Gates Foundation

💬

2mo ago

Public

11 Gem3Flash -Swahili (A2T) (Copy)

💬

2mo ago

Public

10 Kikuyu (MMS+GhanaMT+GPT4.1) (A2T)

💬

Share settings updated

2mo ago

Public

9 Kikuyu (SnbrdV2+Gem3flash) (A2T)

💬

Share settings updated

2mo ago

Public

8 Kikuyu (Gemini 3 Flash) (A2T)

💬

Share settings updated

2mo ago

Public

💬

Share settings updated

2mo ago

177 runs

Public

10 Jacaranda + GPT4.1 -Swahili (A2T)

💬

Share settings updated

2mo ago

176 runs

Public

9 Jac+Gem3Flash -Swahili (A2T)

💬

Share settings updated

2mo ago

175 runs

Public

💬

Share settings updated

2mo ago

182 runs

Public

💬

Share settings updated

2mo ago

85 runs

Public

Eval: ASR to golden health English answer for Africa

⚖️ Eval

A bulk evaluator workflow that compares AI-generated answers (copilot responses) to a set of golden reference answers. Requires input data columns: "input_prompt" (the question/task) and "reference_answer" (the ideal response). The workflow uses custom evaluation prompts to compare outputs, scoring them for accuracy and penalizing hallucinations. Aggregates results to provide an overall performance metric for your AI answers.

⚖️

Share settings updated

2mo ago

Public

(25Qs) Swahili Audio to Text Benchmark | Gemini 3 Pro, GPT‑4o, Jacaranda & …

Computational Mama aka Ambika

🌍 Africa

🇰🇪 Kenya

(Updated Jan 2026)
This page shows a test of many Swahili (Kiswahili) speech‑to‑text systems and, in some cases, Swahili → English translation pipelines.
eval image
We use the same Swahili audio clips for every system. Then we compare each system’s text output to a reference answer and give it a score between 0 and 1.
A higher score means the system is closer to the reference text and usually more accurate.

No.	Workflow	Accuracy (Mean)	Median Latency (s)
0	GPT-4oAudio	0.50	5.49
1	GPT-Realtime	0.45	5.13
2	Jacaranda + GPT-5.1	0.94	4.05
3	Jacaranda + Gemini 3 Pro	0.96	8.84
4	Jacaranda + GPT-5.1 + Goog MT	0.91	4.13
5	Omni + GPT-5.1 + GoogMT	0.92	4.87
6	Omni + Gemini 3 Pro	0.96	8.54
7	Omni + Gemini 3 Pro + GoogM	0.96	9.12
8	Gemini 3 Pro	0.92	9.80
9	Jacaranda + Gemini 3 Flash	0.92	5.32
10	Jacaranda + GPT-4.1	0.89	3.62
11	Gemini 3 Flash	0.85	5.95

On this page you can:

See which Swahili system or pipeline gets the best score
Compare different Swahili ASR and Swahili→English models side by side
Choose the best system for your app, call center, research, or product
Download all results for deeper analysis and custom reporting

3mo ago

Public

(20Qs) Hindi Audio to Text Benchmark | Gemini 3 Pro, GPT‑4o, GPT 5.1

🇮🇳 India

(20 Qs Updated Jan 2025)

This page shows a test of many Hindi speech‑to‑text systems and, in some cases, Hindi → English translation pipelines.
hindi benchmark

We use the same Hindi audio clips for every system. Then we compare each system’s text output to a reference answer and give it a score between 0 and 1.
A higher score means the system is closer to the reference text and usually more accurate.

Workflow	Accuracy (mean score)	Latency (median)
0 GPT4oAudio	0.77	7.33
1 GPTRealtime	0.73	6.64
2 GPT5.1	0.92	5.69
3 GPT4.1	0.90	5.67
4 Gemini 3 Pro	0.96	9.88
5 Gemini 3 Flash	0.92	7.58
6 Sarvam.AI	0.55	5.98
7 Omnilingual+GPT5-mini	0.96	9.00
8 Omnilingual+Gemini 3 Pro	0.93	10.14
9 Omnilingual+Gemini 3 Flash	0.91	7.60
10 MMS+GoogMT+GPT4.1	0.91	4.94

On this page you can:

See which Hindi system or pipeline gets the best score
Compare different Hindi ASR and Hindi→English models side by side
Choose the best system for your app, call center, research, or product
Download all results for deeper analysis and custom reporting

🦾

Share settings updated

4mo ago

Public

10 Omni+Deepseek3.2 Kinyarwanda

Computational Mama aka Ambika

💬

Share settings updated

4mo ago

101 runs

Public

16 Mbaza+GPTOSS 120B Kinyarwanda

Computational Mama aka Ambika

💬

Share settings updated

4mo ago

51 runs

Public

(25Qs) Kikuyu Audio to Text Benchmark | GPT5.1, Gemini 3 Pro, Sunbird v2 & …

(25 Qs Updated December 2025)

This page shows a test of many Kikuyu speech‑to‑text and Kikuyu→English systems.
evalinfo

We use the same Kikuyu audio for every system. Then we compare each system’s text to a reference answer and give it a score between 0 and 1.
A higher score means the system is closer to the reference text and usually more accurate.

Ranking Table

#	Workflow	Accuracy (Mean)	Median Latency (s)
0	GPT‑Realtime	0.05	5.06
1	SunbirdV2 + Goog MT + Gem3Pro	0.78	12.38
2	SunbirdV2 + GPT5.1	0.57	4.85
3	SunbirdV2 + Gem3Pro	0.83	11.96
4	Omni + Gem3pro	0.78	13.84
5	Omni + Goog MT + Gem3Pro	0.74	12.75
6	Omni + GPT5.1	0.18	9.73
7	Gemini 3 Pro	0.81	12.02
8	Gemini 3 Flash	0.38	7.80
9	SunbirdV2 + Gem3Flash	0.75	7.23
10	Meta MMS + GPT4.1 + GhanaNLP MT	0.56	3.69

You can use this page to:

See which system gets the best score
Compare different models and pipelines side by side
Choose the best system for your app, research, or product
Download all results for deeper analysis

5mo ago

Public

(20Qs) English Audio to Text Benchmark | Gemini 3 Pro, GPT‑5.2, Llama 4, …

(Updated Jan 2025)
This page shows a test of many English speech‑to‑text systems.
English Benchmark
We use the same English audio for every system. Then we compare each system’s text to a reference answer and give it a score between 0 and 1.
A higher score means the system is closer to the reference text and usually more accurate.

Ranking

Workflow	Accuracy (mean score)	Latency (median, s)
GPT-4o Audio	0.93	7.75
GPT-Realtime	0.93	7.70
GPT‑5.2	0.91	5.99
Gemini 3 Pro	0.86	10.16
Llama 4	0.91	5.77
DeepSeek 3.2	0.88	6.17
Gemini 3 Flash	0.89	7.77
GPT‑4.1	0.92	6.02

With this benchmark, you can:

See which system gets the best score
Compare different models and pipelines side by side
Choose the best system for your app, research, or product
Download all results for deeper analysis

5mo ago

Public

(25Qs) Kinyarwanda Audio to Text Benchmark | Gemini 3 Pro, GPT‑4o, Mbaza, Sun …