(20Qs) English Audio to Text Benchmark | Gemini 3 Pro, GPT‑5.2, Llama 4, DeepSeek v3.2

(Updated Jan 2025)
This page shows a test of many English speech‑to‑text systems.
English Benchmark
We use the same English audio for every system. Then we compare each system’s text to a reference answer and give it a score between 0 and 1.
A higher score means the system is closer to the reference text and usually more accurate.

Ranking

WorkflowAccuracy (mean score)Latency (median, s)
GPT-4o Audio0.937.75
GPT-Realtime0.937.70
GPT‑5.20.915.99
Gemini 3 Pro0.8610.16
Llama 40.915.77
DeepSeek 3.20.886.17
Gemini 3 Flash0.897.77
GPT‑4.10.926.02

With this benchmark, you can:

  • See which system gets the best score
  • Compare different models and pipelines side by side
  • Choose the best system for your app, research, or product
  • Download all results for deeper analysis
Gooey Workflows
Input Data Spreadsheet
Loading...
Input Columns

Loading...



Evaluation Workflows


Run cost = 1 credits

With each run, you agree to Gooey.AI's terms & privacy policy.

Run: Compare Output Text (from input_audio) Download

Loading...


Aggregate:Mean

Loading...

Loading...


Run: Compare Run Time (Median) Download

Loading...


Aggregate:Median

Loading...

Loading...