Examples: Eval

Our general bulk evaluator to compare AI generated copilot answers against a collection of golden Answers.

🦾

This recipe is used with https://gooey.ai/bulk to evaluate the latest private & open source speech recognition models (from Google, Meta, OpenAI and others). It takes a CSV file of golden (aka human provided) translations and compares those against a set of AI created translations to generate scores from 0 to 1. It then takes the mean of the scores to determine which model performed best.

🦾

9mo ago

160 runs

Here we compare the top 5 ASR models from a set of Telugu samples. Speech output created from https://gooey.ai/bulk/?example_id=nrkx2u17

🦾

2y ago

258 runs

Here we compare the top 3 ASR models from a set of Kannada samples. Speech output created from https://gooey.ai/bulk/?example_id=m8c3mb98

🦾

2y ago

Here we compare the top 6 ASR models from a set of Hindi samples. Speech translations created from https://gooey.ai/bulk/?example_id=ueki9up0.

🦾

2y ago