Gooey.AI

Speech Recognition Model Evaluator

This recipe is used with https://gooey.ai/bulk to evaluate the latest private & open source speech recognition models (from Google, Meta, OpenAI and others). It takes a CSV file of golden (aka human provided) translations and compares those against a set of AI created translations to generate scores from 0 to 1. It then takes the mean of the scores to determine which model performed best.

5mo ago

Input Data Spreadsheet

Upload or link to a CSV or google sheet that contains your sample input data.
For example, for Copilot, this would sample questions or for Art QR Code, would would be pairs of image descriptions and URLs.
Remember to includes header names in your CSV too.

Show as Links

Input Data Preview

Here's what you uploaded:

Language Model

Evaluation Prompts

Specify custom LLM prompts to calculate metrics that evaluate each row of the input data. The output should be a JSON object mapping the metric names to values.
The columns dictionary can be used to reference the spreadsheet columns.

Your job is to assess AI translations by different models.  As a JSON object, give a short rationale, your selection of the best candidate, and a continuous score between 0 to 1 for each like so: 
{ 
"rationale": string, 
"best_model": string, 
{% for col, val in columns.items() %}
{% if ") Output Text" in col %}
"{{ col }}": float
{% endif %}
{% endfor %}
}
If the model's output is not in English, consider it a bad output.

Golden: {{ columns["English"] }}
Models:
{% for col, val in columns.items() %}
{% if ") Output Text" in col %}
{{ col }}: {{ val }}
{% endif %}
{% endfor %}

Score the translations vs the Golden Output Text with continuous scores from 0 to 1 based on the following guidelines:

0 means "Nonsense/No meaning preserved", 
.2 means "Little meaning preserved and not understandable", 
.4 means "Some meaning preserved and understandable", 
.6 means "Some meaning preserved and mostly accurate", 
.8 means "Most meaning preserved with few grammar mistakes", and 
1 means "Perfect meaning and grammar".

JSON:

Aggregations

Aggregate using one or more operations. Uses pandas.

⚙️ Settings

Run cost = 10 credits

By submitting, you agree to Gooey.AI's terms & privacy policy.

https://storage.googleapis.com/dara-c1b52.appspot.com/daras_ai/media/077e3fc4-7ee3-11ef-a776-02420a0001e4/evaluator-2.csv

Aggregate: Mean

Generated in 17.4s on

...

ℹ️ Details

🙋🏽‍♀️ Need more help? Join our Discord

Speech Recognition Model Evaluator

Input Data Spreadsheet

Input Data Preview

Language Model

Evaluation Prompts

Aggregations

🧩 Developer Tools and Functions

Aggregate: Mean

Related Workflows

Bulk Runner and Evaluator

Copilot Builder

Speech Recognition and Translation

RAG in the Cloud: Search any document with AI

GET STARTED

LEARN

DEVELOPERS

SOCIAL

CONNECT

EXTRAS