Eval: ASR to golden health English answer for Africa

A bulk evaluator workflow that compares AI-generated answers (copilot responses) to a set of golden reference answers. Requires input data columns: "input_prompt" (the question/task) and "reference_answer" (the ideal response). The workflow uses custom evaluation prompts to compare outputs, scoring them for accuracy and penalizing hallucinations. Aggregates results to provide an overall performance metric for your AI answers.

Input Data Spreadsheet
Loading...
Input Data Preview

Here's what you uploaded:

Loading...


Evaluation Prompts
Loading...

Aggregations

Run cost = 30 credits

With each run, you agree to Gooey.AI's terms & privacy policy.

Download

Loading...


Aggregate:Mean

Loading...

Loading...

Related Workflows