Evaluator

Evaluate any Gooey.AI Workflow output against a dataset of inputs and "golden" or expert-created desired answers. Score every row of any CSV, google sheet or excel with GPT4o (or any LLM you choose); then average every score in any column to generate automated evaluations.


Input Data Spreadsheet

Upload or link to a CSV or google sheet that contains your sample input data.
For example, for Copilot, this would sample questions or for Art QR Code, would would be pairs of image descriptions and URLs.
Remember to includes header names in your CSV too.

Loading...
Input Data Preview

Here's what you uploaded:

Loading...


GPT-4 Turbo (openai)

Evaluation Prompts

Specify custom LLM prompts to calculate metrics that evaluate each row of the input data. The output should be a JSON object mapping the metric names to values.
The columns dictionary can be used to reference the spreadsheet columns.


Aggregations

Aggregate using one or more operations. Uses pandas.

mean


Run cost = 45 credits

By submitting, you agree to Gooey.AI's terms & privacy policy.

Related Workflows