Evaluator

Evaluate any Gooey.AI Workflow output against a dataset of inputs and "golden" or expert-created desired answers. Score every row of any CSV, google sheet or excel with any LLM-as-Judge instruction prompt; then average every score in any column to generate automated evaluations.

6mo ago

Input Data Spreadsheet

Show as Links

Input Data Preview

Here's what you uploaded:

Language Model

Evaluation Prompts

Lower values are better

Aggregations

⚙️ Settings

Run cost = 9 credits

With each run, you agree to Gooey.AI's terms & privacy policy.

Download

Aggregate:Mean

🐞 Debug

🙋🏽‍♀️ Need more help? Join our Discord

Related Workflows

Bulk Runner and Evaluator

Which AI model actually works best for your needs? Upload your own data and evaluate any Gooey.AI workflow, LLM or AI model against any other. Great for large data sets, AI model evaluation, task automation, …

Copilot Builder

Gooey.AI's base AI workflow with built-in RAG, web search, voice understanding of 1000+ languages, code creation + execution, API connections & integrations to create your own WhatsApp, Web, FB and voice AI …

Speech Recognition and Translation

Transcribe mp3s, WhatsApp voice, YouTube videos in 1000+ langs with Meta’s MMS /Seemless M4T, OpenAI's GPT4o Audio LLM, Whisper v2/v3, Azure, Google, GhanaNLP, AI4Bharat & Bhasini ASR models. Optionally …

RAG in the Cloud: Search any document with AI

We've built the best Retrieval Augmented Generation (RAG) as-a-Service anywhere - now with page-level citations! Absorb tables, PDFs, docs, links, videos or audio clips and use our synthetic data maker to …

Evaluator

Input Data Spreadsheet

Input Data Preview

Language Model

Evaluation Prompts

Lower values are better

Aggregations

🛠️ Developer Tools and Functions

Aggregate:Mean

Related Workflows

Bulk Runner and Evaluator

Copilot Builder

Speech Recognition and Translation

RAG in the Cloud: Search any document with AI

GET STARTED

LEARN

DEVELOPERS

SOCIAL

CONNECT

EXTRAS