Cancel
Save as New
Evaluate any Gooey.AI Workflow output against a dataset of inputs and "golden" or expert-created desired answers. Score every row of any CSV, google sheet or excel with any LLM-as-Judge instruction prompt; then average every score in any column to generate automated evaluations.
Run
Examples
API
6mo ago
Show as Links
Here's what you uploaded:
Loading...
GPT-5.1 β’ OpenAI
GLM 5.1
Apertus 70B Instruct β’ inπ¨π
Qwen3.6 Plus
o4-mini Thinking
o4-mini
o3
GPT-Realtime 2
GPT-Realtime 1.5
GPT-5 Chat
GPT-5.5 (Thinking)
GPT-5.4 Nano
GPT-5.4 Mini
GPT-5.4
GPT 5.2 Thinking
GPT-4o Mini
GPT-4o
Kimi K2.6
Mistral Small 4
Mistral Medium 3.1
Mistral Large 3
Ministral 3 14B
MiniMax M2.7
Gemini 3 Flash
Gemini 3.5 Flash
Gemini 3.1 Pro Preview
Gemini 3.1 Flash-Lite
DeepSeek V4 Pro
Claude 4.8 Opus
Claude 4.7 Opus
Claude 4.6 Sonnet
SEA-LION v4
AgriLLM Qwen-3 30B
mean
median
min
max
sum
cumsum
prod
cumprod
std
var
first
last
count
cumcount
nunique
rank
Add an Aggregation
βοΈ Settings
Run cost = 9 credits
With each run, you agree to Gooey.AI's terms & privacy policy.
π Debug
ππ½ββοΈ Need more help? Join our Discord
Which AI model actually works best for your needs? Upload your own data and evaluate any Gooey.AI workflow, LLM or AI model against any other. Great for large data sets, AI model evaluation, task automation, β¦
Gooey.AI's base AI workflow with built-in RAG, web search, voice understanding of 1000+ languages, code creation + execution, API connections & integrations to create your own WhatsApp, Web, FB and voice AI β¦
Transcribe mp3s, WhatsApp voice, YouTube videos in 1000+ langs with Metaβs MMS /Seemless M4T, OpenAI's GPT4o Audio LLM, Whisper v2/v3, Azure, Google, GhanaNLP, AI4Bharat & Bhasini ASR models. Optionally β¦
We've built the best Retrieval Augmented Generation (RAG) as-a-Service anywhere - now with page-level citations! Absorb tables, PDFs, docs, links, videos or audio clips and use our synthetic data maker to β¦