Save as New
Here we run the incredible questions from the public dataset of the Simple Bench benchmark comparing the top models and how well they perform against these questions, which are pretty obvious for most humans but tend to trip up LLMs quite badly. If anyone wants to try their hand at competing in the Jan 2025 prompt engineering contest, just duplicate any of the workflows below, save it and then add your saved workflow as another workflow to compare against (it's ok to delete the others too).
Run
Examples
API
7d ago
Provide one or more Gooey.AI workflow runs.You can add multiple runs from the same recipe (e.g. two versions of your copilot) and we'll run the inputs over both of them.
Compare LLMs: GPT4o, Claude3.5 Sonnet, Gemini 1.5 Pro, LLaMA3 vs Mixtral
o1-preview Simple Bench Runner
GPT4o - Simple Bench Runner
Claude3.5-Sonnet - Simple Bench Runner
Gemini 1.5 pro- Simple Bench Runner
Add a Workflow
Upload or link to a CSV or google sheet that contains your sample input data.For example, for Copilot, this would sample questions or for Art QR Code, would would be pairs of image descriptions and URLs.Remember to includes header names in your CSV too.
Show as Links
Loading...
Please select which CSV column corresponds to your workflow's input fields.For the outputs, select the fields that should be included in the output CSV.To understand what each field represents, check out our API docs.
Output Text
Run URL
🤲 Show All Columns
🧩 Developer Tools and Functions.URL
———
🧩 Developer Tools and Functions.Trigger
⌥ Variables.question
question
Variables Schema.question
Input Prompt
Selected Models
Avoid Repetition
Num Outputs
Quality
Max Tokens
Sampling Temperature
Response Format
Price
Run Time
Error Msg
(optional) Add one or more Gooey.AI Evaluator Workflows to evaluate the results of your runs.
Multiple Choice Eval
Add an Eval
⚙️ Settings
Run cost = 1 credits
By submitting, you agree to Gooey.AI's terms & privacy policy.
https://storage.googleapis.com/dara-c1b52.appspot.com/daras_ai/media/e845bc02-d0ba-11ef-bc62-02420a00014e/evaluator-20.csv
https://gooey.ai/eval/?run_id=lgu3d8h1pyo0&uid=kKZgp2h1H2YxZYxZ2DbiRfUfeDM2
Generated in 515.9s on
...
ℹ️ Details
🙋🏽♀️ Need more help? Join our Discord