Ulangizi Golden Q&A Eval

This workflow takes a Google doc of sample questions and golden expert created answers as an input. It then runs competing versions of the Ulangizi copilot (here with different LLMs - Gemini Pro 1 vs GPTV) and scores the answers. This helps us us determine if changes to our workflows actually increase performance (vs speed and cost as wel)

Gooey Workflows
Input Data Spreadsheet
Loading...
Input Columns

Loading...



Evaluation Workflows


Run cost = 1 credits

With each run, you agree to Gooey.AI's terms & privacy policy.

Run: Copilot Evaluator Download

Loading...


Aggregate:Mean

Loading...

Loading...