Ulangizi Golden Q&A Eval

This workflow takes a Google doc of sample questions and golden expert created answers as an input. It then runs competing versions of the Ulangizi copilot (here with different LLMs - Gemini Pro 1 vs GPTV) and scores the answers. This helps us us determine if changes to our workflows actually increase performance (vs speed and cost as wel)

11mo ago

Gooey Workflows

Provide one or more Gooey.AI workflow runs.
You can add multiple runs from the same recipe (e.g. two versions of your copilot) and we'll run the inputs over both of them.

Copilot Builder

Deprecated: Ulangizi AI with Vision - v14 (default NY/Chi)

Copilot Builder

Ulangizi AI with Vision - v12 - Gemini Pro 1.0 Vision

Input Data Spreadsheet

Upload or link to a CSV or google sheet that contains your sample input data.
For example, for Copilot, this would sample questions or for Art QR Code, would would be pairs of image descriptions and URLs.
Remember to includes header names in your CSV too.

Show as Links

Preview: Here's what you uploaded

Columns

Please select which CSV column corresponds to your workflow's input fields.
For the outputs, select the fields that should be included in the output CSV.
To understand what each field represents, check out our API docs.

Inputs

Outputs

Output Text

Run URL

Evaluation Workflows

(optional) Add one or more Gooey.AI Evaluator Workflows to evaluate the results of your runs.

Copilot Evaluator

⚙️ Settings

Run cost = 1 credits

By submitting, you agree to Gooey.AI's terms & privacy policy.

https://storage.googleapis.com/dara-c1b52.appspot.com/daras_ai/media/a6205e4c-f1e3-11ee-8c7e-02420a000183/evaluator-0-1.csv

Aggregate: Mean

https://gooey.ai/eval/?run_id=wwzfmeie7qdd&uid=fm165fOmucZlpa5YHupPBdcvDR02

Generated in 76.9s on

...

ℹ️ Details

🙋🏽‍♀️ Need more help? Join our Discord

Ulangizi Golden Q&A Eval

Gooey Workflows

Input Data Spreadsheet

Preview: Here's what you uploaded

Columns

Inputs

Outputs

Inputs

Outputs

Evaluation Workflows

🧩 Developer Tools and Functions

Aggregate: Mean

GET STARTED

LEARN

DEVELOPERS

SOCIAL

CONNECT

EXTRAS