Run: Bulk Evaluator


Input Data Spreadsheet

Upload or link to a CSV or google sheet that contains your sample input data.
For example, for Copilot, this would sample questions or for Art QR Code, would would be pairs of image descriptions and URLs.
Remember to includes header names in your CSV too.

Loading...

Input Data Preview

Here's what you uploaded:

Loading...


Evaluation Prompts

Specify custom LLM prompts to calculate metrics that evaluate each row of the input data. The output should be a JSON object mapping the metric names to values.
The columns dictionary can be used to reference the spreadsheet columns.


Aggregations

Aggregate using one or more operations. Uses pandas.

mean



Run cost = 25 credits

By submitting, you agree to Gooey.AI's terms & privacy policy.

Related Workflows

Bulk Runner and Evaluator

Which AI model actually works best for your needs? Upload your own data and evaluate any Gooey.AI workflow, LLM or AI model against any other. Great for large data sets, AI model evaluation, task automation, parallel processing and automated testing. To get started, paste in a Gooey.AI workflow, upload a CSV of your test data (with header names!), check the mapping of headers to workflow inputs and tap Submit. More tips in the Details below.

Copilot Builder

Gooey.AI's Copilot is the best chatbot builder anywhere, combining your choice of LLMs (GPT4o, Gemini, Claude3, Mixtral or LLaMA3), knowledge docs from any link or doc/PDF (with table support!), speech recognition, text-to-speech and Lipsync, editable synthetic data pipelines, built in 👍🏿 👎🏼 feedback, conversation analysis, WhatsApp, slack, Facebook and API integrations and much more.

In this example, we present Farmer.CHAT. It uses a collection of documents and transcripts of 100s of videos representing best practices to answer common questions from Indian farmers. We load these documents + transcripts into a vector database. With each question, we search the vector DB and then add the results to a GPT script below to create an answer to the farmer's questions. Simply change the Instructions and Knowledge docs & tap Submit and then Save to make it your own.

To know more, head over to our COPILOT HELP GUIDE

Speech Recognition and Translation

Transcribe mp3, WhatsApp audio + wavs with OpenAI's Whisper or AI4Bharat / Bhashini ASR models. Optionally translate to any language too.

RAG in the Cloud: Search any document with AI

We've built the best Retrieval Augmented Generation (RAG) as-a-Service anywhere - now with page-level citations! Absorb tables, PDFs, docs, links, videos or audio clips and use our synthetic data maker to generate FAQs and structured data from noisy, unstructured files. Search 1000s of files with our incredibly fast, hybrid database (finding related concepts OR specific keywords). Summarize results with OpenAI, Gemini or any open-source LLM of your choice. And finally, make informed LLM and synthetic data decisions by evaluating with your own golden data sets.

Here is our Quickstart Guide

Our benefits:

  1. Page level citations to PDFs.
  2. We understand tables and dirty PDFs!
  3. No-code UX with full API support.
  4. Use any LLM model + your own scripts to create synthetic data .
  5. Support for Google docs, sheets, etc + PDF, doc, docx, txt, links, ppt, sheets, xls, wav, mp3, mp4 and mov, including transcription and optional translation from the best speech recognition models.
  6. Links to live documents automatically re-indexed if they change
  7. Use our Golden QnA eval framework to test any workflow (especially useful for testing different embeddings + synthetic data creation prompts)
  8. Hybrid search - search for vectors, keywords or both.
  9. Cloud-based, per-API-call and per-MB pricing.