Examples: Doc Search

We've built the best Retrieval Augmented Generation (RAG) as-a-Service anywhere - now with page-level citations! Absorb tables, PDFs, docs, links, videos or audio clips and use our synthetic data maker to generate FAQs and structured data from noisy, unstructured files. Search 1000s of files with our incredibly fast, hybrid database (finding related concepts OR specific keywords). Summarize results with OpenAI, Gemini or any open-source LLM of your choice. And finally, make informed LLM and synthetic data decisions by evaluating with your own golden data sets.

Our benefits:

  1. Page level citations to PDFs.
  2. We understand tables and dirty PDFs!
  3. No-code UX with full API support.
  4. Use any LLM model + your own scripts to create synthetic data.
  5. Support for Google docs, sheets, etc + PDF, doc, docx, txt, links, ppt, sheets, xls, wav, mp3, mp4 and mov
  6. Links to live documents automatically re-indexed if they change
  7. Use our Golden QnA eval framework to test any workflow (especially useful for testing different embeddings + synthetic data creation prompts)
  8. Hybrid search - search for vectors, keywords or both.
  9. Cloud-based, per-API and per MB pricing

🔍

Enhance your search results and summary with Retreival Augemented Generation. Use vector database (vectorDB) search on documents, links, pdfs, docx, txt, and use summarize with any LLM of your choice. You can choose from several embeddings models, customize hybrid search, choose from a range of citation styles and also create synthetic data! If you are looking for a quick RAG (Retrieval Augmented Generation) tool, look no further!

🔍

Add your PDF, Word, HTML or Text docs, train our AI on them with OpenAI embeddings & vector search and then process results with a GPT3 script. This workflow is perfect for anything NOT in ChatGPT: 250-page compliance PDFs, training manuals, your diary, etc.

🔍

Add your PDF, Word, HTML or Text docs, train our AI on them with OpenAI embeddings & vector search and then process results with a GPT3 script. This workflow is perfect for anything NOT in ChatGPT: 250-page compliance PDFs, training manuals, your diary, etc.

🔍

Add your PDF, Word, HTML or Text docs, train our AI on them with OpenAI embeddings & vector search and then process results with the LLM script and engine of your choice.

🔍

Aimed at Indian Chili farmers, this bot parses 4 documents representing best practices from the Indian Ministry of Culture and Digital Green's work to collect common questions from Indian Chili farmers. We then load these as text embeddings and then run the GPT3 script below to create an answer to the farmer's question, giving citations back to the source documents.

🔍

Here's how you can do this for your own account:

  1. Request your twitter archive
  2. Download archive, unzip, and open Your archive.html in Chrome.
  3. Go to the tweets section and scroll to the bottom.
  4. Save page as html (Web Page, Complete)
  5. Upload it here

🔍

2y ago

94 runs

Here we use the WHO’s 180 page guide to antenatal care for pregnancies and then can ask any question to receive simple, referenced answers. https://www.who.int/publications-detail-redirect/9789241549912

🔍

Add your PDF, Word, HTML or Text docs, train our AI on them with OpenAI embeddings & vector search and then process results with a GPT3 script. This workflow is perfect for anything NOT in ChatGPT: 250-page compliance PDFs, training manuals, your diary, etc.

🔍

Here we downloaded a webpage into a PDF and then

  1. Parse the PDF into openai embeddings
  2. Search the PDF for "What are the most interesting aspects of this character?"
  3. Add the results into a GPT3 script that is asked to generate a poll from the most interesting aspects.

🔍

Add your PDF, Word, HTML or Text docs, train our AI on them with OpenAI embeddings & vector search and then process results with a GPT3 script. This workflow is perfect for anything NOT in ChatGPT: 250-page compliance PDFs, training manuals, your diary, etc.

🔍

Here we feed our workflow an SEC filing and then have it output structured data about the company.

🔍

2y ago