This workflow creates a vector database (vectorDB) index on a collection of documents, links, pdfs, docx, txt files and then, given a query, searches them, returns page-level citation chunks and summarizes them with a GPT4 prompt. In particular, this workflow:
- Takes in links to PDFs
- Runs a synthetic data creation recipe on each scanned page of the PDF, to create a sample FAQ for each
- Queries the vector DB with the search query
- Gets back results
- Summarizes the resulting chunks with the task instructions into an answer
- Provides page-level URL citations for each referenced chuck returned from the vector DB search.
All of this is available as an API as well, such that you can modify the source documents, prompt or synthetic data creation processing while maintaining the same API endpoint