Ask my PDF

Intro to ask my pdf

Inspiration

As impressive as LLMs are, in the vast majority of cases they are not good at recognizing facts and citing sources. The solution to this is called RAG, which stands for Retrieval Augmented Generation. The idea behind RAG is that you use Prompt Engineering to not only send the question to the LLM, but also the required sources. Most LLM cloud providers already offer RAG solutions. But what if I want to keep my sources locally on my device? This is where “Ask my PDF” comes into play. "Ask my PDF" is a RAG solution that runs completely on-device in the user's browser and therefore respects privacy and is even available offline.

What it does

it reads any PDF sentence by sentence
every sentence will then be converted to a multidimensional verctorspace, the sentence and it's vector representation will then be stored in a vectorDB
once the user asks a question, this question will be vectorized as well and the sentences with the closest vectors (which should be the sentences that are most similar to the question), together with some surrounding context, will then be added to the prompt
The prompt now consists of some instructions, the context to answer the question and the question itself and will be sent to the LLM to generate an answer that also provides the sources that is uses

How I built it

Ask my PDF uses PDF.js to process a PDF locally and split the content up into separate paragraphs and sentences.
I created my own little in-memory VectorDB that uses transformers.js for the vector embeddings.
The text generation can be done using the PromptAPI (if available) or the Gemma2-2B or Gemma2-9B, compiled to WebAssembly and WebGPU using MLC LLM

Challenges I ran into

For RAG to work, the LLM must follow the instructions very precisely. I tried various prompts until I found one that works relatively well and avoids hallucinations.

Accomplishments that I am proud of

Well, it works :D. And also I am proud of my VectorDB solution, because it is super simple and works quite well

What I learned

I learned a lot about how LLMs perform on different machines and also how vector search works under the hood

What's next for Ask my PDF

Right now I am still gathering feedback on how well the application work on different machines and with different documents. One weak point is definitely the PDF parsing. I am already working on ways to improve this. Other than that I also want to add more models (maybe even the Gemma2 27B) to see the limits of on device AI in the browser.

Built With

pdf.js
promptapi
transformers.js

Updates

Nico Martin started this project — Nov 19, 2024 05:31 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.