A small Flask web app that turns a stack of PDFs into a grounded question-answering chatbot, with an optional voice loop. Upload a few documents, ask a question in the browser (or speak it), and the assistant retrieves the most relevant passage from your PDFs and asks GPT-3.5 to answer using only that passage.
Built September 2023 as a focused exercise in retrieval-augmented generation before "RAG" was the default acronym in every tutorial. Kept here as a reference implementation of the core loop without the framework weight of LangChain or LlamaIndex.
I wanted to see how far a deliberately minimal RAG stack could go for a single domain:
- Plain
PyPDF2for ingest. scikit-learnTfidfVectorizerplus cosine similarity for retrieval (no vector DB).- A tightly-scoped system prompt that refuses to answer from outside the supplied passage.
- A Flask front-end with a chat widget and a "human-like typing" effect on the response.
An optional path uses sentence-transformers plus FAISS for dense retrieval, with a similarity check that verifies the model's answer is actually grounded in the source text before returning it (see retrieval_from_embeddings.py).
The voice loop (SpeechRecognition + pyttsx3) is included but commented out in the main flow, because the web UI ended up being the more useful surface.
PDF(s) ──> PDFProcessor ──> sentence list
│
▼
TfidfVectorizer.fit_transform
│
▼
user question ──> cosine_similarity ──> best-match passage
│
▼
OpenAI Chat Completions (gpt-3.5-turbo)
system prompt: "use only this passage"
│
▼
answer ──> Flask UI / pyttsx3 TTS
Files:
app.py: Flask routes (/,/chat,/upload).assistant.py: Orchestrator that wires PDF ingest, retrieval, and chat.pdf_processor.py: Extracts and tokenises PDF text into sentences.chatbot.py: TF-IDF retrieval plus OpenAI Chat Completions call.audio_processor.py:speech_recognition(Google Web Speech API) for STT,pyttsx3(SAPI5 voice) for TTS.retrieval_from_embeddings.py: Alternate path usingsentence-transformersembeddings, FAISS index, and a grounding-verification step.templates/index.html,static/: Flask UI plus a small chat widget.
| Layer | Choice |
|---|---|
| Web | Flask, Jinja templates, vanilla JS |
| Ingest | PyPDF2 |
| Retrieval (default) | scikit-learn TF-IDF + cosine similarity |
| Retrieval (alt) | sentence-transformers (paraphrase-distilroberta-base-v1) + FAISS |
| LLM | OpenAI gpt-3.5-turbo via REST |
| STT (optional) | SpeechRecognition with Google Web Speech backend |
| TTS (optional) | pyttsx3 (Windows SAPI5) |
Requires Python 3.10+ and an OpenAI API key. The voice features additionally require system audio libraries (see below).
git clone https://github.com/dparikh79/AI-Assistant.git
cd AI-Assistant
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install flask PyPDF2 scikit-learn requests \
SpeechRecognition pyttsx3 gTTS \
sentence-transformers faiss-cpuCreate secrets.json in the project root (it is gitignored):
{
"API_ENDPOINT": "https://api.openai.com/v1/chat/completions",
"API_KEY": "sk-your-openai-key"
}Edit config.py to point PATH_TO_BOOK at your own PDFs, or leave the list empty and upload PDFs through the web UI at runtime.
Run the web app:
python app.py
# open http://127.0.0.1:5000Or run the terminal-only loop (no Flask):
python assistant.py- macOS:
brew install portaudiothenpip install pyaudio. - Debian/Ubuntu:
sudo apt-get install portaudio19-dev espeak ffmpegthenpip install pyaudio. - Windows:
pyttsx3uses SAPI5 out of the box;pip install pyaudiousually works on recent Python builds.
The voice calls are commented out in assistant.py::main. Uncomment the AudioProcessor.text_to_speech(...) and AudioProcessor.voice_to_text() lines to switch from typed input to spoken input.
This is a 2023-era sketch, not a product. Worth being upfront about what it is and is not:
- TF-IDF retrieval over sentence splits is fast but brittle on long, multi-page reasoning. The FAISS path in
retrieval_from_embeddings.pyis better but not wired into the Flask app. pyttsx3is initialised withsapi5, so out-of-the-box TTS is Windows-only. On macOS or Linux, swap the driver (nsssorespeak) inaudio_processor.py.- The Google Web Speech backend used by
SpeechRecognitionis undocumented and unsuitable for production. A modern rewrite would use Whisper (local) or the OpenAI audio API. - No wake word, no OS control (no app launching, no web search). It is a grounded Q&A loop, not a Siri replacement.
- Bare
requests.postto OpenAI with no retry/backoff and no streaming; atry/exceptswallows errors and prints to stdout. - The
app.secret_keyinapp.pyis a placeholder. Set it from an environment variable before exposing the app to anyone other than yourself.
If I rebuilt this in 2026 I would: swap PyPDF2 for pypdf or unstructured; replace TF-IDF with a small embedding model and an actual vector store; move the OpenAI call to the official SDK with streaming and structured output; use Whisper for STT; and put the grounding check from retrieval_from_embeddings.py in front of every answer, not just the alternate path.
MIT. See LICENSE.