💬 DocuChat AI

An AI-powered document Q&A chatbot using Retrieval-Augmented Generation (RAG)

Upload PDF or text documents → Ask questions → Get accurate AI-powered answers with source citations.

Try Demo Mode · Quick Start · Architecture · Tech Stack

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        DocuChat AI                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  📄 Documents    ✂️ Chunking    🔢 Embeddings    🗄️ ChromaDB    │
│  ───────────> ───────────> ──────────────> ──────────────>      │
│  PDF / TXT     1000 char    all-MiniLM      Vector Store       │
│                chunks       -L6-v2          (In-Memory)        │
│                                                                 │
│  ┌──────────────────────────────────────────────────────┐      │
│  │                  Query Pipeline                       │      │
│  │                                                       │      │
│  │  💬 User Question                                     │      │
│  │       │                                               │      │
│  │       ▼                                               │      │
│  │  🔍 Semantic Search (Top-3 chunks from ChromaDB)      │      │
│  │       │                                               │      │
│  │       ▼                                               │      │
│  │  🧠 Claude AI (Generates answer from chunks)          │      │
│  │       │                                               │      │
│  │       ▼                                               │      │
│  │  💬 Answer + 📚 Source Citations                      │      │
│  └──────────────────────────────────────────────────────┘      │
│                                                                 │
│  Frontend: Streamlit │ Backend: LangChain │ LLM: Claude        │
└─────────────────────────────────────────────────────────────────┘

Tech Stack

Category	Technology	Purpose
LLM	Claude (Anthropic)	Answer generation
Embeddings	all-MiniLM-L6-v2	Text → vectors (runs locally, free)
Vector DB	ChromaDB	Semantic similarity search
Framework	LangChain	RAG pipeline orchestration
Frontend	Streamlit	Web interface
Language	Python 3.10+	Core application
Container	Docker	Deployment
Testing	Pytest	Unit tests

Features

Document Upload — Drag & drop PDF and TXT files
RAG Pipeline — Automatic chunking, embedding, and retrieval
Source Citations — See exactly which document sections were used
Demo Mode — Try instantly with a pre-loaded AI Engineering guide
Chat Export — Download conversation as text
Suggested Questions — One-click starter questions in demo mode
Modular Architecture — Clean separation of concerns (config, loader, store, chatbot)
Docker Support — One command to build and run
Tested — Unit tests for core modules

Quick Start

Option 1: Local Setup

# Clone
git clone https://github.com/Shrinija17/rag-chatbot.git
cd rag-chatbot

# Environment
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# API Key
echo "ANTHROPIC_API_KEY=your-key-here" > .env

# Run
streamlit run app.py

Option 2: Docker

docker build -t docuchat-ai .
docker run -p 8501:8501 -e ANTHROPIC_API_KEY=your-key docuchat-ai

Then open http://localhost:8501

Demo Mode

Don't have documents ready? Switch to Demo Mode in the sidebar to instantly try the chatbot with a pre-loaded AI Engineering guide. Ask about:

"What is RAG and how does it work?"
"Compare different vector databases"
"What's the career path for an AI Engineer?"

Project Structure

rag-chatbot/
├── app.py                          # Streamlit web interface
├── src/
│   ├── config.py                   # Configuration & settings
│   ├── document_loader.py          # Document loading & chunking
│   ├── vector_store.py             # Vector store creation
│   └── chatbot.py                  # RAG chain & query logic
├── tests/
│   └── test_document_loader.py     # Unit tests
├── data/
│   └── sample/                     # Demo mode documents
├── rag_chatbot.py                  # CLI version
├── Dockerfile                      # Container support
├── requirements.txt                # Dependencies
└── .env.example                    # API key template

Running Tests

python -m pytest tests/ -v

How RAG Works

Load — PDF and text files are read into memory
Chunk — Documents are split into ~1000 character pieces with 200 char overlap
Embed — Each chunk is converted to a 384-dimensional vector using all-MiniLM-L6-v2
Store — Vectors are indexed in ChromaDB for fast similarity search
Retrieve — User questions are embedded and the top 3 most similar chunks are found
Generate — Claude receives the question + relevant chunks and generates an accurate answer

Built by Shrinija Kummari · Powered by Claude AI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💬 DocuChat AI

Architecture

Tech Stack

Features

Quick Start

Option 1: Local Setup

Option 2: Docker

Demo Mode

Project Structure

Running Tests

How RAG Works

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data/sample		data/sample
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
rag_chatbot.py		rag_chatbot.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

💬 DocuChat AI

Architecture

Tech Stack

Features

Quick Start

Option 1: Local Setup

Option 2: Docker

Demo Mode

Project Structure

Running Tests

How RAG Works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages