A simple yet powerful Retrieval-Augmented Generation (RAG) system built with Node.js, Groq API, and local embeddings. Upload PDF documents and ask questions about their content with real-time streaming responses.
- PDF Upload & Processing: Upload multiple PDF documents (up to 50MB each)
- AI-Powered Q&A: Ask questions and get intelligent answers based on your documents
- Smart Search: Uses vector embeddings to find relevant information
- Source Citations: See which parts of your documents were used in answers
- Real-time Streaming: Watch responses generate in real-time
- Cost-Effective: Free local embeddings + Groq's fast, affordable LLM
- Clean UI: Modern, intuitive web interface
- Backend: Node.js + Express + Socket.IO
- Frontend: HTML5 + CSS3 + Vanilla JavaScript
- PDF Parsing: pdf-parse
- Embeddings: @xenova/transformers (local, free)
- Vector Storage: In-memory with cosine similarity
- LLM: Groq API (openai/gpt-oss-20b)
- Node.js 16+ installed
- Groq API key (get one at console.groq.com)
-
Clone or download this project
-
Install dependencies:
npm install
-
Set up environment variables:
- Edit
.envfile - Add your Groq API key:
GROQ_API_KEY=your_actual_groq_api_key_here PORT=3000
- Edit
-
Start the server:
npm start
-
Open your browser:
- Navigate to
http://localhost:3000
- Navigate to
- Click the upload area or drag & drop a PDF file
- Click "Upload & Process"
- Wait for processing (first-time embedding model download may take a moment)
- Your PDF is now ready for queries!
- Type your question in the chat input
- Press Enter or click Send
- Watch the response stream in real-time
- See source citations at the bottom of each answer
- View document statistics in the sidebar
- Clear all data with "Clear All Data" button
- Upload multiple PDFs to expand your knowledge base
Edit src/ragService.js to adjust:
chunkSize: 1000 (characters per chunk)chunkOverlap: 200 (overlap between chunks)topK: 3 (number of chunks to retrieve)
Edit server.js to change:
- Groq model:
openai/gpt-oss-20b - Temperature: 0.7 (higher = more creative)
- Max tokens: 2048
groq-rag-system/
├── package.json # Dependencies
├── .env # Environment variables
├── server.js # Express server + Socket.IO
├── src/
│ ├── pdfService.js # PDF text extraction
│ ├── embeddingService.js # Local embeddings
│ ├── vectorStore.js # In-memory vector DB
│ └── ragService.js # RAG orchestration
└── public/
├── index.html # Frontend UI
├── style.css # Styling
└── script.js # Client-side logic
- Upload PDF: Extract text from uploaded PDF
- Chunk Text: Split text into smaller, manageable chunks
- Generate Embeddings: Convert chunks to vector embeddings (local, free)
- Store Vectors: Save embeddings in memory with metadata
- Query Process:
- Convert question to embedding
- Find similar chunks using cosine similarity
- Retrieve top-k relevant chunks
- Send chunks + question to Groq LLM
- Stream response back to user