PDF RAG Research Tool

A simple yet powerful Retrieval-Augmented Generation (RAG) system built with Node.js, Groq API, and local embeddings. Upload PDF documents and ask questions about their content with real-time streaming responses.

Features

PDF Upload & Processing: Upload multiple PDF documents (up to 50MB each)
AI-Powered Q&A: Ask questions and get intelligent answers based on your documents
Smart Search: Uses vector embeddings to find relevant information
Source Citations: See which parts of your documents were used in answers
Real-time Streaming: Watch responses generate in real-time
Cost-Effective: Free local embeddings + Groq's fast, affordable LLM
Clean UI: Modern, intuitive web interface

Tech Stack

Backend: Node.js + Express + Socket.IO
Frontend: HTML5 + CSS3 + Vanilla JavaScript
PDF Parsing: pdf-parse
Embeddings: @xenova/transformers (local, free)
Vector Storage: In-memory with cosine similarity
LLM: Groq API (openai/gpt-oss-20b)

Prerequisites

Node.js 16+ installed
Groq API key (get one at console.groq.com)

Quick Start

Clone or download this project
Install dependencies:
```
npm install
```
Set up environment variables:
- Edit .env file
- Add your Groq API key:
```
GROQ_API_KEY=your_actual_groq_api_key_here
PORT=3000
```
Start the server:
```
npm start
```
Open your browser:
- Navigate to http://localhost:3000

Usage

Upload a PDF

Click the upload area or drag & drop a PDF file
Click "Upload & Process"
Wait for processing (first-time embedding model download may take a moment)
Your PDF is now ready for queries!

Ask Questions

Type your question in the chat input
Press Enter or click Send
Watch the response stream in real-time
See source citations at the bottom of each answer

Manage Data

View document statistics in the sidebar
Clear all data with "Clear All Data" button
Upload multiple PDFs to expand your knowledge base

Configuration

Chunking Settings

Edit src/ragService.js to adjust:

chunkSize: 1000 (characters per chunk)
chunkOverlap: 200 (overlap between chunks)
topK: 3 (number of chunks to retrieve)

Model Settings

Edit server.js to change:

Groq model: openai/gpt-oss-20b
Temperature: 0.7 (higher = more creative)
Max tokens: 2048

Project Structure

groq-rag-system/
├── package.json          # Dependencies
├── .env                  # Environment variables
├── server.js            # Express server + Socket.IO
├── src/
│   ├── pdfService.js     # PDF text extraction
│   ├── embeddingService.js # Local embeddings
│   ├── vectorStore.js    # In-memory vector DB
│   └── ragService.js     # RAG orchestration
└── public/
    ├── index.html        # Frontend UI
    ├── style.css         # Styling
    └── script.js         # Client-side logic

How It Works

Upload PDF: Extract text from uploaded PDF
Chunk Text: Split text into smaller, manageable chunks
Generate Embeddings: Convert chunks to vector embeddings (local, free)
Store Vectors: Save embeddings in memory with metadata
Query Process:
- Convert question to embedding
- Find similar chunks using cosine similarity
- Retrieve top-k relevant chunks
- Send chunks + question to Groq LLM
- Stream response back to user

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
public		public
src		src
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
server.js		server.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF RAG Research Tool

Features

Tech Stack

Prerequisites

Quick Start

Usage

Upload a PDF

Ask Questions

Manage Data

Configuration

Chunking Settings

Model Settings

Project Structure

How It Works

About

Uh oh!

Releases

Packages

Languages

sundaram2021/rag-system

Folders and files

Latest commit

History

Repository files navigation

PDF RAG Research Tool

Features

Tech Stack

Prerequisites

Quick Start

Usage

Upload a PDF

Ask Questions

Manage Data

Configuration

Chunking Settings

Model Settings

Project Structure

How It Works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages