Vision Compression Document Chat

A web application that lets you upload PDF documents and chat with an AI assistant that answers questions using the document content. Uses Gemini Vision API for document compression and Supermemory for semantic search.

How It Works

Upload PDF: Upload a PDF document through the web interface
Process: Backend extracts each page using Gemini Vision API, compresses it to structured JSON, and ingests into Supermemory
Chat: Ask questions about the document - the system retrieves relevant pages and generates answers with citations

Architecture

Frontend: Next.js web UI (React + TypeScript + TailwindCSS)
Backend: FastAPI service (Python) that handles PDF processing and question answering
APIs: Google Gemini (vision + text) and Supermemory (semantic search)

Quick Start

Backend Setup

cd backend
python -m venv venv
venv\Scripts\activate  # Windows
# source venv/bin/activate  # macOS/Linux

pip install -r requirements.txt

Create backend/.env:

GEMINI_API_KEY=your_key_here
SUPERMEMORY_API_KEY=your_key_here

Run backend:

uvicorn app.main:app --reload

Backend runs at http://localhost:8000

Frontend Setup

cd frontend
npm install

Create frontend/.env.local:

NEXT_PUBLIC_BACKEND_URL=http://localhost:8000

Run frontend:

npm run dev

Frontend runs at http://localhost:3000

Usage

Open http://localhost:3000 in your browser
Upload a PDF file and click "Process & Ingest"
Wait for processing to complete (shows progress: pages ingested/total)
Ask questions in the chat interface
View retrieved evidence in the right panel

Deployment

Backend (Google Cloud Run)

See backend/CLOUD_RUN_SETUP.md for detailed instructions. Use backend/deploy-with-cloud-build.ps1 script for automated deployment.

Frontend (Vercel)

Push frontend code to GitHub
Import repository in Vercel
Set NEXT_PUBLIC_BACKEND_URL environment variable to your Cloud Run URL
Deploy

Features

Parallel Processing: Pages processed concurrently for faster ingestion
Thread-Safe: Each processing thread creates its own model instance
Error Handling: Failed pages are tracked and can be retried
Citations: Answers include page references like (doc_id p.7)
Evidence Panel: View retrieved pages and excerpts supporting answers

Project Structure

vision-compression-project/
├── backend/          # FastAPI backend
│   ├── app/         # Application code
│   └── requirements.txt
├── frontend/        # Next.js frontend
│   ├── app/        # Pages and components
│   └── package.json
└── README.md

Requirements

Python 3.7+ (backend)
Node.js 18+ (frontend)
Google Gemini API key
Supermemory API key
Poppler (for PDF processing) - see backend README for installation

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
backend		backend
frontend		frontend
output		output
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision Compression Document Chat

How It Works

Architecture

Quick Start

Backend Setup

Frontend Setup

Usage

Deployment

Backend (Google Cloud Run)

Frontend (Vercel)

Features

Project Structure

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vision Compression Document Chat

How It Works

Architecture

Quick Start

Backend Setup

Frontend Setup

Usage

Deployment

Backend (Google Cloud Run)

Frontend (Vercel)

Features

Project Structure

Requirements

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages