A web application that lets you upload PDF documents and chat with an AI assistant that answers questions using the document content. Uses Gemini Vision API for document compression and Supermemory for semantic search.
- Upload PDF: Upload a PDF document through the web interface
- Process: Backend extracts each page using Gemini Vision API, compresses it to structured JSON, and ingests into Supermemory
- Chat: Ask questions about the document - the system retrieves relevant pages and generates answers with citations
- Frontend: Next.js web UI (React + TypeScript + TailwindCSS)
- Backend: FastAPI service (Python) that handles PDF processing and question answering
- APIs: Google Gemini (vision + text) and Supermemory (semantic search)
cd backend
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # macOS/Linux
pip install -r requirements.txtCreate backend/.env:
GEMINI_API_KEY=your_key_here
SUPERMEMORY_API_KEY=your_key_here
Run backend:
uvicorn app.main:app --reloadBackend runs at http://localhost:8000
cd frontend
npm installCreate frontend/.env.local:
NEXT_PUBLIC_BACKEND_URL=http://localhost:8000
Run frontend:
npm run devFrontend runs at http://localhost:3000
- Open
http://localhost:3000in your browser - Upload a PDF file and click "Process & Ingest"
- Wait for processing to complete (shows progress: pages ingested/total)
- Ask questions in the chat interface
- View retrieved evidence in the right panel
See backend/CLOUD_RUN_SETUP.md for detailed instructions. Use backend/deploy-with-cloud-build.ps1 script for automated deployment.
- Push frontend code to GitHub
- Import repository in Vercel
- Set
NEXT_PUBLIC_BACKEND_URLenvironment variable to your Cloud Run URL - Deploy
- Parallel Processing: Pages processed concurrently for faster ingestion
- Thread-Safe: Each processing thread creates its own model instance
- Error Handling: Failed pages are tracked and can be retried
- Citations: Answers include page references like
(doc_id p.7) - Evidence Panel: View retrieved pages and excerpts supporting answers
vision-compression-project/
├── backend/ # FastAPI backend
│ ├── app/ # Application code
│ └── requirements.txt
├── frontend/ # Next.js frontend
│ ├── app/ # Pages and components
│ └── package.json
└── README.md
- Python 3.7+ (backend)
- Node.js 18+ (frontend)
- Google Gemini API key
- Supermemory API key
- Poppler (for PDF processing) - see backend README for installation
MIT