Inspiration
I love reading books, especially self-help ones. And as an AI Engineer, I also have to go through a lot of research papers and technical literature. At other times, I'm reading and learning from my course books.
Most of the self-help books and technical literature I love are in English, and my university courses are also completely in English. But I'm a non-native English speaker.
While reading PDFs, I'd constantly hit a wall — a complex phrase, an idiom, a dense paragraph — and break my flow to switch apps: Google Translate, dictionary, ChatGPT, notes. It was exhausting.
I realized: if I'm struggling, millions of language learners, students, and professionals are too. That's when I thought of building something to help readers and learners read, understand, and learn all in one place with the help of Gemini 3's revolutionary multimodal capabilities.
I wanted to leverage Gemini 3's enhanced reasoning and reduced latency to create an AI that doesn't just translate or summarize — but thinks with you as you read. I thought:
"What if the PDF itself became your AI-powered tutor?"
That spark became DuoRead AI — powered by Gemini 3.
What it does
DuoRead transforms any static PDF into a dynamic, AI-powered learning canvas — no app-switching required.

Core Features Powered by Gemini 3:
- DuoRead Agent — A conversational AI assistant that understands your entire book, powered by Gemini 3's long-context reasoning. Ask complex questions, get contextual answers, and have follow-up discussions.
- Instant Translation — Translate any selected text into your native language with Gemini 3's multilingual capabilities
- Smart Simplification — Break down complex academic or technical passages using Gemini 3's advanced language understanding
- Contextual Summarization — Summarize chapters, sections, or selected passages while preserving key insights
- Deep Explanations — Get ELI5 explanations, detailed breakdowns, or context-aware definitions using Gemini 3's reasoning engine
- Pronunciation & Read Aloud — Hear correct pronunciation and have entire pages read to you
- Smart Sticky Notes — Add annotations that sync across devices
Dual AI Architecture:
- Hybrid Mode: Powered by Gemini 3 API — unlimited context, advanced reasoning, multimodal understanding
- Client Mode: Chrome's Gemini Nano for instant, privacy-first operations
Works as a web app accessible anywhere.
How I built it
DuoRead AI is a full-stack, Gemini 3-powered reading platform that puts Google's most advanced AI directly into your reading experience.
Frontend Architecture:
The frontend is built with React + Vite and deployed on Vercel. For PDF rendering, I used react-pdf (v10.2.0) as the React wrapper and pdfjs-dist (v5.4.296) — Mozilla's PDF.js — as the core engine. This combination enables dynamic text extraction, precise selection, and performance optimization.
PDFs are loaded as blobs, converted to object URLs, and rendered with full text and annotation layers enabled. A custom selection overlay captures user highlights and triggers a floating AI action menu — all interactions are instant and contextual.
Gemini 3 Integration (Core AI Layer): The heart of DuoRead is Gemini 3 (gemini-2.5-flash) accessed via the Google AI Studio API. Here's how I leveraged Gemini 3's capabilities:
Long-Context Reasoning — Gemini 3 processes entire book chapters (up to 1M tokens) to answer complex questions like "How do the concepts in Chapter 3 relate to the conclusion?" or "Compare the author's arguments in sections 2 and 5."
Enhanced Multimodal Understanding — When users select text containing references to figures, tables, or diagrams, Gemini 3 provides context-aware explanations that reference visual elements (future OCR integration planned).
Reduced Latency — Gemini 3's speed improvements enable real-time AI responses while reading. Users get instant translations, summaries, and explanations without breaking flow.
Advanced Reasoning for Learning — I engineered specialized prompts that leverage Gemini 3's reasoning capabilities:
- Contextual Definitions: "Define [term] in the context of this passage"
- Layered Explanations: "Explain this concept at beginner, intermediate, and expert levels"
- Comparative Analysis: "How does this section's argument differ from [previous section]?"
LangChain + LangGraph Orchestration: I built a stateful AI agent using LangChain + LangGraph that maintains conversation history and book context across interactions. The agent uses MongoDB as a checkpointer to preserve multi-turn reasoning — enabling follow-up questions like:
- User: "Summarize Chapter 5"
- Agent: [Provides summary via Gemini 3]
- User: "What are the practical applications?"
- Agent: [Continues with context from previous summary]
Backend Infrastructure:
The backend is FastAPI in Python, serving as a high-performance gateway to Gemini 3. User data, uploaded books, and vector embeddings for semantic search are stored in PostgreSQL with the pgvector extension — enabling similarity-based search across book content.
Dual-Mode Intelligence:
- Hybrid Mode (Gemini 3): Full power for complex reasoning, long documents, and advanced queries
- Client Mode (Gemini Nano): Chrome's built-in AI for instant, offline operations like quick translations and simple definitions
Privacy-First Design: In Client Mode, zero data leaves the device. In Hybrid Mode, only selected text is sent to Gemini 3 with user consent. No full documents are uploaded without explicit permission.
The entire system is deployed with Vercel (frontend), Hugging Face Docker Spaces (backend), Neon (PostgreSQL), and MongoDB Atlas (agent state).
Technology Stack:
- AI: Gemini 3 API (gemini-2.5-flash), Chrome Gemini Nano
- Orchestration: LangChain, LangGraph
- Frontend: React, Vite, react-pdf, pdfjs-dist
- Backend: FastAPI, PostgreSQL (pgvector), MongoDB Atlas
- Deployment: Vercel, Hugging Face Spaces, Neon, MongoDB Atlas
Challenges I ran into
Building a Dynamic, AI-Native PDF Canvas — Creating a fully interactive PDF experience where AI feels native to the document (not bolted on) required careful coordination between PDF.js rendering, text selection, and AI action triggers. Making one feature work often broke another.
Leveraging Gemini 3's Long Context Effectively — With access to 1M+ token context, the challenge wasn't capacity — it was intelligent context selection. I had to build systems to:
- Extract only relevant sections when answering specific questions
- Maintain conversation history without exceeding context limits
- Balance between sending full chapters vs. targeted passages
Stateful Agent Architecture with LangGraph — Managing multi-turn conversations where the agent remembers both the chat history AND dynamically added book context was complex. I needed to:
- Preserve conversation state across sessions (MongoDB checkpointer)
- Dynamically inject book sections into the agent's context
- Handle context window limits gracefully as conversations grew
Prompt Engineering for Reading Scenarios — Generic prompts don't work for reading comprehension. I spent significant time engineering prompts that leverage Gemini 3's reasoning for:
- Context-aware definitions (understanding terms within the passage)
- Layered explanations (adjusting complexity based on user needs)
- Comparative analysis (relating concepts across chapters)
- Simplification without losing meaning
Balancing Speed and Quality — Even with Gemini 3's reduced latency, processing long passages for summarization takes time. I implemented:
- Progressive streaming of responses
- Smart chunking for large selections
- Fallback to Gemini Nano for simple, instant operations
Dual-Mode UX Complexity — Creating a seamless toggle between Client Mode (Gemini Nano) and Hybrid Mode (Gemini 3) that users actually understand and trust required multiple design iterations.
Accomplishments that I'm proud of
First-ever AI-native PDF reader powered by Gemini 3 — bringing Google's most advanced AI directly into document reading at a level never seen before in reading tools.
Revolutionary learning experience — Users say: "It feels like having a personal tutor inside every book." The combination of Gemini 3's reasoning + long-context understanding + reduced latency creates a truly magical reading experience.
Intelligent dual-mode architecture — Hybrid Mode (Gemini 3 for power) and Client Mode (Gemini Nano for privacy) with a user-controlled toggle that respects both privacy and performance.
Stateful AI agent with memory — Built with LangChain + LangGraph + MongoDB, enabling natural conversations about book content with follow-up questions and contextual understanding.
Zero friction UX — No app-switching, no copy-pasting, no breaking reading flow. AI actions happen instantly within the PDF itself through intuitive text selection.
Production-grade infrastructure — Full-stack deployment with React + Vercel, FastAPI + PostgreSQL + MongoDB, all connected with clean APIs and CI/CD pipelines.
Privacy-first design — In Client Mode, no text ever leaves the device. In Hybrid Mode, only selected text is sent with explicit user consent.
Leveraging Gemini 3's full potential — Successfully integrated long-context reasoning, reduced latency, and enhanced language understanding to create something genuinely new in the reading space.
What I learned
Gemini 3's long-context is a game-changer — 1M+ token context isn't just a spec — it fundamentally transforms how AI can assist with reading. The ability to "hold" entire books in context enables questions impossible with previous models.
Reduced latency matters more than expected — Gemini 3's speed improvements make the difference between "cool AI feature" and "feels like magic." Users notice and value instant responses while reading.
Context management > raw context size — Having 1M tokens available doesn't mean sending everything. Smart context selection, relevance ranking, and progressive loading are critical for both cost and quality.
Stateful agents with LangGraph + checkpointers unlock genuinely useful AI — The ability to maintain conversation history while dynamically adding book context creates experiences that feel like talking to a knowledgeable tutor who's read the same book.
Reading is a prime use case for advanced AI — Documents are where knowledge lives. Gemini 3's reasoning + multimodal understanding + speed positioned perfectly for transforming how people learn from text.
Privacy and power don't have to conflict — Dual-mode architecture (Gemini Nano for privacy, Gemini 3 for power) gives users real control over their data and experience.
PDF.js + React is production-ready — For dynamic, selectable, annotatable PDFs in web apps,
react-pdf+pdfjs-distis the gold standard.Prompt engineering for education requires domain expertise — Generic "summarize this" prompts fail in learning contexts. Effective prompts leverage understanding of pedagogy and learning theory.
What's next for DuoRead
Near-Term (Leveraging Gemini 3 Further):
Multimodal Understanding — Use Gemini 3's vision capabilities to understand images, charts, diagrams, and equations within PDFs. Imagine asking "Explain this flowchart" or "Solve this equation step-by-step."
Advanced Reasoning Tools — Integrate web search, math solver, citation finder, and code interpreter as tools for the DuoRead Agent, leveraging Gemini 3's tool-use capabilities for deeper, more accurate answers.
Voice-First Experience — Build a full voice agent powered by Gemini 3: speak to your PDF, ask questions naturally, get explanations read aloud with perfect intonation.
Cross-Document Reasoning — Enable questions across multiple books: "Compare the economic theories in Book A and Book B" or "Find connections between these three research papers."
Medium-Term:
Modular AI PDF Canvas — Package DuoRead as a plug-and-play component for LMS platforms (Moodle, Canvas, Blackboard) and note-taking apps.
Mobile Apps — Native iOS/Android with offline AI (Gemini Nano), background sync, and gesture controls.
Collaborative Learning — Share annotations, AI insights, and discussions with study groups — think "Google Docs comments" but AI-enhanced.
Personal Library with AI — Cloud + local library with AI-powered search, automatic tagging, reading progress tracking, and personalized recommendations.
Long-Term:
Enterprise Mode — Secure, on-prem deployment for sensitive documents (legal, medical, finance) with Gemini 3 Enterprise features.
Research Assistant Mode — For researchers and academics: automatic citation management, literature review assistance, methodology analysis, and paper drafting support.
Universal Knowledge Companion — Expand beyond PDFs to ePubs, web articles, videos, podcasts — Gemini 3-powered learning across all content formats.
And finally, it's live here: DuoRead
Built With
- chrome
- cursor
- docker
- fastapi
- gemini
- huggingface
- javascript
- langchain
- langgraph
- lovable
- mongodb
- postgresql
- python
- rag
- react
- vercel

Log in or sign up for Devpost to join the conversation.