DuoRead

Inspiration

I love reading books, especially self-help ones. And as an AI Engineer, I also have to go through a lot of research papers and technical literature. At other times, I'm reading and learning from my course books.

Most of the self-help books and technical literature I love are in English, and my university courses are also completely in English. But I'm a non-native English speaker.

While reading PDFs, I'd constantly hit a wall — a complex phrase, an idiom, a dense paragraph — and break my flow to switch apps: Google Translate, dictionary, ChatGPT, notes. It was exhausting.

I realized: if I'm struggling, millions of language learners, students, and professionals are too. That's when I thought of building something to help readers and learners read, understand, and learn all in one place with the help of Gemini 3's revolutionary multimodal capabilities.

I wanted to leverage Gemini 3's enhanced reasoning and reduced latency to create an AI that doesn't just translate or summarize — but thinks with you as you read. I thought:

"What if the PDF itself became your AI-powered tutor?"

That spark became DuoRead AI — powered by Gemini 3.

What it does

DuoRead transforms any static PDF into a dynamic, AI-powered learning canvas — no app-switching required.

DuoRead Agent

Core Features Powered by Gemini 3:

DuoRead Agent — A conversational AI assistant that understands your entire book, powered by Gemini 3's long-context reasoning. Ask complex questions, get contextual answers, and have follow-up discussions.
Instant Translation — Translate any selected text into your native language with Gemini 3's multilingual capabilities
Smart Simplification — Break down complex academic or technical passages using Gemini 3's advanced language understanding
Contextual Summarization — Summarize chapters, sections, or selected passages while preserving key insights
Deep Explanations — Get ELI5 explanations, detailed breakdowns, or context-aware definitions using Gemini 3's reasoning engine
Pronunciation & Read Aloud — Hear correct pronunciation and have entire pages read to you
Smart Sticky Notes — Add annotations that sync across devices

Dual AI Architecture:

Hybrid Mode: Powered by Gemini 3 API — unlimited context, advanced reasoning, multimodal understanding
Client Mode: Chrome's Gemini Nano for instant, privacy-first operations

Works as a web app accessible anywhere.

How I built it

DuoRead AI is a full-stack, Gemini 3-powered reading platform that puts Google's most advanced AI directly into your reading experience.

Frontend Architecture: The frontend is built with React + Vite and deployed on Vercel. For PDF rendering, I used react-pdf (v10.2.0) as the React wrapper and pdfjs-dist (v5.4.296) — Mozilla's PDF.js — as the core engine. This combination enables dynamic text extraction, precise selection, and performance optimization.

PDFs are loaded as blobs, converted to object URLs, and rendered with full text and annotation layers enabled. A custom selection overlay captures user highlights and triggers a floating AI action menu — all interactions are instant and contextual.

Gemini 3 Integration (Core AI Layer): The heart of DuoRead is Gemini 3 (gemini-2.5-flash) accessed via the Google AI Studio API. Here's how I leveraged Gemini 3's capabilities:

Long-Context Reasoning — Gemini 3 processes entire book chapters (up to 1M tokens) to answer complex questions like "How do the concepts in Chapter 3 relate to the conclusion?" or "Compare the author's arguments in sections 2 and 5."
Enhanced Multimodal Understanding — When users select text containing references to figures, tables, or diagrams, Gemini 3 provides context-aware explanations that reference visual elements (future OCR integration planned).
Reduced Latency — Gemini 3's speed improvements enable real-time AI responses while reading. Users get instant translations, summaries, and explanations without breaking flow.
Advanced Reasoning for Learning — I engineered specialized prompts that leverage Gemini 3's reasoning capabilities:
- Contextual Definitions: "Define [term] in the context of this passage"
- Layered Explanations: "Explain this concept at beginner, intermediate, and expert levels"
- Comparative Analysis: "How does this section's argument differ from [previous section]?"

LangChain + LangGraph Orchestration: I built a stateful AI agent using LangChain + LangGraph that maintains conversation history and book context across interactions. The agent uses MongoDB as a checkpointer to preserve multi-turn reasoning — enabling follow-up questions like:

User: "Summarize Chapter 5"
Agent: [Provides summary via Gemini 3]
User: "What are the practical applications?"
Agent: [Continues with context from previous summary]

Backend Infrastructure: The backend is FastAPI in Python, serving as a high-performance gateway to Gemini 3. User data, uploaded books, and vector embeddings for semantic search are stored in PostgreSQL with the pgvector extension — enabling similarity-based search across book content.

Dual-Mode Intelligence:

Hybrid Mode (Gemini 3): Full power for complex reasoning, long documents, and advanced queries
Client Mode (Gemini Nano): Chrome's built-in AI for instant, offline operations like quick translations and simple definitions

Privacy-First Design: In Client Mode, zero data leaves the device. In Hybrid Mode, only selected text is sent to Gemini 3 with user consent. No full documents are uploaded without explicit permission.

The entire system is deployed with Vercel (frontend), Hugging Face Docker Spaces (backend), Neon (PostgreSQL), and MongoDB Atlas (agent state).

Technology Stack:

AI: Gemini 3 API (gemini-2.5-flash), Chrome Gemini Nano
Orchestration: LangChain, LangGraph
Frontend: React, Vite, react-pdf, pdfjs-dist
Backend: FastAPI, PostgreSQL (pgvector), MongoDB Atlas
Deployment: Vercel, Hugging Face Spaces, Neon, MongoDB Atlas

Challenges I ran into

Building a Dynamic, AI-Native PDF Canvas — Creating a fully interactive PDF experience where AI feels native to the document (not bolted on) required careful coordination between PDF.js rendering, text selection, and AI action triggers. Making one feature work often broke another.

Leveraging Gemini 3's Long Context Effectively — With access to 1M+ token context, the challenge wasn't capacity — it was intelligent context selection. I had to build systems to:

Extract only relevant sections when answering specific questions
Maintain conversation history without exceeding context limits
Balance between sending full chapters vs. targeted passages

Stateful Agent Architecture with LangGraph — Managing multi-turn conversations where the agent remembers both the chat history AND dynamically added book context was complex. I needed to:

Preserve conversation state across sessions (MongoDB checkpointer)
Dynamically inject book sections into the agent's context
Handle context window limits gracefully as conversations grew

Prompt Engineering for Reading Scenarios — Generic prompts don't work for reading comprehension. I spent significant time engineering prompts that leverage Gemini 3's reasoning for:

Context-aware definitions (understanding terms within the passage)
Layered explanations (adjusting complexity based on user needs)
Comparative analysis (relating concepts across chapters)
Simplification without losing meaning

Balancing Speed and Quality — Even with Gemini 3's reduced latency, processing long passages for summarization takes time. I implemented:

Progressive streaming of responses
Smart chunking for large selections
Fallback to Gemini Nano for simple, instant operations

Dual-Mode UX Complexity — Creating a seamless toggle between Client Mode (Gemini Nano) and Hybrid Mode (Gemini 3) that users actually understand and trust required multiple design iterations.

Accomplishments that I'm proud of

First-ever AI-native PDF reader powered by Gemini 3 — bringing Google's most advanced AI directly into document reading at a level never seen before in reading tools.
Revolutionary learning experience — Users say: "It feels like having a personal tutor inside every book." The combination of Gemini 3's reasoning + long-context understanding + reduced latency creates a truly magical reading experience.
Intelligent dual-mode architecture — Hybrid Mode (Gemini 3 for power) and Client Mode (Gemini Nano for privacy) with a user-controlled toggle that respects both privacy and performance.
Stateful AI agent with memory — Built with LangChain + LangGraph + MongoDB, enabling natural conversations about book content with follow-up questions and contextual understanding.
Zero friction UX — No app-switching, no copy-pasting, no breaking reading flow. AI actions happen instantly within the PDF itself through intuitive text selection.
Production-grade infrastructure — Full-stack deployment with React + Vercel, FastAPI + PostgreSQL + MongoDB, all connected with clean APIs and CI/CD pipelines.
Privacy-first design — In Client Mode, no text ever leaves the device. In Hybrid Mode, only selected text is sent with explicit user consent.
Leveraging Gemini 3's full potential — Successfully integrated long-context reasoning, reduced latency, and enhanced language understanding to create something genuinely new in the reading space.

What I learned

Gemini 3's long-context is a game-changer — 1M+ token context isn't just a spec — it fundamentally transforms how AI can assist with reading. The ability to "hold" entire books in context enables questions impossible with previous models.
Reduced latency matters more than expected — Gemini 3's speed improvements make the difference between "cool AI feature" and "feels like magic." Users notice and value instant responses while reading.
Context management > raw context size — Having 1M tokens available doesn't mean sending everything. Smart context selection, relevance ranking, and progressive loading are critical for both cost and quality.
Stateful agents with LangGraph + checkpointers unlock genuinely useful AI — The ability to maintain conversation history while dynamically adding book context creates experiences that feel like talking to a knowledgeable tutor who's read the same book.
Reading is a prime use case for advanced AI — Documents are where knowledge lives. Gemini 3's reasoning + multimodal understanding + speed positioned perfectly for transforming how people learn from text.
Privacy and power don't have to conflict — Dual-mode architecture (Gemini Nano for privacy, Gemini 3 for power) gives users real control over their data and experience.
PDF.js + React is production-ready — For dynamic, selectable, annotatable PDFs in web apps, react-pdf + pdfjs-dist is the gold standard.
Prompt engineering for education requires domain expertise — Generic "summarize this" prompts fail in learning contexts. Effective prompts leverage understanding of pedagogy and learning theory.

What's next for DuoRead

Near-Term (Leveraging Gemini 3 Further):

Multimodal Understanding — Use Gemini 3's vision capabilities to understand images, charts, diagrams, and equations within PDFs. Imagine asking "Explain this flowchart" or "Solve this equation step-by-step."
Advanced Reasoning Tools — Integrate web search, math solver, citation finder, and code interpreter as tools for the DuoRead Agent, leveraging Gemini 3's tool-use capabilities for deeper, more accurate answers.
Voice-First Experience — Build a full voice agent powered by Gemini 3: speak to your PDF, ask questions naturally, get explanations read aloud with perfect intonation.
Cross-Document Reasoning — Enable questions across multiple books: "Compare the economic theories in Book A and Book B" or "Find connections between these three research papers."

Medium-Term:

Modular AI PDF Canvas — Package DuoRead as a plug-and-play component for LMS platforms (Moodle, Canvas, Blackboard) and note-taking apps.
Mobile Apps — Native iOS/Android with offline AI (Gemini Nano), background sync, and gesture controls.
Collaborative Learning — Share annotations, AI insights, and discussions with study groups — think "Google Docs comments" but AI-enhanced.
Personal Library with AI — Cloud + local library with AI-powered search, automatic tagging, reading progress tracking, and personalized recommendations.

Long-Term:

Enterprise Mode — Secure, on-prem deployment for sensitive documents (legal, medical, finance) with Gemini 3 Enterprise features.
Research Assistant Mode — For researchers and academics: automatic citation management, literature review assistance, methodology analysis, and paper drafting support.
Universal Knowledge Companion — Expand beyond PDFs to ePubs, web articles, videos, podcasts — Gemini 3-powered learning across all content formats.

And finally, it's live here: DuoRead

Built With

chrome
cursor
docker
fastapi
gemini
huggingface
javascript
langchain
langgraph
lovable
mongodb
postgresql
python
rag
react
vercel

Updates

MUHAMMAD ZAIN ATTIQ started this project — Feb 09, 2026 04:17 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.