Inspiration
I love reading books, especially self-help ones. And as an AI Engineer, I also have to go through a lot of research papers and technical literature. At other times, I’m reading and learning from my course books.
Most of the self-help books and technical literature I love are in English, and my university courses are also completely in English. But I’m a non-native English speaker.
While reading PDFs, I’d constantly hit a wall — a complex phrase, an idiom, a dense paragraph — and break my flow to switch apps: Google Translate, dictionary, ChatGPT, notes. It was exhausting.
I realized: if I’m struggling, millions of language learners, students, and professionals are too. That’s when I thought of building something to help readers and learners read, understand, and learn all in one place with the help of a Dynamic Agentic AI-Powered Platform.
In some cases, I was worried about privacy and reliability — and that’s when Chrome’s Gemini Nano caught my eye: AI that runs offline, in the browser, with full privacy. I thought:
"What if the PDF itself became your tutor?"
That spark became DuoRead AI.
What it does
DuoRead turns any static PDF into a dynamic AI canvas — no app-switching, no internet required (in Client Mode).

- DuoRead Agent Side running chat agent. Ask it anything about the book.
- Translate Translate into your native language (Translator API)
- Simplify complex sentences (Rewriter API)
- Summarize long passages (Summarizer API)
- Explain concepts in context (Writer API)
- Definitions defines words or terms (Prompt API)
- Pronunciation& Read Aloud pronounce words or read aloud whole pages (Speech API)
- Add sticky notes that save offline
- Toggle Dual Modes:
- Client Mode: 100% Gemini Nano — offline, private, instant
- Hybrid Mode: Gemini Developer API — unlimited power
- Client Mode: 100% Gemini Nano — offline, private, instant
Works as a web app
How we built it
DuoRead AI is a full-stack, dual-mode AI reading platform built in under 48 hours using Chrome’s built-in Gemini Nano AI, modern web technologies, and a smart hybrid backend that scales from offline to cloud seamlessly.
The frontend is a React + Vite web app deployed on Vercel. For dynamic PDF rendering, we used a powerful dual-library approach: react-pdf (v10.2.0) as the primary React wrapper and pdfjs-dist (v5.4.296) — Mozilla’s PDF.js — as the core rendering and parsing engine. This combination gives us React-friendly components while retaining full control over text extraction, selection, and performance.
PDFs are fetched as blobs from the API, converted to object URLs using URL.createObjectURL(), and loaded dynamically. The <Document> component wraps the file, and each <Page> is rendered with text and annotation layers enabled (renderTextLayer={true}, renderAnnotationLayer={true}) — critical for accurate text selection and future annotation features. The PDF.js worker is loaded from a CDN (unpkg.com/pdfjs-dist) to keep parsing non-blocking and fast.
A custom selection overlay system listens for user highlights. When text is selected, we capture the DOM range and map it to PDF coordinates using PDF.js internals. This triggers a floating AI action menu with options to translate, summarize, simplify, explain, or add a sticky note — all powered directly in the browser.
The core AI layer runs entirely on Chrome’s built-in AI APIs (window.ai.*) in Client Mode. Translation uses the Translator API, summarization uses the Summarizer API, simplification uses the Rewriter API, and context-aware explanations use the Prompt API — all powered by Gemini Nano, running 100% on-device, with zero network calls, no data leaving the browser, and instant response times, even offline.
We designed Dual AI Modes with a user-controlled toggle. In Client Mode, everything stays local and private. In Hybrid Mode, when tasks exceed Gemini Nano’s limits (like summarizing long passages), the app intelligently falls back to a cloud backend. This mode activates only with user consent and sends only the selected text.
The backend (Hybrid Mode) is powered by FastAPI in Python, acting as a high-performance API gateway with async endpoints, JWT authentication. User data, uploaded books, and vector embeddings for semantic search are stored in PostgreSQL using the pgvector extension for efficient similarity lookups. MongoDB serves as the checkpointer for LangGraph agents, preserving multi-step reasoning state across interactions — for example, when a user asks follow-up questions about a previous explanation.
LangChain + LangGraph orchestrate complex AI workflows: chaining simplification, summarization, and explanation in a single agent flow with memory and tool use. When needed, it calls the Gemini Developer API (gemini-2.5-flash) for long-context or computationally heavy tasks.
Offline persistence is handled via IndexedDB for notes, highlights, and preferences. In Hybrid Mode, local changes sync to PostgreSQL on reconnect. The entire system is deployed with Vercel (frontend), Hugging Face Docker Spaces (backend), Neon (PostgreSQL), and MongoDB Atlas.
We prioritized working AI in a real PDF first, then dual-mode reliability, and finally demo polish — resulting in a magical, production-ready experience that redefines how people read and learn.
Challenges we ran into
Building a Dynamic PDF Canvas — Building a fully Dynamic and AI-Native PDF Canvas was the biggest problem I faced. I try to make working one thing, and the other was getting affected. So it took a lot of time to manage each and every action and feature we are providing on the Canvas
Building and Managing Context of DuoRead Agent — Building the DuoRead Agent logic on both the frontend and backend was one of the toughest challenges. It was difficult to provide the agent with the right context from the book dynamically while considering the context window.
LangGraph Agent State Management — Managing the state of the agent was another world of challenges. It was hard to maintain the context of the conversation along with the context from the book, which the user dynamically adds to the chat. This required dynamic state management.
Gemini Nano Token Limits — Summarization fails on long text.
Offline-First UX — Gracefully degrading to Hybrid Mode.
Prompt Engineering — Engineering the right prompts for definitions, synonyms, explanations, simplifications, summarizations, and chat agents took a lot of time and iteration to reach a stable position.
Accomplishments that we're proud of
- First-ever PDF reader powered by Chrome’s Gemini Nano — bringing on-device AI into document interaction at a level never seen before in hackathons or production apps.
- True dual-mode AI architecture — Client Mode (100% offline, private, instant) and Hybrid Mode (cloud-powered with LangChain + Gemini) — with a user-controlled toggle that respects both privacy and performance.
- Seamless text selection + AI actions using
react-pdf+pdfjs-dist— enabling context-aware translation, summarization, simplification, and explanation directly inside any PDF. - Zero data leakage in Client Mode — no text ever leaves the device, verified with network logs and privacy-first design.
- Production-grade stack: React + Vite + Vercel, FastAPI + PostgreSQL + MongoDB + LangChain, all connected with clean APIs and CI/CD.
- Magical UX — users say: “It feels like the PDF is alive.”
What we learned
- Gemini Nano is production-ready — fast, accurate, and truly offline. It’s not just a demo tool — it powers real apps.
- Hybrid AI > all-or-nothing — users want control: speed vs power, privacy vs capability. A toggle beats auto-fallback.
react-pdf+pdfjs-distis the gold standard — for dynamic, selectable, annotatable PDFs in React.- LangGraph + MongoDB checkpointers unlock stateful AI agents in document tools — enabling follow-up questions and memory.
- Privacy is a feature — not a compliance checkbox. Users notice and trust when nothing leaves their device.
What's next for DuoRead
- Modular AI PDF Canvas — a plug-and-play component to integrate DuoRead into any LMS (Moodle, Canvas, Blackboard).
- Mobile App — native iOS/Android with offline AI, background sync, and gesture controls.
- Full Voice Agent — speak to your PDF: ask questions, get summaries, hear explanations aloud.
- Multi-Modularity for DuoRead Agent — understand images, charts, diagrams in PDFs (OCR + vision models).
- More Tools to DuoRead Agent — web search, math solver, citation finder, code interpreter — for deeper, smarter answers about the book.
- Enterprise Mode — secure, on-prem Hybrid backend for sensitive documents (legal, medical, finance).
- Book's Library — personal cloud + local library with AI-powered search, tagging, and reading progress.
And finally its live here: DuoRead
Built With
- docker
- fastapi
- gemini
- huggingface
- javascript
- langchain
- langgraph
- mongodb
- postgresql
- python
- rag
- react
- typecript
- vercel
- vite


Log in or sign up for Devpost to join the conversation.