documind

Problem

Learning off of PDFs suck, yet every course requires it.

Chrome’s built-in PDF reader is extremely basic and makes it difficult—and demotivating—to study 400-page textbooks or read through hundreds of research papers. Uploading large PDFs to external AI platforms introduces too much friction, including slow upload times, paywalls, and clunky interfaces. Students are left no choice but to spend energy on navigating a difficult interface, rather than learning.

Solution

documind is a Chrome extension that automatically intercepts Chrome's default PDF reader, introducing an interface with a variety of study resources. Other than installing the app on Chrome Web Store, there is no extra set-up, new platform or additional learning curve to using this product. This sets our project apart from the thousands of study tools that force students to change their entire study routine to be used.

Features

PDF reopens on the last visited page
Add comments and inline highlights, and access these through bookmarks
Get AI explanations on any highlighted passage
Table of Contents are automatically generated to easily navigate between sections
Smart Reader Mode: Common terms that may need clarification are underlined. Clicking on these terms will generate AI definitions and key notes.
Teacher*: A chatbot that answers your questions using strictly the textbook content and notes you choose to include as context.
Text-to-speech of any highlight passage or AI summary for on-the-go learning
Download the modified PDF with all notes, highlights, and comments
All of Chrome’s existing PDF features (annotating, zoom, print, two-page view etc.)

How we built it

We built Documind as a modular Chrome extension using React and TypeScript, with Vite for development and Tailwind CSS for styling. PDF rendering and annotation are handled by PDF.js. The extension architecture separates background scripts, offscreen workers, and the UI. For AI features, we implemented chunking and text embedding in the offscreen scripts to be used as context for AI summarization and chatbot responses, powered by Gemini. ElevenLabs is used for text-to-speech functionality, allowing users to listen to document content. All major features—including chat, highlighting, and term definitions—run client-side for privacy and responsiveness.

Challenges we ran into

Whole textbooks were too long to fit into LLM context windows
Database versioning issues between AI generated data, user notes, drawings & comments, and other PDF data
Numerous merge conflicts that were challenging to resolve during development
ElevenLabs audio playback integration on the frontend
Chunkr API hitting processing limits and needing to switch models
Getting rate-limited by Gemini API

Accomplishments we’re proud of

Accurately generating a table of contents
Exporting PDFs with native notes and illustrations embedded
Efficiently chunking PDFs for quick AI response times
Polishing the frontend to be as sleek as Chrome’s original reader, but with far more functionality

What we learned

How to build a Chrome extension
How PDFs are rendered under the hood
How to perform semantic search and text embedding

What’s next for documind

Workspaces: share AI context across multiple files
Collaborative editing and annotation across PDFs
Intercepting PDFs embedded on webpages (e.g. Brightspace)
Monetizing AI features, publishing on the Chrome Web Store, and growing our user base
Expanding to other browsers