Problem

Learning off of PDFs suck, yet every course requires it.

Chrome’s built-in PDF reader is extremely basic and makes it difficult—and demotivating—to study 400-page textbooks or read through hundreds of research papers. Uploading large PDFs to external AI platforms introduces too much friction, including slow upload times, paywalls, and clunky interfaces. Students are left no choice but to spend energy on navigating a difficult interface, rather than learning.

Solution

documind is a Chrome extension that automatically intercepts Chrome's default PDF reader, introducing an interface with a variety of study resources. Other than installing the app on Chrome Web Store, there is no extra set-up, new platform or additional learning curve to using this product. This sets our project apart from the thousands of study tools that force students to change their entire study routine to be used.

Features

  • PDF reopens on the last visited page
  • Add comments and inline highlights, and access these through bookmarks
  • Get AI explanations on any highlighted passage
  • Table of Contents are automatically generated to easily navigate between sections
  • Smart Reader Mode: Common terms that may need clarification are underlined. Clicking on these terms will generate AI definitions and key notes.
  • Teacher*: A chatbot that answers your questions using strictly the textbook content and notes you choose to include as context.
  • Text-to-speech of any highlight passage or AI summary for on-the-go learning
  • Download the modified PDF with all notes, highlights, and comments
  • All of Chrome’s existing PDF features (annotating, zoom, print, two-page view etc.)

How we built it

We built Documind as a modular Chrome extension using React and TypeScript, with Vite for development and Tailwind CSS for styling. PDF rendering and annotation are handled by PDF.js. The extension architecture separates background scripts, offscreen workers, and the UI. For AI features, we implemented chunking and text embedding in the offscreen scripts to be used as context for AI summarization and chatbot responses, powered by Gemini. ElevenLabs is used for text-to-speech functionality, allowing users to listen to document content. All major features—including chat, highlighting, and term definitions—run client-side for privacy and responsiveness.

Challenges we ran into

  • Whole textbooks were too long to fit into LLM context windows
  • Database versioning issues between AI generated data, user notes, drawings & comments, and other PDF data
  • Numerous merge conflicts that were challenging to resolve during development
  • ElevenLabs audio playback integration on the frontend
  • Chunkr API hitting processing limits and needing to switch models
  • Getting rate-limited by Gemini API

Accomplishments we’re proud of

  • Accurately generating a table of contents
  • Exporting PDFs with native notes and illustrations embedded
  • Efficiently chunking PDFs for quick AI response times
  • Polishing the frontend to be as sleek as Chrome’s original reader, but with far more functionality

What we learned

  • How to build a Chrome extension
  • How PDFs are rendered under the hood
  • How to perform semantic search and text embedding

What’s next for documind

  • Workspaces: share AI context across multiple files
  • Collaborative editing and annotation across PDFs
  • Intercepting PDFs embedded on webpages (e.g. Brightspace)
  • Monetizing AI features, publishing on the Chrome Web Store, and growing our user base
  • Expanding to other browsers

Built With

Share this project:

Updates