DEMO LINKS: Presentation Demo

Inspiration

As 3 people coming from research and corporate backgrounds, large, complicated documentation constantly plagues our everyday lives. A common theme we all noticed in the way we learn while building this project is there are clear signals from where you are glancing on the page and how you are interacting with a document to show when you are confused. While many times confusion may drive you to explore more, some times you glance over important details and miss critical context.

What it does

DocuParse is your AI agent companion that goes through an interactive experience of reading through a document with you. As you go through the document, it checks areas that you have constantly reviewed or areas that have important context and relevance based on similarity to other parts of the text. When these are detected based on patterns in eye movement, interactions with the document in Focus Mode, or relevancies to the text, the agent will prompt you to ensure that you have the maximal understanding of what is going on.

How we built it

We started with Bolt to create the first interactive prototype of the application.

The platform starts by converting the PDF to DOM for easier analysis and retrieval from the agent via the PDF.co API.

Tech Stack

Frontend: Typescript, Next.js, and TailwindCSS for styling Backend: FastAPI, Supabase for database and vector db work Agent: We used Dev MCP servers to add additional context to the agent for different high-signal sections. We deployed a RAG agent through OpenAI API on the backend to host the agent. Eye Tracking Module: We used the open source Webgazer.js to control most our eyetracking module and used HTML2Canvas to do the screenshots for analysis.

  • We use a grid-based clustering algorithm that identifies high-focus areas by counting gaze points per cell. Cells with enough points form clusters, and we compute their center of mass. The largest cluster’s center is used to grab a screenshot for agent memory.

Challenges we ran into

Our first biggest challenge was eye tracking. The open source documentation for eye tracking is rather outdated and not very good. Most of what eye tracking today is is not interactive with your screen and requires a lot of calibration to be somewhat useable.

Our next biggest challenge was utilizing the Dex MCP servers in conjunction with the application. We are pretty new to MCP overall and how it works, so there was a learning curve there. On top of it, there were a lot of new functions and learnings that we needed to work around with the Dex MCP server.

Accomplishments that we're proud of

Converting PDF to DOM: We worked through a bunch of different methods to try to break down the PDF and at the end, we got to an interesting way of using the DOM to retrieve content super easily.

MVP on attention-based learning: This project was rather ambitious, but in such short time, we were able to learn a lot about how we can create a interactive learning experience for an everyday task even despite the number of moving parts. We also introduced this idea that can basically be used for any context even beyond research papers.

What we learned

We learned a lot about the emerging fields of MCP and AI agents, getting real world exposure to them. Additionally, being able to work with the eye tracking module opened up a lot of insights into an interesting application of computer vision that we have never explored before.

What's next for DocuParse

  • The current eye tracking module is rather "finnicky." The idea is that we can make seamless calibration experience, have a more clear integration pipeline with key areas of interest based on the eye tracking, and finally utilize more intense computer vision techniques to optimize the eye tracking accuracy.
  • We could use more MCPs to integrate more of external context on the agent.
  • We can improve some of the frontend elements to make it more interactive and ensure that we are showing users more about their specific value add.

Built With

  • bolt
  • dev-mcp
  • fastapi
  • html2canvas
  • next.js
  • openai-api
  • pdf.co-api
  • python
  • supabase
  • tailwindcss
  • typescript
  • webgazer.js
Share this project:

Updates