Inspiration
The team consists of math majors at UH. We decided on this project based off of our experiences about the limitations of our math education as well as potential limitations other people may face.
What it does
Scans documents, images, websites, etc using Mathpix and outputs a transcription and Gemini AI generated explanation of the document in a language of your selection. It also recognizes selected text from the transcription and explanation and uses a relevancy algorithm to give related links to the user. It's able to read text out loud using our text to speech system, and is customizable so each user can use their preferred settings and display.
How we built it
We wanted to make a chrome extension so we used a react typescript frontend, and used Cloudflare workers for each of your backend processes. We all contributed into our github repo and used claude code to help accelerate our development. It finds pdfs on any page in chrome and allows you to take images of the page you are currently looking at and then extracts text from the image using our backend workers. First it calls our handwritten ocr worker which leverages chrome vision, and provides an image with the digitized handwritten text to our worker that calls the mathpix api.
Challenges we ran into
We had 3 main obstacles: Lack of Resources, Edge Cases, and Version Control. On top of the obvious time crunch, we also ran into multiple problems with API keys provided that extremely limited our ability of some of our features. On top of that, some of our language settings involved the usage of training AI with examples of handwritten Japanese text, which was a serious time wink, and we were very close to pulling the plug on it entirely multiple times. We also sank in a lot of our limited time preventing edge cases. We spent a lot of effort trying to make our final product clean and easy to use, and a consequence of that was hours of testing and bug fixing to make sure we ended up with a project we were happy with. An hour before the end of the hackathon, we had a version that was able to use our text to speech worker but when we went to push a unexplained git authentication failure occurred so we were unable to use that code. A similar case occurred for our keyword search ui implementation, as sudden problems made it not possible to make work.
Accomplishments that we're proud of
After the last 24 hours, we're extremely proud of O.W.L.'s overall functionality, to the point that, regardless of whatever happens with our program, we are still left with a program that all of us find useful and will likely be incorporating it with our own education to make our studies easier. This is part of the reason we are so proud of our product because we have created something that benefits us and can potentially help tons of other people as well.
What we learned
Our members learned a lot of valuable skills with development, including Typescript, Web Development, Cloud Workers, Alternative Operating Systems, and a whole lot more. Despite our pride in our program, we are even more prideful about something else; we will be leaving CodeRed doubling or some of us even tripling our knowledge on development.
What's next for Optical Web Scraper
We will be continuing on this project after CodeRed to optimize O.W.L. so that we nail down all of it's features so that we are left with nothing but our aspired result: The ultimate accessibility tool for STEM.

Log in or sign up for Devpost to join the conversation.