FujiVoice

Inspiration

So much learning stalls not at the final answer, but at the moment a student is unsure how to proceed. A small slip or unclear next step interrupts momentum, confidence dips, and the confusion can solidify into habits or misconceptions. Many tools rush to finished solutions or make AI help easy to copy, while few offer timely guidance in the middle of doing the problem. We set out to meet all students where they are, including students with anxiety, low vision, dyslexia, and ADHD. Our answer is a real-time, accessibility-first math coach that observes each step as you write, quietly auto-captures your work when you pause, and provides concise audio feedback with live captions so you can keep moving.

What it does

FujiVoice is a real-time, voice-optional math coach that sits on your worksheet. You upload a PDF and write directly on the page; as you finish pen strokes or pause, the app smoothly auto-captures your work in the background without changing the view. When it detects an issue, it plays a short, clear audio explanation with live captions so you can keep working without interruption. If you want extra help, tap Talk and speak what is confusing; FujiVoice combines your spoken question with the most recent auto-capture to deliver targeted guidance.

How we built it

We combined a JavaScript frontend (Next.js + React) with a Python backend (FastAPI) to deliver real-time help in the flow of writing. As you finish pen strokes, the app quietly auto-captures the work, sends the crop to Mathpix for STEM OCR, passes the result to Gemini for a concise tutoring response, and streams the explanation through ElevenLabs so you hear feedback with live captions, fast, clear, and accessible.

Challenges we ran into

Our first hurdle was rendering the PDF and ink layer together so a single snapshot captured both the page and our handwriting, then sending that image to OCR and getting accurate LaTeX. Early captures kept grabbing one layer or the other, so we reworked layering, timing, and scaling to align them. We also hit friction deploying and wiring the frontend and backend on Vercel, from CORS to environment routing. Beyond tech, the scope was big and unclear at first, so breaking the idea into concrete steps took iteration; when one piece stalled, it caused difficulties.

Accomplishments that we're proud of

We pushed past every blocker and shipped the system we envisioned. We cracked reliable PDF+ink captures, wired Mathpix → Gemini → ElevenLabs into a fast loop, and now FujiVoice detects handwriting and delivers real-time audio feedback with live captions while you keep writing. We bridged a JavaScript frontend and a Python backend on Vercel, hit our core goals, and the result surpassed our expectations for an end product.

What we learned

We learned how to take a big idea and make it real by breaking it into clear, shippable parts and collaborating on a full-stack app in a way none of us had before. Along the way we integrated Mathpix for STEM OCR, Gemini for concise tutoring responses, and ElevenLabs for natural audio, then stitched them into a smooth loop that helps students in real time. Most importantly, we learned how to plan a full-stack application, divide ownership, and keep momentum when one piece stalled, turning a complex concept into a focused, working product.

What's next for FujiVoice

We will add accounts and a simple dashboard, then grow FujiVoice into an insights platform. By analyzing step patterns, we will surface common mistakes for each learner and expand tools for teachers to see class wide trends and assign targeted practice.