Inspiration

We noticed how difficult and intimidating it can be for immigrants to fill out government forms — especially when English isn’t their first language. With long, complex questions and zero guidance, many give up or risk making costly mistakes. We wanted to create a tool that not only understands their voice but speaks their language — literally.

What it does

Passage is a multilingual AI-powered assistant that helps users fill out immigration forms just by having a conversation. It speaks with the user in real time, listens to their answers, and visually fills out the form for them — no typing, no guesswork.

How we built it

We used Google Gemini's Live API for real-time conversation, Azure Form Recognizer to extract fields and bounding boxes from PDFs, and react-pdf to render and annotate documents in the browser. The backend runs on Flask, streams updates via Server-Sent Events, and stores state locally. We also integrated voice recognition and PDF editing using pdf-lib to complete the loop from voice to filled-out form.

Challenges we ran into

Getting Gemini to fill fields in a natural, voice-based way without repeating tool calls was tricky. We also hit compatibility issues with PDF standards across different systems — especially when exporting modified PDFs. Handling multilingual voice input, rendering live field overlays, and syncing frontend and backend updates required a lot of careful orchestration.

Accomplishments that we're proud of

We’re proud that we got real-time voice interaction working — the assistant listens, responds, and fills in the form as you talk to it. It even works in multiple languages like Hindi, Spanish, and Chinese. And seeing the PDF update live with values you just spoke aloud? Magical.

What we learned

We learned a ton about form parsing, voice-to-text, and working with experimental APIs like Gemini Live. It pushed us to think deeply about accessibility, multilingual UX, and how to build trust in AI for critical applications.

What's next for Passage

We want to integrate support for more government forms, improve the accuracy of field matching, and support mobile browsers. Long-term, we envision Passage as a general-purpose assistant for any kind of official paperwork — a voice that speaks your language and helps you be heard.

Built With

Share this project:

Updates