Inspiration

Imagine not being able to read the label on your medication.

Not knowing if the product in your hand contains an allergen that could send you to hospital.

Not being able to tell if what you're holding is a cleaning chemical or a food product.

This is daily life for over 575,000 Australians living with blindness or low vision. The tools built to help them are failing. Be My Eyes connects users to random, unscreened volunteers and a Reddit thread with thousands of upvotes tells the real story. Wrong information. Dismissiveness. In rare cases, deliberate harm. For someone who has no way to verify the answer they're given, this isn't just a bad experience. It's a safety risk.

So we tested the AI alternative. We held a tissue box designed to look like a bag of chips in front of Be My Eyes' AI and it confidently told us it was chips. No hesitation. No "I'm not sure." Just a wrong answer, delivered with complete certainty. For a visually impaired person asking about medication dosage or food allergens, that kind of false confidence isn't a bug. It's dangerous.

And even when these tools get it right, using them is wrong. Navigating menus. Tapping tiny buttons. Reading text on a screen. Every current solution asks visually impaired users to operate an interface that was never designed for them.

We saw three broken things and decided to fix all of them.

That's why we built Iris.

What it does

The app runs in a phone browser. When the user holds the mic button and asks a question, the browser captures their voice using the Web Speech API and converts it to text. At the same moment, the camera takes a still photo using the Web Camera API.

Both the photo and the question are sent to our Express backend running on Node.js. The backend first checks if the frontend detected a barcode in the image using the browser's built-in BarcodeDetector API. If a barcode was found, it searches three free product databases in order, Open Food Facts, Open Beauty Facts, and Open Products Facts, covering food, cosmetics, and household products. If the product is found, it returns structured information immediately without needing AI. If no barcode is detected, or the product isn't in any database, the backend sends the image and question to Claude's vision API. Claude analyses the image and answers the question naturally.

One problem we identified early was that AI models exhibit false confidence. They can sound completely certain while being wrong. This is especially dangerous for visually impaired users who rely entirely on the answer they hear. To address this, we built a transparency system around Claude's responses. Claude is explicitly prompted to rate its own confidence as HIGH, MEDIUM, or LOW based on how clearly it can see the object and read any text in the image. The backend parses this rating and uses it to drive the UI, showing a confidence bar so the user understands how reliable the answer is. If confidence is medium or low, the volunteer call button is automatically surfaced. Additionally, the backend scans Claude's response for sensitive keywords such as medicines, chemicals, and hazardous materials. If detected, Claude is instructed to verbally recommend calling a volunteer in its answer, and the volunteer button appears in the UI simultaneously, so the user both hears the warning and sees the option to get human help.

The answer is sent back to the frontend and read aloud using the Web Speech Synthesis API so the user never needs to look at the screen.

How we built it

Frontend: We built the frontend in React using Vite as the build tool for fast development. Styling is done with Tailwind CSS. We used Axios for HTTP requests to the backend.

For the core accessibility features we deliberately avoided third party libraries and used native browser APIs instead where we used getUserMedia for camera access, SpeechRecognition for voice input, SpeechSynthesis for reading answers aloud, and BarcodeDetector for barcode scanning. This keeps the bundle small and the app fast on mobile.

Backend: We built the backend in Node.js with Express.js as the web framework. We used Axios to make HTTP requests from the backend to external APIs.

The backend has a single main route /scan that orchestrates the whole flow. From receiving the image and question, running the barcode lookup, and falling back to Claude if needed.

AI Integration: We integrated Claude's Vision API by Anthropic using direct HTTP calls via Axios. The image is sent as a base64 encoded string in the request body alongside a structured prompt that tells Claude how to format its confidence rating.

Deployment: Frontend deployed on Vercel, backend deployed on Render.

Challenges we ran into

  • Defining the project scope was one of our earliest challenges. We had ambitious ideas and had to make deliberate decisions about what was achievable within the hackathon timeframe.
  • AI integration proved more complex than expected. We experimented with several LLMs including Gemini and open source alternatives before realising that reliable vision capabilities required a paid API. We ultimately chose Claude by Anthropic for its accuracy in reading real-world objects and text.
  • Deployment was another learning curve. Getting the app working locally was straightforward, but deploying a fullstack application with a separate frontend and backend introduced new challenges around environment variables, CORS, and hosting configuration that we had to work through under time pressure.
  • Due to scope constraints, we were unable to fully implement the backend infrastructure for user accounts and volunteer management. However we built out the complete frontend to demonstrate the intended user experience and product vision.
  • Building as a web app introduced device compatibility issues. Browser APIs like BarcodeDetectorand camera access behave differently across devices, causing layout and functionality inconsistencies. Going forward we would build this as a native mobile app to take full advantage of device hardware, ensure consistent formatting across screen sizes, and deliver a more reliable experience for the visually impaired community we are designing for.

Accomplishments that we're proud of

We're proud that we built something that actually works. A real application where you point a camera at an object, ask a question out loud, and hear a real answer powered by a backend we built, connected to an AI we integrated, talking to databases we wired up ourselves. We're also proud that what we built directly addresses the problems we set out to solve. The confidence system exists because we genuinely worry about a visually impaired person acting on a wrong answer about their medication. The three database barcode lookup exists because we wanted structured, reliable product information before falling back to AI. The voice first interface exists because we refused to build another app that makes visually impaired users navigate a screen. We identified real problems, made real design decisions, and shipped something real.

What we learned

We learned that accessibility is hard to get right and easy to get wrong. It's not enough to build something that works for sighted users and assume it transfers. Every design decision, from how the mic button works to how the answer is read aloud, had to be reconsidered through the lens of someone who can't see the screen.

We learned that AI confidence is a product problem, not just a technical one. Deciding how to communicate uncertainty to a user who is relying entirely on what they hear requires real thought about trust, safety, and responsibility.

We learned how to build and deploy a fullstack application under pressure, wiring together a React frontend, a Node.js backend, a third party AI API, and multiple product databases, then getting it live on Vercel and Render in a single night.

What's next for iris

The most immediate priority is completing the backend infrastructure for user accounts such as allowing visually impaired users to create profiles, save their scan history, and set personal preferences like preferred language and voice settings. On the volunteer side, we want to build a proper volunteer management system with background verification and screening to ensure the safety of users, addressing one of the core problems we identified with existing solutions like Be My Eyes.

We also want to give users control over the voice that reads answers aloud. Different voices, accents, and speeds can make a significant difference in accessibility and comfort, and the Web Speech Synthesis API already supports this but it just needs to be surfaced as a user preference. The biggest feature we want to build next is conversational context. Right now the app answers one question per scan and resets. In reality, a user might want to follow up by asking about specific ingredients after hearing the product name, or asking a clarifying question about a warning label. We want to build a persistent conversation around each product scan, so the user can have a natural back-and-forth dialogue rather than having to re-scan every time they want to know something new.

Further down the road, we would rebuild this as a native mobile app to resolve the device compatibility issues we encountered with browser APIs, ensure consistent formatting across all screen sizes, and take full advantage of mobile hardware for faster and more reliable barcode scanning and camera access.

Built With

Share this project:

Updates