IRIS | Devpost

Sign In Page
Home Page
Customizable User Experience
Camera-enabled Safety Goggles

Inspiration

There's a moment in every lab where a student pauses, looks around, and realizes they don't know if what they're doing is right. One teaching assistant for 40 students means most students spend their lab sessions confused, making mistakes nobody catches, and writing reports without real feedback. We wanted to fix that for any student, including those who don't have access to STEM education, those with learning disabilities/neurodivergence, and other barriers. IRIS puts an AI teaching assistant directly in the student's pocket, making STEM more accessible than ever.

What it does

IRIS is an AI-powered lab assistant mounted on safety glasses. As a student works through an experiment, IRIS watches through a camera, understands each step, and speaks real-time guidance directly to them. When something goes wrong, IRIS flags it immediately. When the student has a question, they ask out loud and get an answer in seconds. When the session ends, IRIS coaches them through their lab report but not by writing it for them, instead asking the right questions and giving the grading tips a TA would. Every part of it is personalized to what that student struggles with.

Real-time vision guidance — Overshoot's vision model watches the camera feed and confirms when each step is completed correctly, or flags an error immediately
Voice Q&A — students hold a button on the iOS app, ask any question out loud, and IRIS answers in 2-3 spoken sentences tailored to their current step
Personalized coaching — before the experiment starts, students tell IRIS what they struggle with (measurement, procedure writing, lab safety). Every tip and guidance message is tailored to those struggles throughout the session
AI experiment generation — students can search any experiment by name and IRIS generates step-by-step instructions, materials list, and visual detection conditions using Gemini
Document upload — students can upload their actual lab sheet (PDF, DOCX, or TXT) and IRIS extracts the procedure and guides them through it
Lab report coaching — at the end, IRIS doesn't write the report for the student. Instead it walks them through each section with guiding questions, grading tips from a TA's perspective, and specific coaching for their struggle areas
Experiment summary — a spoken summary of what the student accomplished is played at the end of every session

How we built it

Hardware — An ESP32-S3 camera module mounted on standard safety glasses streams a first-person MJPEG video feed over WiFi to the backend.

Backend — FastAPI server written in Python handling all AI orchestration. Gemini 2.5 Flash powers experiment generation, step-by-step instructions, Q&A answers, contextual explanations, and lab report coaching. Overshoot's vision API watches the camera stream and returns structured JSON confirming whether each step is complete or flagging errors. ElevenLabs Flash TTS converts all text responses to natural spoken audio delivered to the student's phone via WebSocket.

iOS App — Built in SwiftUI. Displays the current step, plays audio guidance through the speaker, handles voice input via SFSpeechRecognizer for Q&A, and manages the full experiment flow from search through report generation.

Auth — Auth0 secures the AI agent-to-student relationship. Only authenticated students can start lab sessions, ensuring IRIS acts within an authorized student's context.

Key integrations — Gemini 2.5 Flash, ElevenLabs Flash, Overshoot vision API, Auth0, ESP32 camera hardware.

Challenges we ran into

The hardest problem was network architecture at a hackathon. The ESP32 camera needed a private WiFi router to maintain a stable stream — it couldn't reliably connect to public or hotspot networks. This meant the hardware demo had to run on a dedicated router completely separate from the app's network, requiring careful coordination between two laptops and two phones all on different networks.

Overshoot's vision model proved tricky to integrate correctly. Getting it to return structured JSON with done, err, and msg fields consistently required building a robust fallback parser that handles multiple response formats. We also had to tune the CONFIRM_THRESHOLD to avoid false positives meaning the camera needs to see the completed condition twice in a row before auto-advancing to the next step.

Auth0 callback URL configuration for iOS was unexpectedly painful. Bundle ID changes broke the callback URL registration multiple times, and the iOS URL scheme had to be registered in three separate places before the login flow worked end to end.

Gemini model deprecations hit us mid-hackathon. We swapped from the deprecated google.generativeai package to the new google-genai SDK, and the model we had targeted (gemini-2.5-flash-preview-04-17) turned out to be unavailable for new API keys. We had to query the available models list and switch to gemini-2.5-flash.

Accomplishments that we're proud of

Getting real audio flowing from a camera on a pair of safety glasses to a student's iPhone in real time felt genuinely magical the first time it worked. The full pipeline from ESP32 capturing frame, Overshoot analyzubg it, Gemini enriching the response, ElevenLabs speaking it, iOS plays it is happening end to end in under two seconds is something we're proud of.

The lab report coaching output genuinely surprised us. When we tested it with struggle areas set to "measurement" and "procedure writing", it produced section-by-section guidance, passive voice cheat sheets, significant figures reminders, and a complete submission checklist that would have been useful to us as students. It teaches rather than does the work for you.

The AI-powered experiment search is a feature we love. A student can type "baking soda volcano" into a search bar and within seconds have a fully structured experiment with steps, materials, camera detection conditions, and an audio introduction ready to go and this applies for any experiment, not just preloaded ones.

What we learned

Building hardware-software systems under time pressure requires much more coordination than pure software projects. The camera, the ESP32 firmware, the server, the iOS app, and the network all have to agree on the same protocol. When one piece changes, everything breaks in unexpected ways.

We learned that vision models need very specific prompts to be useful. A generic "watch the experiment and confirm the step" prompt produces vague responses. Sami's approach of building structured per-step prompts with explicit detect conditions and errors arrays made Overshoot dramatically more accurate.

Personalization makes AI dramatically more useful. The same guidance about measurement delivered to a student who said they struggle with measurement hits differently than generic feedback. Storing student struggle areas and threading them through every Gemini prompt throughout the session was one of our best design decisions.

What's next for IRIS

Cross-Industry Applications — apply IRIS's concept to industries like construction, medicine, first responder/EMS, and more where real-time guidance can make life-changing differences.

Mistake tracking — log every error Overshoot flags during the session and include them in the end summary and report coaching. "You made 2 errors today — wrong flask on step 2, too much indicator on step 4" gives students specific things to improve.

Institution integration — connect IRIS to a university's LMS so lab report coaching is tailored to the specific rubric the professor is using, not generic TA advice.

More hardware — ESP32 is a proof of concept. A proper glasses integration with bone conduction audio, so the student hears IRIS without earphones and keeps full situational awareness in the lab, is the production version of this idea.

Automatic safety alerts — detect dangerous conditions (wrong chemical, spill, incorrect PPE) and interrupt everything with an urgent spoken alert before harm occurs.