Inspiration

One of our team member’s grandparents had cataract surgery and had to live without vision for a while. Experiencing how difficult even simple tasks became — like finding a door or a chair — made us realize just how hard it is to live without sight. That inspired us to create a tool that can restore some of that independence.

What it does

VisionAid uses AI + LiDAR + spatial audio to help people who can’t see find important objects in their environment. A user can say “Guide me to the door,” and VisionAid: Uses AI (Gemini) to recognize the door in the camera feed. Uses LiDAR to anchor it in 3D space. Plays a spatial audio beacon through AirPods, so it feels like the sound is coming directly from the door. The user simply follows the sound until they arrive.

How we built it

ARKit + LiDAR for mapping the room in 3D. Gemini Vision AI for real-time object detection. AVAudioEngine with AirPods head tracking for spatial audio beacons. Apple Speech Framework for voice commands. We integrated these into an iOS app that links camera input, AI detection, and spatial audio feedback in real time.

Challenges we ran into

Getting spatial audio to feel natural — early versions sounded like annoying chimes rather than intuitive beacons. Integrating AirPods head tracking so the sound stayed anchored even when the user turned. Bridging AI detections with LiDAR depth — Gemini gives 2D pixels, but we had to map them into real-world 3D coordinates. Debugging ARKit errors like poor SLAM tracking and frame retention issues.

Accomplishments that we're proud of

We anchored sound to real objects in the world, not just arbitrary points. We successfully fused AI object detection with ARKit spatial mapping. We built a working demo in under 24 hours that can guide someone to a door using only AirPods and an iPhone. We designed pleasant, intuitive audio cues that make navigation feel good. What we learned How to combine AI and AR in real time — taking 2D detections and lifting them into 3D. How critical user experience is for accessibility — it’s not enough for the tech to work, it has to feel intuitive and non-intrusive. The power of multimodal systems: speech, vision, spatial audio, and LiDAR working together can create experiences far beyond what any one modality could do alone.

What's next for VisionAid

Expand object library: beyond doors and chairs to personal belongings like bags, keys, or phones. Persistent mapping: save anchors so your home or office “remembers” where everything is. Obstacle awareness: use LiDAR meshes to warn users of hazards in their path. Wearable integration: extend to lightweight glasses or head-mounted cameras for hands-free use. User testing with visually impaired communities to refine the experience and prove real-world value.

Built With

Share this project:

Updates