## Inspiration Guide dogs are incredible - they navigate blind people through crowds, avoid obstacles, and keep their owners safe. But they can't tell you there's a Tim Hortons on your right, or that someone is approaching from your left, or whether you're holding a ketchup bottle or an icecream container. We built Insight to fill that gap: a real-time visual assistant that gives blind people the spatial awareness a guide dog can't.

## What it does Insight uses your phone's camera to continuously narrate your surroundings — obstacles, people, landmarks, hazards — in short, direct spoken sentences. Tap once to ask a question ("am I holding salt or pepper?", "is there anyone near me?"). Say a destination and it navigates you there with turn-by-turn walking directions. Haptic feedback gives physical cues for hazards, turns, and arrival. A Supabase/PostgreSQL backend stores your session history so Insight gets smarter over time and never repeats itself.

## How we built it

  • Claude Haiku for real-time vision and language — one API call handles both scene understanding and natural language output, no separate CV pipeline needed
  • Google Places + Directions APIs for walking navigation from natural language queries
  • SFSpeechRecognizer for on-device voice input with zero latency
  • AVFoundation for camera capture and text-to-speech
  • CoreLocation for GPS step tracking
  • Supabase (Postgres) for long-term memory, keyed to a UUID auto-generated in the iOS Keychain — no signup, no friction
  • On-device 16x16 pixel scene change detection to avoid unnecessary API calls
  • On-device keyword detection for navigation intent — no extra API round trip

## Challenges we ran into Getting narration, Q&A, and navigation instructions to interleave smoothly without ever cutting each other off was harder than expected. We built a priority speech queue and a pending instruction buffer so Insight always finishes its current sentence before speaking the next thing. GPS unreliability indoors required a graceful fallback to hardcoded coordinates for demo purposes. iOS TTS reading road abbreviations literally ("Dr" as "dee-ar") required a custom abbreviation expansion layer before every spoken instruction.

## Accomplishments that we're proud of Building a genuinely useful accessibility tool in under 10 hours as a solo developer. The latency — under 2 seconds from scene change to spoken words — feels fast enough to be actually useful in real life. The haptic feedback system creates a parallel physical communication channel that sighted people don't think about but blind users will immediately feel the value of.

## What we learned Traditional computer vision models like YOLO give you bounding boxes and labels. Blind people don't need bounding boxes — they need judgment, context, and spatial reasoning. Claude handles both the vision and the language in one call, which is architecturally simpler and faster than a two-model pipeline. Accessibility apps require a completely different design philosophy: every word Evan speaks costs the user time and attention, so brevity isn't a nice-to-have, it's a core requirement.

## What's next for Insight

  • Face recognition for familiar people ("Sarah is near the entrance, she's waving")
  • "Lost item" radar mode — pan the room and haptics guide you to your keys
  • Outfit coordination ("those socks don't match")
  • Richer long-term memory — learning your daily routes and flagging changes
  • Social mode for networking events — real-time vibe reading and crowd awareness
Share this project:

Updates