BlindSpot

BlindSpot is a program developed to help visually impaired individuals navigate a safer and smarter city by identifying surrounding objects in real time.

The program uses YOLOv8 and MiDaS models to analyze frames of video received from the user’s phone camera. When the models detect obstacles in front of you, an API request is sent to Google Gemini, which generates a custom warning message about potential obstacles by using the data found by the ML models as well as its own input on the image data.

Before the message is presented to the user, it is sent to ElevenLabs Turbo v2.5, which converts the text into a natural, human-sounding voice clip.

Custom messages can be sent directly to the phone as well, for emergency alerts or city announcments.

Google Gemini is also used as a way to have easy multilingual support, and with elevenlabs voices, will speak in an translate any nearby obstacles into whatever language is selected.

Inspiration

From road signs to physical obstacles such as poles and cones, many aspects of modern cities are heavily visual. Simply walking down the street requires constant visual awareness, which does not always accommodate visually impaired individuals.

BlindSpot aims to make cities smarter and safer for those who need assistance navigating them.

What It Does

Identify & Warn About Dangers

Detects nearby obstacles using object detection and depth estimation
Prioritizes higher-risk objects
Generates custom warning messages

Sensory Features

Vibration feedback
Voice input and output

Multilingual Support

Supports six languages:

English
Spanish
French
Vietnamese
Japanese
Chinese

City Alerts

External devices can send alerts to the user
Users receive voice notifications on their phone
Future smart city integration (traffic lights, cars, etc.) could notify users when it is safe to cross

How We Built It

Expo Go — Mobile app development
Tailscale — VPN to bypass school firewall restrictions
YOLOv8 — Object detection
COCO Dataset — Object training dataset
MiDaS — Depth analysis
ONNX — Model optimization and simplification
Google Gemini — Text generation for warning messages
ElevenLabs — Voice generation

Challenges We Ran Into

Prioritizing which descriptions to output
Handling rapidly changing surroundings
Detecting:
- Closer objects
- Fast-moving objects
- Turning direction
Fixing inaccurate descriptions
Adjusting datasets for improved detection
Minimizing AI response time (critical for emergency situations)
Getting audio features working on newer versions of Expo Go
- Ultimately used an older version