BlindSpot
BlindSpot is a program developed to help visually impaired individuals navigate a safer and smarter city by identifying surrounding objects in real time.
The program uses YOLOv8 and MiDaS models to analyze frames of video received from the user’s phone camera. When the models detect obstacles in front of you, an API request is sent to Google Gemini, which generates a custom warning message about potential obstacles by using the data found by the ML models as well as its own input on the image data.
Before the message is presented to the user, it is sent to ElevenLabs Turbo v2.5, which converts the text into a natural, human-sounding voice clip.
Custom messages can be sent directly to the phone as well, for emergency alerts or city announcments.
Google Gemini is also used as a way to have easy multilingual support, and with elevenlabs voices, will speak in an translate any nearby obstacles into whatever language is selected.
Inspiration
From road signs to physical obstacles such as poles and cones, many aspects of modern cities are heavily visual. Simply walking down the street requires constant visual awareness, which does not always accommodate visually impaired individuals.
BlindSpot aims to make cities smarter and safer for those who need assistance navigating them.
What It Does
Identify & Warn About Dangers
- Detects nearby obstacles using object detection and depth estimation
- Prioritizes higher-risk objects
- Generates custom warning messages
Sensory Features
- Vibration feedback
- Voice input and output
Multilingual Support
Supports six languages:
- English
- Spanish
- French
- Vietnamese
- Japanese
- Chinese
City Alerts
- External devices can send alerts to the user
- Users receive voice notifications on their phone
- Future smart city integration (traffic lights, cars, etc.) could notify users when it is safe to cross
How We Built It
- Expo Go — Mobile app development
- Tailscale — VPN to bypass school firewall restrictions
- YOLOv8 — Object detection
- COCO Dataset — Object training dataset
- MiDaS — Depth analysis
- ONNX — Model optimization and simplification
- Google Gemini — Text generation for warning messages
- ElevenLabs — Voice generation
Challenges We Ran Into
- Prioritizing which descriptions to output
- Handling rapidly changing surroundings
- Detecting:
- Closer objects
- Fast-moving objects
- Turning direction
- Closer objects
- Fixing inaccurate descriptions
- Adjusting datasets for improved detection
- Minimizing AI response time (critical for emergency situations)
- Getting audio features working on newer versions of Expo Go
- Ultimately used an older version
- Ultimately used an older version
Accomplishments We're Proud Of
- Creating an app that improves safety and accessibility
- Running AI models efficiently in near real time
- Supporting multiple languages accurately
- Building the complete app within 36 hours
What We Learned
- Challenges faced by visually impaired individuals and existing solutions
- How users navigate apps with minimal physical cues
- Mobile app development process
- Building for iOS using a non-iOS system
- Object detection and tracking
- Working with datasets for identification
- Designing a prioritization hierarchy for detected objects
What's Next for BlindSpot
Potential Improvements
- 360° cameras to detect surroundings in all directions
- Integration with smart glasses (e.g., Meta glasses)
- Improved detection of transparent objects (e.g., glass)
- Larger budget for tools and API usage
- Running AI directly on the mobile device (requires paid developer tools)
- Higher-tier API access for:
- Gemini
- ElevenLabs
- Gemini
BlindSpot is designed to make urban navigation safer, smarter, and more accessible for everyone.
Log in or sign up for Devpost to join the conversation.