-
-
Training our custom SegFormer model on 600 annotated street-level images covering curbs, sidewalks, poles, and obstacles
-
Raspberry Pi5 + Camera Module Setup
-
Training our custom SegFormer model on 600 annotated street-level images covering curbs, sidewalks, poles, and obstacles
-
Workflow
-
Human Detection
Inspiration
Vizzion was born from the realization that while GPS can tell a visually impaired person where they are, it cannot tell them what is three feet in front of them. Traditional white canes provide local tactile feedback, but they cannot anticipate a rapidly approaching vehicle or distinguish between a safe sidewalk and a dangerous drop-off until it's too late. We wanted to build a "digital co-pilot" that uses semantic understanding to bridge that gap.
What it does
Vizzion is an AI-powered assistive navigation system that provides real-time spatial awareness for visually impaired users. By combining semantic segmentation with object detection, it identifies safe paths (sidewalks/roads), detects structural hazards (stairs/curbs), and tracks threats (vehicles/people). The system uses a prioritized haptic feedback loop: the buzzer intensity and pulse rate increase as hazards get closer, effectively "translating" the visual world into a language of vibrations.
How we built it
We engineered a multi-layered computer vision pipeline using PyTorch and Hugging Face Transformers.
The Brain: We fine-tuned a SegFormer (B0) model on a custom street-navigation dataset from Roboflow to achieve pixel-perfect curb and stair detection. The Logic: We implemented a stateful "Object Growth" algorithm using Exponential Moving Averages (EMA) to track the changing area of objects. This allows the system to ignore static distractions and only alert the user to things they are actively approaching. The Speed: To achieve real-time 30+ FPS performance, we optimized the inference engine to utilize Metal Performance Shaders (MPS) for Mac GPU acceleration and CUDA for PCs, ensuring the system reacts in milliseconds.
Challenges we ran into
Hardware Constraints: Running heavy segmentation models on a Raspberry Pi 5 required intense optimization of input resolutions (512x512) and batch processing to maintain a safe frame rate.
Beeper Fatigue: Initially, the system beeped for everything. We had to build a temporal trend filter so the buzzer only activates when an object is physically getting larger in the frame (indicating an imminent collision).
Accomplishments that we're proud of
Real-time Semantic Segmentation: Achieving near-zero latency for pixel-wise classification on a local machine. Curvature Detection: Our model successfully distinguishes a flat sidewalk from a 6-inch curb, a feat that traditional ultrasonic sensors often miss.
What we learned
We learned that in assistive technology, latency is safety. A model that is 99% accurate but only runs at 2 FPS is useless for a person walking at 3 mph. We also learned how to navigate the complexities of the Hugging Face ecosystem, specifically regarding dataset versioning and the move away from legacy loading scripts.
What's next for Vizzion
Haptic Belt Integration: Moving beyond a single buzzer to a multi-motor haptic belt that provides directional "wayfinding" (e.g., vibration on the left hip means the safe sidewalk is to the left).
Edge-Cases: Fine-tuning the model for low-light/nighttime navigation and adverse weather like rain or snow. Audio/LLM Integration: Using a local LLM to provide descriptive "scene summaries" on-demand (e.g., "There is a park bench 10 feet ahead and a coffee shop to your right").


Log in or sign up for Devpost to join the conversation.