Inspiration
While researching navigation solutions for the visually impaired in developing nations, I found that while traditional aids like white canes are affordable, they offer limited spatial awareness. On the other end of the spectrum, guide dogs provide excellent dynamic assistance but are prohibitively expensive, with training costs in the United States ranging from $15,000 to $40,000 USD. This price barrier makes them inaccessible to the vast majority of people who need them. I wanted to bridge this gap by building an autonomous, intelligent agent that could offer the situational awareness of a guide dog at the cost of consumer electronics.
What it does
Bird's I is an autonomous drone that acts as a personal aerial guide for the visually impaired. Instead of just detecting obstacles at ground level, it hovers overhead to analyze the entire environment in real-time. It identifies dynamic hazards like cars and bicycles as well as static obstacles like street furniture. By calculating a comprehensive Risk Score for every object it sees, it prioritizes the most immediate dangers and communicates them to the user through a natural, AI-generated voice using Fish Audio's S1 mini model. It tells the user exactly what the hazard is and where it is located relative to them (e.g., "Danger, car approaching on your left"), allowing them to navigate complex environments with greater confidence and independence.
How we built it
Hardware & Connectivity I used a DJI Tello EDU drone as the mobile sensor platform. The drone communicates with a host computer (MacBook) via a dedicated Wi-Fi link, streaming 720p video and receiving control commands over UDP.
Computer Vision Pipeline The core vision system is built in Python using OpenCV. I implemented a dual-branch vision architecture:
Object Detection: I utilized the YOLOv3-Tiny neural network for real-time object detection. This model identifies 80 different classes of objects (from people to traffic lights). I process the raw video frames into 416x416 blobs to feed the neural network efficiently on the CPU.
Pose Estimation: To create a "virtual leash," I employed ArUco markers. I calibrated the drone's camera to obtain the intrinsic matrix and distortion coefficients. Using these parameters, the computer calculates the precise 3D pose ($x, y, z$) of the user relative to the drone using the solvePnP algorithm.
Control Systems Implemented a custom flight controller that runs a continuous feedback loop.
Height Lock: I use the drone's downward-facing Time-of-Flight (ToF) infrared sensor to maintain a constant altitude (e.g., eye level).
Distance Maintenance: Using the Z-axis data from the ArUco tracking, the drone automatically adjusts its pitch to maintain a fixed distance from the user.
Multithreading Architecture To maintain a smooth video feed while running heavy AI inference, we separated the application into three concurrent threads:
Video Thread: continuously grabs the latest frame from the UDP stream to prevent buffer lag.
Logic Thread: runs the YOLO inference, calculates control signals, and manages the state machine.
Audio Thread: manages a FIFO queue for text-to-speech synthesis to prevent the flight loop from blocking during audio generation.
AI Speech Synthesis For audio feedback, I deployed Fish Audio's OpenAudio S1 Mini model. I hosted this 0.5-billion parameter model as a local API server. This allows us to generate expressive, human-like warnings with specific emotional tones (e.g., using a "shouting" or "anxious" tone for high-risk alerts) without needing an internet connection.
Challenges I ran into
My biggest hurdles stemmed from the complexity of running multiple asynchronous systems simultaneously.
Local AI Deployment Getting Fish Audio to run locally on macOS Silicon was difficult. We faced significant dependency conflicts, specifically with pyaudio and portaudio requiring manual compilation flags to find the correct C libraries. We also had to patch internal files in the torchaudio backend to force compatibility with our audio drivers. Additionally, handling the specific model configuration files for the S1 Mini decoder required manually aligning the checkpoint paths to prevent size mismatch errors during tensor loading.
Concurrency & Latency Early in development, the video feed would freeze whenever the drone executed a movement command or the AI processed a frame. We solved this by moving the video capture and the audio generation into daemon threads. This ensured the main control loop remained non-blocking, allowing the drone to react instantly even while the AI was "thinking."
Network Routing Since the drone creates its own Wi-Fi network without internet access, the operating system would often refuse to route packets to the drone's IP address, resulting in OSError: [Errno 65] No route to host. We had to implement strict network isolation techniques to force traffic through the correct interface.
Accomplishments that we're proud of
I am particularly proud of the Dynamic Risk Algorithm. Instead of alerting the user to every single object, I designed a formula that calculates a risk score ($R$) based on three factors:
$$R = B \times P \times C$$
Where $B$ is the base danger of the object class (e.g., a car is higher than a chair), $P$ is the proximity (calculated via the bounding box coverage area), and $C$ is the centering factor. This ensures the system only speaks when it truly matters.
I also successfully implemented Spatial Audio Cues. By analyzing the centroid of the bounding box relative to the frame width, the system intelligently determines if an obstacle is on the "left," "right," or "dead ahead," giving the user actionable directional advice rather than just generic warnings.
What we learned
I learned a great deal about the difference between 2D object detection and 3D spatial reasoning. Simply knowing what is in an image is not enough for robotics; you must know where it is in 3D space. I also learned the hard way that real-time robots don't forgive lag. I spent hours debugging crashes only to realize that a single blocking function was freezing the flight controls. It forced me to learn threading and queues because, unlike a web app, if this code lags, the drone hits a wall.
What's next for Bird's I
I plan to expand the system to include trajectory pathing. Currently, it detects position, but I want to calculate the velocity vector of moving objects to predict collisions before they happen. Additionally, I recognize that drones may not be permitted in all public spaces. I want to prototype a wearable alternative, a vest or harness equipped with four wide-angle cameras, that utilizes the same computer vision and risk assessment software we built, offering a discrete option for daily use.
Log in or sign up for Devpost to join the conversation.