Inspiration
Falls are the leading cause of injury among adults aged 65 and older in the United States. More than one in four older adults falls each year, resulting in approximately 3 million emergency department visits, 1 million hospitalizations, and 32,000 deaths annually. But the injury itself is only part of the problem — a significant portion of those who fall are unable to get up or call for help, and it's that window of time between the fall and receiving assistance where outcomes are most dramatically affected. Even a delay of one to two hours can lead to hypothermia, pressure injuries, and sharply increased mortality.
A near fall is a stumble the person catches themselves from, a lurch they recover from without hitting the ground. It looks like nothing. But research shows that people who experience near falls are 6 times more likely to suffer a serious fall within the following six months. A near fall isn't a close call — it's an early warning that the body's balance recovery systems are beginning to fail, before catastrophe has occurred.
The solutions that exist today don't address either of these problems well. Wearable panic buttons require the person to be conscious and capable enough to press them — which fails exactly when it matters most. Cloud-based smart home cameras introduce continuous video surveillance of a vulnerable person in their own home, which raises serious privacy concerns for a population that deserves both safety and dignity.
We built this because I wanted something that could catch the warning signs early, respond automatically when it matters, and do all of it without compromising the privacy of the person it's protecting.
What It Does
The system runs continuously on a standard PC with a webcam. MediaPipe extracts body keypoints from the live feed in real time — raw pixels never leave the pose estimation module. A trained Random Forest classifier analyzes those keypoints to detect falls, while a parallel rule-based module detects near falls: stumbles the person recovers from without hitting the ground. When a fall is confirmed, a local voice assistant checks in verbally, and if there's no response, it automatically sends an SMS and places a voice call to emergency contacts via Twilio. Everything runs locally — the network is only used for the outbound alert.
System flow:
- 📷 Webcam captures live video feed
- 🦴 MediaPipe extracts 33 body keypoints per frame — raw pixels are immediately discarded
- 🤖 Random Forest classifier analyzes keypoint features to detect falls
- 🚶 Parallel near-fall detector flags stumbles and logs them silently for trend tracking
- 🗣️ On confirmed fall, local voice assistant checks in with the user verbally
- ✅ User confirms they're okay → system resumes monitoring
- 🚨 No response or user requests help → SMS alert and voice call sent to emergency contacts via Twilio
- 💾 Skeleton-only wireframe clip of the event is saved locally for first responders to review
How We Built It
The stack is intentionally lean and local:
| Layer | Technology |
|---|---|
| Pose Estimation | MediaPipe Pose |
| Camera | OpenCV |
| Fall Classification | scikit-learn Random Forest |
| Near-Fall Detection | Custom rule-based state machine |
| Voice Assessment | pyttsx3 + whisper.cpp |
| Alerts | Twilio |
| UI | Tkinter |
The near-fall detector operates on body-height-normalised hip position, making it invariant to camera distance. A standing person has a normalised hip value of approximately 1.0, where:
$$h = \frac{y_{\text{hip}} - y_{\text{shoulder}}}{y_{\text{knee}} - y_{\text{shoulder}}}$$
A near fall is confirmed through three sequential checks:
- Velocity spike — $$\Delta h > \theta_v$$ in the downward direction, while $h$ is already below the standing baseline
- Activity ratio disambiguation — $$r = \frac{\text{leg height}}{\text{torso height}}$$ distinguishes a stumble from a deliberate sit-down
- Recovery window — within 30 frames, the hip must return to within ϵ of the standing baseline
Only if all three conditions are met in sequence is the event classified as a near fall.
Training data came from the Le2i datasets, processed frame-by-frame through MediaPipe to extract keypoint sequences, then windowed and labeled for the classifier.
The event logger maintains a rolling 5-second pre-fall frame buffer in memory at all times. The moment a fall is detected, it locks that buffer and captures post-fall frames, then assembles them into a timestamped .mp4 containing only the skeleton wireframe overlay — preserving the clinical information first responders need without storing any identifiable image of the person.
Challenges
False positives were the hardest problem. Sitting down quickly, bending to pick something up, kneeling — all of these produce hip velocity spikes that look like falls. The activity ratio check and the recovery window both exist specifically to handle this. Getting the normalised thresholds right required a lot of iteration against real video.
What We Learned
I came in thinking the ML classifier would be the hard part. It wasn't — once MediaPipe reduces the input to structured keypoints, classification is relatively straightforward. The hard parts were the edge cases in the rule-based logic, the UX of a system designed for elderly users, and the privacy architecture. Building something where the constraints genuinely shape every design decision — rather than being a checkbox at the end — changes how you think about the problem entirely.
What's Next
- Longitudinal near-fall trend analysis with a family-facing dashboard
- On-device model fine-tuning from the user's own movement patterns over time
- Multi-room support with multiple cameras and unified event correlation
- Fall risk scoring that combines near-fall frequency, velocity profiles, and time-of-day patterns into a single indicator for clinical handoff


Log in or sign up for Devpost to join the conversation.