Inspiration
I kept coming back to a simple problem in patient care: the moments that matter most, a fall, a seizure, a face twisted in pain, often happen when nobody is in the room. Cameras are everywhere in care settings, but a raw video feed only helps if someone is watching it. I wanted the camera itself to understand what it sees and raise the alarm. I have a background in embedded systems and a fall-prevention prototype I built earlier, so pairing a vision model with real hardware felt like the natural thing to try.
What it does
VitalWatch points a camera at a patient's face. It watches for two things: abnormal head movement, like rapid jerking that could signal a seizure, and facial distress, like a contorted or grimacing face. When it detects either, it raises an alert on a console, a log, and a physical bedside unit built on an ESP32 that shows the alert on a 16x2 LCD and sounds a buzzer.
The key idea is the split between cheap detection and smart judgment. OpenCV decides when to look. Claude decides what it is seeing.
How I built it
The pipeline has three layers:
Local motion detection (OpenCV). Background subtraction measures how much the head is moving every frame. This runs for free and never calls the model. It is just a trigger.
Claude as the classifier. When movement spikes, or on a periodic heartbeat, the app captures that frame and hands it to Claude. Claude looks at the image and the live motion reading, then returns a structured JSON verdict: a category, a severity, a confidence, a one-line reason, and a recommended action. This is the brain. The motion meter cannot tell a yawn from a grimace. Claude can.
The ESP32 bedside unit. The laptop pushes serious alerts over WiFi to the ESP32, which drives the LCD and buzzer. A red, steady buzzer for facial distress. An intermittent beep for abnormal head movement.
To keep cost and latency sane, Claude is only called on frames worth judging, gated by a cooldown. Throwing every frame at a model would be slow and wasteful. Waking it only when the cheap filter sees something is what makes this an agent and not a wrapper.
One design choice I'm proud of: I had no API credits, only a Claude subscription. So instead of calling the paid API, the classifier shells out to Claude Code in headless mode, which runs on the subscription. The frame is saved to disk, Claude Code reads it with its vision capability, and returns the classification. No pay-as-you-go credits spent.
Challenges I ran into
- A still frame cannot show shaking. Head shaking is movement over time, but a single image is frozen. I solved this by feeding the OpenCV motion score into Claude's prompt as context, so the model reasons over the image plus a live sensor reading instead of the picture alone.
- Reading an image from disk in headless mode is finicky. I built an automatic fallback: if Claude Code can't read the frame, the system drops to a motion-only heuristic so a live demo never dies.
- Hardware on stage is fragile. The app keeps running even if the ESP32 drops off the network, so a loose wire can't sink the demo.
- Small environment hurdles, like a Mac refusing system-wide pip installs, which pushed me to a clean virtual environment.
What I learned
I learned to treat the model as one component in a system, not the whole system. The interesting engineering was deciding when to call it, what context to give it, and how to fail gracefully when a part breaks. I also learned how much a structured output contract (strict JSON) tightens the gap between a vision model and real actuators like a buzzer.
What's next
Edge deployment so the camera runs without a laptop, a proper nurse-station notification channel, and tracking expressions across a short window of frames for more reliable movement detection.
A note on limits
This is a prototype, not a medical device. Camera-based monitoring raises real privacy questions, and a real deployment would need consent, on-device processing, and clinical validation. Naming that is part of building it responsibly.
Log in or sign up for Devpost to join the conversation.