One missed step in a surgical scrub can cost a patient their life. ScrubAI makes sure that never happens.
Built at WeHack 2026 — a voice-controlled and computer vision AI system that verifies surgical hand hygiene compliance in real time. No clipboards. No self-reporting. No honor system.
Surgical site infections affect roughly 1 in 24 hospital patients. Most are preventable.
Semmelweis proved handwashing saved lives in the 1840s and was laughed out of medicine for it. Thousands died waiting for the establishment to accept something simple. Today we have the protocols — an 18-step sterile scrub procedure that every surgeon must complete before entering an operating room. But compliance is still tracked manually, by people, under pressure, in environments where shortcuts happen.
One missed step that nobody catches. One patient that gets infected. That's the gap ScrubAI closes.
ScrubAI uses two sensors simultaneously — a camera and a microphone — to verify a surgeon's sterile scrub in real time.
Voice layer: The surgeon narrates each step as they perform it. ScrubAI transcribes speech offline using OpenAI Whisper, then runs it through a three-layer AI matching engine to identify which clinical step was just performed — even with accents, mumbling, or natural phrasing that doesn't match the protocol word for word. Saying "taking off my bracelet" correctly identifies the jewelry removal step. Saying "running my hands underwater" matches "Wet hands and forearms" even though the two phrases share no words in common.
Vision layer: A camera monitors the sink zone using OpenCV optical flow analysis. Every frame is compared to the previous one to detect motion magnitude and direction. The sink zone is split into halves and the system looks for opposing motion — the physical signature of two hands actually scrubbing against each other. It can tell the difference between a surgeon scrubbing and someone just standing at the sink.
When the surgeon says "done", the system generates a full compliance report — what was completed, what was missed, and whether they are cleared for the operating room.
Matching a surgeon's natural speech to a clinical protocol is harder than it sounds. Words like "rinse" and "nails" appear in multiple steps. People don't speak in protocol language. Whisper sometimes mishears words.
We built a three-layer matching engine that runs simultaneously on every input:
| Layer | Method | Weight | Purpose |
|---|---|---|---|
| 1 | Keyword matching | 0.5 | Precision on exact protocol terms |
| 2 | Semantic embeddings | 0.4 | Understanding meaning not just words |
| 3 | Fuzzy matching | 0.1 | Catching typos and transcription errors |
The combined score is then adjusted by a context-aware sequential multiplier we built from scratch:
| Situation | Multiplier |
|---|---|
| Next expected step | ×2.0 |
| Nearby upcoming steps | ×1.3 |
| Far future steps | ×0.8 |
| Skipped but incomplete | ×0.85 |
| Already completed | ×0.2 |
This is what lets "rinse" mean the first rinse early in the procedure and the final rinse at the end — the system knows where it is and uses that to understand what you mean.
Microphone Input
│
▼
sounddevice (48kHz capture)
│
▼
scipy resample → 16kHz
│
▼
OpenAI Whisper (local, offline)
│
▼
ScrubChecker Engine
├── Keyword Match
├── Embedding Similarity (all-MiniLM-L6-v2)
└── Fuzzy Match (rapidfuzz)
│
▼
Context-Aware Sequential Scoring
│
▼
Step Verified / Flagged / Missed
Camera Input
│
▼
OpenCV Frame Capture
│
▼
Farneback Optical Flow
│
▼
Sink Zone Motion Analysis
│
▼
Scrubbing Detected / Not Detected
The full sterile scrub procedure is stored in config/scrub_checklist.json and can be updated without touching any code:
- Remove all jewelry
- Check for sores or abrasions
- Wet hands and forearms
- Apply soap and rub 15 seconds
- Scrub back of fingers
- Scrub between fingers
- Scrub around thumbs
- Rinse hands and forearms
- Clean under fingernails
- Rinse after nail cleaning
- Apply soap with sponge
- Scrub all sides of each finger
- Scrub palm
- Scrub nails
- Scrub knuckles
- Scrub back of hand
- Scrub from wrist to above elbow
- Final rinse fingertips to elbows
surgical-scrub-checker/
├── config/
│ └── scrub_checklist.json # Clinical protocol — swap to change procedure
├── scrub_checker.py # Core AI matching engine
├── voice_loop.py # Live microphone pipeline and session management
├── vision.py # Computer vision scrub detection
├── interactive_mode.py # Type inputs manually for testing
├── requirements.txt
└── README.md
Requirements: Python 3.10 or 3.11 (not 3.13 — some dependencies are not yet compatible)
# Clone the repo
git clone https://github.com/your-username/surgical-scrub-checker.git
cd surgical-scrub-checker
# Create virtual environment
python -m venv venv
# Activate it
# Windows:
venv\Scripts\activate
# Mac/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txtRun the voice system:
python voice_loop.pyRun the computer vision system:
python vision.pyRun in interactive mode (type instead of speak — good for testing):
python interactive_mode.py- Run
voice_loop.py - Say "start washing hands" to begin a session
- Narrate each step as you perform it — speak naturally, you don't need to use exact protocol wording
- Say "done" when finished
- The system reports what was completed, what was missed, and whether you are cleared
Example phrases that work:
- "taking off my bracelet and earrings" → Remove all jewelry ✅
- "running my hands under the water" → Wet hands and forearms ✅
- "scrubbing between my fingers" → Scrub between fingers ✅
- "using the nail pick" → Clean under fingernails ✅
- "rubbing up to my elbow" → Scrub from wrist to above elbow ✅
Change microphone device:
# In voice_loop.py
DEVICE_INDEX = 24 # Run list_audio_devices() to find yoursAdjust sensitivity:
VOLUME_THRESHOLD = 0.09 # Lower = more sensitive
SILENCE_LIMIT = 0.8 # Seconds of silence before processing
MIN_SPEECH_DURATION = 0.8 # Minimum seconds to count as speechChange matching threshold:
# In scrub_checker.py
confidence_threshold = 0.40 # Raise to require higher confidenceFind your camera index:
# In vision.py
cap = cv2.VideoCapture(1, cv2.CAP_DSHOW) # Try 0, 1, 2 until camera opens| Category | Technology |
|---|---|
| Language | Python 3.10 |
| Speech Recognition | OpenAI Whisper (local, offline) |
| Semantic Matching | sentence-transformers / all-MiniLM-L6-v2 |
| Fuzzy Matching | rapidfuzz |
| Audio Capture | sounddevice |
| Audio Processing | scipy, numpy |
| Computer Vision | OpenCV (optical flow) |
| Protocol Storage | JSON |
| Version Control | Git / GitHub |
Whisper runs entirely locally — no audio is sent to OpenAI or any external server. The model file is downloaded once and runs on device with no network connection. No data leaves the machine.
In a real clinical deployment, the path to full HIPAA compliance would involve one of three approaches: dedicated edge hardware at each sink running local inference, a private on-premises server handling inference over a secured internal network, or a HIPAA Business Associate Agreement with a compliant cloud provider using encrypted transmission and zero data retention. The local-first architecture of this prototype is the right foundation for any of those paths.
- Expand to other procedures — gowning, gloving, patient site prep are all config file swaps
- Compliance dashboard — track missed steps across sessions to find systemic gaps
- Clinical validation — work with actual surgeons and scrub nurses to tune the protocol encoding
- Dedicated hardware — waterproof wall-mounted unit with far-field microphone array for real sink deployment
- Open source matching engine — the context-aware sequential matching approach applies to any step-by-step procedural verification problem
Built at WeHack 2026
MIT License — see LICENSE for details.