Inspiration
WiFi signals pass through walls, furniture, and human bodies. We wanted to exploit this — using the WiFi already around us to detect people through walls, then fuse it with camera-based pose tracking to build something neither sensor could do alone.
What it does
WallSense detects human presence through walls using WiFi signal analysis combined with real-time skeletal pose tracking. A camera tracks a person with 33 body landmarks. When they walk behind a wall, WiFi CSI takes over — detecting their continued presence from disturbances in subcarrier amplitudes. A live dashboard shows three states: VISIBLE (camera sees them), OCCLUDED (detected through the wall via WiFi), and ABSENT (no one there).
How we built it
Hardware: Two ESP32-S3 boards — one transmits 100 WiFi packets/sec, the other extracts Channel State Information (amplitude and phase across 64 subcarrier frequencies) from each received frame. Detection pipeline: Raw I/Q values are converted to per-subcarrier amplitudes. Three detection systems run in parallel:
1. Threshold detector — sliding window variance compared against a calibrated empty-room baseline. Sub-second response.
2. CNN (CSINet) — 4-layer convolutional neural network (147K parameters) processes 52x400 CSI spectrograms (subcarrier frequency × time). Trained on a through-wall research dataset, 100% test accuracy on binary presence detection. An adapter converts our live serial stream into normalized spectrograms matching the training distribution.
3. MediaPipe Pose — USB webcam at 23 FPS detecting full skeletal landmarks.
Fusion: A state machine combines all three signals. Camera takes priority when available. When lost, the CNN overrides the threshold detector when confident (>60%), reducing detection flickering. The result is stable through-wall presence detection.
Challenges we ran into
- Detection flickering: CSI variance hovers near the threshold when someone stands behind a wall, causing rapid OCCLUDED/ABSENT toggling. This motivated integrating the CNN for more stable predictions.
- Training-to-live data gap: The CNN was trained on pre-made spectrogram images. Bridging to live serial data required an adapter handling subcarrier count mismatches (54→52), per-spectrogram min-max normalization to match PNG creation, and z-score normalization with dataset statistics.
- ESP-IDF baud rate: The UART config kept reverting to 115200 on every build, throttling throughput to ~20 pkt/s instead of 100.
- Two-machine coordination: Debugging WiFi handshakes, channel mismatches, and serial issues across two separate ESP32s and computers.
Accomplishments that we're proud of
- Working through-wall human detection using $10 ESP32 boards and a webcam
- Real-time CNN inference on live WiFi signals with proper domain adaptation
- Seamless camera-to-WiFi handoff as a person walks behind a wall
- Three-source fusion more robust than any single approach
What we learned
- A human body creates measurable disturbances across 64 WiFi subcarrier frequencies — detectable through walls
- The gap between a trained model and a deployed system is significant — normalization and format matching required as much work as the model itself
- Threshold detectors break at decision boundaries; CNNs provide stability but need warmup. Combining both gives the best of both worlds.
- Sensor fusion logic — when to trust which source, how long to hold a detection — is where the real intelligence lives
What's next for WallSense
- Fine-tune on our own data — we collect labeled CSI every session; training on our specific environment would improve accuracy significantly
- Full 100 pkt/s throughput — 5x better temporal resolution, CNN buffer fills in 4 seconds instead of 20
- Activity recognition — the same architecture can distinguish walking, standing, and arm-waving through walls for fall detection and occupancy analytics
Log in or sign up for Devpost to join the conversation.