WallSense

These are the ESP32s3 microcontrollers that we used. One was configured to be a transmitter and one was configured to be a receiver.

Inspiration

WiFi signals pass through walls, furniture, and human bodies. We wanted to exploit this — using the WiFi already around us to detect people through walls, then fuse it with camera-based pose tracking to build something neither sensor could do alone.

What it does

WallSense detects human presence through walls using WiFi signal analysis combined with real-time skeletal pose tracking. A camera tracks a person with 33 body landmarks. When they walk behind a wall, WiFi CSI takes over — detecting their continued presence from disturbances in subcarrier amplitudes. A live dashboard shows three states: VISIBLE (camera sees them), OCCLUDED (detected through the wall via WiFi), and ABSENT (no one there).

How we built it

Hardware: Two ESP32-S3 boards — one transmits 100 WiFi packets/sec, the other extracts Channel State Information (amplitude and phase across 64 subcarrier frequencies) from each received frame. Detection pipeline: Raw I/Q values are converted to per-subcarrier amplitudes. Three detection systems run in parallel:

  1. Threshold detector — sliding window variance compared against a calibrated empty-room baseline. Sub-second response.
  2. CNN (CSINet) — 4-layer convolutional neural network (147K parameters) processes 52x400 CSI spectrograms (subcarrier frequency × time). Trained on a through-wall research dataset, 100% test accuracy on binary presence detection. An adapter converts our live serial stream into normalized spectrograms matching the training distribution.
  3. MediaPipe Pose — USB webcam at 23 FPS detecting full skeletal landmarks.

Fusion: A state machine combines all three signals. Camera takes priority when available. When lost, the CNN overrides the threshold detector when confident (>60%), reducing detection flickering. The result is stable through-wall presence detection.

Challenges we ran into

Detection flickering: CSI variance hovers near the threshold when someone stands behind a wall, causing rapid OCCLUDED/ABSENT toggling. This motivated integrating the CNN for more stable predictions.
Training-to-live data gap: The CNN was trained on pre-made spectrogram images. Bridging to live serial data required an adapter handling subcarrier count mismatches (54→52), per-spectrogram min-max normalization to match PNG creation, and z-score normalization with dataset statistics.
ESP-IDF baud rate: The UART config kept reverting to 115200 on every build, throttling throughput to ~20 pkt/s instead of 100.
Two-machine coordination: Debugging WiFi handshakes, channel mismatches, and serial issues across two separate ESP32s and computers.

Accomplishments that we're proud of

Working through-wall human detection using $10 ESP32 boards and a webcam
Real-time CNN inference on live WiFi signals with proper domain adaptation
Seamless camera-to-WiFi handoff as a person walks behind a wall
Three-source fusion more robust than any single approach

What we learned

A human body creates measurable disturbances across 64 WiFi subcarrier frequencies — detectable through walls
The gap between a trained model and a deployed system is significant — normalization and format matching required as much work as the model itself
Threshold detectors break at decision boundaries; CNNs provide stability but need warmup. Combining both gives the best of both worlds.
Sensor fusion logic — when to trust which source, how long to hold a detection — is where the real intelligence lives

What's next for WallSense

Fine-tune on our own data — we collect labeled CSI every session; training on our specific environment would improve accuracy significantly
Full 100 pkt/s throughput — 5x better temporal resolution, CNN buffer fills in 4 seconds instead of 20
Activity recognition — the same architecture can distinguish walking, standing, and arm-waving through walls for fall detection and occupancy analytics