A robot fleet-management system: one operator oversees an autonomous robot line, a VLM correction layer auto-flags failures, and the operator can take over any robot and drive it with their own hand via CV hand-tracking.
Status: the control plane is built as a runnable mock fleet — hub + 2 agents + perception + VLM, all speaking the real wire contract, with the full failure→takeover→ recover loop verified headlessly (no hardware, no API key;
python3 -m pytest -q). The ESP32-S3 hand firmware and webcam teleop bridge are the live hardware paths. Remaining hardware (real cameras, ACT policy, Jetson/ZED, CAN) plugs into the mocks one seam at a time — seedocs/OPEN_DEVELOPMENT.md. Recent scope decisions:docs/DECISIONS.md.
| Path | What it is |
|---|---|
shared/schemas.py |
The contract spine — Pydantic wire messages + the transport seam. |
control_plane/ |
fleet.py (pure state machine) + hub.py (FastAPI WebSocket hub). |
agents/ |
Mock SO-101 + Piper agents — telemetry, scripted AUTO, staged fault, teleop + auto-relax safety, synthetic MJPEG. |
perception/ |
Mock HandPose stream — the seam the Jetson/ZED + MediaPipe code plugs into. |
vlm/ |
VLM correction layer — offline /truth mode + real Claude (Haiku/Opus) path. |
tests/ |
State-machine, agent-safety, and end-to-end smoke tests. |
firmware/ |
ESP-IDF firmware for the XIAO ESP32-S3 + PCA9685 driving the 5-servo hand. Read firmware/CLAUDE.md first — hardware gotchas + servo safety. |
host/ |
hand_teleop.py — MediaPipe webcam → 5 finger closures → serial. See host/README.md. |
docs/ |
System design (architecture, plan, topology) + DECISIONS.md. Plain markdown; read directly. |
CLAUDE.md |
Agent briefing + the load-bearing contract + what's real vs. planned. |
pip install -r requirements.txt
python3 run.py # hub :8000 + 2 agents + perception + VLM in one process
python3 -m pytest -q # 22 tests: state machine, agent safety, end-to-end loopWatch http://127.0.0.1:8000/healthz. With ANTHROPIC_API_KEY set the VLM uses real
Claude; without it, the offline /truth path, so the loop runs with zero spend.
1. Flash the hand firmware (ESP-IDF; see firmware/CLAUDE.md for the flash-through-hub
warning and pin notes):
cd firmware
idf.py build flash monitor # flash the XIAO ESP32-S3 directly (not through a hub)In the serial monitor you can drive the hand by hand: open, close, relax,
f <i> <c> (one finger), or a bare f0 f1 f2 f3 f4 line of 5 closures in [0,1].
2. Run the webcam teleop bridge (Python 3.12 venv at repo root — see host/README.md
for one-time setup):
.venv-cv/bin/python host/hand_teleop.py # full teleop (auto-detects the port)
.venv-cv/bin/python host/hand_teleop.py --dry-run # track + preview, DON'T move servos5 space-separated finger closures in [0,1], thumb→pinky, newline-terminated:
f0 f1 f2 f3 f4\n # 0 = open, 1 = fully curled; ch0=thumb .. ch4=pinky
The firmware owns all servo safety (clamp to calibrated range, slew-limit, auto-relax ~2 s
after the stream stops), so the host can only ever send 5 floats in [0,1]. Full protocol:
docs/PLAN.md §4.5.