SuperPowers

Inspiration

Drone operators in the field shouldn't have to fight a UI. They're managing complex interfaces under pressure when their focus should be entirely on the mission. We asked a simple question: what if you could just talk to your drone?

What it does

C2 Voice Command lets an operator control a drone using natural spoken language. You speak a command, Whisper transcribes it, Claude Opus parses the intent into a structured flight command, a safety validator checks it against no-fly zones and impossible actions, and pymavlink sends it directly to the drone via MAVLink 2 — all in under a second. Unsafe commands like flying into restricted zones are rejected with a spoken explanation before anything moves.

How we built it

We built a full voice-to-MAVLink pipeline from scratch: Silero VAD for voice detection, faster-Whisper for local speech-to-text, Claude Opus via the Anthropic API for intent parsing using structured tool-calling, a custom Python safety validator with compound map awareness, and pymavlink sending commands to ArduPilot SITL running in Gazebo Garden. On top of that, a TypeScript/Vite dashboard streams live telemetry over WebSockets.

Challenges we ran into

Getting the full pipeline to run end-to-end with sub-second latency was the core challenge — each component introduced delay and they had to be tuned together. Handling ambiguous commands gracefully (partial location names, multi-step instructions, altitude-conditional no-fly zones) required careful prompt engineering. Keeping the Gazebo + SITL environment stable while streaming live telemetry to the dashboard simultaneously took significant debugging.

What we learned

Structured tool-calling with Claude is significantly more reliable than free-form parsing for safety-critical applications — typed output means no ambiguity. MAVLink 2 is remarkably approachable once you understand the coordinate frame conventions. And voice interfaces demand a different kind of robustness than text — you have to design for the full range of how humans actually speak.

What's next for Super Powers

Multi-vehicle coordination, agentic mission planning, and fully offline operation with local LLM inference. The MAVLink stack requires zero changes to run on real Pixhawk hardware — that's the next step.