Voice2Vector — Voice-Driven Drone Command Console What It Is Voice2Vector is a real-time, voice-controlled drone operations system built for the Red Team Hackathon. It lets an operator speak natural-language commands — "take off to 10 meters", "fly to the northwest watch tower", "return to the landing pad and land" — and the system parses, validates, and executes them against a live drone simulation, all through a polished web interface.

The Problem We're Solving In real-world UxS (Unmanned Systems) operations, drone operators juggle radios, controllers, screens, and maps simultaneously under high-stress, high-interruption conditions. Traditional interfaces require precise button sequences or typed coordinates — slow, error-prone, and cognitively expensive.

Voice2Vector eliminates that friction. An operator just speaks what they want the drone to do. The system handles parsing intent, validating safety constraints, planning the flight path, and executing — then reports back with clear feedback. The operator stays eyes-up, hands-free, focused on the mission rather than the interface.

Why It Has Real-World Military Application This isn't just a hackathon demo — it addresses genuine operational needs:

Reduced cognitive load: In contested environments, operators can't afford to look down at keyboards. Voice input keeps attention on the battlespace. Safety enforcement: The system knows compound geometry, building heights, and no-go zones (fuel depots, comms tower exclusion zones). It automatically rejects unsafe commands and explains why — preventing accidents before they happen. Situational awareness: Real-time 2D tactical map and 3D visualization show drone position, flight path, structures, and hazard zones simultaneously. Operators can toggle views based on what they need in the moment. Natural language robustness: Operators don't need to memorize command syntax. "Fly north 50 meters", "head to the NE tower", "drop down to 5 meters" — the system understands all of it, including ambiguous or oddly-phrased commands via GPT fallback parsing. Audit trail: Every command is logged with timestamp, result, and explanation — critical for post-mission review and accountability. How We Built It Architecture — a five-stage pipeline:

Listen — Voice input captured via browser microphone, transcribed to text using OpenAI Whisper. Parse — Natural language is parsed into structured drone commands. A fast regex parser handles standard patterns; ambiguous commands fall through to GPT for intent extraction. Validate — Every command is checked against compound geometry, no-go zone boundaries, altitude constraints, and drone capabilities. Unsafe or impossible commands are rejected with an explanation. Execute — Valid commands are translated to MAVLink messages and sent to the flight controller (ArduPilot SITL) over TCP, which drives the drone in the Gazebo 3D physics simulation. Report — The operator gets immediate visual feedback: updated map position, mission log entry, telemetry cards, and status indicators.

Built With

Share this project:

Updates