Vox: Voice-Driven Coding

Inspiration

It started with pain. One of our teammates, a computer science major, was struggling with carpal tunnel syndrome. Writing even a few lines of code felt painful, and she almost couldn’t attend the hackathon because of it.

For developers, our keyboards are our tools, our creative outlets. But what happens when you can’t use them?

That question hit close to home, and it made us realize how many others might be quietly dealing with the same problem. We wanted to build something that would make coding accessible, no matter your physical limitations. Something that said: even if your hands can't move, your ideas still can.

What It Does

That’s why we built Vox, a voice-based IDE designed from scratch to empower developers to code using only their voice.

Think Cursor meets speech recognition. You speak, and Vox writes the code. But it goes further:

Navigate files and folders
Search your codebase
Open and close terminals
Run shell commands

All without touching your keyboard.

Core Stack

Vapi – Real-time speech-to-text
Gemini – Translate natural language into code
Rust – Powering the IDE backend
Vite (React) – Clean, responsive frontend UI

Whether you're prototyping, experiencing fatigue, or simply thinking out loud, Vox turns your spoken ideas into executable code.

How We Built It

Frontend

Vite
Tailwind CSS for elegant design

Backend

Vapi API for accurate voice transcription
Gemini LLM for translating voice-to-code logic
Rust for high-performance core functionality

Features

Natural-language code generation for Python
Edit and refactor code using voice commands
File and folder navigation (e.g., “open main.py”, “go to line 12”)
Terminal command execution (e.g., “run npm install”, “git status”)
Full support for hands-free coding sessions

Challenges

Getting microphone access and real-time audio streaming to work on macOS was unexpectedly difficult. Only one of our laptops initially supported it, which slowed early testing.
Parsing when a spoken phrase is actual logic (“make a new function”) versus non-coding commentary (“uhh wait what was that”) required extra attention.

Accomplishments We’re Proud Of

One of the most rewarding moments was getting the full voice-to-code flow working on all our machines. In the beginning, only one laptop could handle audio input correctly, which slowed us down significantly. After rewriting parts of our input pipeline, we got it running smoothly across all four machines.

We’re also proud of how well the backend came together. Using Vapi for voice transcription and Gemini for text-to-code generation gave us strong results. The two tools worked better than expected when integrated, and we fine-tuned the interactions to make the experience feel natural. It was exciting to see the LLM understand what we meant and return usable, runnable code.

On the frontend, we focused on making the UI clean, intuitive, and approachable. We didn’t just want to build a functional tool: we wanted it to feel good to use. That included everything from layout and styling to the product name itself. We chose “Vox” because it’s simple, memorable, and communicates exactly what the product is about: your voice, translated into code.

What We Learned

Throughout this project, we learned a lot about building end-to-end systems, especially those relying on voice as the primary input. We gained experience integrating multiple APIs, like Vapi and Gemini, and learned how to fine-tune them to work together smoothly.

None of us had used Rust before, so diving into it for the backend was a challenge that pushed us to grow quickly. On the frontend, we learned how much design matters when building developer tools. Making the interface simple, intuitive, and responsive made a huge difference.

Most importantly, we learned how powerful good teamwork can be. Each of us brought different skills to the table, and figuring things out together made the process faster, smoother, and a lot more fun.