Vox: Voice-Driven Coding
Inspiration
It started with pain. One of our teammates, a computer science major, was struggling with carpal tunnel syndrome. Writing even a few lines of code felt painful, and she almost couldn’t attend the hackathon because of it.
For developers, our keyboards are our tools, our creative outlets. But what happens when you can’t use them?
That question hit close to home, and it made us realize how many others might be quietly dealing with the same problem. We wanted to build something that would make coding accessible, no matter your physical limitations. Something that said: even if your hands can't move, your ideas still can.
What It Does
That’s why we built Vox, a voice-based IDE designed from scratch to empower developers to code using only their voice.
Think Cursor meets speech recognition. You speak, and Vox writes the code. But it goes further:
- Navigate files and folders
- Search your codebase
- Open and close terminals
- Run shell commands
All without touching your keyboard.
Core Stack
- Vapi – Real-time speech-to-text
- Gemini – Translate natural language into code
- Rust – Powering the IDE backend
- Vite (React) – Clean, responsive frontend UI
Whether you're prototyping, experiencing fatigue, or simply thinking out loud, Vox turns your spoken ideas into executable code.
How We Built It
Frontend
- Vite
- Tailwind CSS for elegant design
Backend
- Vapi API for accurate voice transcription
- Gemini LLM for translating voice-to-code logic
- Rust for high-performance core functionality
Features
- Natural-language code generation for Python
- Edit and refactor code using voice commands
- File and folder navigation (e.g., “open main.py”, “go to line 12”)
- Terminal command execution (e.g., “run npm install”, “git status”)
- Full support for hands-free coding sessions
Challenges
- Getting microphone access and real-time audio streaming to work on macOS was unexpectedly difficult. Only one of our laptops initially supported it, which slowed early testing.
- Parsing when a spoken phrase is actual logic (“make a new function”) versus non-coding commentary (“uhh wait what was that”) required extra attention.
Accomplishments We’re Proud Of
One of the most rewarding moments was getting the full voice-to-code flow working on all our machines. In the beginning, only one laptop could handle audio input correctly, which slowed us down significantly. After rewriting parts of our input pipeline, we got it running smoothly across all four machines.
We’re also proud of how well the backend came together. Using Vapi for voice transcription and Gemini for text-to-code generation gave us strong results. The two tools worked better than expected when integrated, and we fine-tuned the interactions to make the experience feel natural. It was exciting to see the LLM understand what we meant and return usable, runnable code.
On the frontend, we focused on making the UI clean, intuitive, and approachable. We didn’t just want to build a functional tool: we wanted it to feel good to use. That included everything from layout and styling to the product name itself. We chose “Vox” because it’s simple, memorable, and communicates exactly what the product is about: your voice, translated into code.
What We Learned
Throughout this project, we learned a lot about building end-to-end systems, especially those relying on voice as the primary input. We gained experience integrating multiple APIs, like Vapi and Gemini, and learned how to fine-tune them to work together smoothly.
None of us had used Rust before, so diving into it for the backend was a challenge that pushed us to grow quickly. On the frontend, we learned how much design matters when building developer tools. Making the interface simple, intuitive, and responsive made a huge difference.
Most importantly, we learned how powerful good teamwork can be. Each of us brought different skills to the table, and figuring things out together made the process faster, smoother, and a lot more fun.
What’s Next for Vox
- Expand support to multiple languages beyond just JavaScript or Python
- Enable real-time code generation with streaming output
- Improve LLM performance and responsiveness
- Native Git integration
- Voice-based natural language search
- Support all dev workflows—using only voice
Log in or sign up for Devpost to join the conversation.