Voxa AI — Voice Meets Control
What if someone with no hands could control a computer — precisely, independently, and fully — using only their voice and AI?
Voxa AI was built to make that question obsolete.
An AI-powered voice agent that gives people with limited hand mobility full control over their computers — not just browsers or apps, but the entire OS.
It’s already working — and it’s not a prototype. This is a fully functional, production-ready software solution, ready to be used every day.
What It Does
Voxa AI is a voice-first desktop agent, designed for hands-free precision control.
The current MVP allows users to:
- Click anywhere on screen using a custom two-step visual recognition grid
- Speak naturally and have their intent understood
- Execute macros and custom actions, fully by voice
- Leverage Gemini for both reasoning and UI recognition
- Switch to a non-smart mode for manual control without AI
- Change theme and adjust flexible settings to fit personal preferences
Installation is simple — just run the installer and you’re ready to go.
How It Works
The system uses:
- Python for backend logic and control flow
- Gemini 2.5 Flash for language understanding and visual UI analysis
- PyAutoGUI for screen control and input
- Real-time speech recognition using Google Speech API
At the core is a custom-built dual-grid vision targeting system:
- Screen is divided into a coarse grid
- Gemini identifies the target cell
- A finer grid is drawn within that cell
- The system clicks with pixel-level accuracy
This architecture allows AI to interact with any UI element on any screen, without training data or model fine-tuning.
Why It Matters
We built a tool that works now — for real people, in real scenarios.
Voxa AI listens, understands, and acts — not just in chat, but on your desktop.
It doesn’t simulate assistance — it delivers it, reliably, in a form you can install and start using immediately.
What We Learned
- You don’t need brain implants to make machines listen — just the right architecture
- AI’s real power isn’t in talking — it’s in doing
- Users with disabilities don’t want pity or gimmicks — they need power on their terms
- Gemini can understand UIs from screenshots — if you prompt it precisely enough
What’s Next
This version is the Support Edition, focused on assistive voice control.
But the vision is bigger:
- Launch as an open-source project for global accessibility
- Add memory and multi-step action planning via dialogue
- Extend functionality with full typing and drag-and-drop
Why Voxa AI Is Different
- First to use a dual-grid targeting system with Gemini
- Integration of voice commands, macros, clicks, typing, and manual mode
- Works across any app, any screen — no retraining, no prep
- Accessible, intuitive, and production-grade
Voxa AI is not a concept — it’s a complete, installable software application, built to empower.


Log in or sign up for Devpost to join the conversation.