Voxa AI — Voice Meets Control

What if someone with no hands could control a computer — precisely, independently, and fully — using only their voice and AI?
Voxa AI was built to make that question obsolete.

An AI-powered voice agent that gives people with limited hand mobility full control over their computers — not just browsers or apps, but the entire OS.
It’s already working — and it’s not a prototype. This is a fully functional, production-ready software solution, ready to be used every day.


What It Does

Voxa AI is a voice-first desktop agent, designed for hands-free precision control.
The current MVP allows users to:

  • Click anywhere on screen using a custom two-step visual recognition grid
  • Speak naturally and have their intent understood
  • Execute macros and custom actions, fully by voice
  • Leverage Gemini for both reasoning and UI recognition
  • Switch to a non-smart mode for manual control without AI
  • Change theme and adjust flexible settings to fit personal preferences

Installation is simple — just run the installer and you’re ready to go.


How It Works

The system uses:

  • Python for backend logic and control flow
  • Gemini 2.5 Flash for language understanding and visual UI analysis
  • PyAutoGUI for screen control and input
  • Real-time speech recognition using Google Speech API

At the core is a custom-built dual-grid vision targeting system:

  1. Screen is divided into a coarse grid
  2. Gemini identifies the target cell
  3. A finer grid is drawn within that cell
  4. The system clicks with pixel-level accuracy

This architecture allows AI to interact with any UI element on any screen, without training data or model fine-tuning.


Why It Matters

We built a tool that works now — for real people, in real scenarios.
Voxa AI listens, understands, and acts — not just in chat, but on your desktop.
It doesn’t simulate assistance — it delivers it, reliably, in a form you can install and start using immediately.


What We Learned

  • You don’t need brain implants to make machines listen — just the right architecture
  • AI’s real power isn’t in talking — it’s in doing
  • Users with disabilities don’t want pity or gimmicks — they need power on their terms
  • Gemini can understand UIs from screenshots — if you prompt it precisely enough

What’s Next

This version is the Support Edition, focused on assistive voice control.
But the vision is bigger:

  • Launch as an open-source project for global accessibility
  • Add memory and multi-step action planning via dialogue
  • Extend functionality with full typing and drag-and-drop

Why Voxa AI Is Different

  • First to use a dual-grid targeting system with Gemini
  • Integration of voice commands, macros, clicks, typing, and manual mode
  • Works across any app, any screen — no retraining, no prep
  • Accessible, intuitive, and production-grade

Voxa AI is not a concept — it’s a complete, installable software application, built to empower.

Share this project:

Updates