Website: https://anubhav-gupta-software.github.io/voiceagents/
VoiceAgents is a dual-agent accessibility project:
chromium-voice-agent/for voice-first web navigation in Chromiumlmmsagent/for voice/text control of LMMS through an in-appAgentControlplugin
The practical goal is simple: reduce the operational burden of complex software by turning spoken intent into safe, actionable steps.
Both projects use a layered command strategy:
- deterministic commands for speed and reliability on known actions
- fuzzy normalization for common speech-to-text errors and phrasing variation
- LLM fallback only when needed, with guardrails, so unrelated speech does not trigger destructive actions
This design is intentional:
- deterministic paths keep common commands fast and predictable
- fallback intelligence improves real-world usability when transcription is imperfect
- safety gates preserve trust by refusing unrelated or low-confidence commands
These agents are built to support users who may face barriers with mouse-heavy, menu-dense software, including:
- people with motor/physical disabilities who benefit from reduced fine-pointer demands
- people with learning disabilities or cognitive load sensitivity who benefit from intent-level commands
- beginners who know what they want to do but not where to click
The objective is not to replace UI knowledge; it is to lower entry cost, reduce fatigue, and make advanced tools more reachable.
Web workflows are full of repetitive mechanics: tab switching, scrolling, opening tools, confirming dialogs, and navigating deep page layouts.
chromium-voice-agent/ targets these mechanics directly and allows users to operate the browser by intent rather than pointer precision.
For users with disabilities, this is especially valuable because it:
- reduces repetitive cursor travel and click strain
- shortens multi-step UI paths into one spoken action
- keeps interaction in a single modality when context switching is costly
Digital Audio Workstations are powerful but highly complex. LMMS has many windows, tracks, editors, and plugin workflows that can overwhelm first-time users.
lmmsagent/ focuses on that exact problem:
- opening and focusing the right tool windows
- creating tracks and patterns with direct commands
- importing files and controlling common slicer workflows
- normalizing noisy spoken commands into executable LMMS actions
For beginners, this turns DAW navigation from “discover hidden UI pathways” into “state musical intent and iterate.”
For accessibility users, it reduces the interaction complexity of dense production interfaces.
Browser automation prototype for voice-driven web control.
Key files:
chromium-voice-agent/manifest.jsonchromium-voice-agent/background.jschromium-voice-agent/speech.jschromium-voice-agent/popup.htmlchromium-voice-agent/popup.js
LMMS automation project for controlling LMMS through a local plugin boundary.
Key directories:
lmmsagent/integrations/lmms/AgentControl/- LMMS plugin sourcelmmsagent/integrations/lmms/patches/- minimal LMMS host patch setlmmsagent/lmms-text-agent/- local text command clientlmmsagent/lmms-voice-agent/- local voice bridgelmmsagent/shared/- shared LMMS socket client and command normalizationlmmsagent/scripts/- install and build scripts for an external LMMS checkoutlmmsagent/docs/- architecture, command map, and demo noteslmmsagent/demo/- smoke-test commands
- use
chromium-voice-agent/for browser-side voice accessibility and automation experiments - use
lmmsagent/for accessible LMMS control, beginner onboarding, and workflow acceleration