Skip to content

anubhav-gupta-software/voiceagents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Website: https://anubhav-gupta-software.github.io/voiceagents/

VoiceAgents

VoiceAgents is a dual-agent accessibility project:

  • chromium-voice-agent/ for voice-first web navigation in Chromium
  • lmmsagent/ for voice/text control of LMMS through an in-app AgentControl plugin

The practical goal is simple: reduce the operational burden of complex software by turning spoken intent into safe, actionable steps.

Why this approach

Both projects use a layered command strategy:

  1. deterministic commands for speed and reliability on known actions
  2. fuzzy normalization for common speech-to-text errors and phrasing variation
  3. LLM fallback only when needed, with guardrails, so unrelated speech does not trigger destructive actions

This design is intentional:

  • deterministic paths keep common commands fast and predictable
  • fallback intelligence improves real-world usability when transcription is imperfect
  • safety gates preserve trust by refusing unrelated or low-confidence commands

Accessibility impact

These agents are built to support users who may face barriers with mouse-heavy, menu-dense software, including:

  • people with motor/physical disabilities who benefit from reduced fine-pointer demands
  • people with learning disabilities or cognitive load sensitivity who benefit from intent-level commands
  • beginners who know what they want to do but not where to click

The objective is not to replace UI knowledge; it is to lower entry cost, reduce fatigue, and make advanced tools more reachable.

Why Chromium voice control is effective

Web workflows are full of repetitive mechanics: tab switching, scrolling, opening tools, confirming dialogs, and navigating deep page layouts.
chromium-voice-agent/ targets these mechanics directly and allows users to operate the browser by intent rather than pointer precision.

For users with disabilities, this is especially valuable because it:

  • reduces repetitive cursor travel and click strain
  • shortens multi-step UI paths into one spoken action
  • keeps interaction in a single modality when context switching is costly

Why LMMS voice control matters

Digital Audio Workstations are powerful but highly complex. LMMS has many windows, tracks, editors, and plugin workflows that can overwhelm first-time users.

lmmsagent/ focuses on that exact problem:

  • opening and focusing the right tool windows
  • creating tracks and patterns with direct commands
  • importing files and controlling common slicer workflows
  • normalizing noisy spoken commands into executable LMMS actions

For beginners, this turns DAW navigation from “discover hidden UI pathways” into “state musical intent and iterate.”
For accessibility users, it reduces the interaction complexity of dense production interfaces.

Project layout

chromium-voice-agent/

Browser automation prototype for voice-driven web control.

Key files:

  • chromium-voice-agent/manifest.json
  • chromium-voice-agent/background.js
  • chromium-voice-agent/speech.js
  • chromium-voice-agent/popup.html
  • chromium-voice-agent/popup.js

lmmsagent/

LMMS automation project for controlling LMMS through a local plugin boundary.

Key directories:

  • lmmsagent/integrations/lmms/AgentControl/ - LMMS plugin source
  • lmmsagent/integrations/lmms/patches/ - minimal LMMS host patch set
  • lmmsagent/lmms-text-agent/ - local text command client
  • lmmsagent/lmms-voice-agent/ - local voice bridge
  • lmmsagent/shared/ - shared LMMS socket client and command normalization
  • lmmsagent/scripts/ - install and build scripts for an external LMMS checkout
  • lmmsagent/docs/ - architecture, command map, and demo notes
  • lmmsagent/demo/ - smoke-test commands

Intended use

  • use chromium-voice-agent/ for browser-side voice accessibility and automation experiments
  • use lmmsagent/ for accessible LMMS control, beginner onboarding, and workflow acceleration

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors