VOCO is a local-first Linux dictation app. Press a hotkey, speak, press it again, and VOCO types at your cursor.
VOCO can also run as an optional voice bridge for OpenClaw: speak locally, let VOCO transcribe on-device, then send the transcript to a configured OpenClaw CLI agent and type the agent's answer at your cursor.
For low-latency back-and-forth voice, VOCO also has an opt-in realtime conversation toggle. It keeps the OpenAI API key in the local Tauri backend, mints a short-lived Realtime token, and streams 24 kHz PCM audio over a WebSocket so the Linux WebView does not depend on WebRTC support.
Recommended:
wget https://raw.githubusercontent.com/sergiopesch/voco/voco.2026.0.16/install -O voco-install
chmod +x voco-install
./voco-installOptional:
less ./voco-installManual .deb fallback:
wget -O voco_latest_amd64.deb https://github.com/sergiopesch/voco/releases/latest/download/voco_latest_amd64.deb
sudo dpkg -i voco_latest_amd64.debPrimary tested path: Ubuntu and Debian.
- Launch
VOCOfrom your app menu or runvoco. - Finish the short setup.
- Press
Alt+D. - Speak.
- Press
Alt+Dagain. - Confirm the text is inserted at your cursor.
To use OpenClaw mode, open Settings -> Output, choose Ask OpenClaw and type answer or Ask OpenClaw and speak answer, and keep the OpenClaw gateway/agent available from your shell environment. Spoken answers also require OpenClaw TTS and ffplay.
To use realtime conversation, store OPENAI_API_KEY=... in ~/.openclaw/realtime.env, then press Alt+R or open the VOCO popover and press Start realtime. Press Alt+R again or press Stop realtime to end the session. While realtime is active, the VOCO mic visual appears in the popover or hidden overlay and follows both your microphone level and the assistant's spoken response level.
Detailed realtime behavior, first-toggle guarantees, diagnostics, and QA criteria are defined in docs/realtime-conversation-spec.md.
- Ubuntu or Debian
- PulseAudio or PipeWire
- Wayland:
ydotool,wl-clipboard, and access to theinputgroup for the most reliable hotkey path - X11:
xdotoolandxclip
git clone https://github.com/sergiopesch/voco.git
cd voco
npm install
./scripts/setup.sh --install
npm run devnpm run check
npm run lint
npm test
cargo test --manifest-path apps/desktop/src-tauri/Cargo.toml
npm run rehearse:release
npm run report:linux-runtime- First launch downloads the speech model once.
- Single dictation recordings are currently capped at 60 seconds.
- On Wayland,
Alt+DandAlt+Shift+Dare the most reliable hotkeys right now. - Realtime conversation uses
Alt+R. - The realtime VOCO mic animation is driven by live input and output audio levels.
- Wayland text insertion depends on
ydotool, compositor support, and ofteninputgroup access. - OpenClaw mode is opt-in and requires the
openclawCLI to be available inPATH. - Realtime conversation is opt-in and requires
OPENAI_API_KEYin the environment or~/.openclaw/realtime.env. - Config lives at
~/.config/voco/config.json.
MIT