feat(tui): voice mode CLI parity (VAD loop + TTS + crash forensics) by 0xbyt4 · Pull Request #14808 · NousResearch/hermes-agent

0xbyt4 · 2026-04-23T23:12:47Z

Summary

Brings `hermes --tui` voice mode to full parity with the classic CLI (`cli.py`'s `voice*` machinery). Before this PR the TUI voice pipeline was broken in three compounding ways:

Ctrl+B did nothing — on macOS the keybinding fell through `isAction` (which requires Cmd, not Ctrl), then through TextInput without hitting the voice-toggle handler, landing a literal `b` in the composer. Reported in Discord by timmie.
Gateway RPC was dead code — `tui_gateway/server.py` imported `hermes_cli.voice.{start_recording,stop_and_transcribe,speak_text}` but that module never existed in the tree. Every `voice.record` / `voice.tts` RPC landed in `ImportError` and surfaced as voice module not available — install audio dependencies even with all audio deps installed.
Model drift — TUI was push-to-talk, CLI is VAD-driven continuous (`/voice on` then Ctrl+B starts an auto-stop → transcribe → submit → auto-restart loop, 3 silent cycles halt). Memory of the repo and `tips.py` both document the CLI model as the intended one.

What's in the PR

Keybinding + TextInput — `ui-tui/src/lib/platform.ts` gains `isVoiceToggleKey` that accepts raw Ctrl+B on every platform (docs-default in `config.yaml`, `tips.py`, and the Python CLI all agree on Ctrl+B) plus Cmd+B on macOS so existing muscle memory still works. TextInput adds Ctrl+B to its pass-through list so the global handler actually receives the chord.

`hermes_cli/voice.py` — New wrapper on top of `tools.voice_mode` + `tools.tts_tool`:

`start_continuous(on_transcript, on_status, on_silent_limit, silence_threshold, silence_duration)` / `stop_continuous()` — VAD-driven recording loop with auto-restart, mirroring `cli.py:_voice_stop_and_transcribe` + `_restart_recording`. Three consecutive no-speech cycles trigger `on_silent_limit` and stop the loop (CLI parity).
`speak_text(text)` — exact port of `cli.py:_voice_speak_response`: same markdown-strip regex pipeline, 4000-char cap, explicit MP3 output path, MP3-over-OGG playback choice, cleanup of both extensions.
`_tts_playing` Event mirroring `cli.py:_voice_tts_done`: the continuous loop waits for TTS playback to finish before re-arming the mic, and `speak_text` cancels the active recorder before opening the speakers so the agent's spoken reply doesn't feedback-loop through the microphone.
880 Hz single-beep on record start, 660 Hz double-beep on record stop — same frequencies `cli.py` uses.
`start_recording` / `stop_and_transcribe` PTT API kept for backward-compat (`voice.record` RPC clients).
Key-name bug fix: `transcribe_recording()` returns `{"success": bool, "transcript": str}` — the earlier draft read `result["text"]` and silently dropped every valid STT result.

`tui_gateway/server.py` —

`voice.toggle` supports `on` / `off` / `tts` / `status`, all matching CLI output shape (status surfaces STT/audio provider details so STT provider: MISSING (...) is visible at a glance).
`voice.record` `start` refuses when voice mode is off (mirrors `cli.py:handle_voice_record`'s early return on `not _voice_mode`).
New event types `voice.status` (`listening` / `transcribing` / `idle`) and `voice.transcript` feed the loop's state to the TUI.
Agent-reply TTS dispatch on `message.complete` when `_voice_tts_enabled()`, threaded so the RPC returns immediately.
Runtime-only state: `voice.toggle` no longer persists `display.voice_enabled` / `display.voice_tts` to `config.yaml` — the TUI now starts with voice OFF every launch (CLI parity, no stale auto-REC).

TUI (`ui-tui/`) —

Ctrl+B calls `voice.record` (mode-aware: start / stop if mode on, sys-nudge if off), no longer `voice.toggle`.
`createGatewayEventHandler` handles `voice.status` (drives ● REC / ◉ STT badges) and `voice.transcript` (auto-submits: `setTimeout(() => submitRef.current(text), 0)` outside any React updater so strict-mode double-invoke can't duplicate the submit).
Status-bar badge colours: `● REC` red (`t.color.error`), `◉ STT` amber (`t.color.warn`), matching `cli.py:_get_voice_status_fragments`.

Crash forensics (`tui_gateway/{entry,server}.py`) —

`sys.excepthook` + `threading.excepthook` → dump full traceback (+ thread name) to `~/.hermes/logs/tui_gateway_crash.log`, echo one-liner to stderr so the TUI Activity surfaces it.
Entry `_log_exit` tags each exit path with the JSON-RPC method name so "died handling voice.record" is distinguishable from "died on stdin EOF".
`_log_signal` for SIGTERM / SIGHUP writes stack of every live thread via `sys._current_frames()` — the only way to see which background thread's write to stdout triggered a pipe break.
SIGPIPE is now `SIG_IGN` instead of `SIG_DFL`. The old binding was killing the gateway silently whenever a background thread (TTS playback, voice debug stderr emitter, beep) wrote to a pipe the TUI had gone quiet on — even though the main thread was fine. `signal.SIG_IGN` lets Python raise `BrokenPipeError` on the offending write, which `write_json` already handles with a clean `sys.exit(0)` via `_log_exit`. Verified by spawning the gateway subprocess and sending RPCs — process stays alive, clean EOF path fires on stdin close.

Test plan

Python / TS coverage

12 Python tests (`tests/hermes_cli/test_voice_wrapper.py`) — public API, idempotency, continuous-loop mock simulation (auto-restart, 3-strikes halt, stop-during-transcribe)
244 TypeScript tests (`ui-tui/` vitest suite) — platform `isVoiceToggleKey` unit tests, gateway event handler fixtures
Type-check + lint clean on both sides

When the user runs /voice and then presses Ctrl+B in the TUI, three handlers collaborate to consume the chord and none of them dispatch voice.record: - isAction() is platform-aware — on macOS it requires Cmd (meta/super), so Ctrl+B fails the match in useInputHandlers and never triggers voiceStart/voiceStop. - TextInput's Ctrl+B pass-through list doesn't include 'b', so the keystroke falls through to the wordMod backward-word branch on Linux and to the printable-char insertion branch on macOS — the latter is exactly what timmie reported ("enters a b into the tui"). - /voice emits "voice: on" with no hint, so the user has no way to know Ctrl+B is the recording toggle. Introduces isVoiceToggleKey(key, ch) in lib/platform.ts that matches raw Ctrl+B on every platform (mirrors tips.py and config.yaml's voice.record_key default) and additionally accepts Cmd+B on macOS so existing muscle memory keeps working. Wires it into useInputHandlers, adds Ctrl+B to TextInput's pass-through list so the global handler actually receives the chord, and appends "press Ctrl+B to record" to the /voice on message. Empirically verified with hermes --tui: Ctrl+B no longer leaks 'b' into the composer and now dispatches the voice.record RPC (the downstream ImportError for hermes_cli.voice is a separate upstream bug — follow-up patch).

tui_gateway/server.py:3486/3491/3509 imports start_recording, stop_and_transcribe, and speak_text from hermes_cli.voice, but the module never existed (not in git history — never shipped, never deleted). Every voice.record / voice.tts RPC call hit the ImportError branch and the TUI surfaced it as "voice module not available — install audio dependencies" even on boxes with sounddevice / faster-whisper / numpy installed. Adds a thin wrapper on top of tools.voice_mode (recording + transcription) and tools.tts_tool (text-to-speech): - start_recording() — idempotent; stores the active AudioRecorder in a module-global guarded by a Lock so repeat Ctrl+B presses don't fight over the mic. - stop_and_transcribe() — returns None for no-op / no-speech / Whisper-hallucination cases so the TUI's existing "no speech detected" path keeps working unchanged. - speak_text(text) — lazily imports tts_tool (optional provider SDKs stay unloaded until the first /voice tts call), parses the tool's JSON result, and plays the audio via play_audio_file. Paired with the Ctrl+B keybinding fix in the prior commit, the TUI voice pipeline now works end-to-end for the first time.

The TUI had drifted from the CLI's voice model in two ways: - /voice on was lighting up the microphone immediately and Ctrl+B was interpreted as a mode toggle. The CLI separates the two: /voice on just flips the umbrella bit, recording only starts once the user presses Ctrl+B, which also sets _voice_continuous so the VAD loop auto-restarts until the user presses Ctrl+B again or three silent cycles pass. - /voice tts was missing entirely, so users couldn't turn agent reply speech on/off from inside the TUI. This commit brings the TUI to parity. Python - hermes_cli/voice.py: continuous-mode API (start_continuous, stop_continuous, is_continuous_active) layered on the existing PTT wrappers. The silence callback transcribes, fires on_transcript, tracks consecutive no-speech cycles, and auto-restarts — mirroring cli.py:_voice_stop_and_transcribe + _restart_recording. - tui_gateway/server.py: - voice.toggle now supports on / off / tts / status. The umbrella bit lives in HERMES_VOICE + display.voice_enabled; tts lives in HERMES_VOICE_TTS + display.voice_tts. /voice off also tears down any active continuous loop so a toggle-off really releases the microphone. - voice.record start/stop now drives start_continuous/stop_continuous. start is refused with a clear error when the mode is off, matching cli.py:handle_voice_record's early return on `not _voice_mode`. - New voice.transcript / voice.status events emit through _voice_emit (remembers the sid that last enabled the mode so events land in the right session). TypeScript - gatewayTypes.ts: voice.status + voice.transcript event discriminants; VoiceToggleResponse gains tts; VoiceRecordResponse gains status for the new "started/stopped" responses. - interfaces.ts: GatewayEventHandlerContext gains composer.setInput + submission.submitRef + voice.{setRecording, setProcessing, setVoiceEnabled}; InputHandlerContext.voice gains enabled + setVoiceEnabled for the mode-aware Ctrl+B handler. - createGatewayEventHandler.ts: voice.status drives REC/STT badges; voice.transcript auto-submits when the composer is empty (CLI _pending_input.put parity) and appends when a draft is in flight. no_speech_limit flips voice off + sys line. - useInputHandlers.ts: Ctrl+B now calls voice.record (start/stop), not voice.toggle, and nudges the user with a sys line when the mode is off instead of silently flipping it on. - useMainApp.ts: wires the new event-handler context fields. - slash/commands/session.ts: /voice handles on / off / tts / status with CLI-matching output ("voice: mode on · tts off"). Backward compat preserved for voice.record (was always PTT shape; gateway still honours start/stop with mode-gating added).

Three issues surfaced during end-to-end testing of the CLI-parity voice loop and are fixed together because they all blocked "speak → agent responds → TTS reads it back" from working at all: 1. Wrong result key (hermes_cli/voice.py) transcribe_recording() returns {"success": bool, "transcript": str}, matching cli.py:_voice_stop_and_transcribe. The wrapper was reading result.get("text"), which is None, so every successful Groq / local STT response was thrown away and the 3-strikes halt fired after three silent-looking cycles. Fixed by reading "transcript" and also honouring "success" like the CLI does. Updated the loop simulation tests to return the correct shape. 2. TTS speak-back was missing (tui_gateway/server.py + hermes_cli/voice.py) The TUI had a voice.toggle "tts" subcommand but nothing downstream actually read the flag — agent replies never spoke. Mirrored cli.py:8747-8754's dispatch: on message.complete with status == "complete", if _voice_tts_enabled() is true, spawn a daemon thread running speak_text(response). Rewrote speak_text as a full port of cli.py:_voice_speak_response — same markdown-strip regex pipeline (code blocks, links, bold/italic, inline code, headers, list bullets, horizontal rules, excessive newlines), same 4000-char cap, same explicit mp3 output path, same MP3-over-OGG playback choice (afplay misbehaves on OGG), same cleanup of both extensions. Keeps TUI TTS audible output byte-for-byte identical to the classic CLI. 3. Auto-submit swallowed on non-empty composer (createGatewayEventHandler.ts) The voice.transcript handler branched on prev input via a setInput updater and fired submitRef.current inside the updater when prev was empty. React strict mode double-invokes state updaters, which would queue the submit twice; and when the composer had any content the transcript was merely appended — the agent never saw it. CLI _pending_input.put(transcript) unconditionally feeds the transcript as the next turn, so match that: always clear the composer and setTimeout(() => submitRef.current(text), 0) outside any updater. Side effect can't run twice this way, and a half-typed draft on the rare occasion is a fair trade vs. silently dropping the turn. Also added peak_rms to the rec.stop debug line so "recording too quiet" is diagnosable at a glance when HERMES_VOICE_DEBUG=1.

TTS feedback loop (hermes_cli/voice.py) The VAD loop kept the microphone live while speak_text played the agent's reply over the speakers, so the reply itself was picked up, transcribed, and submitted — the agent then replied to its own echo ("Ha, looks like we're in a loop"). Ported cli.py:_voice_tts_done synchronisation: - _tts_playing: threading.Event (initially set = "not playing"). - speak_text cancels the active recorder before opening the speakers, clears _tts_playing, and on exit waits 300 ms before re-starting the recorder — long enough for the OS audio device to settle so afplay and sounddevice don't race for it. - _continuous_on_silence now waits on _tts_playing (up to 60 s) before re-arming the mic with another 300 ms gap, mirroring cli.py:10619-10621. If the user flips voice off during the wait the loop exits cleanly instead of fighting for the device. Without both halves the loop races: if the silence callback fires before TTS starts it re-arms immediately; if TTS is already playing the pause-and-resume path catches it. Red REC badge (ui-tui appChrome + useMainApp) Classic CLI (cli.py:_get_voice_status_fragments) renders "● REC" in red and "◉ STT" in amber. TUI was showing a dim "REC" with no dot, making it hard to spot at a glance. voiceLabel now emits the same glyphs and appChrome colours them via t.color.error / t.color.warn, falling back to dim for the idle label.

…rash.log When the gateway subprocess raises an unhandled exception during a voice-mode turn, nothing survives: stdout is the JSON-RPC pipe, stderr flushes but the process is already exiting, and no log file catches Python's default traceback print. The user is left with an undiagnosable "gateway exited" banner. Install: - sys.excepthook → write full traceback to tui_gateway_crash.log + echo the first line to stderr (which the TUI pumps into Activity as a gateway.stderr event). Chains to the default hook so the process still terminates. - threading.excepthook → same, tagged with the thread name so it's clear when the crash came from a daemon thread (beep playback, TTS, silence callback, etc.). - Turn-dispatcher except block now also appends a traceback to the crash log before emitting the user-visible error event — str(e) alone was too terse to identify where in the voice pipeline the failure happened. Zero behavioural change on the happy path; purely forensics.

Gateway exits weren't reaching the panic hook because entry.py calls sys.exit(0) on broken stdout — clean termination, no exception. That left "gateway exited" in the TUI with zero forensic trail when pipe breaks happened mid-turn. Entry.py now tags each exit path — startup-write failure, parse-error- response write failure, per-method response write failure, stdin EOF — with a one-line entry in ~/.hermes/logs/tui_gateway_crash.log and a gateway.stderr breadcrumb. Includes the JSON-RPC method name on the dispatch path, which is the only way to tell "died right after handling voice.toggle on" from "died emitting the second message.complete".

SIG_DFL for SIGPIPE means the kernel reaps the gateway subprocess the instant a background thread (TTS playback, silence callback, voice status emitter) writes to a stdout the TUI stopped reading — before the Python interpreter can run excepthook, threading.excepthook, atexit, or the entry.py post-loop _log_exit. Replace the three SIG_DFL / SIG_IGN bindings with a _log_signal handler that: - records which signal (SIGPIPE / SIGTERM / SIGHUP) fired and when; - dumps the main-thread stack at signal delivery AND every live thread's stack via sys._current_frames — the background-thread write that provoked SIGPIPE is almost always visible here; - writes everything to ~/.hermes/logs/tui_gateway_crash.log and prints a [gateway-signal] breadcrumb to stderr so the TUI Activity surfaces it as well. SIGINT stays ignored (TUI handles Ctrl+C for the user).

Crash-log stack trace (tui_gateway_crash.log) from the user's session pinned the regression: SIGPIPE arrived while main thread was blocked on for-raw-in-sys.stdin — i.e., a background thread (debug print to stderr, most likely from HERMES_VOICE_DEBUG=1) wrote to a pipe whose buffer the TUI hadn't drained yet, and SIG_DFL promptly killed the process. Two fixes that together restore CLI parity: - entry.py: SIGPIPE → SIG_IGN instead of the _log_signal handler that then exited. With SIG_IGN, Python raises BrokenPipeError on the offending write, which write_json already handles with a clean exit via _log_exit. SIGTERM / SIGHUP still route through _log_signal so real termination signals remain diagnosable. - hermes_cli/voice.py:_debug: wrap the stderr print in a BrokenPipeError / OSError try/except. This runs from daemon threads (silence callback, TTS playback, beep), so a broken stderr must not escape and ride up into the main event loop. Verified by spawning the gateway subprocess locally: voice.toggle status → 200 OK, process stays alive, clean exit on stdin close logs "reason=stdin EOF" instead of a silent reap.

The voice.toggle handler was persisting display.voice_enabled / display.voice_tts to config.yaml, so a TUI session that ever turned voice on would re-open with it already on (and the mic badge lit) on every subsequent launch. cli.py treats voice strictly as runtime state: _voice_mode = False at __init__, only /voice on flips it, and nothing writes it back to disk. Drop the _write_config_key calls in voice.toggle on/off/tts and the config.yaml fallback in _voice_mode_enabled / _voice_tts_enabled. State is now env-var-only (HERMES_VOICE / HERMES_VOICE_TTS), scoped to the live gateway subprocess — the next launch starts clean.

teknium1 · 2026-04-23T23:18:29Z

Merged via PR #14810. Your 10 commits were cherry-picked onto current main with authorship preserved — see 3504bd4..44a0cbe in git log. Thanks for the thorough fix (Ctrl+B keybind + missing hermes_cli.voice module + VAD-continuous parity + SIGPIPE crash fix — all three root causes). timmie's Discord report is resolved.

alt-glitch · 2026-04-23T23:26:51Z

Superseded by #14810 (merged salvage of this PR). Closing is appropriate.

0xbyt4 added 10 commits April 24, 2026 00:19

teknium1 mentioned this pull request Apr 23, 2026

feat(tui): voice mode CLI parity (VAD loop + TTS + crash forensics) #14810

Merged

teknium1 closed this in #14810 Apr 23, 2026

alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have comp/tui Terminal UI (ui-tui/ + tui_gateway/) tool/tts Text-to-speech and transcription labels Apr 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tui): voice mode CLI parity (VAD loop + TTS + crash forensics)#14808

feat(tui): voice mode CLI parity (VAD loop + TTS + crash forensics)#14808
0xbyt4 wants to merge 10 commits into
NousResearch:mainfrom
0xbyt4:feat/tui-voice-mode

0xbyt4 commented Apr 23, 2026

Uh oh!

teknium1 commented Apr 23, 2026

Uh oh!

alt-glitch commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

0xbyt4 commented Apr 23, 2026

Summary

What's in the PR

Test plan

Python / TS coverage

Related

Uh oh!

teknium1 commented Apr 23, 2026

Uh oh!

alt-glitch commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants