feat: Voice Mode — CLI, Telegram, Discord (text + VC), and Web UI with full voice support (Issue #314) by 0xbyt4 · Pull Request #327 · NousResearch/hermes-agent

0xbyt4 · 2026-03-03T18:05:34Z

Summary

Scope expanded since initial PR: Now includes Discord voice channels
(join/listen/speak), Telegram/Discord auto voice reply, Web UI gateway
with browser voice chat, and cross-platform double TTS prevention.
See comments below for incremental updates.

Implements Voice Mode for the Hermes CLI (Issue #314, Phases 2-5). Users can speak to the agent via microphone and optionally hear responses read aloud via TTS — with sentence-by-sentence streaming for ElevenLabs.

Note: Phase 1 (Gateway voice messages) was already implemented — Telegram, Discord, WhatsApp, and Slack all handle incoming voice messages with auto-transcription.

What's New

Phase 2: CLI Voice Input

/voice slash command to toggle voice mode on/off
Ctrl+R to start/stop recording (toggle, not hold-to-talk)
Audio capture via sounddevice + numpy (optional deps via pip install hermes-agent[voice])
Multi-provider STT: OpenAI Whisper (VOICE_TOOLS_OPENAI_KEY) and Groq Whisper (GROQ_API_KEY) with automatic model correction per provider
Visual recording indicator: Real-time audio level bar in prompt (● ▃ ❯)
Transcribed text is submitted as a normal user message — agent processes it identically to typed input

Phase 3: TTS Response Output

/voice tts sub-toggle to read agent responses aloud
Uses existing text_to_speech tool infrastructure
Markdown stripping for TTS (removes code blocks, URLs, formatting)
Voice system prompt appended when voice mode is active — instructs the model to keep responses concise and conversational (2-3 sentences max)

Phase 4: Low-Latency Features

Silence detection: Auto-stops recording after configurable seconds of silence (default 3s). Uses RMS-based speech detection with micro-pause tolerance for natural speech patterns
Continuous mode: After the agent responds, recording auto-restarts so the user can keep talking hands-free. Ctrl+R exits continuous mode
Audio cues: 880Hz beep on record start, 660Hz double-beep on stop, 1200Hz tick on tool execution
TTS interrupt: Pressing Ctrl+R while TTS is playing stops playback and starts recording
Interruptable playback: TTS uses subprocess.Popen (not run) so stop_playback() can terminate it
Configurable params: voice.silence_threshold and voice.silence_duration in config.yaml
Whisper hallucination guard: Two-layer protection — peak RMS check rejects silent recordings before STT, then known hallucination phrases ("Thank you.", "Bye.", etc.) are filtered after STT
Peak RMS check: Uses maximum chunk RMS instead of overall average to avoid discarding recordings where short speech is diluted by trailing silence

Phase 5: Streaming TTS (ElevenLabs)

Sentence-by-sentence audio streaming — audio starts playing within ~1-2s of agent starting to respond, instead of waiting for the full response
Architecture: LLM tokens → stream_callback → text_queue → sentence buffer → ElevenLabs pcm_24000 → sounddevice.OutputStream → speaker
Sentence buffering: Accumulates tokens until sentence boundary (. ! ? \n\n), with 20-char minimum to merge short fragments and 100-char timeout flush for long sentences without punctuation
Think block filtering: <think>...</think> content is stripped in real-time so reasoning tokens are never spoken
Markdown stripping: Code blocks, URLs, bold, italic, headers, list items cleaned before TTS
Streaming API integration: run_agent._interruptible_api_call() uses stream=True when callback is set, accumulates chunks into a mock ChatCompletion response (same interface as non-streaming)
Graceful fallback: Only ElevenLabs gets streaming. Edge TTS and OpenAI TTS keep batch behavior. When elevenlabs or sounddevice is not installed, falls back to batch TTS automatically
Zero impact on non-voice mode: stream_callback defaults to None, API calls stay non-streaming
Low-latency model: Uses eleven_flash_v2_5 (~75ms latency) by default, configurable via tts.elevenlabs.streaming_model_id in config.yaml

Design Decisions

Why Ctrl+R toggle instead of Space-bar hold-to-talk?

The issue suggested hold-Space, but prompt_toolkit doesn't support key-up events, making hold-to-talk infeasible. Ctrl+R toggle combined with silence detection provides a better UX — the user presses once, speaks naturally, and recording auto-stops when they're done. No need to hold anything.

Why not streaming STT?

OpenAI Whisper API and Groq Whisper API don't support streaming transcription. Adding a streaming provider (Deepgram/AssemblyAI) would require new service dependencies. The current approach (record → auto-stop on silence → transcribe) is reliable and keeps the dependency footprint small.

Why only ElevenLabs for streaming TTS?

ElevenLabs returns raw PCM chunks that can be written directly to sounddevice.OutputStream for zero-copy playback. Edge TTS is async and outputs MP3 (needs decoding), OpenAI TTS returns complete files. The streaming architecture requires chunk-by-chunk audio iteration which only ElevenLabs supports natively.

CoreAudio safety

On macOS, sd.play() (beep) and sd.InputStream (recording) conflict when running simultaneously (PaMacCore error). Beeps are played synchronously BEFORE starting the recording stream to avoid this.

Quick Usage

/voice on          # Enable voice mode
Ctrl+R             # Start recording (one press, no need to hold)
                   # Speak naturally — recording auto-stops on 3s silence
                   # Transcript is submitted as text, agent responds
                   # In continuous mode, recording auto-restarts after response
Ctrl+R             # Stop recording & exit continuous mode
/voice tts         # Toggle TTS (agent reads responses aloud)
/voice off         # Disable voice mode

Streaming TTS Setup (optional):

pip install hermes-agent[voice,tts-premium]  # adds elevenlabs + sounddevice
export ELEVENLABS_API_KEY="sk-..."

In ~/.hermes/config.yaml:

tts:
  provider: elevenlabs
  elevenlabs:
    voice_id: pNInz6obpgDQGcFmaJgB          # Adam (default)
    streaming_model_id: eleven_flash_v2_5     # low-latency model

STT Provider: Tested with Groq Whisper (GROQ_API_KEY), also supports OpenAI Whisper (VOICE_TOOLS_OPENAI_KEY). Groq's free tier works well for this.

Install: pip install hermes-agent[voice] (adds sounddevice + numpy)

Files Changed

File	Change
`tools/voice_mode.py`	New — AudioRecorder, silence detection, beep generation, hallucination filter, interruptable playback
`tools/tts_tool.py`	Added `stream_tts_to_speaker()` — sentence buffer, think block filter, markdown stripping, ElevenLabs PCM streaming to sounddevice
`tools/transcription_tools.py`	Added Groq STT provider, auto model correction, multi-provider resolution
`run_agent.py`	Added `stream_callback` param to `run_conversation()`/`chat()`, streaming path in `_interruptible_api_call()` with mock ChatCompletion assembly
`cli.py`	Voice mode integration — key bindings, continuous mode, audio level UI, streaming TTS pipeline wiring, batch TTS skip when streaming active
`hermes_cli/config.py`	Voice config defaults (silence_threshold, silence_duration)
`hermes_cli/commands.py`	`/voice` command registration
`pyproject.toml`	`[voice]` optional dependency group
`.env.example`	`GROQ_API_KEY` documentation
`tests/tools/test_voice_mode.py`	34 tests — recorder, silence detection, beep, hallucination filter, playback, cleanup
`tests/tools/test_transcription_tools.py`	12 tests — multi-provider STT, model correction
`tests/tools/test_voice_cli_integration.py`	14 tests — markdown stripping, command parsing, thread safety

Test Plan

Closes #314

teknium1 · 2026-03-09T06:04:21Z

Really cool feature concept — voice mode would be a great addition to Hermes. A few concerns about cross-platform/environment compatibility before this could be merged:

Audio dependency fragility (sounddevice + PortAudio):

SSH sessions — no audio device available, PortAudio will crash/error on import
WSL2 — no native audio subsystem; needs a PulseAudio bridge to Windows that most users won't have
PuTTY / headless servers / Docker containers — no audio devices, sd.InputStream() and sd.play() will throw
No mic plugged in — PortAudio can fail even on desktop machines without input devices
This is the biggest concern — Hermes runs in all of these environments daily. If sounddevice is imported at module level or eagerly, it would break Hermes for anyone not on a local desktop with audio hardware

Key binding conflict:

Ctrl+R is the standard reverse-history-search binding in readline/prompt_toolkit. Overriding it would break muscle memory for a lot of CLI users. This should be configurable, or use a different binding

Core agent loop changes:

The PR modifies _interruptible_api_call() in run_agent.py to add streaming support. Changing the core agent loop for a feature that only works in specific environments is risky — any bug here affects every user, not just voice mode users

What we'd need to feel comfortable merging:

Fully lazy imports — sounddevice, numpy, elevenlabs must never be imported until voice mode is explicitly activated. Any import failure should be caught and surface a friendly message, not crash
Graceful degradation at every audio touchpoint — every call to sd.play(), sd.InputStream(), sd.OutputStream() needs to be wrapped so it fails silently or with a warning in non-audio environments
No core agent loop changes — the streaming path should be implemented without modifying _interruptible_api_call(). Consider wrapping the response after the fact rather than changing how the API call works
Configurable key binding — don't override Ctrl+R by default
Environment detection — auto-disable voice features when running over SSH, in containers, or without audio devices, rather than crashing

The feature itself is genuinely exciting — just needs to be bulletproof for the environments Hermes runs in. Happy to review again once these are addressed!

0xbyt4 · 2026-03-09T10:16:09Z

hi @teknium1 thank you for review and addressed issues solved:

Lazy imports — sounddevice, numpy, elevenlabs, edge_tts, openai are never imported at module level. Each has a lazy helper
(_import_audio(), _import_edge_tts(), etc.) called only when voice mode is activated.
Graceful degradation — Every audio call (sd.InputStream, sd.play, sd.OutputStream, sd.stop, sd.query_devices) is wrapped in try-except
with friendly error messages.
No core agent loop changes — _interruptible_api_call() is untouched. Streaming lives in a separate _streaming_api_call() method that only
runs when voice TTS is active.
Configurable key binding — Default is Ctrl+B (not Ctrl+R). Configurable via voice.record_key in config.yaml.
Environment detection — detect_audio_environment() auto-detects SSH, Docker, WSL, and missing audio devices. Voice features are disabled with
warnings instead of crashing.

teknium1 · 2026-03-10T00:17:10Z

Thanks for addressing the original feedback — lazy imports, graceful degradation, Ctrl+B default, and environment detection all look good. A few remaining issues before we can merge:

Bug A: Streaming TTS can never activate (stale imports)

cli.py:3447-3448 imports _HAS_ELEVENLABS and _HAS_AUDIO from tools.tts_tool:

from tools.tts_tool import (
    _load_tts_config as _load_tts_cfg,
    _get_provider as _get_prov,
    _HAS_ELEVENLABS as _el_ok,
    _HAS_AUDIO as _audio_ok,
    stream_tts_to_speaker,
)

These module-level booleans no longer exist — they were removed when you switched to lazy import functions (_import_elevenlabs(), etc.). The try/except Exception: pass wrapper silently swallows the ImportError, so use_streaming_tts is always False. The entire Phase 5 streaming TTS pipeline is dead code right now.

Fix: Replace the boolean checks with calls to the lazy import helpers, e.g.:

try:
    from tools.tts_tool import (
        _load_tts_config as _load_tts_cfg,
        _get_provider as _get_prov,
        _import_elevenlabs,
        _import_sounddevice,
        stream_tts_to_speaker,
    )
    _tts_cfg = _load_tts_cfg()
    _el_ok = False
    _audio_ok = False
    try:
        _import_elevenlabs()
        _el_ok = True
    except ImportError:
        pass
    try:
        _import_sounddevice()
        _audio_ok = True
    except (ImportError, OSError):
        pass
    if _get_prov(_tts_cfg) == "elevenlabs" and _el_ok and _audio_ok:
        use_streaming_tts = True
except Exception:
    pass

Bug B: Voice mode system prompt is a no-op

_enable_voice_mode() appends the "[Voice mode active] keep responses concise..." instruction to self.system_prompt on HermesCLI. But the agent's ephemeral_system_prompt is set once during _init_agent() and is never re-read from the CLI object. Changing self.system_prompt mid-session has no effect on the agent's behavior — the concise-response instruction never reaches the model.

Important: Even if you fix the propagation, modifying the system prompt mid-conversation would break prompt caching (the cache prefix becomes invalid). This is a core policy — see AGENTS.md.

Suggested fix: Instead of modifying the system prompt, inject the voice mode instruction as a user message prefix when voice input is submitted. Something like prepending [Voice input] to the transcribed text, or adding a brief instruction in the user message itself. This keeps the system prompt stable and avoids cache invalidation.

Bug C: Branch needs rebase

The branch is ~20 commits behind main (merge base is c21d77c). Please rebase onto current main before resubmitting.

Minor concern: _vprint suppresses error messages

The _vprint() changes in run_agent.py suppress ALL console output when streaming TTS is active — including API error messages, retry info, and context limit warnings. Consider only suppressing informational/progress prints, not error-level messages. For example, keep direct print() for lines containing ❌ or ⚠️ error conditions.

0xbyt4 · 2026-03-10T19:59:47Z

Update: Telegram Gateway Voice Mode + Critical Bug Fix

Bug Fix: `_keep_typing` session deadlock (`c785253`)

Found and fixed a critical bug in BasePlatformAdapter._process_message_background():

_keep_typing() was called with metadata=_thread_metadata but didn't accept that parameter
The TypeError crashed before the try-finally block, so _active_sessions was never cleaned up
Every subsequent message saw the session as "active" and went into the interrupt path — effectively deadlocking the entire chat
Fix: added metadata parameter to _keep_typing(), send_typing() base class, and SignalAdapter.send_typing()

Feature: `/voice` command for Telegram gateway (`3f63462`)

Auto voice reply mode for the Telegram bot:

/voice on — voice reply only when user sends voice messages
/voice tts — voice reply to all messages (text + voice)
/voice off — disable, text-only replies
/voice status — show current mode
/voice (no args) — toggle on/off
Per-chat state persisted to ~/.hermes/gateway_voice_mode.json
Dedup: skips auto-reply if agent already called text_to_speech tool
drop_pending_updates=True added to ignore stale Telegram messages on restart
25 tests, all passing (518 total gateway tests, 0 regressions)

0xbyt4 · 2026-03-10T20:37:57Z

## Update: Discord Voice Mode + Cross-Platform Fix

Discord `/voice` slash command (`d79a8e6`)

Registered /voice as a Discord slash command with dropdown choices (on, tts, off, status)
Same voice reply logic as Telegram — no code duplication

Cross-platform `send_voice` fix

_send_voice_reply() was passing metadata= kwarg to all adapters, but Discord's send_voice() doesn't accept it
Now inspects the adapter method signature at runtime and only passes metadata if supported
Works correctly on Telegram (metadata supported), Discord (metadata skipped), and Slack (metadata supported)

TTS provider note

ElevenLabs free tier blocks requests through VPN — switched to edge-tts (free, no API key, no VPN issues) as a fallback provider

0xbyt4 · 2026-03-11T12:53:37Z

## Update: Discord Voice Channel Support + Documentation

Phase 1: Bot joins VC and speaks replies (`f83b1f4`)

/voice join — bot joins the user's current voice channel
/voice channel — alias for join
/voice leave — bot disconnects from VC
TTS replies are played directly in the voice channel via Opus encoding
Echo prevention: audio listener pauses while bot is speaking
Only DISCORD_ALLOWED_USERS can interact via voice

Phase 2: Bot listens in VC — full STT pipeline (`a5a0ded`)

Complete voice-to-voice loop: user speaks in VC → STT → agent → TTS → VC playback

VoiceReceiver class captures per-user RTP audio packets
Decrypts NaCl transport encryption + DAVE E2E encryption
Per-user Opus decoders (48kHz stereo → PCM)
Silence detection: 1.5s silence after 0.5s speech triggers processing
PCM → 16kHz mono WAV conversion via ffmpeg
Whisper STT transcription (Groq or OpenAI)
Transcripts appear in text channel: [Voice] @user: what they said
Agent response sent as text AND spoken in VC

Bug fixes during integration:

Adapter dict key: "discord" → Platform.DISCORD enum
Local import shadowing top-level Platform causing UnboundLocalError
Synthetic voice events missing raw_message.guild_id for _get_guild_id()

Documentation (`1175f16`, `44d661f`)

New comprehensive voice mode doc: website/docs/user-guide/features/voice-mode.md

Prerequisites — hermes install, LLM config, first run
CLI Voice Mode — hermes startup, /voice commands, Ctrl+B flow, silence detection, streaming TTS, hallucination filter
Gateway Voice Reply — Telegram & Discord /voice commands, modes, platform delivery formats
Discord Voice Channels — full setup guide (bot permissions with OAuth2 URL, privileged intents, opus codec, env vars), commands, 10-step pipeline explanation, text channel integration, echo prevention, access control
Configuration Reference — config.yaml, env vars, STT/TTS provider comparisons
Troubleshooting — common issues and fixes

Rebase

Rebased onto latest main (75 commits), resolved 4 conflict areas (commands registry, test expected commands, run.py gateway commands + voice reply). All tests passing.

0xbyt4 · 2026-03-11T19:20:20Z

## Update: Web UI Gateway + Double TTS Fixes

Web Gateway — Browser-based Chat UI

Full-featured browser chat interface accessible from any device on the network:

WebSocket-based real-time messaging over ws://
Token authentication — configurable via WEB_UI_TOKEN env var
Voice conversation — browser mic recording with VAD silence detection
Invisible TTS playback — audio plays without chat bubble
Futuristic UI — glassmorphism design, purple theme, glow effects
Media support — images, voice bubbles with waveform player
/remote-control command — start Web UI on demand from any platform
LAN access — auto-detects local IPs, shows all access URLs on startup
Toolset — hermes-web registered with full tool access

Double TTS Prevention

Fixed duplicate audio playback across all platforms. Two independent TTS paths were firing for the same message:

Base adapter auto-TTS (play_tts) — for voice input messages
Gateway runner _send_voice_reply — for voice mode enabled chats

Fixes:

send_voice(**kwargs) — Discord and Slack adapters now accept extra keyword arguments
skip_double guard — runner skips voice reply for voice input (base already handled it)
Discord VC exception — when bot is in voice channel, runner handles VC playback directly
Discord play_tts override — skips file attachment when connected to voice channel

Platform	Voice Input	Text + `/voice tts`	Discord VC
Base auto-TTS	fires	skip	skip (VC override)
Runner voice reply	skip	fires	fires (VC playback)
Result	1 audio	1 audio	1 audio (VC)

Documentation Updates

Discord DMs — DM vs server channel interaction, @mention requirement, DISCORD_REQUIRE_MENTION config
macOS firewall — allow Python through firewall for LAN access
Mobile HTTPS — mic requires HTTPS on mobile; documented workarounds (Android Chrome flag, mkcert, Caddy, SSH tunnel)

Tests

32 tests for Web adapter (config, auth, messaging, media, LAN IP)
32 tests for voice command (full platform x input x mode matrix, Discord VC skip, Web play_audio)
3489 tests passing, 0 failures

teknium1 · 2026-03-13T08:54:24Z

Code Review — Round 3

First off, really impressive work here — the scope and quality of the CLI voice integration, streaming TTS pipeline, and Discord VC implementation show real engineering skill. The thread safety patterns, lazy imports, and graceful degradation are all well done. Bugs A and B from the previous review are properly fixed with regression tests. 👏

That said, there are several issues that need to be addressed before this can merge:

🔴 Blocking Issues

1. Rebase Regression: _interruptible_api_call lost Anthropic interrupt support

During the rebase, the Anthropic-aware interrupt handler was accidentally moved FROM _interruptible_api_call INTO _streaming_api_call. The result is that _interruptible_api_call now has a simplified handler that:

Calls self.client.close() (OpenAI only)
Rebuilds self.client = OpenAI(...)
Never checks for api_mode == "anthropic_messages"
Never closes self._anthropic_client

This affects ALL Anthropic users on interrupt (not just voice users). On interrupt: wrong client closed, wrong client rebuilt, token generation continues on the Anthropic side. This is a critical regression.

2. Web Gateway — Path Traversal in File Uploads

The upload handler uses the user-supplied filename without sanitization:

orig_name = field.filename or "file"
filename = f"upload_{uuid.uuid4().hex[:8]}_{orig_name}"
dest = self._media_dir / filename

If orig_name contains ../, files can be written outside media_dir. Fix: use Path(orig_name).name or os.path.basename().

3. Web Gateway — Unauthenticated /media Route

The static file serving via aiohttp.add_static() has no auth check. Anyone on the network can access uploaded files, voice recordings, and images by guessing the UUID-prefixed filenames. Media should be served through an authenticated handler.

4. Web Gateway — XSS via innerHTML

Bot messages are rendered via marked.parse() + innerHTML without sanitization. If the LLM response contains HTML/JS, it executes in the browser. Needs DOMPurify or equivalent before inserting into the DOM.

5. Web Gateway — Token Exposed via /remote-control

The /remote-control slash command echoes the auth token into the chat response. In Discord servers or group chats, any participant who can read the channel gets full web UI access.

6. Branch is 358 commits behind main

The merge base is c21d77c. run_agent.py, cli.py, gateway/, and discord.py have all changed significantly since then. This is practically unmergeable as-is — even the rebase that was done introduced the Anthropic regression in issue #1 above.

🟡 Should-Fix

Web server binds 0.0.0.0 by default over plaintext HTTP — tokens are sniffable on LAN. Should default to 127.0.0.1 with explicit opt-in for LAN binding.
Token comparison uses == instead of hmac.compare_digest() — timing side-channel.
Hardcoded macOS Opus path (/opt/homebrew/lib/libopus.dylib) loaded before Linux path — should use ctypes.util.find_library().
_vprint suppresses some interrupt confirmation messages (3 instances of "interrupt detected during retry" lack force=True), so users get no feedback their interrupt was processed during voice mode.
Discord VC debug logging — first 5-10 RTP packets log raw hex at INFO level, should be DEBUG.
_keep_typing fix: the PR description mentions a critical _keep_typing deadlock fix, but no changes to _keep_typing or its lock/session coordination appear in the diff.
All web sessions share chat_id="web" — no per-user session isolation; multiple simultaneous web users would share conversation context.

🟢 What's Done Well

CLI voice integration is excellent — proper thread safety with _voice_lock, key bindings dispatched to daemon threads, atomic guards against double-start/stop, real-time audio level bar, continuous mode with 3-strike safety valve
Streaming TTS pipeline is well-architected — queue-based sentence buffering, dual cleanup paths, think-block filtering, graceful ElevenLabs fallback
Import safety is clean — zero module-level audio imports, everything lazily loaded
Discord VC implementation is solid — VoiceReceiver with proper lifecycle, echo prevention, inactivity auto-disconnect
Bug A and B fixes are clean with regression tests
browser_tool.py signal handler fix is a legitimate improvement
60+ new tests with good coverage

Path Forward

Given the 358-commit gap and the security issues in the web gateway, a full rebase would be very challenging. One option would be to split this into smaller, focused PRs:

CLI Voice Mode (voice_mode.py, transcription_tools.py, tts_tool.py, cli.py voice integration, config) — this is the strongest part and closest to merge-ready
Gateway Voice Reply (Telegram + Discord /voice command) — relatively self-contained
Discord Voice Channels — separate feature, can stand alone
Web UI Gateway — needs the security fixes and is effectively a new subsystem

This would make each piece easier to rebase, review, and merge independently. Happy to help with any of this!

0xbyt4 · 2026-03-13T14:36:37Z

Round 3 Review — All Issues Addressed

Thanks for the thorough review @teknium1 All blocking and should-fix items have been resolved.

Blocking Issues (6/6 Fixed)

#	Issue	Fix	Commit
1	`_interruptible_api_call` lost Anthropic interrupt support	Restored Anthropic-aware handler: checks `api_mode == "anthropic_messages"`, closes `_anthropic_client`, rebuilds both clients	`45baa4f`
2	Path traversal in file uploads	`Path(field.filename).name` strips `../` sequences	`aed9e28`
3	Unauthenticated `/media` route	Replaced `add_static()` with authenticated `_handle_media` handler — requires `?token=` query param, validates with `hmac.compare_digest`, applies `Path(filename).name` sanitization	`aed9e28`
4	XSS via `marked.parse()` + `innerHTML`	Added DOMPurify CDN — all bot message HTML is sanitized via `DOMPurify.sanitize(marked.parse(text))`	`aed9e28`
5	Token exposed via `/remote-control`	Token only shown in DM; group chats show "(hidden — check DM)"	`aed9e28`
6	Branch 358 commits behind main	Fully rebased onto current main	rebased

Should-Fix Items (7/7 Fixed)

#	Issue	Fix	Commit
1	Web server binds `0.0.0.0` by default	Default changed to `127.0.0.1` in config, adapter, and `/remote-control`. Startup message shows only reachable URLs with hint for LAN opt-in	`aed9e28`, `327f881`
2	Token comparison uses `==`	All 3 token checks replaced with `hmac.compare_digest()`	`aed9e28`
3	Hardcoded macOS Opus path	Primary: `ctypes.util.find_library("opus")`. Fallback: Homebrew paths on macOS only, guarded by `sys.platform == "darwin"` and `os.path.isfile()`	`9e91937`
4	`_vprint` suppresses interrupt messages	Added `force=True` to all 5 interrupt confirmation `_vprint` calls	`45baa4f`
5	RTP packet logging at INFO level	Demoted raw UDP, non-RTP skip, and RTP packet logs to `logger.debug`. SPEAKING events remain at INFO	`c32fc7e`
6	`_keep_typing` deadlock fix not in diff	Already in branch — `metadata` param added to `_keep_typing()`, `send_typing()` base class, and `SignalAdapter.send_typing()`	pre-existing
7	All web sessions share `chat_id="web"`	Changed to `chat_id=f"web_{session_id}"` — each WebSocket connection gets isolated conversation context	`c32fc7e`

Additional Fixes (this session)

Gateway shutdown crash: RuntimeError: dictionary changed size during iteration in stop() — iterate over list(self.adapters.items()) copy (9e91937)
Web UI token exposure in logs: Configured tokens are no longer printed to console; only auto-generated tokens are shown (0c87dfa)
Empty WEB_UI_HOST env var: Falls back to 127.0.0.1 instead of binding to empty string (5fdf0e3)
Web UI env vars missing from docs: Added WEB_UI_ENABLED, WEB_UI_PORT, WEB_UI_HOST, WEB_UI_TOKEN to environment-variables.md reference (7936d33)

Test Coverage

All fixes have corresponding tests:

196 tests in test_web.py + test_discord_opus.py + test_run_agent.py — all passing
TestPathTraversalSanitization (3) — Path.name strips traversal, upload produces safe filename
TestMediaEndpointAuth (4) — 401 without/wrong token, 200 with valid token, traversal blocked
TestHmacTokenComparison (2) — no ==/!= for token, hmac.compare_digest present
TestDomPurifyPresent (2) — DOMPurify script tag, sanitize(marked.parse()) pattern
TestDefaultBindLocalhost (2) — adapter and config default to 127.0.0.1
TestRemoteControlTokenHiding (2) — token visible in DM, hidden in group
TestVpnAndMultiInterfaceIp (7) — LAN preferred over VPN, fallbacks, loopback filtering
TestStartupTokenExposure (4) — auto-generated flag, configured token hidden
TestOpusFindLibrary (3) — find_library first, Homebrew fallback conditional, decode errors logged
TestInterruptVprintForceTrue (1) — all interrupt _vprint calls have force=True
TestAnthropicInterruptHandler (3) — Anthropic branch present, client rebuilt

Rebase

Rebased onto latest main
Resolved 9 conflict files (slack.py, toolsets.py, pyproject.toml, base.py, config.py, run.py, test_run_agent.py, cli.py, run_agent.py)
Verified all main changes preserved: parallel tool execution, Honcho manager params, Anthropic adapter, secret state

PR Splitting

Considered splitting into 4 PRs as suggested, but decided against it , the features are tightly coupled:

Gateway voice reply depends on CLI voice/TTS infrastructure
Discord VC reuses the same STT/TTS pipeline and voice_mode state
Web UI shares the gateway voice reply system and media handling

Splitting would mean duplicating shared code or creating artificial boundaries. All commits are already logically grouped and the rebase is clean.

teknium1 · 2026-03-14T04:04:03Z

Review — Round 4

Great work addressing all the Round 3 feedback — the security fixes, Anthropic interrupt handler, and overall code quality are solid. The lazy imports, thread safety, and streaming TTS architecture are genuinely well-engineered.

However, we need to make some scope changes before this can merge:

Web UI Gateway — Please Remove

We are building our own official chat UI and dashboard for Hermes Agent. We cannot accept the web gateway (gateway/platforms/web.py, Platform.WEB, /remote-control command, hermes-web toolset) in this PR.

We should have been clearer about this in Round 3 — we suggested splitting the web UI into a separate PR but then gave detailed security feedback on it, which sent mixed signals. Apologies for that.

Please remove all web UI related code from this PR:

gateway/platforms/web.py
tests/gateway/test_web.py
website/docs/user-guide/messaging/web.md
Platform.WEB enum addition in gateway/config.py
hermes-web toolset in toolsets.py
/remote-control command in gateway/run.py
WEB_UI_* env var handling in gateway/config.py
Any web-related imports/references in gateway/run.py

If you want the web UI considered separately, feel free to open a new PR for it — but it will likely conflict with our own UI plans.

Remaining Issues to Fix (voice mode code)

With the web UI removed, these items remain:

1. sd.wait() in play_audio_file() can hang forever (tools/voice_mode.py)
Your play_beep() correctly avoids sd.wait() with a polling loop + 2s timeout (and the comments even explain why). But play_audio_file() still uses sd.wait(), which can block indefinitely if the audio device stalls. Please use the same polling pattern for consistency.

2. transcription_tools.py imports faster_whisper at module level
voice_mode.py is fully lazy (excellent work there), but transcription_tools.py does a module-level try: from faster_whisper import WhisperModel that runs at import time. If faster_whisper triggers a heavy native library load or crashes, it affects all code that imports the module. Please use the same lazy import pattern.

3. inspect.signature() in _send_voice_reply (gateway/run.py)
Checking if adapter.send_voice supports metadata by inspecting its signature at each call is fragile. Please use **kwargs pattern instead, or just ensure all adapters accept metadata (which they should after PR #1178 fixed Discord's signatures).

4. Unrelated SessionResetPolicy null-handling fix
The bugfix in gateway/config.py for SessionResetPolicy null handling is unrelated to voice mode. Please either remove it from this PR (we can merge it separately as a one-liner) or at minimum make it a separate commit so it's clear in git history.

Once these are addressed and the web UI code is removed, this should be ready to merge. The CLI voice mode, gateway voice reply, and Discord VC features are strong work. 🎉

0xbyt4 · 2026-03-14T06:22:24Z

Thank you @teknium1 for reviewing !

Round 4 — All Issues Addressed

Web UI — Removed

Completely removed all web UI code from the PR:

gateway/platforms/web.py, tests/gateway/test_web.py, website/docs/user-guide/messaging/web.md deleted
Platform.WEB enum, WEB_UI_* env handling, /remote-control command, hermes-web toolset removed
All references cleaned from docs (index.md, voice-mode.md, environment-variables.md, .env.example)
Session loading made resilient to removed platform values (skips unknown entries instead of crashing)

Fix 1: `sd.wait()` hang in `play_audio_file()`

Replaced with polling pattern + timeout, consistent with play_beep() which already had this fix with a comment explaining why sd.wait() is unsafe.

Fix 2: `faster_whisper` module-level import

Changed to importlib.util.find_spec() for availability checks — no module loading at import time. Actual from faster_whisper import WhisperModel and from openai import OpenAI now happen inside the transcription functions only when needed.

Fix 3: `inspect.signature()` in `_send_voice_reply`

Removed the inspect.signature() hack. Added **kwargs to TelegramAdapter.send_voice() — all adapters now uniformly accept metadata.

Fix 4: `SessionResetPolicy` null-handling

This fix is already in main (PR #1194). Not present in our diff against main — no action needed.

Rebase

Rebased onto latest main (23 new commits). Resolved 3 conflict areas in docs. All tests passing locally (3824 passed).

teknium1 · 2026-03-14T07:03:59Z

Thanks — this is much closer now, and removing the web UI scope was the right call. I re-reviewed the current branch against main and there are still a few required fixes before we can merge:

run_agent.py: stream_callback is still OpenAI-chat-only

run_conversation() routes any non-None stream_callback into _streaming_api_call().
_streaming_api_call() still unconditionally calls self.client.chat.completions.create(..., stream=True).
In anthropic_messages mode, self.client is None, so this still breaks for Anthropic.
It also skips the normal provider-specific streaming paths.

Required fix:

Either gate streaming TTS to providers that actually support the current _streaming_api_call() implementation, or implement provider-correct streaming for Anthropic/Codex there.
Also preserve Anthropic base_url whenever rebuilding the client after interrupt/fallback. The constructor passes base_url into build_anthropic_client(...), but the interrupt/fallback rebuild paths currently drop it.

Discord VC synthetic events are still keyed/authenticated like DMs

_handle_voice_channel_input() posts the transcript into the text channel before gateway auth.
It then constructs a synthetic SessionSource without chat_type / server-channel context, so it falls back to the default chat_type="dm".
build_session_key() then collapses those into the shared Discord DM session key instead of a server/channel-scoped session.
That can cause session/context bleed, and unauthorized VC users can still get transcript text echoed publicly before the normal auth flow runs.

Required fix:

Build the synthetic VC source with the correct server/channel context (not DM defaults).
Run authorization before echoing the transcript publicly.
Make sure VC traffic cannot fall into the DM pairing path.
When you do echo transcript text, do not send raw mentionable content directly.

The CLI voice prefix is not actually turn-local

In cli.py, the voice path prepends the concise-response instruction to agent_message.
That prefixed message is then persisted by run_conversation() and written back into self.conversation_history.
The comment says the original history stays clean, but the current flow does not keep it clean.

Required fix:

Keep the voice instruction API-call-local only.
Do not let the synthetic [Voice input ...] prefix get persisted into conversation history / session DB / resumed sessions.

/voice off still disagrees with runtime behavior

The command/status/docs say off means text-only.
But the base adapter still auto-generates TTS for voice inputs unconditionally.

Required fix:

Either make off truly text-only, or change the product semantics/docs/tests to match the intended behavior.
We should not merge while the user-facing contract and actual behavior disagree.

Once those are fixed, this looks very close. The core CLI voice work is strong — just need these last correctness issues cleaned up.

- Patch WEB_UI_HOST in test_web_defaults to avoid env leak - Handle empty WEB_UI_HOST string in config (fall back to 127.0.0.1)

- Change RTP packet logging from INFO to DEBUG level to reduce noise (SPEAKING events remain at INFO as they are important lifecycle events) - Use per-session chat_id (web_{session_id}) instead of shared "web" to isolate conversation context between simultaneous web users

Merge main's faster-whisper (local, free) with our Groq support into a unified three-provider STT pipeline: local > groq > openai. Provider priority ensures free options are tried first. Each provider has its own transcriber function with model auto-correction, env- overridable endpoints, and proper error handling. 74 tests cover the full provider matrix, fallback chains, model correction, config loading, validation edge cases, and dispatch.

Voice status was hardcoded to check API keys only. Now uses the actual provider resolution (local/groq/openai) so it correctly shows "local faster-whisper" when installed instead of "Groq" or "MISSING".

Move stream close outside the lock in shutdown() to prevent deadlock when audio callback tries to acquire the same lock. Replace single t.join(timeout) with a polling loop (0.1s intervals) so KeyboardInterrupt is not blocked during stream cleanup.

…ider key - web.py: pass stt_model from config like discord.py and run.py do - run.py: match new error messages (No STT provider / not set) - _transcribe_local: add missing "provider": "local" to return dict

…tate into agent context

…rface issues Remove web UI gateway (web.py, tests, docs, toolset, env vars, Platform.WEB enum) per maintainer request — Nous is building their own official chat UI. Fix 1: Replace sd.wait() with polling pattern in play_audio_file() to prevent indefinite hang when audio device stalls (consistent with play_beep()). Fix 2: Use importlib.util.find_spec() for faster_whisper/openai availability checks instead of module-level imports that trigger heavy native library loading (CUDA/cuDNN) at import time. Fix 3: Remove inspect.signature() hack in _send_voice_reply() — add **kwargs to Telegram send_voice() so all adapters accept metadata uniformly. Fix 4: Make session loading resilient to removed platform enum values — skip entries with unknown platforms instead of crashing the entire gateway.

@here

…efix, auto-TTS control 1. Gate _streaming_api_call to chat_completions mode only — Anthropic and Codex fall back to _interruptible_api_call. Preserve Anthropic base_url across all client rebuild paths (interrupt, fallback, 401 refresh). 2. Discord VC synthetic events now use chat_type="channel" instead of defaulting to "dm" — prevents session bleed into DM context. Authorization runs before echoing transcript. Sanitize @everyone/@here in voice transcripts. 3. CLI voice prefix ("[Voice input...]") is now API-call-local only — stripped from returned history so it never persists to session DB or resumed sessions. 4. /voice off now disables base adapter auto-TTS via _auto_tts_disabled_chats set — voice input no longer triggers TTS when voice mode is off.

…response The mock's app_commands SimpleNamespace lacked choices and Choice attrs, causing xdist test ordering failures when this mock loaded before test_discord_slash_commands.

1. Anthropic + ElevenLabs TTS silence: forward full response to TTS callback for non-streaming providers (choices first, then native content blocks fallback). 2. Subprocess timeout kill: play_audio_file now kills the process on TimeoutExpired instead of leaving zombie processes. 3. Discord disconnect cleanup: leave all voice channels before closing the client to prevent leaked state. 4. Audio stream leak: close InputStream if stream.start() fails. 5. Race condition: read/write _on_silence_stop under lock in audio callback thread. 6. _vprint force=True: show API error, retry, and truncation messages even during streaming TTS. 7. _refresh_level lock: read _voice_recording under _voice_lock.

The rebase added voice prompt checks to _get_tui_prompt_fragments but the test stub was missing _voice_recording, _voice_processing and _voice_mode attributes, causing AttributeError.

fix: salvage PR #327 voice mode onto current main

Ported the proven UI and voice logic from the original Web UI (PR NousResearch#327) adapted for the REST API transport: UI: - Glassmorphism theme (purple accent, grid background, glass effects) - Centered chat container with desktop borders - Voice waveform bubble player with seek and progress bars - Markdown rendering with syntax highlighting (marked.js + highlight.js) - Message animations, typing indicator, auto-scroll - Mobile responsive design Voice mode (from old VAD implementation): - Press mic to enter voice mode (input bar hides, big mic shows) - VAD silence detection (AnalyserNode, 1.5s silence auto-sends) - TTS response plays invisibly, then auto-restarts recording - Echo prevention: stop recording during TTS playback - Press mic again to exit voice mode - echoCancellation + noiseSuppression on getUserMedia Adapted for REST API (was WebSocket): - ws.send({type:'message'}) -> POST /v1/chat - ws.send({type:'voice', b64}) -> POST /v1/chat/voice (FormData) - play_audio event -> response.media[].url - File upload via POST /v1/upload

andrueandersoncs · 2026-04-01T08:45:32Z

Implementation Complete ✅

Changes Made

Celebration & Animation:

Added confetti animation on first-run profile save using canvas-confetti
Respects prefers-reduced-motion for accessibility
Dual confetti bursts from left and right with green/gold color scheme

Enhanced Success Alert:

New gradient background (green to emerald) with left border accent
Party popper icon with bounce animation
"Welcome to Vantage!" headline (was: "Your profile is set...")
Clearer copy: "Your AI manager is ready to create personalized meal and training plans"
Primary CTA: "Build my first week" → links to /weekly-plan (was: "Review Weekly Plan")
Secondary CTA: "Go to Today" → links to /

First-Run Form Header:

Sparkles icon with amber gradient background
"Welcome to Vantage" title (was: "Your baseline profile")
Descriptive subtitle: "Tell us a bit about yourself so your AI manager can create personalized plans"

Behavior Changes:

First-run saves no longer auto-redirect (users see the celebration)
Form title changes to "Your profile is saved" after save (behind the alert)
Edit mode unchanged (shows "Edit profile" title)

Files Modified:

components/profile-form.tsx - UI enhancements and confetti integration
components/profile-form.test.tsx - Updated test assertions
components/profile-screen.test.tsx - Updated test assertions

Verification

✅ All 36 profile-related tests passing
✅ Build compiles successfully
✅ Confetti respects reduced-motion preferences

Deployed to Railway: https://vantage-production-b8d9.up.railway.app

Merge contributor branch feature/voice-mode onto current main for follow-up fixes.

…f5fb1d3b fix: salvage PR NousResearch#327 voice mode onto current main

Merge contributor branch feature/voice-mode onto current main for follow-up fixes.

…f5fb1d3b fix: salvage PR NousResearch#327 voice mode onto current main

Merge contributor branch feature/voice-mode onto current main for follow-up fixes.

…f5fb1d3b fix: salvage PR NousResearch#327 voice mode onto current main

Merge contributor branch feature/voice-mode onto current main for follow-up fixes.

…f5fb1d3b fix: salvage PR NousResearch#327 voice mode onto current main

0xbyt4 changed the title ~~feat: Voice Mode for CLI — Speech Input/Output (Issue #314)~~ feat: Voice Mode for CLI — Speech Input/Output + Streaming TTS (Issue #314) Mar 3, 2026

0xbyt4 force-pushed the feature/voice-mode branch 2 times, most recently from e03e481 to b16527c Compare March 10, 2026 15:22

0xbyt4 force-pushed the feature/voice-mode branch 3 times, most recently from 11d431f to 44d661f Compare March 11, 2026 12:50

0xbyt4 force-pushed the feature/voice-mode branch 2 times, most recently from dfb8595 to deaf36e Compare March 11, 2026 17:37

0xbyt4 changed the title ~~feat: Voice Mode for CLI — Speech Input/Output + Streaming TTS (Issue #314)~~ feat: Voice Mode — CLI, Telegram, Discord (text + VC), and Web UI with full voice support (Issue #314) Mar 11, 2026

0xbyt4 force-pushed the feature/voice-mode branch 4 times, most recently from fd10f94 to 9837473 Compare March 13, 2026 01:08

0xbyt4 force-pushed the feature/voice-mode branch from 9837473 to 5fdf0e3 Compare March 13, 2026 14:26

0xbyt4 force-pushed the feature/voice-mode branch 2 times, most recently from 522494e to d9df64c Compare March 13, 2026 21:03

0xbyt4 force-pushed the feature/voice-mode branch from 0e45dcd to d43ef9c Compare March 14, 2026 06:12

0xbyt4 force-pushed the feature/voice-mode branch from f487cdd to f7b3411 Compare March 14, 2026 07:45

0xbyt4 added 14 commits March 14, 2026 14:27

fix: isolate WEB_UI_HOST env var in test and handle empty string

fa2c825

- Patch WEB_UI_HOST in test_web_defaults to avoid env leak - Handle empty WEB_UI_HOST string in config (fall back to 127.0.0.1)

fix: add explicit metadata param to Discord send_voice signature

eb052b1

fix: update /voice status to show correct STT provider

69cb373

Voice status was hardcoded to check API keys only. Now uses the actual provider resolution (local/groq/openai) so it correctly shows "local faster-whisper" when installed instead of "Groq" or "MISSING".

fix: STT consistency — web.py model param, error matching, local prov…

e3126ae

…ider key - web.py: pass stt_model from config like discord.py and run.py do - run.py: match new error messages (No STT provider / not set) - _transcribe_local: add missing "provider": "local" to return dict

fix: add choices/Choice to discord mock for /voice slash command test

49f3f0f

feat: add voice channel awareness — inject participant and speaking s…

1ad5e0e

…tate into agent context

fix: add missing choices/Choice to discord mock in test_discord_free_…

7a24168

…response The mock's app_commands SimpleNamespace lacked choices and Choice attrs, causing xdist test ordering failures when this mock loaded before test_discord_slash_commands.

fix(test): add missing voice state attrs to CLI stub in skin tests

92c14ec

The rebase added voice prompt checks to _get_tui_prompt_fragments but the test stub was missing _voice_recording, _voice_processing and _voice_mode attributes, causing AttributeError.

0xbyt4 force-pushed the feature/voice-mode branch from a7f86ca to 92c14ec Compare March 14, 2026 12:07

teknium1 mentioned this pull request Mar 14, 2026

fix: salvage PR #327 voice mode onto current main #1299

Merged

teknium1 added a commit that referenced this pull request Mar 14, 2026

Merge pull request #1299 from NousResearch/hermes/hermes-f5fb1d3b

95c0bee

fix: salvage PR #327 voice mode onto current main

teknium1 merged commit 523a1b6 into NousResearch:main Mar 14, 2026
1 check passed

angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026

merge: salvage PR NousResearch#327 voice mode branch

5fac070

Merge contributor branch feature/voice-mode onto current main for follow-up fixes.

angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026

Merge pull request NousResearch#1299 from NousResearch/hermes/hermes-…

f6ae1cd

…f5fb1d3b fix: salvage PR NousResearch#327 voice mode onto current main

twinkpig mentioned this pull request May 10, 2026

feat(cli): add /voice tts_interrupt to stop TTS on new input #23237

Open

02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026

merge: salvage PR NousResearch#327 voice mode branch

7bfe4b7

Merge contributor branch feature/voice-mode onto current main for follow-up fixes.

02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026

Merge pull request NousResearch#1299 from NousResearch/hermes/hermes-…

e3950be

…f5fb1d3b fix: salvage PR NousResearch#327 voice mode onto current main

olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026

merge: salvage PR NousResearch#327 voice mode branch

7414c22

Merge contributor branch feature/voice-mode onto current main for follow-up fixes.

olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026

Merge pull request NousResearch#1299 from NousResearch/hermes/hermes-…

0a18c51

…f5fb1d3b fix: salvage PR NousResearch#327 voice mode onto current main

Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026

merge: salvage PR NousResearch#327 voice mode branch

01f6670

Merge contributor branch feature/voice-mode onto current main for follow-up fixes.

Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026

Merge pull request NousResearch#1299 from NousResearch/hermes/hermes-…

409f473

…f5fb1d3b fix: salvage PR NousResearch#327 voice mode onto current main

Conversation

0xbyt4 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's New

Phase 2: CLI Voice Input

Phase 3: TTS Response Output

Phase 4: Low-Latency Features

Phase 5: Streaming TTS (ElevenLabs)

Design Decisions

Why Ctrl+R toggle instead of Space-bar hold-to-talk?

Why not streaming STT?

Why only ElevenLabs for streaming TTS?

CoreAudio safety

Quick Usage

Files Changed

Test Plan

Uh oh!

teknium1 commented Mar 9, 2026

Uh oh!

0xbyt4 commented Mar 9, 2026

Uh oh!

teknium1 commented Mar 10, 2026

Bug A: Streaming TTS can never activate (stale imports)

Bug B: Voice mode system prompt is a no-op

Bug C: Branch needs rebase

Minor concern: _vprint suppresses error messages

Uh oh!

0xbyt4 commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update: Telegram Gateway Voice Mode + Critical Bug Fix

Bug Fix: _keep_typing session deadlock (c785253)

Feature: /voice command for Telegram gateway (3f63462)

Uh oh!

0xbyt4 commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Discord /voice slash command (d79a8e6)

Cross-platform send_voice fix

TTS provider note

Uh oh!

0xbyt4 commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Phase 1: Bot joins VC and speaks replies (f83b1f4)

Phase 2: Bot listens in VC — full STT pipeline (a5a0ded)

Documentation (1175f16, 44d661f)

Rebase

Uh oh!

0xbyt4 commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Web Gateway — Browser-based Chat UI

Double TTS Prevention

Documentation Updates

Tests

Uh oh!

teknium1 commented Mar 13, 2026

Code Review — Round 3

🔴 Blocking Issues

🟡 Should-Fix

🟢 What's Done Well

Path Forward

Uh oh!

0xbyt4 commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Round 3 Review — All Issues Addressed

Blocking Issues (6/6 Fixed)

Should-Fix Items (7/7 Fixed)

Additional Fixes (this session)

Test Coverage

Rebase

PR Splitting

Uh oh!

teknium1 commented Mar 14, 2026

Review — Round 4

Web UI Gateway — Please Remove

Remaining Issues to Fix (voice mode code)

Uh oh!

0xbyt4 commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Round 4 — All Issues Addressed

Web UI — Removed

0xbyt4 commented Mar 3, 2026 •

edited

Loading

0xbyt4 commented Mar 10, 2026 •

edited

Loading

Bug Fix: `_keep_typing` session deadlock (`c785253`)

Feature: `/voice` command for Telegram gateway (`3f63462`)

0xbyt4 commented Mar 10, 2026 •

edited

Loading

Discord `/voice` slash command (`d79a8e6`)

Cross-platform `send_voice` fix

0xbyt4 commented Mar 11, 2026 •

edited

Loading

Phase 1: Bot joins VC and speaks replies (`f83b1f4`)

Phase 2: Bot listens in VC — full STT pipeline (`a5a0ded`)

Documentation (`1175f16`, `44d661f`)

0xbyt4 commented Mar 11, 2026 •

edited

Loading

0xbyt4 commented Mar 13, 2026 •

edited

Loading

0xbyt4 commented Mar 14, 2026 •

edited

Loading

Fix 1: `sd.wait()` hang in `play_audio_file()`

Fix 2: `faster_whisper` module-level import

Fix 3: `inspect.signature()` in `_send_voice_reply`

Fix 4: `SessionResetPolicy` null-handling