feat(tts): add local_command provider by versun · Pull Request #17211 · NousResearch/hermes-agent

versun · 2026-04-29T01:58:32Z

What does this PR do?

Adds tts.provider: local_command, a generic bridge for running a user-configured local text-to-speech command from Hermes.

The provider lets users keep engine-specific dependencies outside Hermes core while still using local or experimental TTS engines such as Piper, VoxCPM, Qwen/MLX wrappers, or any script that can read text from a file and write audio to a file.

This is intentionally a small Phase 1 bridge, not a full TTS plugin registry. It gives Hermes a stable local command path now while leaving room for a future register_tts_provider(...) plugin API.

Related Issue

Related to #11688 and #8508.

Fixes: N/A

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

Local Command TTS

Adds local_command as a TTS provider in tools/tts_tool.py.
Requires an explicit tts.local_command.command config value.
Writes TTS input to a temporary UTF-8 text file.
Supports {input_path}, {text_path}, {output_path}, {format}, {voice}, {model}, and {speed} placeholders.
Preserves literal braces in command templates and safely quotes placeholder values for their shell context.
Supports mp3, wav, ogg, and flac output via output_format / format.
Validates that the command produced a non-empty output file.
Adds configurable timeout handling and process-tree cleanup.
Adds a voice_compatible opt-in for voice-bubble delivery, including ffmpeg conversion to Opus/OGG when needed.
Enforces the local_command max text length through the existing TTS length resolver.

Audio Delivery Routing

Centralizes platform-aware audio routing with should_send_media_as_audio(...).
Adds .flac to recognized audio media.
Keeps Telegram voice bubbles limited to .ogg / .opus when the media is explicitly voice-compatible.
Sends Telegram .mp3 / .m4a as audio attachments.
Falls back to document/file delivery for Telegram audio formats that Telegram cannot send as voice/audio.
Applies the same routing rules to gateway replies, auto-TTS, extracted MEDIA: files, scheduled job delivery, and send_message media routing.

Configuration, Setup, and Docs

Adds tts.local_command defaults to hermes_cli/config.py.
Adds Local Command TTS to setup, tools configuration, dashboard config schema, and CLI tips.
Updates voice/TTS documentation and provider lists.
Documents Local Command configuration, placeholders, supported formats, timeout, and voice_compatible behavior.

Tests

Adds focused Local Command TTS coverage for command rendering, placeholders, literal braces, quoted paths, output formats, missing commands, failures, timeouts, child-process cleanup, stale output cleanup, JSON response shape, requirement detection, and voice compatibility.
Adds gateway/platform routing coverage for Telegram and non-Telegram audio delivery.
Adds setup/tools/dashboard coverage for the new provider option.
Updates scheduler, voice command, and send-message tests for the shared audio routing behavior.

How to Test

Run the focused Local Command and related audio-routing tests:

scripts/run_tests.sh tests/tools/test_tts_local_command.py tests/tools/test_tts_max_text_length.py tests/gateway/test_tts_media_routing.py tests/gateway/test_voice_command.py tests/gateway/test_telegram_documents.py tests/cron/test_scheduler.py tests/hermes_cli/test_setup.py tests/hermes_cli/test_tools_config.py tests/tools/test_send_message_tool.py -q

Run the dashboard schema test that covers the new provider option:

scripts/run_tests.sh tests/hermes_cli/test_web_server.py::TestBuildSchemaFromConfig::test_tts_provider_options_include_local_command -q

Configure a local bridge command:

tts:
  provider: local_command
  local_command:
    command: 'my-tts --input {input_path} --output {output_path} --format {format}'
    timeout: 120
    output_format: mp3
    voice_compatible: false

Generate TTS through Hermes and verify that the output audio file is created and playable.

Verification

Focused related tests pass:

$ scripts/run_tests.sh tests/tools/test_tts_local_command.py tests/tools/test_tts_max_text_length.py tests/gateway/test_tts_media_routing.py tests/gateway/test_voice_command.py tests/gateway/test_telegram_documents.py tests/cron/test_scheduler.py tests/hermes_cli/test_setup.py tests/hermes_cli/test_tools_config.py tests/tools/test_send_message_tool.py -q
518 passed, 21 skipped, 3 warnings in 9.51s

Dashboard provider-option coverage passes:

$ scripts/run_tests.sh tests/hermes_cli/test_web_server.py::TestBuildSchemaFromConfig::test_tts_provider_options_include_local_command -q
1 passed, 1 warning in 3.67s

Known local verification note:

$ scripts/run_tests.sh tests/tools/test_tts_local_command.py tests/tools/test_tts_max_text_length.py tests/gateway/test_tts_media_routing.py tests/gateway/test_voice_command.py tests/gateway/test_telegram_documents.py tests/cron/test_scheduler.py tests/hermes_cli/test_setup.py tests/hermes_cli/test_tools_config.py tests/hermes_cli/test_web_server.py tests/tools/test_send_message_tool.py -q
637 passed, 21 skipped, 4 failed, 3 warnings in 10.10s

The four failures are in the broader dashboard/web test file:

TestBuildSchemaFromConfig.test_no_single_field_categories (prompt_caching has one field)
TestPtyWebSocket.test_streams_child_stdout_to_client
TestPtyWebSocket.test_client_input_reaches_child_stdin
TestPtyWebSocket.test_resize_escape_is_forwarded

Those failures are visible when the full tests/hermes_cli/test_web_server.py file is included, but the Local Command provider-option dashboard test passes separately.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform: macOS 15 / Apple Silicon

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A
I've updated cli-config.yaml.example if I added/changed config keys — or N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Security / Robustness Notes

This feature intentionally runs a command configured by the local user. It should be treated as a trusted local command path, not as a sandbox for untrusted command templates.

Hermes limits its side of the bridge by requiring explicit config, passing text through a temporary UTF-8 file, using explicit input/output placeholders, quoting placeholder values, validating the output file, and cleaning up timed-out process trees.

Updated the number of TTS provider options from nine to ten.

@versun

… fallback (#17833) Extracted from PR #17211 (@versun) so it can land independently of the local_command TTS provider redesign. - Add should_send_media_as_audio(platform, ext, is_voice) in gateway/platforms/base.py; single source of truth for audio routing. - Add .flac to recognized audio extensions (MEDIA regex, weixin audio set, send_message audio set). - Telegram send_voice() now falls back to send_document for formats Telegram's Bot API can't play natively (.wav, .flac, ...) instead of raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice. - Route _send_telegram() in send_message_tool through a narrower _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set. - cron.scheduler._send_media_via_adapter now delegates the audio decision to should_send_media_as_audio so it matches the gateway. - Update the cron live-adapter ogg test to flag [[audio_as_voice]] so it still routes to sendVoice under the new Telegram-specific policy. - Tests: unit coverage for should_send_media_as_audio across platforms, end-to-end MEDIA routing via _process_message_background and GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice fallback for FLAC/WAV. Co-authored-by: Versun <me+github7604@versun.org>

@versun

…me> (#17843) Reshape of PR #17211 (@versun). Lets users wire any local or external TTS CLI into Hermes without adding engine-specific Python code. Users declare any number of named providers in config.yaml and switch between them with tts.provider: <name>, alongside the built-ins (edge, openai, elevenlabs, …). Config shape: tts: provider: piper-en providers: piper-en: type: command command: 'piper -m ~/model.onnx -f {output_path} < {input_path}' output_format: wav Placeholders: {input_path}, {text_path}, {output_path}, {format}, {voice}, {model}, {speed}. Use {{ / }} for literal braces. Key behavior: - Built-in provider names always win — a tts.providers.openai entry cannot shadow the native OpenAI provider. - type: command is the default when command: is set. - Placeholder values are shell-quote-aware (bare / single / double context), so paths with spaces and shell metacharacters are safe. - Default delivery is a regular audio attachment. voice_compatible: true opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion. - Command failures (non-zero exit, timeout, empty output) surface to the agent with stderr/stdout included so you can debug from chat. - Process-tree kill on timeout (Unix killpg, Windows taskkill /T). - max_text_length defaults to 5000 for command providers; override under tts.providers.<name>.max_text_length. Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover provider resolution, shell-quote context, placeholder rendering with injection payloads, timeout, non-zero exit, empty output, voice_compatible opt-in, and end-to-end dispatch through text_to_speech_tool. All 88 pre-existing TTS tests still pass. Docs: new "Custom command providers" section in website/docs/user-guide/features/tts.md with three worked examples (Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys, behavior notes, and security caveat. E2E-verified live: isolated HERMES_HOME, command provider declared in config.yaml, text_to_speech_tool dispatches through the registered shell command and the output file is produced as expected. Co-authored-by: Versun <me+github7604@versun.org>

teknium1 · 2026-04-30T10:03:25Z

Hey @versun — thanks for PR #17211. We thought hard about this one and ended up reshaping it into three separate PRs so the parts could land cleanly. All three are now on main:

1. Audio routing cleanup → PR #17833 (merged)
The should_send_media_as_audio() helper, .flac support, and the Telegram send_voice fallback to send_document for formats Bot-API can't play natively. This was the cleanest part of your PR and shipped standalone. Your authorship is preserved via Co-authored-by: in the commit.

2. Command-provider registry → PR #17843 (merged)
Instead of tts.provider: local_command as a hardcoded single provider, we reshaped it as a named registry:

tts:
  provider: my-voxcpm
  providers:
    my-voxcpm:
      type: command
      command: "voxcpm --ref ~/voice.wav --text-file {input_path} --out {output_path}"
      output_format: mp3
      voice_compatible: true

Users can declare any number of command providers and switch between them with tts.provider: <name>, same as built-ins. Built-in names always win — a tts.providers.openai entry can't shadow the native OpenAI provider. Your core logic (placeholder rendering, shell-quote-awareness, process-tree timeout, voice_compatible opt-in) is all intact — just named differently. Co-authored-by: you on the commit.

3. Native Piper → PR #17885 (merged, closes #8508)
Piper deserved first-class treatment for the "I just want local TTS in my language" use case that motivated both #11688 and #8508. One keystroke via hermes tools → Voice & TTS → Piper installs piper-tts and the default voice auto-downloads on first call. 44 languages. Works alongside the command registry, so if you want to wire VoxCPM-CLI or your own cloned-voice script, that path is still there.

Config migration for you
If you're using your original local_command shape, here's the equivalent in the new registry:

# Before (your PR #17211)
tts:
  provider: local_command
  local_command:
    command: 'my-tts --input {input_path} --output {output_path} --format {format}'
    timeout: 120
    output_format: mp3
    voice_compatible: false

# After (current main)
tts:
  provider: my-local-tts
  providers:
    my-local-tts:
      type: command
      command: 'my-tts --input {input_path} --output {output_path} --format {format}'
      timeout: 120
      output_format: mp3
      voice_compatible: false

Everything under tts.providers.<name> is the same keys you had under tts.local_command.

Docs
The new shape is fully documented at https://hermes-agent.nousresearch.com/docs/user-guide/features/tts — the Piper (local, 44 languages) and Custom command providers sections cover both paths.

Real appreciation for the work on #17211. The core mechanics (shell quoting per context, process-tree kill, voice_compatible flag) were solid — we kept all of that, just reshaped the surface so the door stays open for more custom engines without hardcoding each one.

teknium1 · 2026-04-30T10:03:38Z

Closing — reshaped into three merged PRs (#17833 / #17843 / #17885) with your authorship credited. See above comment for details + config migration.

@versun

… fallback (#17833) Extracted from PR #17211 (@versun) so it can land independently of the local_command TTS provider redesign. - Add should_send_media_as_audio(platform, ext, is_voice) in gateway/platforms/base.py; single source of truth for audio routing. - Add .flac to recognized audio extensions (MEDIA regex, weixin audio set, send_message audio set). - Telegram send_voice() now falls back to send_document for formats Telegram's Bot API can't play natively (.wav, .flac, ...) instead of raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice. - Route _send_telegram() in send_message_tool through a narrower _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set. - cron.scheduler._send_media_via_adapter now delegates the audio decision to should_send_media_as_audio so it matches the gateway. - Update the cron live-adapter ogg test to flag [[audio_as_voice]] so it still routes to sendVoice under the new Telegram-specific policy. - Tests: unit coverage for should_send_media_as_audio across platforms, end-to-end MEDIA routing via _process_message_background and GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice fallback for FLAC/WAV. Co-authored-by: Versun <me+github7604@versun.org>

@versun

… fallback (NousResearch#17833) Extracted from PR NousResearch#17211 (@versun) so it can land independently of the local_command TTS provider redesign. - Add should_send_media_as_audio(platform, ext, is_voice) in gateway/platforms/base.py; single source of truth for audio routing. - Add .flac to recognized audio extensions (MEDIA regex, weixin audio set, send_message audio set). - Telegram send_voice() now falls back to send_document for formats Telegram's Bot API can't play natively (.wav, .flac, ...) instead of raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice. - Route _send_telegram() in send_message_tool through a narrower _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set. - cron.scheduler._send_media_via_adapter now delegates the audio decision to should_send_media_as_audio so it matches the gateway. - Update the cron live-adapter ogg test to flag [[audio_as_voice]] so it still routes to sendVoice under the new Telegram-specific policy. - Tests: unit coverage for should_send_media_as_audio across platforms, end-to-end MEDIA routing via _process_message_background and GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice fallback for FLAC/WAV. Co-authored-by: Versun <me+github7604@versun.org>

@versun

…me> (NousResearch#17843) Reshape of PR NousResearch#17211 (@versun). Lets users wire any local or external TTS CLI into Hermes without adding engine-specific Python code. Users declare any number of named providers in config.yaml and switch between them with tts.provider: <name>, alongside the built-ins (edge, openai, elevenlabs, …). Config shape: tts: provider: piper-en providers: piper-en: type: command command: 'piper -m ~/model.onnx -f {output_path} < {input_path}' output_format: wav Placeholders: {input_path}, {text_path}, {output_path}, {format}, {voice}, {model}, {speed}. Use {{ / }} for literal braces. Key behavior: - Built-in provider names always win — a tts.providers.openai entry cannot shadow the native OpenAI provider. - type: command is the default when command: is set. - Placeholder values are shell-quote-aware (bare / single / double context), so paths with spaces and shell metacharacters are safe. - Default delivery is a regular audio attachment. voice_compatible: true opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion. - Command failures (non-zero exit, timeout, empty output) surface to the agent with stderr/stdout included so you can debug from chat. - Process-tree kill on timeout (Unix killpg, Windows taskkill /T). - max_text_length defaults to 5000 for command providers; override under tts.providers.<name>.max_text_length. Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover provider resolution, shell-quote context, placeholder rendering with injection payloads, timeout, non-zero exit, empty output, voice_compatible opt-in, and end-to-end dispatch through text_to_speech_tool. All 88 pre-existing TTS tests still pass. Docs: new "Custom command providers" section in website/docs/user-guide/features/tts.md with three worked examples (Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys, behavior notes, and security caveat. E2E-verified live: isolated HERMES_HOME, command provider declared in config.yaml, text_to_speech_tool dispatches through the registered shell command and the output file is produced as expected. Co-authored-by: Versun <me+github7604@versun.org>

@versun

… fallback (NousResearch#17833) Extracted from PR NousResearch#17211 (@versun) so it can land independently of the local_command TTS provider redesign. - Add should_send_media_as_audio(platform, ext, is_voice) in gateway/platforms/base.py; single source of truth for audio routing. - Add .flac to recognized audio extensions (MEDIA regex, weixin audio set, send_message audio set). - Telegram send_voice() now falls back to send_document for formats Telegram's Bot API can't play natively (.wav, .flac, ...) instead of raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice. - Route _send_telegram() in send_message_tool through a narrower _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set. - cron.scheduler._send_media_via_adapter now delegates the audio decision to should_send_media_as_audio so it matches the gateway. - Update the cron live-adapter ogg test to flag [[audio_as_voice]] so it still routes to sendVoice under the new Telegram-specific policy. - Tests: unit coverage for should_send_media_as_audio across platforms, end-to-end MEDIA routing via _process_message_background and GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice fallback for FLAC/WAV. Co-authored-by: Versun <me+github7604@versun.org>

@versun

…me> (NousResearch#17843) Reshape of PR NousResearch#17211 (@versun). Lets users wire any local or external TTS CLI into Hermes without adding engine-specific Python code. Users declare any number of named providers in config.yaml and switch between them with tts.provider: <name>, alongside the built-ins (edge, openai, elevenlabs, …). Config shape: tts: provider: piper-en providers: piper-en: type: command command: 'piper -m ~/model.onnx -f {output_path} < {input_path}' output_format: wav Placeholders: {input_path}, {text_path}, {output_path}, {format}, {voice}, {model}, {speed}. Use {{ / }} for literal braces. Key behavior: - Built-in provider names always win — a tts.providers.openai entry cannot shadow the native OpenAI provider. - type: command is the default when command: is set. - Placeholder values are shell-quote-aware (bare / single / double context), so paths with spaces and shell metacharacters are safe. - Default delivery is a regular audio attachment. voice_compatible: true opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion. - Command failures (non-zero exit, timeout, empty output) surface to the agent with stderr/stdout included so you can debug from chat. - Process-tree kill on timeout (Unix killpg, Windows taskkill /T). - max_text_length defaults to 5000 for command providers; override under tts.providers.<name>.max_text_length. Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover provider resolution, shell-quote context, placeholder rendering with injection payloads, timeout, non-zero exit, empty output, voice_compatible opt-in, and end-to-end dispatch through text_to_speech_tool. All 88 pre-existing TTS tests still pass. Docs: new "Custom command providers" section in website/docs/user-guide/features/tts.md with three worked examples (Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys, behavior notes, and security caveat. E2E-verified live: isolated HERMES_HOME, command provider declared in config.yaml, text_to_speech_tool dispatches through the registered shell command and the output file is produced as expected. Co-authored-by: Versun <me+github7604@versun.org>

@versun

… fallback (NousResearch#17833) Extracted from PR NousResearch#17211 (@versun) so it can land independently of the local_command TTS provider redesign. - Add should_send_media_as_audio(platform, ext, is_voice) in gateway/platforms/base.py; single source of truth for audio routing. - Add .flac to recognized audio extensions (MEDIA regex, weixin audio set, send_message audio set). - Telegram send_voice() now falls back to send_document for formats Telegram's Bot API can't play natively (.wav, .flac, ...) instead of raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice. - Route _send_telegram() in send_message_tool through a narrower _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set. - cron.scheduler._send_media_via_adapter now delegates the audio decision to should_send_media_as_audio so it matches the gateway. - Update the cron live-adapter ogg test to flag [[audio_as_voice]] so it still routes to sendVoice under the new Telegram-specific policy. - Tests: unit coverage for should_send_media_as_audio across platforms, end-to-end MEDIA routing via _process_message_background and GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice fallback for FLAC/WAV. Co-authored-by: Versun <me+github7604@versun.org>

@versun

…me> (NousResearch#17843) Reshape of PR NousResearch#17211 (@versun). Lets users wire any local or external TTS CLI into Hermes without adding engine-specific Python code. Users declare any number of named providers in config.yaml and switch between them with tts.provider: <name>, alongside the built-ins (edge, openai, elevenlabs, …). Config shape: tts: provider: piper-en providers: piper-en: type: command command: 'piper -m ~/model.onnx -f {output_path} < {input_path}' output_format: wav Placeholders: {input_path}, {text_path}, {output_path}, {format}, {voice}, {model}, {speed}. Use {{ / }} for literal braces. Key behavior: - Built-in provider names always win — a tts.providers.openai entry cannot shadow the native OpenAI provider. - type: command is the default when command: is set. - Placeholder values are shell-quote-aware (bare / single / double context), so paths with spaces and shell metacharacters are safe. - Default delivery is a regular audio attachment. voice_compatible: true opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion. - Command failures (non-zero exit, timeout, empty output) surface to the agent with stderr/stdout included so you can debug from chat. - Process-tree kill on timeout (Unix killpg, Windows taskkill /T). - max_text_length defaults to 5000 for command providers; override under tts.providers.<name>.max_text_length. Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover provider resolution, shell-quote context, placeholder rendering with injection payloads, timeout, non-zero exit, empty output, voice_compatible opt-in, and end-to-end dispatch through text_to_speech_tool. All 88 pre-existing TTS tests still pass. Docs: new "Custom command providers" section in website/docs/user-guide/features/tts.md with three worked examples (Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys, behavior notes, and security caveat. E2E-verified live: isolated HERMES_HOME, command provider declared in config.yaml, text_to_speech_tool dispatches through the registered shell command and the output file is produced as expected. Co-authored-by: Versun <me+github7604@versun.org>

@versun

… fallback (NousResearch#17833) Extracted from PR NousResearch#17211 (@versun) so it can land independently of the local_command TTS provider redesign. - Add should_send_media_as_audio(platform, ext, is_voice) in gateway/platforms/base.py; single source of truth for audio routing. - Add .flac to recognized audio extensions (MEDIA regex, weixin audio set, send_message audio set). - Telegram send_voice() now falls back to send_document for formats Telegram's Bot API can't play natively (.wav, .flac, ...) instead of raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice. - Route _send_telegram() in send_message_tool through a narrower _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set. - cron.scheduler._send_media_via_adapter now delegates the audio decision to should_send_media_as_audio so it matches the gateway. - Update the cron live-adapter ogg test to flag [[audio_as_voice]] so it still routes to sendVoice under the new Telegram-specific policy. - Tests: unit coverage for should_send_media_as_audio across platforms, end-to-end MEDIA routing via _process_message_background and GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice fallback for FLAC/WAV. Co-authored-by: Versun <me+github7604@versun.org>

@versun

…me> (NousResearch#17843) Reshape of PR NousResearch#17211 (@versun). Lets users wire any local or external TTS CLI into Hermes without adding engine-specific Python code. Users declare any number of named providers in config.yaml and switch between them with tts.provider: <name>, alongside the built-ins (edge, openai, elevenlabs, …). Config shape: tts: provider: piper-en providers: piper-en: type: command command: 'piper -m ~/model.onnx -f {output_path} < {input_path}' output_format: wav Placeholders: {input_path}, {text_path}, {output_path}, {format}, {voice}, {model}, {speed}. Use {{ / }} for literal braces. Key behavior: - Built-in provider names always win — a tts.providers.openai entry cannot shadow the native OpenAI provider. - type: command is the default when command: is set. - Placeholder values are shell-quote-aware (bare / single / double context), so paths with spaces and shell metacharacters are safe. - Default delivery is a regular audio attachment. voice_compatible: true opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion. - Command failures (non-zero exit, timeout, empty output) surface to the agent with stderr/stdout included so you can debug from chat. - Process-tree kill on timeout (Unix killpg, Windows taskkill /T). - max_text_length defaults to 5000 for command providers; override under tts.providers.<name>.max_text_length. Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover provider resolution, shell-quote context, placeholder rendering with injection payloads, timeout, non-zero exit, empty output, voice_compatible opt-in, and end-to-end dispatch through text_to_speech_tool. All 88 pre-existing TTS tests still pass. Docs: new "Custom command providers" section in website/docs/user-guide/features/tts.md with three worked examples (Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys, behavior notes, and security caveat. E2E-verified live: isolated HERMES_HOME, command provider declared in config.yaml, text_to_speech_tool dispatches through the registered shell command and the output file is produced as expected. Co-authored-by: Versun <me+github7604@versun.org>

@versun

… fallback (NousResearch#17833) Extracted from PR NousResearch#17211 (@versun) so it can land independently of the local_command TTS provider redesign. - Add should_send_media_as_audio(platform, ext, is_voice) in gateway/platforms/base.py; single source of truth for audio routing. - Add .flac to recognized audio extensions (MEDIA regex, weixin audio set, send_message audio set). - Telegram send_voice() now falls back to send_document for formats Telegram's Bot API can't play natively (.wav, .flac, ...) instead of raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice. - Route _send_telegram() in send_message_tool through a narrower _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set. - cron.scheduler._send_media_via_adapter now delegates the audio decision to should_send_media_as_audio so it matches the gateway. - Update the cron live-adapter ogg test to flag [[audio_as_voice]] so it still routes to sendVoice under the new Telegram-specific policy. - Tests: unit coverage for should_send_media_as_audio across platforms, end-to-end MEDIA routing via _process_message_background and GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice fallback for FLAC/WAV. Co-authored-by: Versun <me+github7604@versun.org>

@versun

…me> (NousResearch#17843) Reshape of PR NousResearch#17211 (@versun). Lets users wire any local or external TTS CLI into Hermes without adding engine-specific Python code. Users declare any number of named providers in config.yaml and switch between them with tts.provider: <name>, alongside the built-ins (edge, openai, elevenlabs, …). Config shape: tts: provider: piper-en providers: piper-en: type: command command: 'piper -m ~/model.onnx -f {output_path} < {input_path}' output_format: wav Placeholders: {input_path}, {text_path}, {output_path}, {format}, {voice}, {model}, {speed}. Use {{ / }} for literal braces. Key behavior: - Built-in provider names always win — a tts.providers.openai entry cannot shadow the native OpenAI provider. - type: command is the default when command: is set. - Placeholder values are shell-quote-aware (bare / single / double context), so paths with spaces and shell metacharacters are safe. - Default delivery is a regular audio attachment. voice_compatible: true opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion. - Command failures (non-zero exit, timeout, empty output) surface to the agent with stderr/stdout included so you can debug from chat. - Process-tree kill on timeout (Unix killpg, Windows taskkill /T). - max_text_length defaults to 5000 for command providers; override under tts.providers.<name>.max_text_length. Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover provider resolution, shell-quote context, placeholder rendering with injection payloads, timeout, non-zero exit, empty output, voice_compatible opt-in, and end-to-end dispatch through text_to_speech_tool. All 88 pre-existing TTS tests still pass. Docs: new "Custom command providers" section in website/docs/user-guide/features/tts.md with three worked examples (Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys, behavior notes, and security caveat. E2E-verified live: isolated HERMES_HOME, command provider declared in config.yaml, text_to_speech_tool dispatches through the registered shell command and the output file is produced as expected. Co-authored-by: Versun <me+github7604@versun.org>

@versun

… fallback (NousResearch#17833) Extracted from PR NousResearch#17211 (@versun) so it can land independently of the local_command TTS provider redesign. - Add should_send_media_as_audio(platform, ext, is_voice) in gateway/platforms/base.py; single source of truth for audio routing. - Add .flac to recognized audio extensions (MEDIA regex, weixin audio set, send_message audio set). - Telegram send_voice() now falls back to send_document for formats Telegram's Bot API can't play natively (.wav, .flac, ...) instead of raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice. - Route _send_telegram() in send_message_tool through a narrower _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set. - cron.scheduler._send_media_via_adapter now delegates the audio decision to should_send_media_as_audio so it matches the gateway. - Update the cron live-adapter ogg test to flag [[audio_as_voice]] so it still routes to sendVoice under the new Telegram-specific policy. - Tests: unit coverage for should_send_media_as_audio across platforms, end-to-end MEDIA routing via _process_message_background and GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice fallback for FLAC/WAV. Co-authored-by: Versun <me+github7604@versun.org>

@versun

…me> (NousResearch#17843) Reshape of PR NousResearch#17211 (@versun). Lets users wire any local or external TTS CLI into Hermes without adding engine-specific Python code. Users declare any number of named providers in config.yaml and switch between them with tts.provider: <name>, alongside the built-ins (edge, openai, elevenlabs, …). Config shape: tts: provider: piper-en providers: piper-en: type: command command: 'piper -m ~/model.onnx -f {output_path} < {input_path}' output_format: wav Placeholders: {input_path}, {text_path}, {output_path}, {format}, {voice}, {model}, {speed}. Use {{ / }} for literal braces. Key behavior: - Built-in provider names always win — a tts.providers.openai entry cannot shadow the native OpenAI provider. - type: command is the default when command: is set. - Placeholder values are shell-quote-aware (bare / single / double context), so paths with spaces and shell metacharacters are safe. - Default delivery is a regular audio attachment. voice_compatible: true opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion. - Command failures (non-zero exit, timeout, empty output) surface to the agent with stderr/stdout included so you can debug from chat. - Process-tree kill on timeout (Unix killpg, Windows taskkill /T). - max_text_length defaults to 5000 for command providers; override under tts.providers.<name>.max_text_length. Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover provider resolution, shell-quote context, placeholder rendering with injection payloads, timeout, non-zero exit, empty output, voice_compatible opt-in, and end-to-end dispatch through text_to_speech_tool. All 88 pre-existing TTS tests still pass. Docs: new "Custom command providers" section in website/docs/user-guide/features/tts.md with three worked examples (Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys, behavior notes, and security caveat. E2E-verified live: isolated HERMES_HOME, command provider declared in config.yaml, text_to_speech_tool dispatches through the registered shell command and the output file is produced as expected. Co-authored-by: Versun <me+github7604@versun.org>

@versun

… fallback (NousResearch#17833) Extracted from PR NousResearch#17211 (@versun) so it can land independently of the local_command TTS provider redesign. - Add should_send_media_as_audio(platform, ext, is_voice) in gateway/platforms/base.py; single source of truth for audio routing. - Add .flac to recognized audio extensions (MEDIA regex, weixin audio set, send_message audio set). - Telegram send_voice() now falls back to send_document for formats Telegram's Bot API can't play natively (.wav, .flac, ...) instead of raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice. - Route _send_telegram() in send_message_tool through a narrower _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set. - cron.scheduler._send_media_via_adapter now delegates the audio decision to should_send_media_as_audio so it matches the gateway. - Update the cron live-adapter ogg test to flag [[audio_as_voice]] so it still routes to sendVoice under the new Telegram-specific policy. - Tests: unit coverage for should_send_media_as_audio across platforms, end-to-end MEDIA routing via _process_message_background and GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice fallback for FLAC/WAV. Co-authored-by: Versun <me+github7604@versun.org>

@versun

…me> (NousResearch#17843) Reshape of PR NousResearch#17211 (@versun). Lets users wire any local or external TTS CLI into Hermes without adding engine-specific Python code. Users declare any number of named providers in config.yaml and switch between them with tts.provider: <name>, alongside the built-ins (edge, openai, elevenlabs, …). Config shape: tts: provider: piper-en providers: piper-en: type: command command: 'piper -m ~/model.onnx -f {output_path} < {input_path}' output_format: wav Placeholders: {input_path}, {text_path}, {output_path}, {format}, {voice}, {model}, {speed}. Use {{ / }} for literal braces. Key behavior: - Built-in provider names always win — a tts.providers.openai entry cannot shadow the native OpenAI provider. - type: command is the default when command: is set. - Placeholder values are shell-quote-aware (bare / single / double context), so paths with spaces and shell metacharacters are safe. - Default delivery is a regular audio attachment. voice_compatible: true opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion. - Command failures (non-zero exit, timeout, empty output) surface to the agent with stderr/stdout included so you can debug from chat. - Process-tree kill on timeout (Unix killpg, Windows taskkill /T). - max_text_length defaults to 5000 for command providers; override under tts.providers.<name>.max_text_length. Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover provider resolution, shell-quote context, placeholder rendering with injection payloads, timeout, non-zero exit, empty output, voice_compatible opt-in, and end-to-end dispatch through text_to_speech_tool. All 88 pre-existing TTS tests still pass. Docs: new "Custom command providers" section in website/docs/user-guide/features/tts.md with three worked examples (Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys, behavior notes, and security caveat. E2E-verified live: isolated HERMES_HOME, command provider declared in config.yaml, text_to_speech_tool dispatches through the registered shell command and the output file is produced as expected. Co-authored-by: Versun <me+github7604@versun.org>

alt-glitch added type/feature New feature or request tool/tts Text-to-speech and transcription P3 Low — cosmetic, nice to have labels Apr 29, 2026

versun marked this pull request as draft April 29, 2026 04:35

versun added 6 commits April 29, 2026 12:41

test: cover local command tts provider

dddc10f

feat: add local command tts provider

6ce0167

fix: refine local command tts output handling

9940c56

Add local command TTS setup docs

98d6fab

fix: tighten local command tts setup and voice opt-in

9d647f8

fix: route flac tts media as audio

1bd32d7

versun force-pushed the feat/tts-local-command branch from 5907aa5 to 1bd32d7 Compare April 29, 2026 05:35

Route audio media by platform and tighten Telegram delivery

a4bfbbc

versun marked this pull request as ready for review April 29, 2026 12:19

versun added 3 commits April 30, 2026 08:48

Merge branch 'main' into feat/tts-local-command

5c3b632

Merge branch 'main' into feat/tts-local-command

13bc4df

Correct number of TTS provider options in documentation

f6ecd4a

Updated the number of TTS provider options from nine to ten.

teknium1 mentioned this pull request Apr 30, 2026

feat(gateway): centralize audio routing + FLAC support + Telegram doc fallback #17833

Merged

teknium1 mentioned this pull request Apr 30, 2026

feat(tts): add command-type provider registry under tts.providers.<name> #17843

Merged

teknium1 mentioned this pull request Apr 30, 2026

feat(tts): add Piper as a native local TTS provider (closes #8508) #17885

Merged

teknium1 closed this Apr 30, 2026

versun deleted the feat/tts-local-command branch May 1, 2026 01:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tts): add local_command provider#17211

feat(tts): add local_command provider#17211
versun wants to merge 10 commits into
NousResearch:mainfrom
versun:feat/tts-local-command

versun commented Apr 29, 2026 •

edited

Loading

Uh oh!

teknium1 commented Apr 30, 2026

Uh oh!

teknium1 commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

versun commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Related Issue

Type of Change

Changes Made

Local Command TTS

Audio Delivery Routing

Configuration, Setup, and Docs

Tests

How to Test

Verification

Checklist

Code

Documentation & Housekeeping

Security / Robustness Notes

Uh oh!

teknium1 commented Apr 30, 2026

Uh oh!

teknium1 commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

versun commented Apr 29, 2026 •

edited

Loading