Skip to content

feat(tts): add local_command provider#17211

Closed
versun wants to merge 10 commits into
NousResearch:mainfrom
versun:feat/tts-local-command
Closed

feat(tts): add local_command provider#17211
versun wants to merge 10 commits into
NousResearch:mainfrom
versun:feat/tts-local-command

Conversation

@versun

@versun versun commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Adds tts.provider: local_command, a generic bridge for running a user-configured local text-to-speech command from Hermes.

The provider lets users keep engine-specific dependencies outside Hermes core while still using local or experimental TTS engines such as Piper, VoxCPM, Qwen/MLX wrappers, or any script that can read text from a file and write audio to a file.

This is intentionally a small Phase 1 bridge, not a full TTS plugin registry. It gives Hermes a stable local command path now while leaving room for a future register_tts_provider(...) plugin API.

Related Issue

Related to #11688 and #8508.

Fixes: N/A

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

Local Command TTS

  • Adds local_command as a TTS provider in tools/tts_tool.py.
  • Requires an explicit tts.local_command.command config value.
  • Writes TTS input to a temporary UTF-8 text file.
  • Supports {input_path}, {text_path}, {output_path}, {format}, {voice}, {model}, and {speed} placeholders.
  • Preserves literal braces in command templates and safely quotes placeholder values for their shell context.
  • Supports mp3, wav, ogg, and flac output via output_format / format.
  • Validates that the command produced a non-empty output file.
  • Adds configurable timeout handling and process-tree cleanup.
  • Adds a voice_compatible opt-in for voice-bubble delivery, including ffmpeg conversion to Opus/OGG when needed.
  • Enforces the local_command max text length through the existing TTS length resolver.

Audio Delivery Routing

  • Centralizes platform-aware audio routing with should_send_media_as_audio(...).
  • Adds .flac to recognized audio media.
  • Keeps Telegram voice bubbles limited to .ogg / .opus when the media is explicitly voice-compatible.
  • Sends Telegram .mp3 / .m4a as audio attachments.
  • Falls back to document/file delivery for Telegram audio formats that Telegram cannot send as voice/audio.
  • Applies the same routing rules to gateway replies, auto-TTS, extracted MEDIA: files, scheduled job delivery, and send_message media routing.

Configuration, Setup, and Docs

  • Adds tts.local_command defaults to hermes_cli/config.py.
  • Adds Local Command TTS to setup, tools configuration, dashboard config schema, and CLI tips.
  • Updates voice/TTS documentation and provider lists.
  • Documents Local Command configuration, placeholders, supported formats, timeout, and voice_compatible behavior.

Tests

  • Adds focused Local Command TTS coverage for command rendering, placeholders, literal braces, quoted paths, output formats, missing commands, failures, timeouts, child-process cleanup, stale output cleanup, JSON response shape, requirement detection, and voice compatibility.
  • Adds gateway/platform routing coverage for Telegram and non-Telegram audio delivery.
  • Adds setup/tools/dashboard coverage for the new provider option.
  • Updates scheduler, voice command, and send-message tests for the shared audio routing behavior.

How to Test

  1. Run the focused Local Command and related audio-routing tests:

    scripts/run_tests.sh tests/tools/test_tts_local_command.py tests/tools/test_tts_max_text_length.py tests/gateway/test_tts_media_routing.py tests/gateway/test_voice_command.py tests/gateway/test_telegram_documents.py tests/cron/test_scheduler.py tests/hermes_cli/test_setup.py tests/hermes_cli/test_tools_config.py tests/tools/test_send_message_tool.py -q
  2. Run the dashboard schema test that covers the new provider option:

    scripts/run_tests.sh tests/hermes_cli/test_web_server.py::TestBuildSchemaFromConfig::test_tts_provider_options_include_local_command -q
  3. Configure a local bridge command:

    tts:
      provider: local_command
      local_command:
        command: 'my-tts --input {input_path} --output {output_path} --format {format}'
        timeout: 120
        output_format: mp3
        voice_compatible: false
  4. Generate TTS through Hermes and verify that the output audio file is created and playable.

Verification

Focused related tests pass:

$ scripts/run_tests.sh tests/tools/test_tts_local_command.py tests/tools/test_tts_max_text_length.py tests/gateway/test_tts_media_routing.py tests/gateway/test_voice_command.py tests/gateway/test_telegram_documents.py tests/cron/test_scheduler.py tests/hermes_cli/test_setup.py tests/hermes_cli/test_tools_config.py tests/tools/test_send_message_tool.py -q
518 passed, 21 skipped, 3 warnings in 9.51s

Dashboard provider-option coverage passes:

$ scripts/run_tests.sh tests/hermes_cli/test_web_server.py::TestBuildSchemaFromConfig::test_tts_provider_options_include_local_command -q
1 passed, 1 warning in 3.67s

Known local verification note:

$ scripts/run_tests.sh tests/tools/test_tts_local_command.py tests/tools/test_tts_max_text_length.py tests/gateway/test_tts_media_routing.py tests/gateway/test_voice_command.py tests/gateway/test_telegram_documents.py tests/cron/test_scheduler.py tests/hermes_cli/test_setup.py tests/hermes_cli/test_tools_config.py tests/hermes_cli/test_web_server.py tests/tools/test_send_message_tool.py -q
637 passed, 21 skipped, 4 failed, 3 warnings in 10.10s

The four failures are in the broader dashboard/web test file:

  • TestBuildSchemaFromConfig.test_no_single_field_categories (prompt_caching has one field)
  • TestPtyWebSocket.test_streams_child_stdout_to_client
  • TestPtyWebSocket.test_client_input_reaches_child_stdin
  • TestPtyWebSocket.test_resize_escape_is_forwarded

Those failures are visible when the full tests/hermes_cli/test_web_server.py file is included, but the Local Command provider-option dashboard test passes separately.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS 15 / Apple Silicon

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Security / Robustness Notes

This feature intentionally runs a command configured by the local user. It should be treated as a trusted local command path, not as a sandbox for untrusted command templates.

Hermes limits its side of the bridge by requiring explicit config, passing text through a temporary UTF-8 file, using explicit input/output placeholders, quoting placeholder values, validating the output file, and cleaning up timed-out process trees.

@alt-glitch alt-glitch added type/feature New feature or request tool/tts Text-to-speech and transcription P3 Low — cosmetic, nice to have labels Apr 29, 2026
@versun versun marked this pull request as draft April 29, 2026 04:35
@versun versun force-pushed the feat/tts-local-command branch from 5907aa5 to 1bd32d7 Compare April 29, 2026 05:35
@versun versun marked this pull request as ready for review April 29, 2026 12:19
teknium1 added a commit that referenced this pull request Apr 30, 2026
… fallback (#17833)

Extracted from PR #17211 (@versun) so it can land independently of the
local_command TTS provider redesign.

- Add should_send_media_as_audio(platform, ext, is_voice) in
  gateway/platforms/base.py; single source of truth for audio routing.
- Add .flac to recognized audio extensions (MEDIA regex, weixin audio
  set, send_message audio set).
- Telegram send_voice() now falls back to send_document for formats
  Telegram's Bot API can't play natively (.wav, .flac, ...) instead of
  raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice.
- Route _send_telegram() in send_message_tool through a narrower
  _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set.
- cron.scheduler._send_media_via_adapter now delegates the audio
  decision to should_send_media_as_audio so it matches the gateway.
- Update the cron live-adapter ogg test to flag [[audio_as_voice]] so
  it still routes to sendVoice under the new Telegram-specific policy.
- Tests: unit coverage for should_send_media_as_audio across platforms,
  end-to-end MEDIA routing via _process_message_background and
  GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice
  fallback for FLAC/WAV.

Co-authored-by: Versun <me+github7604@versun.org>
teknium1 added a commit that referenced this pull request Apr 30, 2026
…me> (#17843)

Reshape of PR #17211 (@versun). Lets users wire any local or external
TTS CLI into Hermes without adding engine-specific Python code. Users
declare any number of named providers in config.yaml and switch between
them with tts.provider: <name>, alongside the built-ins (edge, openai,
elevenlabs, …).

Config shape:

  tts:
    provider: piper-en
    providers:
      piper-en:
        type: command
        command: 'piper -m ~/model.onnx -f {output_path} < {input_path}'
        output_format: wav

Placeholders: {input_path}, {text_path}, {output_path}, {format},
{voice}, {model}, {speed}. Use {{ / }} for literal braces.

Key behavior:
- Built-in provider names always win — a tts.providers.openai entry
  cannot shadow the native OpenAI provider.
- type: command is the default when command: is set.
- Placeholder values are shell-quote-aware (bare / single / double
  context), so paths with spaces and shell metacharacters are safe.
- Default delivery is a regular audio attachment. voice_compatible: true
  opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion.
- Command failures (non-zero exit, timeout, empty output) surface to
  the agent with stderr/stdout included so you can debug from chat.
- Process-tree kill on timeout (Unix killpg, Windows taskkill /T).
- max_text_length defaults to 5000 for command providers; override
  under tts.providers.<name>.max_text_length.

Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover
provider resolution, shell-quote context, placeholder rendering with
injection payloads, timeout, non-zero exit, empty output, voice_compatible
opt-in, and end-to-end dispatch through text_to_speech_tool. All 88
pre-existing TTS tests still pass.

Docs: new "Custom command providers" section in
website/docs/user-guide/features/tts.md with three worked examples
(Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys,
behavior notes, and security caveat.

E2E-verified live: isolated HERMES_HOME, command provider declared in
config.yaml, text_to_speech_tool dispatches through the registered
shell command and the output file is produced as expected.

Co-authored-by: Versun <me+github7604@versun.org>
@teknium1

Copy link
Copy Markdown
Contributor

Hey @versun — thanks for PR #17211. We thought hard about this one and ended up reshaping it into three separate PRs so the parts could land cleanly. All three are now on main:

1. Audio routing cleanup → PR #17833 (merged)
The should_send_media_as_audio() helper, .flac support, and the Telegram send_voice fallback to send_document for formats Bot-API can't play natively. This was the cleanest part of your PR and shipped standalone. Your authorship is preserved via Co-authored-by: in the commit.

2. Command-provider registry → PR #17843 (merged)
Instead of tts.provider: local_command as a hardcoded single provider, we reshaped it as a named registry:

tts:
  provider: my-voxcpm
  providers:
    my-voxcpm:
      type: command
      command: "voxcpm --ref ~/voice.wav --text-file {input_path} --out {output_path}"
      output_format: mp3
      voice_compatible: true

Users can declare any number of command providers and switch between them with tts.provider: <name>, same as built-ins. Built-in names always win — a tts.providers.openai entry can't shadow the native OpenAI provider. Your core logic (placeholder rendering, shell-quote-awareness, process-tree timeout, voice_compatible opt-in) is all intact — just named differently. Co-authored-by: you on the commit.

3. Native Piper → PR #17885 (merged, closes #8508)
Piper deserved first-class treatment for the "I just want local TTS in my language" use case that motivated both #11688 and #8508. One keystroke via hermes tools → Voice & TTS → Piper installs piper-tts and the default voice auto-downloads on first call. 44 languages. Works alongside the command registry, so if you want to wire VoxCPM-CLI or your own cloned-voice script, that path is still there.

Config migration for you
If you're using your original local_command shape, here's the equivalent in the new registry:

# Before (your PR #17211)
tts:
  provider: local_command
  local_command:
    command: 'my-tts --input {input_path} --output {output_path} --format {format}'
    timeout: 120
    output_format: mp3
    voice_compatible: false

# After (current main)
tts:
  provider: my-local-tts
  providers:
    my-local-tts:
      type: command
      command: 'my-tts --input {input_path} --output {output_path} --format {format}'
      timeout: 120
      output_format: mp3
      voice_compatible: false

Everything under tts.providers.<name> is the same keys you had under tts.local_command.

Docs
The new shape is fully documented at https://hermes-agent.nousresearch.com/docs/user-guide/features/tts — the Piper (local, 44 languages) and Custom command providers sections cover both paths.

Real appreciation for the work on #17211. The core mechanics (shell quoting per context, process-tree kill, voice_compatible flag) were solid — we kept all of that, just reshaped the surface so the door stays open for more custom engines without hardcoding each one.

@teknium1

Copy link
Copy Markdown
Contributor

Closing — reshaped into three merged PRs (#17833 / #17843 / #17885) with your authorship credited. See above comment for details + config migration.

@teknium1 teknium1 closed this Apr 30, 2026
cluricaun28 referenced this pull request in cluricaun28/Logos Apr 30, 2026
… fallback (#17833)

Extracted from PR #17211 (@versun) so it can land independently of the
local_command TTS provider redesign.

- Add should_send_media_as_audio(platform, ext, is_voice) in
  gateway/platforms/base.py; single source of truth for audio routing.
- Add .flac to recognized audio extensions (MEDIA regex, weixin audio
  set, send_message audio set).
- Telegram send_voice() now falls back to send_document for formats
  Telegram's Bot API can't play natively (.wav, .flac, ...) instead of
  raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice.
- Route _send_telegram() in send_message_tool through a narrower
  _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set.
- cron.scheduler._send_media_via_adapter now delegates the audio
  decision to should_send_media_as_audio so it matches the gateway.
- Update the cron live-adapter ogg test to flag [[audio_as_voice]] so
  it still routes to sendVoice under the new Telegram-specific policy.
- Tests: unit coverage for should_send_media_as_audio across platforms,
  end-to-end MEDIA routing via _process_message_background and
  GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice
  fallback for FLAC/WAV.

Co-authored-by: Versun <me+github7604@versun.org>
@versun versun deleted the feat/tts-local-command branch May 1, 2026 01:11
donald131 pushed a commit to donald131/hermes-agent that referenced this pull request May 2, 2026
… fallback (NousResearch#17833)

Extracted from PR NousResearch#17211 (@versun) so it can land independently of the
local_command TTS provider redesign.

- Add should_send_media_as_audio(platform, ext, is_voice) in
  gateway/platforms/base.py; single source of truth for audio routing.
- Add .flac to recognized audio extensions (MEDIA regex, weixin audio
  set, send_message audio set).
- Telegram send_voice() now falls back to send_document for formats
  Telegram's Bot API can't play natively (.wav, .flac, ...) instead of
  raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice.
- Route _send_telegram() in send_message_tool through a narrower
  _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set.
- cron.scheduler._send_media_via_adapter now delegates the audio
  decision to should_send_media_as_audio so it matches the gateway.
- Update the cron live-adapter ogg test to flag [[audio_as_voice]] so
  it still routes to sendVoice under the new Telegram-specific policy.
- Tests: unit coverage for should_send_media_as_audio across platforms,
  end-to-end MEDIA routing via _process_message_background and
  GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice
  fallback for FLAC/WAV.

Co-authored-by: Versun <me+github7604@versun.org>
donald131 pushed a commit to donald131/hermes-agent that referenced this pull request May 2, 2026
…me> (NousResearch#17843)

Reshape of PR NousResearch#17211 (@versun). Lets users wire any local or external
TTS CLI into Hermes without adding engine-specific Python code. Users
declare any number of named providers in config.yaml and switch between
them with tts.provider: <name>, alongside the built-ins (edge, openai,
elevenlabs, …).

Config shape:

  tts:
    provider: piper-en
    providers:
      piper-en:
        type: command
        command: 'piper -m ~/model.onnx -f {output_path} < {input_path}'
        output_format: wav

Placeholders: {input_path}, {text_path}, {output_path}, {format},
{voice}, {model}, {speed}. Use {{ / }} for literal braces.

Key behavior:
- Built-in provider names always win — a tts.providers.openai entry
  cannot shadow the native OpenAI provider.
- type: command is the default when command: is set.
- Placeholder values are shell-quote-aware (bare / single / double
  context), so paths with spaces and shell metacharacters are safe.
- Default delivery is a regular audio attachment. voice_compatible: true
  opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion.
- Command failures (non-zero exit, timeout, empty output) surface to
  the agent with stderr/stdout included so you can debug from chat.
- Process-tree kill on timeout (Unix killpg, Windows taskkill /T).
- max_text_length defaults to 5000 for command providers; override
  under tts.providers.<name>.max_text_length.

Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover
provider resolution, shell-quote context, placeholder rendering with
injection payloads, timeout, non-zero exit, empty output, voice_compatible
opt-in, and end-to-end dispatch through text_to_speech_tool. All 88
pre-existing TTS tests still pass.

Docs: new "Custom command providers" section in
website/docs/user-guide/features/tts.md with three worked examples
(Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys,
behavior notes, and security caveat.

E2E-verified live: isolated HERMES_HOME, command provider declared in
config.yaml, text_to_speech_tool dispatches through the registered
shell command and the output file is produced as expected.

Co-authored-by: Versun <me+github7604@versun.org>
nickdlkk pushed a commit to nickdlkk/hermes-agent that referenced this pull request May 11, 2026
… fallback (NousResearch#17833)

Extracted from PR NousResearch#17211 (@versun) so it can land independently of the
local_command TTS provider redesign.

- Add should_send_media_as_audio(platform, ext, is_voice) in
  gateway/platforms/base.py; single source of truth for audio routing.
- Add .flac to recognized audio extensions (MEDIA regex, weixin audio
  set, send_message audio set).
- Telegram send_voice() now falls back to send_document for formats
  Telegram's Bot API can't play natively (.wav, .flac, ...) instead of
  raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice.
- Route _send_telegram() in send_message_tool through a narrower
  _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set.
- cron.scheduler._send_media_via_adapter now delegates the audio
  decision to should_send_media_as_audio so it matches the gateway.
- Update the cron live-adapter ogg test to flag [[audio_as_voice]] so
  it still routes to sendVoice under the new Telegram-specific policy.
- Tests: unit coverage for should_send_media_as_audio across platforms,
  end-to-end MEDIA routing via _process_message_background and
  GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice
  fallback for FLAC/WAV.

Co-authored-by: Versun <me+github7604@versun.org>
nickdlkk pushed a commit to nickdlkk/hermes-agent that referenced this pull request May 11, 2026
…me> (NousResearch#17843)

Reshape of PR NousResearch#17211 (@versun). Lets users wire any local or external
TTS CLI into Hermes without adding engine-specific Python code. Users
declare any number of named providers in config.yaml and switch between
them with tts.provider: <name>, alongside the built-ins (edge, openai,
elevenlabs, …).

Config shape:

  tts:
    provider: piper-en
    providers:
      piper-en:
        type: command
        command: 'piper -m ~/model.onnx -f {output_path} < {input_path}'
        output_format: wav

Placeholders: {input_path}, {text_path}, {output_path}, {format},
{voice}, {model}, {speed}. Use {{ / }} for literal braces.

Key behavior:
- Built-in provider names always win — a tts.providers.openai entry
  cannot shadow the native OpenAI provider.
- type: command is the default when command: is set.
- Placeholder values are shell-quote-aware (bare / single / double
  context), so paths with spaces and shell metacharacters are safe.
- Default delivery is a regular audio attachment. voice_compatible: true
  opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion.
- Command failures (non-zero exit, timeout, empty output) surface to
  the agent with stderr/stdout included so you can debug from chat.
- Process-tree kill on timeout (Unix killpg, Windows taskkill /T).
- max_text_length defaults to 5000 for command providers; override
  under tts.providers.<name>.max_text_length.

Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover
provider resolution, shell-quote context, placeholder rendering with
injection payloads, timeout, non-zero exit, empty output, voice_compatible
opt-in, and end-to-end dispatch through text_to_speech_tool. All 88
pre-existing TTS tests still pass.

Docs: new "Custom command providers" section in
website/docs/user-guide/features/tts.md with three worked examples
(Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys,
behavior notes, and security caveat.

E2E-verified live: isolated HERMES_HOME, command provider declared in
config.yaml, text_to_speech_tool dispatches through the registered
shell command and the output file is produced as expected.

Co-authored-by: Versun <me+github7604@versun.org>
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
… fallback (NousResearch#17833)

Extracted from PR NousResearch#17211 (@versun) so it can land independently of the
local_command TTS provider redesign.

- Add should_send_media_as_audio(platform, ext, is_voice) in
  gateway/platforms/base.py; single source of truth for audio routing.
- Add .flac to recognized audio extensions (MEDIA regex, weixin audio
  set, send_message audio set).
- Telegram send_voice() now falls back to send_document for formats
  Telegram's Bot API can't play natively (.wav, .flac, ...) instead of
  raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice.
- Route _send_telegram() in send_message_tool through a narrower
  _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set.
- cron.scheduler._send_media_via_adapter now delegates the audio
  decision to should_send_media_as_audio so it matches the gateway.
- Update the cron live-adapter ogg test to flag [[audio_as_voice]] so
  it still routes to sendVoice under the new Telegram-specific policy.
- Tests: unit coverage for should_send_media_as_audio across platforms,
  end-to-end MEDIA routing via _process_message_background and
  GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice
  fallback for FLAC/WAV.

Co-authored-by: Versun <me+github7604@versun.org>
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…me> (NousResearch#17843)

Reshape of PR NousResearch#17211 (@versun). Lets users wire any local or external
TTS CLI into Hermes without adding engine-specific Python code. Users
declare any number of named providers in config.yaml and switch between
them with tts.provider: <name>, alongside the built-ins (edge, openai,
elevenlabs, …).

Config shape:

  tts:
    provider: piper-en
    providers:
      piper-en:
        type: command
        command: 'piper -m ~/model.onnx -f {output_path} < {input_path}'
        output_format: wav

Placeholders: {input_path}, {text_path}, {output_path}, {format},
{voice}, {model}, {speed}. Use {{ / }} for literal braces.

Key behavior:
- Built-in provider names always win — a tts.providers.openai entry
  cannot shadow the native OpenAI provider.
- type: command is the default when command: is set.
- Placeholder values are shell-quote-aware (bare / single / double
  context), so paths with spaces and shell metacharacters are safe.
- Default delivery is a regular audio attachment. voice_compatible: true
  opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion.
- Command failures (non-zero exit, timeout, empty output) surface to
  the agent with stderr/stdout included so you can debug from chat.
- Process-tree kill on timeout (Unix killpg, Windows taskkill /T).
- max_text_length defaults to 5000 for command providers; override
  under tts.providers.<name>.max_text_length.

Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover
provider resolution, shell-quote context, placeholder rendering with
injection payloads, timeout, non-zero exit, empty output, voice_compatible
opt-in, and end-to-end dispatch through text_to_speech_tool. All 88
pre-existing TTS tests still pass.

Docs: new "Custom command providers" section in
website/docs/user-guide/features/tts.md with three worked examples
(Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys,
behavior notes, and security caveat.

E2E-verified live: isolated HERMES_HOME, command provider declared in
config.yaml, text_to_speech_tool dispatches through the registered
shell command and the output file is produced as expected.

Co-authored-by: Versun <me+github7604@versun.org>
jsboige pushed a commit to jsboige/hermes-agent that referenced this pull request May 14, 2026
… fallback (NousResearch#17833)

Extracted from PR NousResearch#17211 (@versun) so it can land independently of the
local_command TTS provider redesign.

- Add should_send_media_as_audio(platform, ext, is_voice) in
  gateway/platforms/base.py; single source of truth for audio routing.
- Add .flac to recognized audio extensions (MEDIA regex, weixin audio
  set, send_message audio set).
- Telegram send_voice() now falls back to send_document for formats
  Telegram's Bot API can't play natively (.wav, .flac, ...) instead of
  raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice.
- Route _send_telegram() in send_message_tool through a narrower
  _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set.
- cron.scheduler._send_media_via_adapter now delegates the audio
  decision to should_send_media_as_audio so it matches the gateway.
- Update the cron live-adapter ogg test to flag [[audio_as_voice]] so
  it still routes to sendVoice under the new Telegram-specific policy.
- Tests: unit coverage for should_send_media_as_audio across platforms,
  end-to-end MEDIA routing via _process_message_background and
  GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice
  fallback for FLAC/WAV.

Co-authored-by: Versun <me+github7604@versun.org>
jsboige pushed a commit to jsboige/hermes-agent that referenced this pull request May 14, 2026
…me> (NousResearch#17843)

Reshape of PR NousResearch#17211 (@versun). Lets users wire any local or external
TTS CLI into Hermes without adding engine-specific Python code. Users
declare any number of named providers in config.yaml and switch between
them with tts.provider: <name>, alongside the built-ins (edge, openai,
elevenlabs, …).

Config shape:

  tts:
    provider: piper-en
    providers:
      piper-en:
        type: command
        command: 'piper -m ~/model.onnx -f {output_path} < {input_path}'
        output_format: wav

Placeholders: {input_path}, {text_path}, {output_path}, {format},
{voice}, {model}, {speed}. Use {{ / }} for literal braces.

Key behavior:
- Built-in provider names always win — a tts.providers.openai entry
  cannot shadow the native OpenAI provider.
- type: command is the default when command: is set.
- Placeholder values are shell-quote-aware (bare / single / double
  context), so paths with spaces and shell metacharacters are safe.
- Default delivery is a regular audio attachment. voice_compatible: true
  opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion.
- Command failures (non-zero exit, timeout, empty output) surface to
  the agent with stderr/stdout included so you can debug from chat.
- Process-tree kill on timeout (Unix killpg, Windows taskkill /T).
- max_text_length defaults to 5000 for command providers; override
  under tts.providers.<name>.max_text_length.

Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover
provider resolution, shell-quote context, placeholder rendering with
injection payloads, timeout, non-zero exit, empty output, voice_compatible
opt-in, and end-to-end dispatch through text_to_speech_tool. All 88
pre-existing TTS tests still pass.

Docs: new "Custom command providers" section in
website/docs/user-guide/features/tts.md with three worked examples
(Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys,
behavior notes, and security caveat.

E2E-verified live: isolated HERMES_HOME, command provider declared in
config.yaml, text_to_speech_tool dispatches through the registered
shell command and the output file is produced as expected.

Co-authored-by: Versun <me+github7604@versun.org>
dannyJ848 pushed a commit to dannyJ848/hermes-agent that referenced this pull request May 17, 2026
… fallback (NousResearch#17833)

Extracted from PR NousResearch#17211 (@versun) so it can land independently of the
local_command TTS provider redesign.

- Add should_send_media_as_audio(platform, ext, is_voice) in
  gateway/platforms/base.py; single source of truth for audio routing.
- Add .flac to recognized audio extensions (MEDIA regex, weixin audio
  set, send_message audio set).
- Telegram send_voice() now falls back to send_document for formats
  Telegram's Bot API can't play natively (.wav, .flac, ...) instead of
  raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice.
- Route _send_telegram() in send_message_tool through a narrower
  _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set.
- cron.scheduler._send_media_via_adapter now delegates the audio
  decision to should_send_media_as_audio so it matches the gateway.
- Update the cron live-adapter ogg test to flag [[audio_as_voice]] so
  it still routes to sendVoice under the new Telegram-specific policy.
- Tests: unit coverage for should_send_media_as_audio across platforms,
  end-to-end MEDIA routing via _process_message_background and
  GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice
  fallback for FLAC/WAV.

Co-authored-by: Versun <me+github7604@versun.org>
dannyJ848 pushed a commit to dannyJ848/hermes-agent that referenced this pull request May 17, 2026
…me> (NousResearch#17843)

Reshape of PR NousResearch#17211 (@versun). Lets users wire any local or external
TTS CLI into Hermes without adding engine-specific Python code. Users
declare any number of named providers in config.yaml and switch between
them with tts.provider: <name>, alongside the built-ins (edge, openai,
elevenlabs, …).

Config shape:

  tts:
    provider: piper-en
    providers:
      piper-en:
        type: command
        command: 'piper -m ~/model.onnx -f {output_path} < {input_path}'
        output_format: wav

Placeholders: {input_path}, {text_path}, {output_path}, {format},
{voice}, {model}, {speed}. Use {{ / }} for literal braces.

Key behavior:
- Built-in provider names always win — a tts.providers.openai entry
  cannot shadow the native OpenAI provider.
- type: command is the default when command: is set.
- Placeholder values are shell-quote-aware (bare / single / double
  context), so paths with spaces and shell metacharacters are safe.
- Default delivery is a regular audio attachment. voice_compatible: true
  opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion.
- Command failures (non-zero exit, timeout, empty output) surface to
  the agent with stderr/stdout included so you can debug from chat.
- Process-tree kill on timeout (Unix killpg, Windows taskkill /T).
- max_text_length defaults to 5000 for command providers; override
  under tts.providers.<name>.max_text_length.

Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover
provider resolution, shell-quote context, placeholder rendering with
injection payloads, timeout, non-zero exit, empty output, voice_compatible
opt-in, and end-to-end dispatch through text_to_speech_tool. All 88
pre-existing TTS tests still pass.

Docs: new "Custom command providers" section in
website/docs/user-guide/features/tts.md with three worked examples
(Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys,
behavior notes, and security caveat.

E2E-verified live: isolated HERMES_HOME, command provider declared in
config.yaml, text_to_speech_tool dispatches through the registered
shell command and the output file is produced as expected.

Co-authored-by: Versun <me+github7604@versun.org>
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
… fallback (NousResearch#17833)

Extracted from PR NousResearch#17211 (@versun) so it can land independently of the
local_command TTS provider redesign.

- Add should_send_media_as_audio(platform, ext, is_voice) in
  gateway/platforms/base.py; single source of truth for audio routing.
- Add .flac to recognized audio extensions (MEDIA regex, weixin audio
  set, send_message audio set).
- Telegram send_voice() now falls back to send_document for formats
  Telegram's Bot API can't play natively (.wav, .flac, ...) instead of
  raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice.
- Route _send_telegram() in send_message_tool through a narrower
  _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set.
- cron.scheduler._send_media_via_adapter now delegates the audio
  decision to should_send_media_as_audio so it matches the gateway.
- Update the cron live-adapter ogg test to flag [[audio_as_voice]] so
  it still routes to sendVoice under the new Telegram-specific policy.
- Tests: unit coverage for should_send_media_as_audio across platforms,
  end-to-end MEDIA routing via _process_message_background and
  GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice
  fallback for FLAC/WAV.

Co-authored-by: Versun <me+github7604@versun.org>
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…me> (NousResearch#17843)

Reshape of PR NousResearch#17211 (@versun). Lets users wire any local or external
TTS CLI into Hermes without adding engine-specific Python code. Users
declare any number of named providers in config.yaml and switch between
them with tts.provider: <name>, alongside the built-ins (edge, openai,
elevenlabs, …).

Config shape:

  tts:
    provider: piper-en
    providers:
      piper-en:
        type: command
        command: 'piper -m ~/model.onnx -f {output_path} < {input_path}'
        output_format: wav

Placeholders: {input_path}, {text_path}, {output_path}, {format},
{voice}, {model}, {speed}. Use {{ / }} for literal braces.

Key behavior:
- Built-in provider names always win — a tts.providers.openai entry
  cannot shadow the native OpenAI provider.
- type: command is the default when command: is set.
- Placeholder values are shell-quote-aware (bare / single / double
  context), so paths with spaces and shell metacharacters are safe.
- Default delivery is a regular audio attachment. voice_compatible: true
  opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion.
- Command failures (non-zero exit, timeout, empty output) surface to
  the agent with stderr/stdout included so you can debug from chat.
- Process-tree kill on timeout (Unix killpg, Windows taskkill /T).
- max_text_length defaults to 5000 for command providers; override
  under tts.providers.<name>.max_text_length.

Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover
provider resolution, shell-quote context, placeholder rendering with
injection payloads, timeout, non-zero exit, empty output, voice_compatible
opt-in, and end-to-end dispatch through text_to_speech_tool. All 88
pre-existing TTS tests still pass.

Docs: new "Custom command providers" section in
website/docs/user-guide/features/tts.md with three worked examples
(Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys,
behavior notes, and security caveat.

E2E-verified live: isolated HERMES_HOME, command provider declared in
config.yaml, text_to_speech_tool dispatches through the registered
shell command and the output file is produced as expected.

Co-authored-by: Versun <me+github7604@versun.org>
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
… fallback (NousResearch#17833)

Extracted from PR NousResearch#17211 (@versun) so it can land independently of the
local_command TTS provider redesign.

- Add should_send_media_as_audio(platform, ext, is_voice) in
  gateway/platforms/base.py; single source of truth for audio routing.
- Add .flac to recognized audio extensions (MEDIA regex, weixin audio
  set, send_message audio set).
- Telegram send_voice() now falls back to send_document for formats
  Telegram's Bot API can't play natively (.wav, .flac, ...) instead of
  raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice.
- Route _send_telegram() in send_message_tool through a narrower
  _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set.
- cron.scheduler._send_media_via_adapter now delegates the audio
  decision to should_send_media_as_audio so it matches the gateway.
- Update the cron live-adapter ogg test to flag [[audio_as_voice]] so
  it still routes to sendVoice under the new Telegram-specific policy.
- Tests: unit coverage for should_send_media_as_audio across platforms,
  end-to-end MEDIA routing via _process_message_background and
  GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice
  fallback for FLAC/WAV.

Co-authored-by: Versun <me+github7604@versun.org>
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…me> (NousResearch#17843)

Reshape of PR NousResearch#17211 (@versun). Lets users wire any local or external
TTS CLI into Hermes without adding engine-specific Python code. Users
declare any number of named providers in config.yaml and switch between
them with tts.provider: <name>, alongside the built-ins (edge, openai,
elevenlabs, …).

Config shape:

  tts:
    provider: piper-en
    providers:
      piper-en:
        type: command
        command: 'piper -m ~/model.onnx -f {output_path} < {input_path}'
        output_format: wav

Placeholders: {input_path}, {text_path}, {output_path}, {format},
{voice}, {model}, {speed}. Use {{ / }} for literal braces.

Key behavior:
- Built-in provider names always win — a tts.providers.openai entry
  cannot shadow the native OpenAI provider.
- type: command is the default when command: is set.
- Placeholder values are shell-quote-aware (bare / single / double
  context), so paths with spaces and shell metacharacters are safe.
- Default delivery is a regular audio attachment. voice_compatible: true
  opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion.
- Command failures (non-zero exit, timeout, empty output) surface to
  the agent with stderr/stdout included so you can debug from chat.
- Process-tree kill on timeout (Unix killpg, Windows taskkill /T).
- max_text_length defaults to 5000 for command providers; override
  under tts.providers.<name>.max_text_length.

Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover
provider resolution, shell-quote context, placeholder rendering with
injection payloads, timeout, non-zero exit, empty output, voice_compatible
opt-in, and end-to-end dispatch through text_to_speech_tool. All 88
pre-existing TTS tests still pass.

Docs: new "Custom command providers" section in
website/docs/user-guide/features/tts.md with three worked examples
(Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys,
behavior notes, and security caveat.

E2E-verified live: isolated HERMES_HOME, command provider declared in
config.yaml, text_to_speech_tool dispatches through the registered
shell command and the output file is produced as expected.

Co-authored-by: Versun <me+github7604@versun.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

P3 Low — cosmetic, nice to have tool/tts Text-to-speech and transcription type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants