Skip to content

feat(tts): add register_tts_provider() plugin hook (closes #30398)#30420

Closed
kshitijk4poor wants to merge 1 commit into
NousResearch:mainfrom
kshitijk4poor:feat/tts-plugin-hook-30398
Closed

feat(tts): add register_tts_provider() plugin hook (closes #30398)#30420
kshitijk4poor wants to merge 1 commit into
NousResearch:mainfrom
kshitijk4poor:feat/tts-plugin-hook-30398

Conversation

@kshitijk4poor

Copy link
Copy Markdown
Collaborator

What this PR does

Adds a TTSProvider(ABC) + register_tts_provider() extension point to the plugin context API, alongside the existing config-driven tts.providers.<name>: type: command registry from #17843.

This is additive, not a replacement — the command-provider surface remains the primary way to add a TTS backend. The new hook covers cases the shell-template grammar can't reasonably express:

  • Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.)
  • Streaming synthesis (chunked Opus → voice-bubble delivery)
  • Voice metadata API for the hermes tools picker
  • OAuth-refreshing auth flows

None of the 10 inline built-in providers (edge, openai, elevenlabs, minimax, gemini, mistral, xai, piper, kittentts, neutts) are migrated to plugins. They stay inline. The hook is for new engines.

Closes #30398.

Resolution order

The dispatcher's resolution order is the load-bearing invariant:

  1. tts.provider is a built-in name → built-in dispatch. Always wins.
  2. tts.provider matches tts.providers.<name> with command: set → command-provider dispatch (PR feat(tts): add command-type provider registry under tts.providers.<name> #17843).
  3. tts.provider matches a plugin-registered TTSProvider → plugin dispatch (new).
  4. No match → falls through to Edge TTS default (legacy behavior preserved).

Built-ins-always-win is enforced at THREE layers (registry rejection at registration time + dispatcher short-circuit at dispatch time + picker filter for hermes tools). Command-providers-win-over-plugins is enforced at TWO layers (the caller in text_to_speech_tool + a defensive re-check in _dispatch_to_plugin_provider).

Files

New:

  • agent/tts_provider.pyTTSProvider(ABC) mirroring image_gen_provider.py shape. synthesize() required; list_voices(), list_models(), get_setup_schema(), stream(), voice_compatible all optional with sane defaults.
  • agent/tts_registry.py — registry mirroring image_gen_registry.py shape, with _BUILTIN_NAMES reject-shadowing invariant.
  • tests/plugins/tts/ — empty directory ready for community plugins.

Modified:

  • hermes_cli/plugins.py (+38 LoC) — register_tts_provider() method on PluginContext. Matches the gating shape of register_image_gen_provider() / register_browser_provider().
  • tools/tts_tool.py (+148 LoC) — _dispatch_to_plugin_provider() + _plugin_provider_is_voice_compatible() + walrus-elif wiring into the main dispatcher. Built-in elif chain is untouched.
  • hermes_cli/tools_config.py (+62 LoC) — _plugin_tts_providers() injects plugin rows into the Text-to-Speech picker alongside the 10 hardcoded built-in rows.
  • website/docs/user-guide/features/tts.md (+79 LoC) — new "Python plugin providers" section with decision table, minimal plugin example, optional-hook reference.
  • website/docs/user-guide/features/plugins.md (+1 LoC) — TTS row updated to mention both surfaces.

Tests

File Tests Coverage
tests/agent/test_tts_registry.py 47 Registration happy path, type rejection, empty-name rejection, built-in shadow rejection (10 names), case-insensitive lookup, ABC contract, helpers, TestBuiltinSync regression test that fails if _BUILTIN_NAMES drifts from BUILTIN_TTS_PROVIDERS
tests/tools/test_tts_plugin_dispatch.py 35 Built-in always wins (10 parametrized), command wins over plugin, plugin dispatch happy path, kwargs forwarding, exception passthrough, voice_compatible helper
tests/hermes_cli/test_tts_picker.py 10 Picker integration, built-in shadow defense, schema-raising defense, _visible_providers injection
tests/hermes_cli/test_plugins_tts_registration.py 3 End-to-end via PluginManager.discover_and_load()
tests/plugins/tts/check_parity_vs_main.py 9 scenarios Subprocess parity harness vs origin/main

Test plan:

bash scripts/run_tests.sh \
  tests/tools/test_tts_command_providers.py \
  tests/tools/test_tts_dotenv_fallback.py \
  tests/tools/test_tts_gemini.py \
  tests/tools/test_tts_kittentts.py \
  tests/tools/test_tts_max_text_length.py \
  tests/tools/test_tts_mistral.py \
  tests/tools/test_tts_opus_routing.py \
  tests/tools/test_tts_piper.py \
  tests/tools/test_tts_plugin_dispatch.py \
  tests/tools/test_tts_speed.py \
  tests/tools/test_tts_xai_speech_tags.py \
  tests/agent/test_tts_registry.py \
  tests/hermes_cli/test_tts_picker.py \
  tests/hermes_cli/test_plugins_tts_registration.py

# 265 passed, 0 failed

Plus the parity harness:

$ python tests/plugins/tts/check_parity_vs_main.py
  [OK]   unset-defaults-to-edge: builtin_edge
  [OK]   explicit-edge: builtin_edge
  [OK]   explicit-openai: builtin_openai
  [OK]   explicit-elevenlabs: builtin_elevenlabs
  [OK]   command-provider: command
  [OK]   unknown-no-plugin: fallback_edge
  [DIFF] plugin-installed: fallback_edge → plugin — expected
  [OK]   explicit-edge-with-plugin-registered: builtin_edge
  [OK]   mistral-quarantine: error
PARITY OK across 9 scenarios.

The single [DIFF] is the intentional behavior change of the PR: when a plugin is registered, tts.provider: <plugin-name> routes to the plugin instead of silently falling through to Edge default.

E2E smoke

A registered plugin's synthesize() is reached via the real text_to_speech_tool entry point with the standard JSON envelope:

$ python -c '<smoke script — see commit message>'
Tool result: {"success": true, "file_path": "/tmp/.../out.mp3",
              "media_tag": "MEDIA:/tmp/.../out.mp3",
              "provider": "testplug", "voice_compatible": false}
File contents: b'AUDIODATA'

Design rationale

A previous version of website/docs/user-guide/features/plugins.md claimed TTS "isn't plugin-extensible", which @teknium1 corrected in commit a401f81 noting TTS is extensible via command-providers and that register_tts_provider() would be a future "nice-to-have for SDK/streaming cases, not as the primary story."

This PR realizes that nice-to-have without displacing command-providers as the primary surface. The decision table in the updated docs explicitly tells users when to pick command-providers (single CLI → use command) vs plugin (Python SDK / streaming → use plugin).

The 10 built-in providers stay inline because each has SDK-specific quirks (async edge_tts, ElevenLabs WebSocket streaming, Gemini PCM-with-WAV-header, OAuth-refreshing xAI, model-aware MiniMax) that don't compose into a clean uniform ABC shape. Migrating them would be a separate, much larger discussion.

Out of scope

  • Porting any built-in to a plugin.
  • Deprecating tts.providers.<name>: type: command.
  • Streaming-Opus implementation for existing built-ins (the stream() ABC method is there for new providers; built-ins keep using synthesize()).
  • register_stt_provider() for transcription — separate concern, separate issue.
  • Voice-bubble delivery refactor.

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Code refactor / cleanup
  • Performance improvement
  • Test coverage improvement

Related

…ch#30398)

Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point
to the plugin context API, **alongside** the existing config-driven
`tts.providers.<name>: type: command` registry from PR NousResearch#17843. This is
additive — the command-provider surface stays as the primary way to
add a TTS backend.

The hook covers cases the shell-template grammar can't reasonably
express:

- Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.)
- Streaming synthesis (chunked Opus → voice-bubble delivery)
- Voice metadata API for the `hermes tools` picker
- OAuth-refreshing auth flows

None of the 10 inline built-in providers (`edge`, `openai`,
`elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`,
`kittentts`, `neutts`) are migrated to plugins. They stay inline. The
hook is for *new* engines that aren't built-in.

## Resolution order

The dispatcher's resolution order is the load-bearing invariant:

1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.**
2. `tts.provider` matches `tts.providers.<name>` with `command:` set
   → command-provider dispatch (PR NousResearch#17843).
3. `tts.provider` matches a plugin-registered `TTSProvider`
   → plugin dispatch (new).
4. No match → falls through to Edge TTS default (legacy behavior).

Built-ins-always-win is enforced at THREE layers:
- Registry: `register_provider()` rejects shadowing names with a warning.
- Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in
  names defensively before consulting the registry.
- Picker: `_plugin_tts_providers()` filters built-in shadows out of
  the `hermes tools` row list defensively.

Command-providers-win-over-plugins is enforced at TWO layers:
- The caller in `text_to_speech_tool` checks
  `_resolve_command_provider_config` first.
- `_dispatch_to_plugin_provider` re-checks for a same-name command
  config defensively so a refactor of the caller can't silently break
  the invariant.

## New files

- `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required),
  `list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`,
  `voice_compatible` (all optional with sane defaults). Mirrors
  `agent/image_gen_provider.py` shape.
- `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers`
  with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors
  `agent/image_gen_registry.py` shape.
- `plugins/tts/...` directory ready for community plugins (none shipped).

## Modified files

- `hermes_cli/plugins.py` — `register_tts_provider()` method on
  `PluginContext`. Matches the gating shape of
  `register_image_gen_provider()` / `register_browser_provider()`.
- `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` +
  `_plugin_provider_is_voice_compatible()` + walrus-elif wiring into
  the main dispatcher. Built-in elif chain untouched.
- `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects
  plugin rows into the Text-to-Speech picker category alongside the
  10 hardcoded built-in rows.

## Tests

- `tests/agent/test_tts_registry.py` — 47 tests covering registration,
  lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression
  test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from
  `tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to
  circular import constraints).
- `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering
  built-in-always-wins, command-wins-over-plugin, plugin dispatch,
  exception passthrough, voice_compatible helper.
- `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the
  picker surface, builtin shadowing defense, integration with
  `_visible_providers`.
- `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end
  tests via `PluginManager.discover_and_load()`.
- `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess
  parity harness vs `origin/main`. The only intentional diff is
  `fallback_edge → plugin` for the `plugin-installed` scenario.

## Verification

- 95/95 new tests pass.
- 170/170 pre-existing TTS tests (test_tts_command_providers,
  test_tts_max_text_length, test_tts_speed, etc.) pass unchanged.
- Parity harness against `origin/main`: 8 OK + 1 expected DIFF.
- E2E smoke: a registered plugin's `synthesize()` is called via
  `text_to_speech_tool` with the standard JSON envelope returned.
- Ruff clean on all touched files.

## Docs

- `website/docs/user-guide/features/tts.md` — new "Python plugin
  providers" section with a decision table (command-provider vs
  plugin), minimal plugin example, and the optional-hook reference.
- `website/docs/user-guide/features/plugins.md` — TTS row updated to
  mention both surfaces (command-provider primary, plugin for
  SDK/streaming).

Closes NousResearch#30398
@kshitijk4poor kshitijk4poor added type/feature New feature or request comp/plugins Plugin system and bundled plugins tool/tts Text-to-speech and transcription P3 Low — cosmetic, nice to have labels May 22, 2026
@teknium1

Copy link
Copy Markdown
Contributor

Merged via PR #31745 — your commit was cherry-picked onto current main with your authorship preserved (commit 00ec0b617 on main). The branch was 195 commits behind so a direct merge would have been messy, but the salvage path kept your work intact.

Thanks for filing #30398 yourself before implementing — having the design committed in writing made the review straightforward, and Teknium had already flagged register_tts_provider() as a future nice-to-have in commit a401f8172. The defense-in-depth (3-layer built-ins-always-win, 2-layer command-wins-over-plugins, regression test guarding name-list drift) and the subprocess parity harness were both noticed and appreciated.

Looking forward to #30493 (STT plugin hook) next.

@teknium1 teknium1 closed this May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/plugins Plugin system and bundled plugins P3 Low — cosmetic, nice to have tool/tts Text-to-speech and transcription type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add register_tts_provider() plugin hook for Python-SDK and streaming TTS engines

2 participants