feat(tts): add register_tts_provider() plugin hook (closes #30398) by kshitijk4poor · Pull Request #30420 · NousResearch/hermes-agent

kshitijk4poor · 2026-05-22T12:29:20Z

What this PR does

Adds a TTSProvider(ABC) + register_tts_provider() extension point to the plugin context API, alongside the existing config-driven tts.providers.<name>: type: command registry from #17843.

This is additive, not a replacement — the command-provider surface remains the primary way to add a TTS backend. The new hook covers cases the shell-template grammar can't reasonably express:

Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.)
Streaming synthesis (chunked Opus → voice-bubble delivery)
Voice metadata API for the hermes tools picker
OAuth-refreshing auth flows

None of the 10 inline built-in providers (edge, openai, elevenlabs, minimax, gemini, mistral, xai, piper, kittentts, neutts) are migrated to plugins. They stay inline. The hook is for new engines.

Closes #30398.

Resolution order

The dispatcher's resolution order is the load-bearing invariant:

tts.provider is a built-in name → built-in dispatch. Always wins.
tts.provider matches tts.providers.<name> with command: set → command-provider dispatch (PR feat(tts): add command-type provider registry under tts.providers.<name> #17843).
tts.provider matches a plugin-registered TTSProvider → plugin dispatch (new).
No match → falls through to Edge TTS default (legacy behavior preserved).

Built-ins-always-win is enforced at THREE layers (registry rejection at registration time + dispatcher short-circuit at dispatch time + picker filter for hermes tools). Command-providers-win-over-plugins is enforced at TWO layers (the caller in text_to_speech_tool + a defensive re-check in _dispatch_to_plugin_provider).

Files

New:

agent/tts_provider.py — TTSProvider(ABC) mirroring image_gen_provider.py shape. synthesize() required; list_voices(), list_models(), get_setup_schema(), stream(), voice_compatible all optional with sane defaults.
agent/tts_registry.py — registry mirroring image_gen_registry.py shape, with _BUILTIN_NAMES reject-shadowing invariant.
tests/plugins/tts/ — empty directory ready for community plugins.

Modified:

hermes_cli/plugins.py (+38 LoC) — register_tts_provider() method on PluginContext. Matches the gating shape of register_image_gen_provider() / register_browser_provider().
tools/tts_tool.py (+148 LoC) — _dispatch_to_plugin_provider() + _plugin_provider_is_voice_compatible() + walrus-elif wiring into the main dispatcher. Built-in elif chain is untouched.
hermes_cli/tools_config.py (+62 LoC) — _plugin_tts_providers() injects plugin rows into the Text-to-Speech picker alongside the 10 hardcoded built-in rows.
website/docs/user-guide/features/tts.md (+79 LoC) — new "Python plugin providers" section with decision table, minimal plugin example, optional-hook reference.
website/docs/user-guide/features/plugins.md (+1 LoC) — TTS row updated to mention both surfaces.

Tests

File	Tests	Coverage
`tests/agent/test_tts_registry.py`	47	Registration happy path, type rejection, empty-name rejection, built-in shadow rejection (10 names), case-insensitive lookup, ABC contract, helpers, `TestBuiltinSync` regression test that fails if `_BUILTIN_NAMES` drifts from `BUILTIN_TTS_PROVIDERS`
`tests/tools/test_tts_plugin_dispatch.py`	35	Built-in always wins (10 parametrized), command wins over plugin, plugin dispatch happy path, kwargs forwarding, exception passthrough, voice_compatible helper
`tests/hermes_cli/test_tts_picker.py`	10	Picker integration, built-in shadow defense, schema-raising defense, `_visible_providers` injection
`tests/hermes_cli/test_plugins_tts_registration.py`	3	End-to-end via `PluginManager.discover_and_load()`
`tests/plugins/tts/check_parity_vs_main.py`	9 scenarios	Subprocess parity harness vs `origin/main`

Test plan:

bash scripts/run_tests.sh \
  tests/tools/test_tts_command_providers.py \
  tests/tools/test_tts_dotenv_fallback.py \
  tests/tools/test_tts_gemini.py \
  tests/tools/test_tts_kittentts.py \
  tests/tools/test_tts_max_text_length.py \
  tests/tools/test_tts_mistral.py \
  tests/tools/test_tts_opus_routing.py \
  tests/tools/test_tts_piper.py \
  tests/tools/test_tts_plugin_dispatch.py \
  tests/tools/test_tts_speed.py \
  tests/tools/test_tts_xai_speech_tags.py \
  tests/agent/test_tts_registry.py \
  tests/hermes_cli/test_tts_picker.py \
  tests/hermes_cli/test_plugins_tts_registration.py

# 265 passed, 0 failed

Plus the parity harness:

$ python tests/plugins/tts/check_parity_vs_main.py
  [OK]   unset-defaults-to-edge: builtin_edge
  [OK]   explicit-edge: builtin_edge
  [OK]   explicit-openai: builtin_openai
  [OK]   explicit-elevenlabs: builtin_elevenlabs
  [OK]   command-provider: command
  [OK]   unknown-no-plugin: fallback_edge
  [DIFF] plugin-installed: fallback_edge → plugin — expected
  [OK]   explicit-edge-with-plugin-registered: builtin_edge
  [OK]   mistral-quarantine: error
PARITY OK across 9 scenarios.

The single [DIFF] is the intentional behavior change of the PR: when a plugin is registered, tts.provider: <plugin-name> routes to the plugin instead of silently falling through to Edge default.

E2E smoke

A registered plugin's synthesize() is reached via the real text_to_speech_tool entry point with the standard JSON envelope:

$ python -c '<smoke script — see commit message>'
Tool result: {"success": true, "file_path": "/tmp/.../out.mp3",
              "media_tag": "MEDIA:/tmp/.../out.mp3",
              "provider": "testplug", "voice_compatible": false}
File contents: b'AUDIODATA'

Design rationale

A previous version of website/docs/user-guide/features/plugins.md claimed TTS "isn't plugin-extensible", which @teknium1 corrected in commit a401f81 noting TTS is extensible via command-providers and that register_tts_provider() would be a future "nice-to-have for SDK/streaming cases, not as the primary story."

This PR realizes that nice-to-have without displacing command-providers as the primary surface. The decision table in the updated docs explicitly tells users when to pick command-providers (single CLI → use command) vs plugin (Python SDK / streaming → use plugin).

The 10 built-in providers stay inline because each has SDK-specific quirks (async edge_tts, ElevenLabs WebSocket streaming, Gemini PCM-with-WAV-header, OAuth-refreshing xAI, model-aware MiniMax) that don't compose into a clean uniform ABC shape. Migrating them would be a separate, much larger discussion.

Out of scope

Porting any built-in to a plugin.
Deprecating tts.providers.<name>: type: command.
Streaming-Opus implementation for existing built-ins (the stream() ABC method is there for new providers; built-ins keep using synthesize()).
register_stt_provider() for transcription — separate concern, separate issue.
Voice-bubble delivery refactor.

Type of Change

Closes Add register_tts_provider() plugin hook for Python-SDK and streaming TTS engines #30398 (the design issue I filed earlier today)
PR feat(tts): add command-type provider registry under tts.providers.<name> #17843 — command-provider registry (the existing TTS extension surface this PR coexists with)
Commit a401f81 — Teknium's "TTS is a plugin via command-providers" doc correction, mentioning register_tts_provider() as a future nice-to-have
PR feat(image_gen): port FAL backend to plugins/image_gen/fal (salvage #27966) #30380 — recent FAL image-gen plugin migration (architectural template used for the parity harness pattern)
PR Mirror web-provider plugin migration for browser providers #25214 — browser plugin migration (closest analog for the plugin-context API addition)

…ch#30398) Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point to the plugin context API, **alongside** the existing config-driven `tts.providers.<name>: type: command` registry from PR NousResearch#17843. This is additive — the command-provider surface stays as the primary way to add a TTS backend. The hook covers cases the shell-template grammar can't reasonably express: - Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.) - Streaming synthesis (chunked Opus → voice-bubble delivery) - Voice metadata API for the `hermes tools` picker - OAuth-refreshing auth flows None of the 10 inline built-in providers (`edge`, `openai`, `elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`, `kittentts`, `neutts`) are migrated to plugins. They stay inline. The hook is for *new* engines that aren't built-in. ## Resolution order The dispatcher's resolution order is the load-bearing invariant: 1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.** 2. `tts.provider` matches `tts.providers.<name>` with `command:` set → command-provider dispatch (PR NousResearch#17843). 3. `tts.provider` matches a plugin-registered `TTSProvider` → plugin dispatch (new). 4. No match → falls through to Edge TTS default (legacy behavior). Built-ins-always-win is enforced at THREE layers: - Registry: `register_provider()` rejects shadowing names with a warning. - Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in names defensively before consulting the registry. - Picker: `_plugin_tts_providers()` filters built-in shadows out of the `hermes tools` row list defensively. Command-providers-win-over-plugins is enforced at TWO layers: - The caller in `text_to_speech_tool` checks `_resolve_command_provider_config` first. - `_dispatch_to_plugin_provider` re-checks for a same-name command config defensively so a refactor of the caller can't silently break the invariant. ## New files - `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required), `list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`, `voice_compatible` (all optional with sane defaults). Mirrors `agent/image_gen_provider.py` shape. - `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers` with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors `agent/image_gen_registry.py` shape. - `plugins/tts/...` directory ready for community plugins (none shipped). ## Modified files - `hermes_cli/plugins.py` — `register_tts_provider()` method on `PluginContext`. Matches the gating shape of `register_image_gen_provider()` / `register_browser_provider()`. - `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` + `_plugin_provider_is_voice_compatible()` + walrus-elif wiring into the main dispatcher. Built-in elif chain untouched. - `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects plugin rows into the Text-to-Speech picker category alongside the 10 hardcoded built-in rows. ## Tests - `tests/agent/test_tts_registry.py` — 47 tests covering registration, lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from `tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to circular import constraints). - `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering built-in-always-wins, command-wins-over-plugin, plugin dispatch, exception passthrough, voice_compatible helper. - `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the picker surface, builtin shadowing defense, integration with `_visible_providers`. - `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end tests via `PluginManager.discover_and_load()`. - `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess parity harness vs `origin/main`. The only intentional diff is `fallback_edge → plugin` for the `plugin-installed` scenario. ## Verification - 95/95 new tests pass. - 170/170 pre-existing TTS tests (test_tts_command_providers, test_tts_max_text_length, test_tts_speed, etc.) pass unchanged. - Parity harness against `origin/main`: 8 OK + 1 expected DIFF. - E2E smoke: a registered plugin's `synthesize()` is called via `text_to_speech_tool` with the standard JSON envelope returned. - Ruff clean on all touched files. ## Docs - `website/docs/user-guide/features/tts.md` — new "Python plugin providers" section with a decision table (command-provider vs plugin), minimal plugin example, and the optional-hook reference. - `website/docs/user-guide/features/plugins.md` — TTS row updated to mention both surfaces (command-provider primary, plugin for SDK/streaming). Closes NousResearch#30398

teknium1 · 2026-05-25T01:05:13Z

Merged via PR #31745 — your commit was cherry-picked onto current main with your authorship preserved (commit 00ec0b617 on main). The branch was 195 commits behind so a direct merge would have been messy, but the salvage path kept your work intact.

Thanks for filing #30398 yourself before implementing — having the design committed in writing made the review straightforward, and Teknium had already flagged register_tts_provider() as a future nice-to-have in commit a401f8172. The defense-in-depth (3-layer built-ins-always-win, 2-layer command-wins-over-plugins, regression test guarding name-list drift) and the subprocess parity harness were both noticed and appreciated.

Looking forward to #30493 (STT plugin hook) next.

kshitijk4poor added type/feature New feature or request comp/plugins Plugin system and bundled plugins tool/tts Text-to-speech and transcription P3 Low — cosmetic, nice to have labels May 22, 2026

kshitijk4poor mentioned this pull request May 22, 2026

feat(stt): add register_transcription_provider() plugin hook #30493

Closed

7 tasks

teknium1 mentioned this pull request May 25, 2026

feat(tts): add register_tts_provider() plugin hook (salvage of #30420) #31745

Merged

teknium1 closed this May 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(tts): add register_tts_provider() plugin hook (closes #30398)#30420

feat(tts): add register_tts_provider() plugin hook (closes #30398)#30420
kshitijk4poor wants to merge 1 commit into
NousResearch:mainfrom
kshitijk4poor:feat/tts-plugin-hook-30398

kshitijk4poor commented May 22, 2026

Uh oh!

teknium1 commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kshitijk4poor commented May 22, 2026

What this PR does

Resolution order

Files

Tests

E2E smoke

Design rationale

Out of scope

Type of Change

Related

Uh oh!

teknium1 commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants