feat(tts): add register_tts_provider() plugin hook (closes #30398)#30420
Closed
kshitijk4poor wants to merge 1 commit into
Closed
feat(tts): add register_tts_provider() plugin hook (closes #30398)#30420kshitijk4poor wants to merge 1 commit into
kshitijk4poor wants to merge 1 commit into
Conversation
…ch#30398) Adds a `TTSProvider(ABC)` + `register_tts_provider()` extension point to the plugin context API, **alongside** the existing config-driven `tts.providers.<name>: type: command` registry from PR NousResearch#17843. This is additive — the command-provider surface stays as the primary way to add a TTS backend. The hook covers cases the shell-template grammar can't reasonably express: - Native Python SDKs without a CLI (Cartesia, Fish Audio, etc.) - Streaming synthesis (chunked Opus → voice-bubble delivery) - Voice metadata API for the `hermes tools` picker - OAuth-refreshing auth flows None of the 10 inline built-in providers (`edge`, `openai`, `elevenlabs`, `minimax`, `gemini`, `mistral`, `xai`, `piper`, `kittentts`, `neutts`) are migrated to plugins. They stay inline. The hook is for *new* engines that aren't built-in. ## Resolution order The dispatcher's resolution order is the load-bearing invariant: 1. `tts.provider` is a built-in name → built-in dispatch. **Always wins.** 2. `tts.provider` matches `tts.providers.<name>` with `command:` set → command-provider dispatch (PR NousResearch#17843). 3. `tts.provider` matches a plugin-registered `TTSProvider` → plugin dispatch (new). 4. No match → falls through to Edge TTS default (legacy behavior). Built-ins-always-win is enforced at THREE layers: - Registry: `register_provider()` rejects shadowing names with a warning. - Dispatcher: `_dispatch_to_plugin_provider()` short-circuits built-in names defensively before consulting the registry. - Picker: `_plugin_tts_providers()` filters built-in shadows out of the `hermes tools` row list defensively. Command-providers-win-over-plugins is enforced at TWO layers: - The caller in `text_to_speech_tool` checks `_resolve_command_provider_config` first. - `_dispatch_to_plugin_provider` re-checks for a same-name command config defensively so a refactor of the caller can't silently break the invariant. ## New files - `agent/tts_provider.py` — `TTSProvider(ABC)` with `synthesize()` (required), `list_voices()`, `list_models()`, `get_setup_schema()`, `stream()`, `voice_compatible` (all optional with sane defaults). Mirrors `agent/image_gen_provider.py` shape. - `agent/tts_registry.py` — `register_provider`/`get_provider`/`list_providers` with `_BUILTIN_NAMES` reject-shadowing invariant. Mirrors `agent/image_gen_registry.py` shape. - `plugins/tts/...` directory ready for community plugins (none shipped). ## Modified files - `hermes_cli/plugins.py` — `register_tts_provider()` method on `PluginContext`. Matches the gating shape of `register_image_gen_provider()` / `register_browser_provider()`. - `tools/tts_tool.py` — `_dispatch_to_plugin_provider()` + `_plugin_provider_is_voice_compatible()` + walrus-elif wiring into the main dispatcher. Built-in elif chain untouched. - `hermes_cli/tools_config.py` — `_plugin_tts_providers()` injects plugin rows into the Text-to-Speech picker category alongside the 10 hardcoded built-in rows. ## Tests - `tests/agent/test_tts_registry.py` — 47 tests covering registration, lookup, ABC contract, helpers, AND a `TestBuiltinSync` regression test that fails if `agent.tts_registry._BUILTIN_NAMES` drifts from `tools.tts_tool.BUILTIN_TTS_PROVIDERS` (kept duplicated due to circular import constraints). - `tests/tools/test_tts_plugin_dispatch.py` — 35 tests covering built-in-always-wins, command-wins-over-plugin, plugin dispatch, exception passthrough, voice_compatible helper. - `tests/hermes_cli/test_tts_picker.py` — 10 tests covering the picker surface, builtin shadowing defense, integration with `_visible_providers`. - `tests/hermes_cli/test_plugins_tts_registration.py` — 3 end-to-end tests via `PluginManager.discover_and_load()`. - `tests/plugins/tts/check_parity_vs_main.py` — 9-scenario subprocess parity harness vs `origin/main`. The only intentional diff is `fallback_edge → plugin` for the `plugin-installed` scenario. ## Verification - 95/95 new tests pass. - 170/170 pre-existing TTS tests (test_tts_command_providers, test_tts_max_text_length, test_tts_speed, etc.) pass unchanged. - Parity harness against `origin/main`: 8 OK + 1 expected DIFF. - E2E smoke: a registered plugin's `synthesize()` is called via `text_to_speech_tool` with the standard JSON envelope returned. - Ruff clean on all touched files. ## Docs - `website/docs/user-guide/features/tts.md` — new "Python plugin providers" section with a decision table (command-provider vs plugin), minimal plugin example, and the optional-hook reference. - `website/docs/user-guide/features/plugins.md` — TTS row updated to mention both surfaces (command-provider primary, plugin for SDK/streaming). Closes NousResearch#30398
7 tasks
Contributor
|
Merged via PR #31745 — your commit was cherry-picked onto current main with your authorship preserved (commit Thanks for filing #30398 yourself before implementing — having the design committed in writing made the review straightforward, and Teknium had already flagged Looking forward to #30493 (STT plugin hook) next. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does
Adds a
TTSProvider(ABC)+register_tts_provider()extension point to the plugin context API, alongside the existing config-driventts.providers.<name>: type: commandregistry from #17843.This is additive, not a replacement — the command-provider surface remains the primary way to add a TTS backend. The new hook covers cases the shell-template grammar can't reasonably express:
hermes toolspickerNone of the 10 inline built-in providers (
edge,openai,elevenlabs,minimax,gemini,mistral,xai,piper,kittentts,neutts) are migrated to plugins. They stay inline. The hook is for new engines.Closes #30398.
Resolution order
The dispatcher's resolution order is the load-bearing invariant:
tts.provideris a built-in name → built-in dispatch. Always wins.tts.providermatchestts.providers.<name>withcommand:set → command-provider dispatch (PR feat(tts): add command-type provider registry under tts.providers.<name> #17843).tts.providermatches a plugin-registeredTTSProvider→ plugin dispatch (new).Built-ins-always-win is enforced at THREE layers (registry rejection at registration time + dispatcher short-circuit at dispatch time + picker filter for
hermes tools). Command-providers-win-over-plugins is enforced at TWO layers (the caller intext_to_speech_tool+ a defensive re-check in_dispatch_to_plugin_provider).Files
New:
agent/tts_provider.py—TTSProvider(ABC)mirroringimage_gen_provider.pyshape.synthesize()required;list_voices(),list_models(),get_setup_schema(),stream(),voice_compatibleall optional with sane defaults.agent/tts_registry.py— registry mirroringimage_gen_registry.pyshape, with_BUILTIN_NAMESreject-shadowing invariant.tests/plugins/tts/— empty directory ready for community plugins.Modified:
hermes_cli/plugins.py(+38 LoC) —register_tts_provider()method onPluginContext. Matches the gating shape ofregister_image_gen_provider()/register_browser_provider().tools/tts_tool.py(+148 LoC) —_dispatch_to_plugin_provider()+_plugin_provider_is_voice_compatible()+ walrus-elif wiring into the main dispatcher. Built-in elif chain is untouched.hermes_cli/tools_config.py(+62 LoC) —_plugin_tts_providers()injects plugin rows into the Text-to-Speech picker alongside the 10 hardcoded built-in rows.website/docs/user-guide/features/tts.md(+79 LoC) — new "Python plugin providers" section with decision table, minimal plugin example, optional-hook reference.website/docs/user-guide/features/plugins.md(+1 LoC) — TTS row updated to mention both surfaces.Tests
tests/agent/test_tts_registry.pyTestBuiltinSyncregression test that fails if_BUILTIN_NAMESdrifts fromBUILTIN_TTS_PROVIDERStests/tools/test_tts_plugin_dispatch.pytests/hermes_cli/test_tts_picker.py_visible_providersinjectiontests/hermes_cli/test_plugins_tts_registration.pyPluginManager.discover_and_load()tests/plugins/tts/check_parity_vs_main.pyorigin/mainTest plan:
bash scripts/run_tests.sh \ tests/tools/test_tts_command_providers.py \ tests/tools/test_tts_dotenv_fallback.py \ tests/tools/test_tts_gemini.py \ tests/tools/test_tts_kittentts.py \ tests/tools/test_tts_max_text_length.py \ tests/tools/test_tts_mistral.py \ tests/tools/test_tts_opus_routing.py \ tests/tools/test_tts_piper.py \ tests/tools/test_tts_plugin_dispatch.py \ tests/tools/test_tts_speed.py \ tests/tools/test_tts_xai_speech_tags.py \ tests/agent/test_tts_registry.py \ tests/hermes_cli/test_tts_picker.py \ tests/hermes_cli/test_plugins_tts_registration.py # 265 passed, 0 failedPlus the parity harness:
The single
[DIFF]is the intentional behavior change of the PR: when a plugin is registered,tts.provider: <plugin-name>routes to the plugin instead of silently falling through to Edge default.E2E smoke
A registered plugin's
synthesize()is reached via the realtext_to_speech_toolentry point with the standard JSON envelope:Design rationale
A previous version of
website/docs/user-guide/features/plugins.mdclaimed TTS "isn't plugin-extensible", which @teknium1 corrected in commit a401f81 noting TTS is extensible via command-providers and thatregister_tts_provider()would be a future "nice-to-have for SDK/streaming cases, not as the primary story."This PR realizes that nice-to-have without displacing command-providers as the primary surface. The decision table in the updated docs explicitly tells users when to pick command-providers (single CLI → use command) vs plugin (Python SDK / streaming → use plugin).
The 10 built-in providers stay inline because each has SDK-specific quirks (async edge_tts, ElevenLabs WebSocket streaming, Gemini PCM-with-WAV-header, OAuth-refreshing xAI, model-aware MiniMax) that don't compose into a clean uniform ABC shape. Migrating them would be a separate, much larger discussion.
Out of scope
tts.providers.<name>: type: command.stream()ABC method is there for new providers; built-ins keep usingsynthesize()).register_stt_provider()for transcription — separate concern, separate issue.Type of Change
Related
register_tts_provider()as a future nice-to-have