feat(stt): add register_transcription_provider() plugin hook#30493
feat(stt): add register_transcription_provider() plugin hook#30493kshitijk4poor wants to merge 1 commit into
Conversation
keegoid-codex
left a comment
There was a problem hiding this comment.
Bugs
- [CAT-1]
languageargument never reaches plugins from public API attools/transcription_tools.py:1025. - [CAT-1]
is_available()ignored, routing explicit configs to unavailable plugins attools/transcription_tools.py:385.
Severity
- medium. Runtime plugin contract drops parameters and bypasses availability guard.
VERDICT: request_changes
codex-review posting override: forced to --comment because reviewer lacks verified write permission (viewerPermission=READ; was --request-changes). GitHub only counts approvals from WRITE, MAINTAIN, or ADMIN reviewers.
|
[DEV SecOps] verdict: PASS Categories:
Read-only checks used:
Residual risk:
|
|
[DEV SecOps] verdict: PASS
Read-only checks used: |
Add an opt-in Python plugin surface for speech-to-text backends,
mirroring the TTS hook pattern. New backends (OpenRouter, SenseAudio,
Gemini-STT, custom proprietary engines) can be implemented as plugins
without modifying tools/transcription_tools.py.
Built-ins always win
--------------------
The 6 built-in STT providers (local/faster-whisper, local_command,
groq, openai, mistral, xai) keep their native handlers. Plugins
attempting to register under a built-in name are rejected at
registration time with a warning and re-checked defensively at
dispatch.
Resolution order
----------------
1. stt.provider matches a built-in → built-in dispatch (unchanged)
2. stt.provider matches a registered plugin →
a. if plugin.is_available() returns False → unavailability envelope
identifying the plugin (not the generic "No STT provider"
message — the user explicitly opted into this plugin)
b. otherwise plugin.transcribe() with model + language forwarded
from stt.<provider>.{model,language} config
3. No match → legacy "No STT provider available" error (unchanged)
Per-provider config namespace
-----------------------------
Plugins read their config from stt.<provider> in config.yaml, mirroring
how built-ins read stt.openai.model / stt.mistral.model. The dispatcher
forwards `model` and `language` from this section. Caller's explicit
`model=` argument overrides the config-set model.
Files
-----
- agent/transcription_provider.py: TranscriptionProvider ABC
- agent/transcription_registry.py: register/get/list providers,
built-in shadow guard, _reset_for_tests
- hermes_cli/plugins.py: register_transcription_provider() on
PluginContext
- tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset,
_dispatch_to_plugin_provider() with availability gate, wire-in
after xai branch and before "No STT provider" error
- tests/agent/test_transcription_registry.py: 27 tests
- tests/hermes_cli/test_plugins_transcription_registration.py: 3 tests
- tests/tools/test_transcription_plugin_dispatch.py: 28 tests
(covering built-in short-circuit, plugin dispatch, exception
envelope, non-dict guard, availability gate, language forwarding)
- tests/plugins/transcription/check_parity_vs_main.py: 10-scenario
subprocess-pinned parity harness vs origin/main
- website/docs/user-guide/features/{tts,plugins}.md: docs
Behavior parity
---------------
10 scenarios, 8 OK + 2 expected DIFFs:
no_provider_error → plugin (plugin-installed scenario)
no_provider_error → plugin_unavailable (plugin-installed-unavailable
scenario; PR returns cleaner envelope)
Zero behavior change for users not opting into a plugin.
Issue follow-up to NousResearch#30398.
303728e to
a658b7e
Compare
|
Thanks for the review @keegoid-codex — both findings were valid. Addressed in force-pushed commit CAT-1 fix #1:
|
|
Your commit was cherry-picked onto current main with your authorship preserved in On top of your plugin hook, the salvage PR also adds the symmetric
This unblocks the 4 in-flight community STT PRs (#25721, #24703, #9380, #21540) — 3 of which (plain HTTPS) can now ship as config-only entries, with the Python hook reserved for the one PR (#21540, Gemini SDK) that genuinely needs it. |
What this PR does
Adds a
TranscriptionProvider(ABC)+register_transcription_provider()extension point to the plugin context API, mirroring theregister_tts_provider()hook from #30420. New STT backends (OpenRouter, SenseAudio, Gemini-STT, Deepgram, custom proprietary engines) can now be added asplugins/transcription/<vendor>/without modifyingtools/transcription_tools.py.This is additive — the 6 built-in STT backends (
local,local_command,groq,openai,mistral,xai) keep their native implementations and always win on name collision. The hook is for new engines.Why this is needed now
4 open community PRs are adding new STT backends inline, each touching
tools/transcription_tools.pydirectly:This is the "multiple PRs solving the same problem" pattern the hermes-agent-dev skill flags as the cue to extract an ABC. After this hook lands, each can be salvaged as a ~80 LoC plugin under
plugins/transcription/<vendor>/.Resolution order
stt.provideris a built-in name → built-in dispatch. Always wins.stt.providermatches a plugin-registeredTranscriptionProvider→ plugin dispatch (new).Built-ins-always-win is enforced at TWO layers (registry rejection at registration time + dispatcher short-circuit at dispatch time).
Files
New:
agent/transcription_provider.py—TranscriptionProvider(ABC)mirroringtts_provider.py/image_gen_provider.pyshape.transcribe()required; everything else optional with sane defaults.agent/transcription_registry.py— registry mirroringtts_registry.pyshape, with_BUILTIN_NAMESreject-shadowing invariant.Modified:
hermes_cli/plugins.py(+37 LoC) —register_transcription_provider()method onPluginContext. Matches the gating shape ofregister_image_gen_provider().tools/transcription_tools.py(+115 LoC) —BUILTIN_STT_PROVIDERSfrozenset +_dispatch_to_plugin_provider()+ single-line wiring intotranscribe_audio()(after the 6 built-in elif branches, before the legacy error). Built-in elif chain,_get_provider,_validate_audio_file, and all helpers are unchanged.website/docs/user-guide/features/tts.md(+80 LoC) — new "Python plugin providers (STT)" section with decision table, minimal plugin example, optional-hook reference.website/docs/user-guide/features/plugins.md(+1 LoC) — STT row updated.Tests
tests/agent/test_transcription_registry.pyTestBuiltinSyncregression test that fails if_BUILTIN_NAMESdrifts fromBUILTIN_STT_PROVIDERStests/tools/test_transcription_plugin_dispatch.pytranscribe_audio()tests/hermes_cli/test_plugins_transcription_registration.pyPluginManager.discover_and_load()tests/plugins/transcription/check_parity_vs_main.pyorigin/mainTest plan:
bash scripts/run_tests.sh \ tests/tools/test_transcription_tools.py \ tests/tools/test_transcription_dotenv_fallback.py \ tests/tools/test_transcription.py \ tests/tools/test_transcription_plugin_dispatch.py \ tests/agent/test_transcription_registry.py \ tests/hermes_cli/test_plugins_transcription_registration.py # 180 passed, 0 failed (131 pre-existing untouched + 49 new)Plus the parity harness:
The single
[DIFF]is the intentional behavior change: when a plugin is registered,stt.provider: <plugin-name>routes to the plugin instead of falling through to the "No STT provider available" error.Broader sweep:
tests/tools/5395/5395 pass. Zero regressions across 222 test files.E2E smoke
A registered plugin's
transcribe()is reached via the realtranscribe_audio()with the standard envelope:Out of scope
TOOL_CATEGORIES["transcription"]) — STT isn't surfaced inhermes toolstoday. Adding it would require deciding STT defaults, plumbing intohermes setup, etc. Deferred to a follow-up that lands alongside the first community plugin.gateway/run.pyandgateway/platforms/discord.pycalltranscribe_audio(), which retains its signature.HERMES_LOCAL_STT_COMMAND— preserved via the built-inlocal_commandpath.How this unblocks the 4 in-flight PRs
After this lands, each becomes a focused ~80 LoC plugin instead of a 300-line patch to a 963-line dispatcher:
plugins/transcription/openrouter/__init__.py— author creditedplugins/transcription/sensaudio/plugins/transcription/gemini/. TTS + WhatsApp PTT split into separate issuesType of Change
Related
register_tts_provider()hook (symmetric TTS surface)