feat(stt): add register_transcription_provider() hook + stt.providers command-provider registry (salvage of #30493)#31907
Merged
Merged
Conversation
e2241b1 to
4851648
Compare
Contributor
🔎 Lint report:
|
| Rule | Count |
|---|---|
invalid-argument-type |
5 |
unresolved-import |
3 |
not-subscriptable |
1 |
First entries
tests/agent/test_transcription_registry.py:134: [invalid-argument-type] invalid-argument-type: Argument to function `get_provider` is incorrect: Expected `str`, found `None`
tests/agent/test_transcription_registry.py:135: [invalid-argument-type] invalid-argument-type: Argument to function `get_provider` is incorrect: Expected `str`, found `Literal[123]`
tests/tools/test_transcription_command_providers.py:115: [invalid-argument-type] invalid-argument-type: Argument to function `_resolve_command_stt_provider_config` is incorrect: Expected `str`, found `None`
tests/tools/test_transcription_plugin_dispatch.py:462: [not-subscriptable] not-subscriptable: Cannot subscript object of type `None` with no `__getitem__` method
tests/tools/test_transcription_plugin_dispatch.py:176: [invalid-argument-type] invalid-argument-type: Argument to `_FakeProvider.__init__` is incorrect: Expected `dict[Unknown, Unknown] | None`, found `Literal["weird string"]`
tests/agent/test_transcription_registry.py:20: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/tools/test_transcription_command_providers.py:30: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/tools/test_transcription_plugin_dispatch.py:21: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/agent/test_transcription_registry.py:77: [invalid-argument-type] invalid-argument-type: Argument to function `register_provider` is incorrect: Expected `TranscriptionProvider`, found `Literal["not a provider"]`
✅ Fixed issues: none
Unchanged: 4872 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
Add an opt-in Python plugin surface for speech-to-text backends,
mirroring the TTS hook pattern. New backends (OpenRouter, SenseAudio,
Gemini-STT, custom proprietary engines) can be implemented as plugins
without modifying tools/transcription_tools.py.
Built-ins always win
--------------------
The 6 built-in STT providers (local/faster-whisper, local_command,
groq, openai, mistral, xai) keep their native handlers. Plugins
attempting to register under a built-in name are rejected at
registration time with a warning and re-checked defensively at
dispatch.
Resolution order
----------------
1. stt.provider matches a built-in → built-in dispatch (unchanged)
2. stt.provider matches a registered plugin →
a. if plugin.is_available() returns False → unavailability envelope
identifying the plugin (not the generic "No STT provider"
message — the user explicitly opted into this plugin)
b. otherwise plugin.transcribe() with model + language forwarded
from stt.<provider>.{model,language} config
3. No match → legacy "No STT provider available" error (unchanged)
Per-provider config namespace
-----------------------------
Plugins read their config from stt.<provider> in config.yaml, mirroring
how built-ins read stt.openai.model / stt.mistral.model. The dispatcher
forwards `model` and `language` from this section. Caller's explicit
`model=` argument overrides the config-set model.
Files
-----
- agent/transcription_provider.py: TranscriptionProvider ABC
- agent/transcription_registry.py: register/get/list providers,
built-in shadow guard, _reset_for_tests
- hermes_cli/plugins.py: register_transcription_provider() on
PluginContext
- tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset,
_dispatch_to_plugin_provider() with availability gate, wire-in
after xai branch and before "No STT provider" error
- tests/agent/test_transcription_registry.py: 27 tests
- tests/hermes_cli/test_plugins_transcription_registration.py: 3 tests
- tests/tools/test_transcription_plugin_dispatch.py: 28 tests
(covering built-in short-circuit, plugin dispatch, exception
envelope, non-dict guard, availability gate, language forwarding)
- tests/plugins/transcription/check_parity_vs_main.py: 10-scenario
subprocess-pinned parity harness vs origin/main
- website/docs/user-guide/features/{tts,plugins}.md: docs
Behavior parity
---------------
10 scenarios, 8 OK + 2 expected DIFFs:
no_provider_error → plugin (plugin-installed scenario)
no_provider_error → plugin_unavailable (plugin-installed-unavailable
scenario; PR returns cleaner envelope)
Zero behavior change for users not opting into a plugin.
Issue follow-up to #30398.
Mirror of the TTS command-provider registry (PR #17843) for STT. Lets any shell-driven ASR engine — Doubao ASR, NVIDIA Parakeet, whisper.cpp builds, SenseVoice, curl pipelines — become an STT backend with zero Python. Complements the legacy HERMES_LOCAL_STT_COMMAND escape hatch (preserved untouched via the built-in local_command path) and the register_transcription_provider() Python plugin hook also shipped in this PR. Resolution order (mirrors TTS exactly): 1. Built-in (local, local_command, groq, openai, mistral, xai) → native handler. Always wins. 2. stt.providers.<name>: type: command → command-provider runner. 3. Plugin-registered TranscriptionProvider → plugin dispatch. 4. No match → 'No STT provider available'. Files ----- - tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset retained; added _resolve_command_stt_provider_config, _transcribe_command_stt, and local helpers for template rendering, shell-quote context, and process-tree termination. Helpers are documented as mirrors of their tts_tool.py counterparts (kept local to avoid cross-tool private import). Wire-in is one insertion point in transcribe_audio() after the xai elif and before the plugin dispatcher. Plugin dispatcher additionally defensively short-circuits when a same-name command config exists (command-wins-over-plugin invariant). - tests/tools/test_transcription_command_providers.py: 50 new tests covering resolution (builtin precedence, type/command gating, case-insensitive lookup, legacy stt.<name> back-compat), helpers (timeout fallback, format validation, iter, has-any), template rendering (shell-quote contexts, doubled-brace preservation), end-to-end via _transcribe_command_stt (output_path read, stdout fallback, timeout, nonzero exit envelope, model override, language precedence), and dispatcher integration via the real transcribe_audio() including command-wins-over-plugin and builtin-shadow-rejection. - tests/plugins/transcription/check_parity_vs_main.py: extended from 10 to 13 scenarios. New cases: command-provider-installed, command-vs-plugin-same-name (verifies command wins precedence), explicit-openai-with-command-shadow (verifies built-in wins). Adds command_provider dispatch_kind detection via transcript prefix (CMD: vs PLUGIN:) so command-provider scenarios can be distinguished from plugin scenarios even when sharing a provider name. - website/docs/user-guide/features/tts.md: new 'STT custom command providers' section symmetric to the TTS section — example config, placeholder grammar table (input_path / output_path / output_dir / format / language / model), transcript-read-back semantics (file first, then stdout fallback), optional keys table, behavior notes, security note. Updated 'Python plugin providers (STT)' to include the new 'When to pick which (STT)' decision table and updated resolution-order section (now 4 layers instead of 3). Verification ------------ 189/189 STT targeted tests + 50/50 new command-provider tests pass. Combined sweep: tests/tools/ 5576/5576, tests/agent/ + tests/hermes_cli/ 8623/8623 — zero regressions across 14,199 tests. Parity harness: 13 scenarios, 9 OK + 4 expected diffs (no_provider_error → plugin, plugin_unavailable, command_provider × 2). E2E live-verified in an isolated HERMES_HOME with a real .wav file: command: → dispatched to stt.providers.my-fake-cli plugin: → dispatched to registered TranscriptionProvider command-wins-over-plugin: → command provider beats same-name plugin builtin-wins-over-command: → built-in OpenAI handler fires; stt.providers.openai: type: command does NOT hijack it.
4851648 to
e8fa061
Compare
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat(stt): add register_transcription_provider() plugin hook + stt.providers command-provider registry
Salvages PR #30493 (@kshitijk4poor) onto current main and adds the symmetric STT command-provider registry that #31745 just established for TTS. Both new STT extension surfaces ship together so the precedence rules are symmetric across the two tools from day one.
What this PR makes true that wasn't before
New STT engines can plug into Hermes without modifying
tools/transcription_tools.py. Two coexisting surfaces, picked per use case:stt.providers.<name>: type: command— any shell-driven ASR engine (Doubao ASR, NVIDIA Parakeet, whisper.cpp builds, SenseVoice CLI,curl … | jq -r .textpipelines). Zero Python.ctx.register_transcription_provider()— Python-SDK engines (OpenRouter, SenseAudio, Gemini-STT, Deepgram) that can't be expressed as a single shell command.The 6 built-in providers (
local,local_command,groq,openai,mistral,xai) keep their native handlers and always win on name collision.HERMES_LOCAL_STT_COMMANDis preserved untouched via the built-inlocal_commandpath.Resolution order (mirrors TTS exactly after #31745)
stt.provideris a built-in name → built-in dispatch. Always wins.stt.providermatchesstt.providers.<name>withcommand:set → command-provider runner.stt.providermatches a plugin-registeredTranscriptionProvider→ plugin dispatch.Same precedence as TTS: config more local than plugin install, built-ins always win.
Files
New
agent/transcription_provider.py—TranscriptionProvider(ABC)mirroringtts_provider.pyshape (from feat(stt): add register_transcription_provider() plugin hook #30493).agent/transcription_registry.py— registry mirroringtts_registry.pyshape,_BUILTIN_NAMESreject-shadowing invariant (from feat(stt): add register_transcription_provider() plugin hook #30493).Modified
hermes_cli/plugins.py—register_transcription_provider()onPluginContext. Docstring updated to spell out both invariants (built-ins-always-win + command-wins-over-plugin).tools/transcription_tools.py—BUILTIN_STT_PROVIDERSfrozenset +_dispatch_to_plugin_provider()(from feat(stt): add register_transcription_provider() plugin hook #30493) plus_resolve_command_stt_provider_config()/_transcribe_command_stt()/ local helpers for shell-quote-aware template rendering and process-tree termination on timeout (new in this PR). Wire-in is one insertion point intranscribe_audio()— command runs after built-in elif, before plugin dispatch. Plugin dispatcher additionally short-circuits when a same-name command config exists (defense in depth).website/docs/user-guide/features/tts.md— new "STT custom command providers" section + "When to pick which (STT)" decision table + updated resolution order (4 layers instead of 3).website/docs/user-guide/features/plugins.md— STT row updated to describe both surfaces with anchors to each docs section.Tests
tests/agent/test_transcription_registry.pyTestBuiltinSyncregression test if_BUILTIN_NAMESdrifts fromBUILTIN_STT_PROVIDERStests/hermes_cli/test_plugins_transcription_registration.pyPluginManager.discover_and_load()tests/tools/test_transcription_plugin_dispatch.pytranscribe_audio()tests/tools/test_transcription_command_providers.py(new)stt.<name>back-compat), helpers (timeout fallback, format validation, iter, has-any), template rendering (3 shell-quote contexts, doubled-brace preservation),_transcribe_command_sttend-to-end (file write, stdout fallback, timeout envelope, nonzero exit envelope, model override, language precedence chain),transcribe_audio()integration including command-wins-over-plugin and built-in-rejects-command-shadowtests/plugins/transcription/check_parity_vs_main.pyorigin/main. New scenarios:command-provider-installed,command-vs-plugin-same-name,explicit-openai-with-command-shadow14,199 tests pass across the affected surfaces — zero regressions.
E2E live verification
Real
config.yaml+ real.wav+ isolatedHERMES_HOME+ realtranscribe_audio()(no mocks):Why both surfaces in one PR
The TTS hook (#31745, merged today) shipped on top of an existing TTS command-provider registry (#17843, May 2026). For symmetry, the STT story needs both surfaces. 3 of the 4 in-flight community STT PRs (OpenRouter STT #25721/#24703, SenseAudio #9380) are plain HTTPS APIs that could be served by a shell command. Only Gemini-STT #21540 genuinely needs the Python hook. Shipping both at once means no second PR cycle and no contributor gets told "your work isn't useful, we picked the other path."
How this unblocks the 4 in-flight community PRs
After this lands, each becomes a small focused contribution:
stt.providers.openrouterconfig block (or a thin Python plugin) — author creditedstt.providers.sensaudioconfig block — author creditedplugins/transcription/gemini/__init__.py— author credited. TTS + WhatsApp PTT split into separate issuesOut of scope
TOOL_CATEGORIES["transcription"]) — STT isn't surfaced inhermes toolstoday. Deferred to a follow-up that lands alongside the first community plugin.gateway/run.pyandgateway/platforms/discord.pycalltranscribe_audio(), signature unchanged.HERMES_LOCAL_STT_COMMAND— preserved via the built-inlocal_commandpath.Related
register_tts_provider()hook (symmetric TTS surface, just merged)Infographic