feat(stt): add register_transcription_provider() hook + stt.providers command-provider registry (salvage of #30493) by teknium1 · Pull Request #31907 · NousResearch/hermes-agent

teknium1 · 2026-05-25T06:24:08Z

feat(stt): add register_transcription_provider() plugin hook + stt.providers command-provider registry

Salvages PR #30493 (@kshitijk4poor) onto current main and adds the symmetric STT command-provider registry that #31745 just established for TTS. Both new STT extension surfaces ship together so the precedence rules are symmetric across the two tools from day one.

What this PR makes true that wasn't before

New STT engines can plug into Hermes without modifying tools/transcription_tools.py. Two coexisting surfaces, picked per use case:

stt.providers.<name>: type: command — any shell-driven ASR engine (Doubao ASR, NVIDIA Parakeet, whisper.cpp builds, SenseVoice CLI, curl … | jq -r .text pipelines). Zero Python.
ctx.register_transcription_provider() — Python-SDK engines (OpenRouter, SenseAudio, Gemini-STT, Deepgram) that can't be expressed as a single shell command.

The 6 built-in providers (local, local_command, groq, openai, mistral, xai) keep their native handlers and always win on name collision. HERMES_LOCAL_STT_COMMAND is preserved untouched via the built-in local_command path.

Resolution order (mirrors TTS exactly after #31745)

stt.provider is a built-in name → built-in dispatch. Always wins.
stt.provider matches stt.providers.<name> with command: set → command-provider runner.
stt.provider matches a plugin-registered TranscriptionProvider → plugin dispatch.
No match → "No STT provider available" error.

Same precedence as TTS: config more local than plugin install, built-ins always win.

Files

New

agent/transcription_provider.py — TranscriptionProvider(ABC) mirroring tts_provider.py shape (from feat(stt): add register_transcription_provider() plugin hook #30493).
agent/transcription_registry.py — registry mirroring tts_registry.py shape, _BUILTIN_NAMES reject-shadowing invariant (from feat(stt): add register_transcription_provider() plugin hook #30493).

Modified

hermes_cli/plugins.py — register_transcription_provider() on PluginContext. Docstring updated to spell out both invariants (built-ins-always-win + command-wins-over-plugin).
tools/transcription_tools.py — BUILTIN_STT_PROVIDERS frozenset + _dispatch_to_plugin_provider() (from feat(stt): add register_transcription_provider() plugin hook #30493) plus _resolve_command_stt_provider_config() / _transcribe_command_stt() / local helpers for shell-quote-aware template rendering and process-tree termination on timeout (new in this PR). Wire-in is one insertion point in transcribe_audio() — command runs after built-in elif, before plugin dispatch. Plugin dispatcher additionally short-circuits when a same-name command config exists (defense in depth).
website/docs/user-guide/features/tts.md — new "STT custom command providers" section + "When to pick which (STT)" decision table + updated resolution order (4 layers instead of 3).
website/docs/user-guide/features/plugins.md — STT row updated to describe both surfaces with anchors to each docs section.

Tests

File	Tests	Coverage
`tests/agent/test_transcription_registry.py`	27	Registry: register/get/list, type/empty-name rejection, built-in shadow rejection (6 names), case-insensitive lookup, ABC contract, `TestBuiltinSync` regression test if `_BUILTIN_NAMES` drifts from `BUILTIN_STT_PROVIDERS`
`tests/hermes_cli/test_plugins_transcription_registration.py`	3	End-to-end via `PluginManager.discover_and_load()`
`tests/tools/test_transcription_plugin_dispatch.py`	28	Built-in-always-wins (6 parametrized), unknown-name-no-plugin returns None, plugin dispatch, model/language kwargs, exception/non-dict envelopes, availability gate, provider field stamping, end-to-end via `transcribe_audio()`
`tests/tools/test_transcription_command_providers.py` (new)	50	Resolution (built-in precedence, type/command gating, case-insensitive lookup, legacy `stt.<name>` back-compat), helpers (timeout fallback, format validation, iter, has-any), template rendering (3 shell-quote contexts, doubled-brace preservation), `_transcribe_command_stt` end-to-end (file write, stdout fallback, timeout envelope, nonzero exit envelope, model override, language precedence chain), `transcribe_audio()` integration including command-wins-over-plugin and built-in-rejects-command-shadow
`tests/plugins/transcription/check_parity_vs_main.py`	13 scenarios	Subprocess-pinned parity harness vs `origin/main`. New scenarios: `command-provider-installed`, `command-vs-plugin-same-name`, `explicit-openai-with-command-shadow`

$ bash scripts/run_tests.sh tests/tools/test_transcription* \
                            tests/agent/test_transcription_registry.py \
                            tests/hermes_cli/test_plugins_transcription_registration.py \
                            tests/tools/test_tts_command_providers.py \
                            tests/tools/test_tts_plugin_dispatch.py
=== Summary: 11 files, 376 tests passed, 0 failed ===

$ python tests/plugins/transcription/check_parity_vs_main.py
  [OK]   stt-disabled                          [OK]   explicit-mistral-quarantine
  [OK]   explicit-groq                         [OK]   unknown-no-plugin
  [OK]   explicit-openai                       [DIFF] plugin-installed: → plugin (expected)
  [OK]   explicit-local                        [DIFF] plugin-installed-unavailable: → plugin_unavailable (expected)
  [OK]   explicit-xai                          [OK]   explicit-openai-with-plugin-registered
  [DIFF] command-provider-installed: → command_provider (expected)
  [DIFF] command-vs-plugin-same-name: → command_provider (expected — command wins over same-name plugin)
  [OK]   explicit-openai-with-command-shadow (built-in still wins)
INTENTIONAL DIFFS (4): expected new behavior from this PR.
PARITY OK across 13 scenarios.

$ bash scripts/run_tests.sh tests/tools/
=== Summary: 227 files, 5576 tests passed, 0 failed ===

$ bash scripts/run_tests.sh tests/agent/ tests/hermes_cli/
=== Summary: 367 files, 8623 tests passed, 0 failed ===

14,199 tests pass across the affected surfaces — zero regressions.

E2E live verification

Real config.yaml + real .wav + isolated HERMES_HOME + real transcribe_audio() (no mocks):

command:                    → dispatched to stt.providers.my-fake-cli (transcript: E2E_CMD_TRANSCRIPT)
plugin:                     → dispatched to registered TranscriptionProvider (transcript: E2E_PLUGIN_TRANSCRIPT)
command-wins-over-plugin:   → command provider beat the same-name plugin
builtin-wins-over-command:  → built-in OpenAI handler fires; stt.providers.openai: type: command
                              does NOT hijack it

Why both surfaces in one PR

The TTS hook (#31745, merged today) shipped on top of an existing TTS command-provider registry (#17843, May 2026). For symmetry, the STT story needs both surfaces. 3 of the 4 in-flight community STT PRs (OpenRouter STT #25721/#24703, SenseAudio #9380) are plain HTTPS APIs that could be served by a shell command. Only Gemini-STT #21540 genuinely needs the Python hook. Shipping both at once means no second PR cycle and no contributor gets told "your work isn't useful, we picked the other path."

How this unblocks the 4 in-flight community PRs

After this lands, each becomes a small focused contribution:

PR	Becomes
#25721 (RemyFevry, OpenRouter STT)	`stt.providers.openrouter` config block (or a thin Python plugin) — author credited
#24703 (xxxigm, duplicate OpenRouter STT)	Closed with credit, pointing to #25721's resolution
#9380 (Fl0rencess720, SenseAudio STT)	`stt.providers.sensaudio` config block — author credited
#21540 (mrlufepines, Gemini STT)	`plugins/transcription/gemini/__init__.py` — author credited. TTS + WhatsApp PTT split into separate issues

Out of scope

No picker integration (TOOL_CATEGORIES["transcription"]) — STT isn't surfaced in hermes tools today. Deferred to a follow-up that lands alongside the first community plugin.
No changes to the 6 built-in STT backends — they stay inline.
No changes to gateway voice-message auto-transcription — gateway/run.py and gateway/platforms/discord.py call transcribe_audio(), signature unchanged.
No deprecation of HERMES_LOCAL_STT_COMMAND — preserved via the built-in local_command path.

PR feat(tts): add register_tts_provider() plugin hook (salvage of #30420) #31745 — register_tts_provider() hook (symmetric TTS surface, just merged)
PR feat(stt): add register_transcription_provider() plugin hook #30493 — original kshitijk4poor STT plugin hook PR (salvaged into this PR)
PR feat(tts): add command-type provider registry under tts.providers.<name> #17843 — TTS command-provider registry (architectural template for the new STT registry)
Issue Add register_tts_provider() plugin hook for Python-SDK and streaming TTS engines #30398 — TTS plugin hook design issue
Community PRs unblocked: feat(stt): add OpenRouter speech-to-text provider #25721, Feat/openrouter stt provider #24703, feat(stt): add SenseAudio STT provider #9380, feat(voice): add Gemini STT and WhatsApp PTT support #21540

Infographic

github-actions · 2026-05-25T08:23:47Z

🔎 Lint report: `hermes/hermes-6b8f4c13` vs `origin/main`

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9153 on HEAD, 9136 on base (🆕 +17)

🆕 New issues (9):

Rule	Count
`invalid-argument-type`	5
`unresolved-import`	3
`not-subscriptable`	1

First entries

tests/agent/test_transcription_registry.py:134: [invalid-argument-type] invalid-argument-type: Argument to function `get_provider` is incorrect: Expected `str`, found `None`
tests/agent/test_transcription_registry.py:135: [invalid-argument-type] invalid-argument-type: Argument to function `get_provider` is incorrect: Expected `str`, found `Literal[123]`
tests/tools/test_transcription_command_providers.py:115: [invalid-argument-type] invalid-argument-type: Argument to function `_resolve_command_stt_provider_config` is incorrect: Expected `str`, found `None`
tests/tools/test_transcription_plugin_dispatch.py:462: [not-subscriptable] not-subscriptable: Cannot subscript object of type `None` with no `__getitem__` method
tests/tools/test_transcription_plugin_dispatch.py:176: [invalid-argument-type] invalid-argument-type: Argument to `_FakeProvider.__init__` is incorrect: Expected `dict[Unknown, Unknown] | None`, found `Literal["weird string"]`
tests/agent/test_transcription_registry.py:20: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/tools/test_transcription_command_providers.py:30: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/tools/test_transcription_plugin_dispatch.py:21: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/agent/test_transcription_registry.py:77: [invalid-argument-type] invalid-argument-type: Argument to function `register_provider` is incorrect: Expected `TranscriptionProvider`, found `Literal["not a provider"]`

✅ Fixed issues: none

Unchanged: 4872 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

Add an opt-in Python plugin surface for speech-to-text backends, mirroring the TTS hook pattern. New backends (OpenRouter, SenseAudio, Gemini-STT, custom proprietary engines) can be implemented as plugins without modifying tools/transcription_tools.py. Built-ins always win -------------------- The 6 built-in STT providers (local/faster-whisper, local_command, groq, openai, mistral, xai) keep their native handlers. Plugins attempting to register under a built-in name are rejected at registration time with a warning and re-checked defensively at dispatch. Resolution order ---------------- 1. stt.provider matches a built-in → built-in dispatch (unchanged) 2. stt.provider matches a registered plugin → a. if plugin.is_available() returns False → unavailability envelope identifying the plugin (not the generic "No STT provider" message — the user explicitly opted into this plugin) b. otherwise plugin.transcribe() with model + language forwarded from stt.<provider>.{model,language} config 3. No match → legacy "No STT provider available" error (unchanged) Per-provider config namespace ----------------------------- Plugins read their config from stt.<provider> in config.yaml, mirroring how built-ins read stt.openai.model / stt.mistral.model. The dispatcher forwards `model` and `language` from this section. Caller's explicit `model=` argument overrides the config-set model. Files ----- - agent/transcription_provider.py: TranscriptionProvider ABC - agent/transcription_registry.py: register/get/list providers, built-in shadow guard, _reset_for_tests - hermes_cli/plugins.py: register_transcription_provider() on PluginContext - tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset, _dispatch_to_plugin_provider() with availability gate, wire-in after xai branch and before "No STT provider" error - tests/agent/test_transcription_registry.py: 27 tests - tests/hermes_cli/test_plugins_transcription_registration.py: 3 tests - tests/tools/test_transcription_plugin_dispatch.py: 28 tests (covering built-in short-circuit, plugin dispatch, exception envelope, non-dict guard, availability gate, language forwarding) - tests/plugins/transcription/check_parity_vs_main.py: 10-scenario subprocess-pinned parity harness vs origin/main - website/docs/user-guide/features/{tts,plugins}.md: docs Behavior parity --------------- 10 scenarios, 8 OK + 2 expected DIFFs: no_provider_error → plugin (plugin-installed scenario) no_provider_error → plugin_unavailable (plugin-installed-unavailable scenario; PR returns cleaner envelope) Zero behavior change for users not opting into a plugin. Issue follow-up to #30398.

Mirror of the TTS command-provider registry (PR #17843) for STT. Lets any shell-driven ASR engine — Doubao ASR, NVIDIA Parakeet, whisper.cpp builds, SenseVoice, curl pipelines — become an STT backend with zero Python. Complements the legacy HERMES_LOCAL_STT_COMMAND escape hatch (preserved untouched via the built-in local_command path) and the register_transcription_provider() Python plugin hook also shipped in this PR. Resolution order (mirrors TTS exactly): 1. Built-in (local, local_command, groq, openai, mistral, xai) → native handler. Always wins. 2. stt.providers.<name>: type: command → command-provider runner. 3. Plugin-registered TranscriptionProvider → plugin dispatch. 4. No match → 'No STT provider available'. Files ----- - tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset retained; added _resolve_command_stt_provider_config, _transcribe_command_stt, and local helpers for template rendering, shell-quote context, and process-tree termination. Helpers are documented as mirrors of their tts_tool.py counterparts (kept local to avoid cross-tool private import). Wire-in is one insertion point in transcribe_audio() after the xai elif and before the plugin dispatcher. Plugin dispatcher additionally defensively short-circuits when a same-name command config exists (command-wins-over-plugin invariant). - tests/tools/test_transcription_command_providers.py: 50 new tests covering resolution (builtin precedence, type/command gating, case-insensitive lookup, legacy stt.<name> back-compat), helpers (timeout fallback, format validation, iter, has-any), template rendering (shell-quote contexts, doubled-brace preservation), end-to-end via _transcribe_command_stt (output_path read, stdout fallback, timeout, nonzero exit envelope, model override, language precedence), and dispatcher integration via the real transcribe_audio() including command-wins-over-plugin and builtin-shadow-rejection. - tests/plugins/transcription/check_parity_vs_main.py: extended from 10 to 13 scenarios. New cases: command-provider-installed, command-vs-plugin-same-name (verifies command wins precedence), explicit-openai-with-command-shadow (verifies built-in wins). Adds command_provider dispatch_kind detection via transcript prefix (CMD: vs PLUGIN:) so command-provider scenarios can be distinguished from plugin scenarios even when sharing a provider name. - website/docs/user-guide/features/tts.md: new 'STT custom command providers' section symmetric to the TTS section — example config, placeholder grammar table (input_path / output_path / output_dir / format / language / model), transcript-read-back semantics (file first, then stdout fallback), optional keys table, behavior notes, security note. Updated 'Python plugin providers (STT)' to include the new 'When to pick which (STT)' decision table and updated resolution-order section (now 4 layers instead of 3). Verification ------------ 189/189 STT targeted tests + 50/50 new command-provider tests pass. Combined sweep: tests/tools/ 5576/5576, tests/agent/ + tests/hermes_cli/ 8623/8623 — zero regressions across 14,199 tests. Parity harness: 13 scenarios, 9 OK + 4 expected diffs (no_provider_error → plugin, plugin_unavailable, command_provider × 2). E2E live-verified in an isolated HERMES_HOME with a real .wav file: command: → dispatched to stt.providers.my-fake-cli plugin: → dispatched to registered TranscriptionProvider command-wins-over-plugin: → command provider beats same-name plugin builtin-wins-over-command: → built-in OpenAI handler fires; stt.providers.openai: type: command does NOT hijack it.

alt-glitch added type/feature New feature or request comp/plugins Plugin system and bundled plugins tool/tts Text-to-speech and transcription P3 Low — cosmetic, nice to have labels May 25, 2026

teknium1 force-pushed the hermes/hermes-6b8f4c13 branch from e2241b1 to 4851648 Compare May 25, 2026 08:22

kshitijk4poor and others added 2 commits May 25, 2026 01:33

teknium1 force-pushed the hermes/hermes-6b8f4c13 branch from 4851648 to e8fa061 Compare May 25, 2026 08:33

teknium1 merged commit d3ffbc6 into main May 25, 2026
27 checks passed

teknium1 deleted the hermes/hermes-6b8f4c13 branch May 25, 2026 08:41

teknium1 mentioned this pull request May 25, 2026

feat(stt): add register_transcription_provider() plugin hook #30493

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(stt): add register_transcription_provider() hook + stt.providers command-provider registry (salvage of #30493)#31907

feat(stt): add register_transcription_provider() hook + stt.providers command-provider registry (salvage of #30493)#31907
teknium1 merged 2 commits into
mainfrom
hermes/hermes-6b8f4c13

teknium1 commented May 25, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

teknium1 commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat(stt): add register_transcription_provider() plugin hook + stt.providers command-provider registry

What this PR makes true that wasn't before

Resolution order (mirrors TTS exactly after #31745)

Files

Tests

E2E live verification

Why both surfaces in one PR

How this unblocks the 4 in-flight community PRs

Out of scope

Related

Infographic

Uh oh!

github-actions Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔎 Lint report: hermes/hermes-6b8f4c13 vs origin/main

ruff

ty (type checker)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

teknium1 commented May 25, 2026 •

edited

Loading

github-actions Bot commented May 25, 2026 •

edited

Loading

🔎 Lint report: `hermes/hermes-6b8f4c13` vs `origin/main`