Skip to content

feat(stt): add register_transcription_provider() hook + stt.providers command-provider registry (salvage of #30493)#31907

Merged
teknium1 merged 2 commits into
mainfrom
hermes/hermes-6b8f4c13
May 25, 2026
Merged

feat(stt): add register_transcription_provider() hook + stt.providers command-provider registry (salvage of #30493)#31907
teknium1 merged 2 commits into
mainfrom
hermes/hermes-6b8f4c13

Conversation

@teknium1

@teknium1 teknium1 commented May 25, 2026

Copy link
Copy Markdown
Contributor

feat(stt): add register_transcription_provider() plugin hook + stt.providers command-provider registry

Salvages PR #30493 (@kshitijk4poor) onto current main and adds the symmetric STT command-provider registry that #31745 just established for TTS. Both new STT extension surfaces ship together so the precedence rules are symmetric across the two tools from day one.

What this PR makes true that wasn't before

New STT engines can plug into Hermes without modifying tools/transcription_tools.py. Two coexisting surfaces, picked per use case:

  1. stt.providers.<name>: type: command — any shell-driven ASR engine (Doubao ASR, NVIDIA Parakeet, whisper.cpp builds, SenseVoice CLI, curl … | jq -r .text pipelines). Zero Python.
  2. ctx.register_transcription_provider() — Python-SDK engines (OpenRouter, SenseAudio, Gemini-STT, Deepgram) that can't be expressed as a single shell command.

The 6 built-in providers (local, local_command, groq, openai, mistral, xai) keep their native handlers and always win on name collision. HERMES_LOCAL_STT_COMMAND is preserved untouched via the built-in local_command path.

Resolution order (mirrors TTS exactly after #31745)

  1. stt.provider is a built-in name → built-in dispatch. Always wins.
  2. stt.provider matches stt.providers.<name> with command: set → command-provider runner.
  3. stt.provider matches a plugin-registered TranscriptionProviderplugin dispatch.
  4. No match → "No STT provider available" error.

Same precedence as TTS: config more local than plugin install, built-ins always win.

Files

New

Modified

  • hermes_cli/plugins.pyregister_transcription_provider() on PluginContext. Docstring updated to spell out both invariants (built-ins-always-win + command-wins-over-plugin).
  • tools/transcription_tools.pyBUILTIN_STT_PROVIDERS frozenset + _dispatch_to_plugin_provider() (from feat(stt): add register_transcription_provider() plugin hook #30493) plus _resolve_command_stt_provider_config() / _transcribe_command_stt() / local helpers for shell-quote-aware template rendering and process-tree termination on timeout (new in this PR). Wire-in is one insertion point in transcribe_audio() — command runs after built-in elif, before plugin dispatch. Plugin dispatcher additionally short-circuits when a same-name command config exists (defense in depth).
  • website/docs/user-guide/features/tts.md — new "STT custom command providers" section + "When to pick which (STT)" decision table + updated resolution order (4 layers instead of 3).
  • website/docs/user-guide/features/plugins.md — STT row updated to describe both surfaces with anchors to each docs section.

Tests

File Tests Coverage
tests/agent/test_transcription_registry.py 27 Registry: register/get/list, type/empty-name rejection, built-in shadow rejection (6 names), case-insensitive lookup, ABC contract, TestBuiltinSync regression test if _BUILTIN_NAMES drifts from BUILTIN_STT_PROVIDERS
tests/hermes_cli/test_plugins_transcription_registration.py 3 End-to-end via PluginManager.discover_and_load()
tests/tools/test_transcription_plugin_dispatch.py 28 Built-in-always-wins (6 parametrized), unknown-name-no-plugin returns None, plugin dispatch, model/language kwargs, exception/non-dict envelopes, availability gate, provider field stamping, end-to-end via transcribe_audio()
tests/tools/test_transcription_command_providers.py (new) 50 Resolution (built-in precedence, type/command gating, case-insensitive lookup, legacy stt.<name> back-compat), helpers (timeout fallback, format validation, iter, has-any), template rendering (3 shell-quote contexts, doubled-brace preservation), _transcribe_command_stt end-to-end (file write, stdout fallback, timeout envelope, nonzero exit envelope, model override, language precedence chain), transcribe_audio() integration including command-wins-over-plugin and built-in-rejects-command-shadow
tests/plugins/transcription/check_parity_vs_main.py 13 scenarios Subprocess-pinned parity harness vs origin/main. New scenarios: command-provider-installed, command-vs-plugin-same-name, explicit-openai-with-command-shadow
$ bash scripts/run_tests.sh tests/tools/test_transcription* \
                            tests/agent/test_transcription_registry.py \
                            tests/hermes_cli/test_plugins_transcription_registration.py \
                            tests/tools/test_tts_command_providers.py \
                            tests/tools/test_tts_plugin_dispatch.py
=== Summary: 11 files, 376 tests passed, 0 failed ===

$ python tests/plugins/transcription/check_parity_vs_main.py
  [OK]   stt-disabled                          [OK]   explicit-mistral-quarantine
  [OK]   explicit-groq                         [OK]   unknown-no-plugin
  [OK]   explicit-openai                       [DIFF] plugin-installed: → plugin (expected)
  [OK]   explicit-local                        [DIFF] plugin-installed-unavailable: → plugin_unavailable (expected)
  [OK]   explicit-xai                          [OK]   explicit-openai-with-plugin-registered
  [DIFF] command-provider-installed: → command_provider (expected)
  [DIFF] command-vs-plugin-same-name: → command_provider (expected — command wins over same-name plugin)
  [OK]   explicit-openai-with-command-shadow (built-in still wins)
INTENTIONAL DIFFS (4): expected new behavior from this PR.
PARITY OK across 13 scenarios.

$ bash scripts/run_tests.sh tests/tools/
=== Summary: 227 files, 5576 tests passed, 0 failed ===

$ bash scripts/run_tests.sh tests/agent/ tests/hermes_cli/
=== Summary: 367 files, 8623 tests passed, 0 failed ===

14,199 tests pass across the affected surfaces — zero regressions.

E2E live verification

Real config.yaml + real .wav + isolated HERMES_HOME + real transcribe_audio() (no mocks):

command:                    → dispatched to stt.providers.my-fake-cli (transcript: E2E_CMD_TRANSCRIPT)
plugin:                     → dispatched to registered TranscriptionProvider (transcript: E2E_PLUGIN_TRANSCRIPT)
command-wins-over-plugin:   → command provider beat the same-name plugin
builtin-wins-over-command:  → built-in OpenAI handler fires; stt.providers.openai: type: command
                              does NOT hijack it

Why both surfaces in one PR

The TTS hook (#31745, merged today) shipped on top of an existing TTS command-provider registry (#17843, May 2026). For symmetry, the STT story needs both surfaces. 3 of the 4 in-flight community STT PRs (OpenRouter STT #25721/#24703, SenseAudio #9380) are plain HTTPS APIs that could be served by a shell command. Only Gemini-STT #21540 genuinely needs the Python hook. Shipping both at once means no second PR cycle and no contributor gets told "your work isn't useful, we picked the other path."

How this unblocks the 4 in-flight community PRs

After this lands, each becomes a small focused contribution:

PR Becomes
#25721 (RemyFevry, OpenRouter STT) stt.providers.openrouter config block (or a thin Python plugin) — author credited
#24703 (xxxigm, duplicate OpenRouter STT) Closed with credit, pointing to #25721's resolution
#9380 (Fl0rencess720, SenseAudio STT) stt.providers.sensaudio config block — author credited
#21540 (mrlufepines, Gemini STT) plugins/transcription/gemini/__init__.py — author credited. TTS + WhatsApp PTT split into separate issues

Out of scope

  • No picker integration (TOOL_CATEGORIES["transcription"]) — STT isn't surfaced in hermes tools today. Deferred to a follow-up that lands alongside the first community plugin.
  • No changes to the 6 built-in STT backends — they stay inline.
  • No changes to gateway voice-message auto-transcriptiongateway/run.py and gateway/platforms/discord.py call transcribe_audio(), signature unchanged.
  • No deprecation of HERMES_LOCAL_STT_COMMAND — preserved via the built-in local_command path.

Related

Infographic

stt-plugin-hook-and-command-provider-registry

@alt-glitch alt-glitch added type/feature New feature or request comp/plugins Plugin system and bundled plugins tool/tts Text-to-speech and transcription P3 Low — cosmetic, nice to have labels May 25, 2026
@teknium1 teknium1 force-pushed the hermes/hermes-6b8f4c13 branch from e2241b1 to 4851648 Compare May 25, 2026 08:22
@github-actions

github-actions Bot commented May 25, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-6b8f4c13 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9153 on HEAD, 9136 on base (🆕 +17)

🆕 New issues (9):

Rule Count
invalid-argument-type 5
unresolved-import 3
not-subscriptable 1
First entries
tests/agent/test_transcription_registry.py:134: [invalid-argument-type] invalid-argument-type: Argument to function `get_provider` is incorrect: Expected `str`, found `None`
tests/agent/test_transcription_registry.py:135: [invalid-argument-type] invalid-argument-type: Argument to function `get_provider` is incorrect: Expected `str`, found `Literal[123]`
tests/tools/test_transcription_command_providers.py:115: [invalid-argument-type] invalid-argument-type: Argument to function `_resolve_command_stt_provider_config` is incorrect: Expected `str`, found `None`
tests/tools/test_transcription_plugin_dispatch.py:462: [not-subscriptable] not-subscriptable: Cannot subscript object of type `None` with no `__getitem__` method
tests/tools/test_transcription_plugin_dispatch.py:176: [invalid-argument-type] invalid-argument-type: Argument to `_FakeProvider.__init__` is incorrect: Expected `dict[Unknown, Unknown] | None`, found `Literal["weird string"]`
tests/agent/test_transcription_registry.py:20: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/tools/test_transcription_command_providers.py:30: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/tools/test_transcription_plugin_dispatch.py:21: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/agent/test_transcription_registry.py:77: [invalid-argument-type] invalid-argument-type: Argument to function `register_provider` is incorrect: Expected `TranscriptionProvider`, found `Literal["not a provider"]`

✅ Fixed issues: none

Unchanged: 4872 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

kshitijk4poor and others added 2 commits May 25, 2026 01:33
Add an opt-in Python plugin surface for speech-to-text backends,
mirroring the TTS hook pattern. New backends (OpenRouter, SenseAudio,
Gemini-STT, custom proprietary engines) can be implemented as plugins
without modifying tools/transcription_tools.py.

Built-ins always win
--------------------
The 6 built-in STT providers (local/faster-whisper, local_command,
groq, openai, mistral, xai) keep their native handlers. Plugins
attempting to register under a built-in name are rejected at
registration time with a warning and re-checked defensively at
dispatch.

Resolution order
----------------
1. stt.provider matches a built-in → built-in dispatch (unchanged)
2. stt.provider matches a registered plugin →
   a. if plugin.is_available() returns False → unavailability envelope
      identifying the plugin (not the generic "No STT provider"
      message — the user explicitly opted into this plugin)
   b. otherwise plugin.transcribe() with model + language forwarded
      from stt.<provider>.{model,language} config
3. No match → legacy "No STT provider available" error (unchanged)

Per-provider config namespace
-----------------------------
Plugins read their config from stt.<provider> in config.yaml, mirroring
how built-ins read stt.openai.model / stt.mistral.model. The dispatcher
forwards `model` and `language` from this section. Caller's explicit
`model=` argument overrides the config-set model.

Files
-----
- agent/transcription_provider.py: TranscriptionProvider ABC
- agent/transcription_registry.py: register/get/list providers,
  built-in shadow guard, _reset_for_tests
- hermes_cli/plugins.py: register_transcription_provider() on
  PluginContext
- tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset,
  _dispatch_to_plugin_provider() with availability gate, wire-in
  after xai branch and before "No STT provider" error
- tests/agent/test_transcription_registry.py: 27 tests
- tests/hermes_cli/test_plugins_transcription_registration.py: 3 tests
- tests/tools/test_transcription_plugin_dispatch.py: 28 tests
  (covering built-in short-circuit, plugin dispatch, exception
  envelope, non-dict guard, availability gate, language forwarding)
- tests/plugins/transcription/check_parity_vs_main.py: 10-scenario
  subprocess-pinned parity harness vs origin/main
- website/docs/user-guide/features/{tts,plugins}.md: docs

Behavior parity
---------------
10 scenarios, 8 OK + 2 expected DIFFs:
  no_provider_error → plugin (plugin-installed scenario)
  no_provider_error → plugin_unavailable (plugin-installed-unavailable
  scenario; PR returns cleaner envelope)
Zero behavior change for users not opting into a plugin.

Issue follow-up to #30398.
Mirror of the TTS command-provider registry (PR #17843) for STT. Lets any
shell-driven ASR engine — Doubao ASR, NVIDIA Parakeet, whisper.cpp builds,
SenseVoice, curl pipelines — become an STT backend with zero Python.
Complements the legacy HERMES_LOCAL_STT_COMMAND escape hatch (preserved
untouched via the built-in local_command path) and the
register_transcription_provider() Python plugin hook also shipped in this
PR.

Resolution order (mirrors TTS exactly):

  1. Built-in (local, local_command, groq, openai, mistral, xai)
     → native handler. Always wins.
  2. stt.providers.<name>: type: command  → command-provider runner.
  3. Plugin-registered TranscriptionProvider → plugin dispatch.
  4. No match → 'No STT provider available'.

Files
-----
- tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset retained;
  added _resolve_command_stt_provider_config, _transcribe_command_stt,
  and local helpers for template rendering, shell-quote context, and
  process-tree termination. Helpers are documented as mirrors of their
  tts_tool.py counterparts (kept local to avoid cross-tool private
  import). Wire-in is one insertion point in transcribe_audio() after
  the xai elif and before the plugin dispatcher. Plugin dispatcher
  additionally defensively short-circuits when a same-name command
  config exists (command-wins-over-plugin invariant).

- tests/tools/test_transcription_command_providers.py: 50 new tests
  covering resolution (builtin precedence, type/command gating,
  case-insensitive lookup, legacy stt.<name> back-compat), helpers
  (timeout fallback, format validation, iter, has-any), template
  rendering (shell-quote contexts, doubled-brace preservation),
  end-to-end via _transcribe_command_stt (output_path read, stdout
  fallback, timeout, nonzero exit envelope, model override,
  language precedence), and dispatcher integration via the real
  transcribe_audio() including command-wins-over-plugin and
  builtin-shadow-rejection.

- tests/plugins/transcription/check_parity_vs_main.py: extended from
  10 to 13 scenarios. New cases: command-provider-installed,
  command-vs-plugin-same-name (verifies command wins precedence),
  explicit-openai-with-command-shadow (verifies built-in wins).
  Adds command_provider dispatch_kind detection via transcript prefix
  (CMD: vs PLUGIN:) so command-provider scenarios can be distinguished
  from plugin scenarios even when sharing a provider name.

- website/docs/user-guide/features/tts.md: new 'STT custom command
  providers' section symmetric to the TTS section — example config,
  placeholder grammar table (input_path / output_path / output_dir /
  format / language / model), transcript-read-back semantics (file
  first, then stdout fallback), optional keys table, behavior notes,
  security note. Updated 'Python plugin providers (STT)' to include
  the new 'When to pick which (STT)' decision table and updated
  resolution-order section (now 4 layers instead of 3).

Verification
------------
189/189 STT targeted tests + 50/50 new command-provider tests pass.
Combined sweep: tests/tools/ 5576/5576, tests/agent/ + tests/hermes_cli/
8623/8623 — zero regressions across 14,199 tests.

Parity harness: 13 scenarios, 9 OK + 4 expected diffs
(no_provider_error → plugin, plugin_unavailable, command_provider × 2).

E2E live-verified in an isolated HERMES_HOME with a real .wav file:

  command:                    → dispatched to stt.providers.my-fake-cli
  plugin:                     → dispatched to registered TranscriptionProvider
  command-wins-over-plugin:   → command provider beats same-name plugin
  builtin-wins-over-command:  → built-in OpenAI handler fires;
                                stt.providers.openai: type: command
                                does NOT hijack it.
@teknium1 teknium1 force-pushed the hermes/hermes-6b8f4c13 branch from 4851648 to e8fa061 Compare May 25, 2026 08:33
@teknium1 teknium1 merged commit d3ffbc6 into main May 25, 2026
27 checks passed
@teknium1 teknium1 deleted the hermes/hermes-6b8f4c13 branch May 25, 2026 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/plugins Plugin system and bundled plugins P3 Low — cosmetic, nice to have tool/tts Text-to-speech and transcription type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants