Skip to content

feat(stt): add register_transcription_provider() plugin hook#30493

Closed
kshitijk4poor wants to merge 1 commit into
NousResearch:mainfrom
kshitijk4poor:feat/transcription-plugin-migration
Closed

feat(stt): add register_transcription_provider() plugin hook#30493
kshitijk4poor wants to merge 1 commit into
NousResearch:mainfrom
kshitijk4poor:feat/transcription-plugin-migration

Conversation

@kshitijk4poor

Copy link
Copy Markdown
Collaborator

What this PR does

Adds a TranscriptionProvider(ABC) + register_transcription_provider() extension point to the plugin context API, mirroring the register_tts_provider() hook from #30420. New STT backends (OpenRouter, SenseAudio, Gemini-STT, Deepgram, custom proprietary engines) can now be added as plugins/transcription/<vendor>/ without modifying tools/transcription_tools.py.

This is additive — the 6 built-in STT backends (local, local_command, groq, openai, mistral, xai) keep their native implementations and always win on name collision. The hook is for new engines.

Why this is needed now

4 open community PRs are adding new STT backends inline, each touching tools/transcription_tools.py directly:

PR Backend Author LoC Status
#25721 OpenRouter STT @RemyFevry +341 / -5 OPEN
#24703 OpenRouter STT (duplicate) @xxxigm +718 / -11 OPEN
#9380 SenseAudio STT @Fl0rencess720 +83 / -3 OPEN (since April)
#21540 Gemini STT (+ TTS + WhatsApp PTT) @mrlufepines +302 / -56 OPEN

This is the "multiple PRs solving the same problem" pattern the hermes-agent-dev skill flags as the cue to extract an ABC. After this hook lands, each can be salvaged as a ~80 LoC plugin under plugins/transcription/<vendor>/.

Resolution order

  1. stt.provider is a built-in name → built-in dispatch. Always wins.
  2. stt.provider matches a plugin-registered TranscriptionProvider → plugin dispatch (new).
  3. No match → falls through to the legacy "No STT provider available" error.

Built-ins-always-win is enforced at TWO layers (registry rejection at registration time + dispatcher short-circuit at dispatch time).

Files

New:

  • agent/transcription_provider.pyTranscriptionProvider(ABC) mirroring tts_provider.py / image_gen_provider.py shape. transcribe() required; everything else optional with sane defaults.
  • agent/transcription_registry.py — registry mirroring tts_registry.py shape, with _BUILTIN_NAMES reject-shadowing invariant.

Modified:

  • hermes_cli/plugins.py (+37 LoC) — register_transcription_provider() method on PluginContext. Matches the gating shape of register_image_gen_provider().
  • tools/transcription_tools.py (+115 LoC) — BUILTIN_STT_PROVIDERS frozenset + _dispatch_to_plugin_provider() + single-line wiring into transcribe_audio() (after the 6 built-in elif branches, before the legacy error). Built-in elif chain, _get_provider, _validate_audio_file, and all helpers are unchanged.
  • website/docs/user-guide/features/tts.md (+80 LoC) — new "Python plugin providers (STT)" section with decision table, minimal plugin example, optional-hook reference.
  • website/docs/user-guide/features/plugins.md (+1 LoC) — STT row updated.

Tests

File Tests Coverage
tests/agent/test_transcription_registry.py 27 Registration happy path, type rejection, empty-name rejection, built-in shadow rejection (6 names), case-insensitive lookup, ABC contract, TestBuiltinSync regression test that fails if _BUILTIN_NAMES drifts from BUILTIN_STT_PROVIDERS
tests/tools/test_transcription_plugin_dispatch.py 19 Built-in-always-wins (6 parametrized), unknown-name-no-plugin returns None, plugin dispatch happy path, model/language kwargs forwarding, exception → error envelope, non-dict result → error envelope, provider field stamping, end-to-end via transcribe_audio()
tests/hermes_cli/test_plugins_transcription_registration.py 3 End-to-end via PluginManager.discover_and_load()
tests/plugins/transcription/check_parity_vs_main.py 9 scenarios Subprocess parity harness vs origin/main

Test plan:

bash scripts/run_tests.sh \
  tests/tools/test_transcription_tools.py \
  tests/tools/test_transcription_dotenv_fallback.py \
  tests/tools/test_transcription.py \
  tests/tools/test_transcription_plugin_dispatch.py \
  tests/agent/test_transcription_registry.py \
  tests/hermes_cli/test_plugins_transcription_registration.py

# 180 passed, 0 failed (131 pre-existing untouched + 49 new)

Plus the parity harness:

$ python tests/plugins/transcription/check_parity_vs_main.py
  [OK]   stt-disabled: stt_disabled
  [OK]   explicit-groq: builtin_groq
  [OK]   explicit-openai: builtin_openai
  [OK]   explicit-local: builtin_local
  [OK]   explicit-xai: builtin_xai
  [OK]   explicit-mistral-quarantine: builtin_mistral
  [OK]   unknown-no-plugin: no_provider_error
  [DIFF] plugin-installed: no_provider_error → plugin — expected
  [OK]   explicit-openai-with-plugin-registered: builtin_openai
PARITY OK across 9 scenarios.

The single [DIFF] is the intentional behavior change: when a plugin is registered, stt.provider: <plugin-name> routes to the plugin instead of falling through to the "No STT provider available" error.

Broader sweep: tests/tools/ 5395/5395 pass. Zero regressions across 222 test files.

E2E smoke

A registered plugin's transcribe() is reached via the real transcribe_audio() with the standard envelope:

Result: {'success': True, 'transcript': 'PLUGIN_TRANSCRIPT_OK', 'provider': 'testplug'}
PASS

Out of scope

  • No picker integration (TOOL_CATEGORIES["transcription"]) — STT isn't surfaced in hermes tools today. Adding it would require deciding STT defaults, plumbing into hermes setup, etc. Deferred to a follow-up that lands alongside the first community plugin.
  • No changes to the 6 built-in STT backends — they stay inline.
  • No changes to gateway voice-message auto-transcriptiongateway/run.py and gateway/platforms/discord.py call transcribe_audio(), which retains its signature.
  • No deprecation of HERMES_LOCAL_STT_COMMAND — preserved via the built-in local_command path.

How this unblocks the 4 in-flight PRs

After this lands, each becomes a focused ~80 LoC plugin instead of a 300-line patch to a 963-line dispatcher:

PR Becomes
#25721 (RemyFevry) Salvage as plugins/transcription/openrouter/__init__.py — author credited
#24703 (xxxigm, duplicate) Closed with credit, pointing to #25721's salvage
#9380 (SenseAudio) Salvage as plugins/transcription/sensaudio/
#21540 (Gemini STT half) Salvage as plugins/transcription/gemini/. TTS + WhatsApp PTT split into separate issues

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Code refactor / cleanup
  • Performance improvement
  • Test coverage improvement

Related

@kshitijk4poor kshitijk4poor added type/feature New feature or request comp/plugins Plugin system and bundled plugins tool/tts Text-to-speech and transcription P2 Medium — degraded but workaround exists labels May 22, 2026
@alt-glitch alt-glitch added P3 Low — cosmetic, nice to have and removed P2 Medium — degraded but workaround exists labels May 22, 2026

@keegoid-codex keegoid-codex left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bugs

  • [CAT-1] language argument never reaches plugins from public API at tools/transcription_tools.py:1025.
  • [CAT-1] is_available() ignored, routing explicit configs to unavailable plugins at tools/transcription_tools.py:385.

Severity

  • medium. Runtime plugin contract drops parameters and bypasses availability guard.

VERDICT: request_changes


codex-review posting override: forced to --comment because reviewer lacks verified write permission (viewerPermission=READ; was --request-changes). GitHub only counts approvals from WRITE, MAINTAIN, or ADMIN reviewers.

@keegoid-cc

Copy link
Copy Markdown

[DEV SecOps] verdict: PASS
verified at 303728e579461ff7e2475875075307b84c26cc75

Categories:

  • prompt-injection: no findings. Scanned changed docs, code comments/string literals, test fixtures, and commit title/body against DEV trigger fixture plus hidden Unicode controls.
  • dependency / lockfile: no findings. No package manifests or lockfiles changed; no new third-party dependency introduced.
  • CI / workflow injection: no findings. No workflow/CI files changed; no new pull_request_target, secrets exposure, direct github.event/input interpolation, or runner changes.
  • generated/copied code provenance: no blocker. New code is small in-repo Python API/registry/dispatch/tests/docs; no vendored/minified/copied blob observed. The parity helper is test-only and not production dispatch code.
  • secret exposure: no findings. Diff scan found no tokens, private keys, API-key assignments, or credential material.
  • ops/security regression: no findings. Built-in STT provider names are reserved in both registry and dispatcher, plugin exceptions/non-dict results are converted to error envelopes, and plugin dispatch only uses already enabled/discovered local plugins for non-built-in provider names.

Read-only checks used:

  • pinned PR metadata with gh-as cc and verified head SHA before inspection
  • read PR diff, changed-file list, commit metadata/message body
  • static diff scans for prompt-injection triggers, hidden Unicode steering controls, secrets, risky network/shell/eval patterns
  • git diff --check and file-category inventory against base 1e71b71

Residual risk:

  • Plugin providers can execute arbitrary local plugin code once installed/enabled; that is existing Hermes plugin trust model, not introduced as remote code execution by this PR.

@keegoid-cc

Copy link
Copy Markdown

[DEV SecOps] verdict: PASS
verified at 303728e579461ff7e2475875075307b84c26cc75

  • Prompt-injection: no findings. Scanned PR body, commit message, changed docs/tests/comments/string literals against DEV trigger fixture plus hidden Unicode controls; no instruction-bearing payloads found.
  • Dependencies/lockfiles: no findings. No package manifests or lockfiles changed; no new dependency registry verification required.
  • CI/workflow injection: no findings. No workflow/CI files changed; no new pull_request_target, secrets.*, self-hosted runner labels, or ${{ inputs.* }}/${{ github.event.* }} in run: blocks.
  • Generated/copied code provenance: no findings. New provider/registry/tests/docs are first-party plugin-hook code patterned after existing Hermes provider/registry architecture; no vendored blobs, minified payloads, or unexplained copied code detected.
  • Secret exposure: no findings. Added examples mention environment variable names (*_API_KEY) only; no literal credentials or private key material found.
  • Ops/security regression: no blocking finding. Runtime addition is an opt-in STT plugin dispatch path for non-built-in provider names; built-in names are guarded at registration and dispatch, plugin exceptions/non-dict returns are contained in error envelopes. Residual risk: future concrete STT plugin implementations remain arbitrary enabled plugin code and need their own review.

Read-only checks used: gh-as cc pr view, gh-as cc pr diff, pinned-object git fetch/git diff --check, static prompt-injection/hidden-Unicode/secret/risky-pattern scan. Did not execute PR code or tests.

Add an opt-in Python plugin surface for speech-to-text backends,
mirroring the TTS hook pattern. New backends (OpenRouter, SenseAudio,
Gemini-STT, custom proprietary engines) can be implemented as plugins
without modifying tools/transcription_tools.py.

Built-ins always win
--------------------
The 6 built-in STT providers (local/faster-whisper, local_command,
groq, openai, mistral, xai) keep their native handlers. Plugins
attempting to register under a built-in name are rejected at
registration time with a warning and re-checked defensively at
dispatch.

Resolution order
----------------
1. stt.provider matches a built-in → built-in dispatch (unchanged)
2. stt.provider matches a registered plugin →
   a. if plugin.is_available() returns False → unavailability envelope
      identifying the plugin (not the generic "No STT provider"
      message — the user explicitly opted into this plugin)
   b. otherwise plugin.transcribe() with model + language forwarded
      from stt.<provider>.{model,language} config
3. No match → legacy "No STT provider available" error (unchanged)

Per-provider config namespace
-----------------------------
Plugins read their config from stt.<provider> in config.yaml, mirroring
how built-ins read stt.openai.model / stt.mistral.model. The dispatcher
forwards `model` and `language` from this section. Caller's explicit
`model=` argument overrides the config-set model.

Files
-----
- agent/transcription_provider.py: TranscriptionProvider ABC
- agent/transcription_registry.py: register/get/list providers,
  built-in shadow guard, _reset_for_tests
- hermes_cli/plugins.py: register_transcription_provider() on
  PluginContext
- tools/transcription_tools.py: BUILTIN_STT_PROVIDERS frozenset,
  _dispatch_to_plugin_provider() with availability gate, wire-in
  after xai branch and before "No STT provider" error
- tests/agent/test_transcription_registry.py: 27 tests
- tests/hermes_cli/test_plugins_transcription_registration.py: 3 tests
- tests/tools/test_transcription_plugin_dispatch.py: 28 tests
  (covering built-in short-circuit, plugin dispatch, exception
  envelope, non-dict guard, availability gate, language forwarding)
- tests/plugins/transcription/check_parity_vs_main.py: 10-scenario
  subprocess-pinned parity harness vs origin/main
- website/docs/user-guide/features/{tts,plugins}.md: docs

Behavior parity
---------------
10 scenarios, 8 OK + 2 expected DIFFs:
  no_provider_error → plugin (plugin-installed scenario)
  no_provider_error → plugin_unavailable (plugin-installed-unavailable
  scenario; PR returns cleaner envelope)
Zero behavior change for users not opting into a plugin.

Issue follow-up to NousResearch#30398.
@kshitijk4poor kshitijk4poor force-pushed the feat/transcription-plugin-migration branch from 303728e to a658b7e Compare May 23, 2026 09:28
@kshitijk4poor

Copy link
Copy Markdown
Collaborator Author

Thanks for the review @keegoid-codex — both findings were valid. Addressed in force-pushed commit a658b7e5d (rebased onto current main first; clean rebase, zero per-file overlap with the 35 commits that landed since branch point).

CAT-1 fix #1: language argument never reaches plugins

Added a per-provider config namespace stt.<provider> that mirrors how built-ins read stt.openai.model / stt.mistral.model. The dispatcher now forwards both model and language from this section. Caller's explicit model= argument still wins.

stt:
  provider: my-stt
  my-stt:
    model: whisper-large-v3
    language: ja          # forwarded as language= to transcribe()

Implementation in tools/transcription_tools.py (the public entry point):

plugin_cfg = stt_config.get(provider, {}) if isinstance(stt_config.get(provider), dict) else {}
plugin_language = plugin_cfg.get("language")
plugin_model = model or plugin_cfg.get("model")
plugin_result = _dispatch_to_plugin_provider(
    file_path, provider,
    model=plugin_model,
    language=plugin_language,
)

CAT-1 fix #2: is_available() ignored

Added an availability gate in _dispatch_to_plugin_provider between provider lookup and the transcribe() call. When is_available() returns False (or raises — defensively treated as False so a buggy plugin can't break dispatch for everyone), the dispatcher returns a clean unavailability envelope identifying the plugin instead of falling through to the generic "No STT provider available" message. The user explicitly opted into this plugin via stt.provider — surfacing the plugin's own unavailability is more actionable.

try:
    available = plugin_provider.is_available()
except Exception as exc:
    logger.warning("STT plugin provider '%s' is_available() raised: %s — treating as unavailable", key, exc, exc_info=True)
    available = False
if not available:
    return {
        "success": False,
        "transcript": "",
        "error": f"STT plugin '{key}' is not available — check that its required credentials / dependencies are configured.",
        "provider": key,
    }

Test coverage added

9 new tests in tests/tools/test_transcription_plugin_dispatch.py:

  • TestAvailabilityGate (4 tests):
    • test_unavailable_plugin_returns_envelope_not_none
    • test_available_plugin_dispatches_normally
    • test_is_available_raising_treated_as_unavailable
    • test_unavailable_plugin_at_transcribe_audio_level (end-to-end via the public transcribe_audio())
  • TestLanguageForwardingFromConfig (5 tests):
    • test_language_read_from_provider_namespaced_config
    • test_model_from_provider_namespaced_config
    • test_caller_model_overrides_config_model
    • test_missing_provider_namespace_passes_none
    • test_non_dict_provider_namespace_does_not_crash

Plus a new parity scenario plugin-installed-unavailable in the subprocess-pinned harness: on origin/main it returns no_provider_error (no plugin hook); on this PR it returns the cleaner plugin_unavailable envelope.

Verification

  • tests/agent/test_transcription_registry.py tests/hermes_cli/test_plugins_transcription_registration.py tests/tools/test_transcription_plugin_dispatch.py58 / 58 pass (was 49; +9 new for the codex findings)
  • tests/tools/test_transcription_tools.py95 / 95 pass (no regressions in the existing built-in tests)
  • scripts/run_tests.sh against the affected files → 153 / 153 pass in CI-parity mode
  • Parity harness → 10 scenarios, 8 OK + 2 expected DIFFs (no_provider_error → plugin, no_provider_error → plugin_unavailable)
  • Ruff on all touched files → clean
  • E2E with real transcribe_audio() calls confirms unavailability envelope, language forwarding, caller-model-override, and is_available() exception path
  • Git identity verified: commits authored by kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>

Rebase note

Rebased onto origin/main (35 commits ahead; zero overlap with files in this PR — none of the recent commits touched tools/transcription_tools.py, agent/transcription_registry.py, agent/transcription_provider.py, hermes_cli/plugins.py, or the new test files). Clean rebase, no conflicts.

The previously-failing CI test (tests/hermes_cli/test_web_server.py::TestPtyWebSocket::test_pub_broadcasts_to_events_subscribers) was a pre-existing PTY-WebSocket flake — passes locally on origin/main in isolation. Hoping the new CI run is greener.

Updated docs in website/docs/user-guide/features/tts.md to document the resolution order, the per-provider config namespace, and the unavailability behavior.

@teknium1

Copy link
Copy Markdown
Contributor

Merged via PR #31907#31907

Your commit was cherry-picked onto current main with your authorship preserved in git log (commit 2cd952e11). Thanks for the design and the careful test coverage — the parity harness was particularly useful as a template.

On top of your plugin hook, the salvage PR also adds the symmetric stt.providers.<name>: type: command registry mirroring the TTS registry from #17843, so the STT and TTS extension surfaces are now structurally identical:

  1. Built-in dispatch (always wins)
  2. <tool>.providers.<name>: type: command runner
  3. Plugin-registered provider via register_<tool>_provider()
  4. Error

This unblocks the 4 in-flight community STT PRs (#25721, #24703, #9380, #21540) — 3 of which (plain HTTPS) can now ship as config-only entries, with the Python hook reserved for the one PR (#21540, Gemini SDK) that genuinely needs it.

@teknium1 teknium1 closed this May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/plugins Plugin system and bundled plugins P3 Low — cosmetic, nice to have tool/tts Text-to-speech and transcription type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants