Skip to content

Feat/openrouter stt provider#24703

Open
xxxigm wants to merge 4 commits into
NousResearch:mainfrom
xxxigm:feat/openrouter-stt-provider
Open

Feat/openrouter stt provider#24703
xxxigm wants to merge 4 commits into
NousResearch:mainfrom
xxxigm:feat/openrouter-stt-provider

Conversation

@xxxigm

@xxxigm xxxigm commented May 13, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Implements #24415 — adds OpenRouter as a first-class STT provider alongside the existing five (local, groq, openai, mistral, xai).

Hermes today supports five STT providers but has no way to reuse OPENROUTER_API_KEY for transcription — users with OpenRouter set up for LLM access have to spin up a separate Groq / OpenAI / xAI account just to transcribe a voice message. OpenRouter exposes an OpenAI-compatible /api/v1/audio/transcriptions endpoint that routes to Whisper-family models (openai/whisper-1, openai/gpt-4o-mini-transcribe, openai/gpt-4o-transcribe) billed through the same credit pool as LLM access. Adding openrouter as a native provider closes that gap with zero new dependencies — the existing openai SDK with a custom base_url handles the wire format.

The implementation lives in tools/transcription_tools.py:

  1. Provider detection — explicit stt.provider: openrouter branch + auto-detect chain entry. Auto-detect order is local > groq > openai > xai > openrouter. OpenRouter sits last in the cloud chain because it's a catch-all — if the user also has a dedicated STT key (Groq / OpenAI / xAI), the dedicated provider wins so they keep that provider's pricing and routing.
  2. Client routing_transcribe_openrouter() mirrors the _transcribe_groq() shape: OpenAI(api_key=..., base_url="https://openrouter.ai/api/v1", timeout=30, max_retries=0) then audio.transcriptions.create(model=..., file=..., response_format="text"). Same error envelope, same _extract_transcript_text post-processing.
  3. Model selection — default openai/whisper-1, overridable via stt.openrouter.model or the STT_OPENROUTER_MODEL env var. The known-models set is informational only — Hermes does not block other model IDs (OpenRouter routinely adds new ones).
  4. Auth — reuses OPENROUTER_API_KEY from .env (no new env var). stt.openrouter.api_key in config wins over the env var so managed deployments can pin per-deployment credentials.
  5. No new dependency — the existing openai SDK does the work; the issue's "no new SDK required" promise is honoured.

Related Issue

Closes #24415[Feature]: Add OpenRouter as an STT provider.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • tools/transcription_tools.py (+109 / −8):
    • New constants DEFAULT_OPENROUTER_STT_MODEL = "openai/whisper-1", OPENROUTER_STT_BASE_URL = "https://openrouter.ai/api/v1", OPENROUTER_MODELS = {…} (informational only).
    • _get_provider() — explicit openrouter branch (requires OPENROUTER_API_KEY + the openai SDK) + auto-detect chain entry after xai.
    • _transcribe_openrouter() — new function modeled on _transcribe_groq(), with config/env key resolution, base-url override, full error-envelope coverage (PermissionError, APIConnectionError, APITimeoutError, APIError, Exception).
    • transcribe_audio() — new dispatch arm + per-provider model resolution (stt.openrouter.model wins, default openai/whisper-1).
    • Module docstring updated to "seven providers"; the "no provider available" hint mentions OPENROUTER_API_KEY so users discover it without reading source.
  • cli-config.yaml.example (+4 / −2) — new commented stt.openrouter block with model / api_key / base_url examples.
  • tests/tools/test_transcription_openrouter.py (new, +378 lines) — 18 unit tests across three classes:
    • TestGetProviderOpenRouter (7) — explicit-config and auto-detect resolution, including the precedence rule from the issue (groq / openai / xai win over openrouter in auto-detect).
    • TestTranscribeOpenRouter (7) — env-key resolution, config overrides, OpenAI-SDK call shape (response_format="text" pinned), error-path wrappers.
    • TestTranscribeAudioDispatchOpenRouter (4) — end-to-end transcribe_audio() dispatch + the model-default chain.
  • tests/tools/test_transcription_openrouter_feature_anchor.py (new, +208 lines) — 2 feature-shape regression anchors that drive the issue's full "Proposed Solution" contract end-to-end:
    • test_full_user_round_trip_honours_issue_contract — asserts bullets 1–4 (provider routes, base URL, env key, default model) in a single round trip with per-bullet #24415 anchor messages.
    • test_auto_detect_lights_up_with_only_openrouter_key — asserts bullet 5 (auto-detect with only OPENROUTER_API_KEY set must land on openrouter).
  • website/docs/user-guide/features/voice-mode.md (+15 / −3) — added OPENROUTER_API_KEY to the API-Keys section with a tip explaining the "reuse your LLM key" value proposition + the auto-detect rationale; troubleshooting bullet updated; Configuration Reference section gets STT_OPENROUTER_MODEL and STT_OPENROUTER_BASE_URL overrides.

No existing files outside the additions above are touched; existing provider code paths (_transcribe_groq, _transcribe_openai, _transcribe_xai, _transcribe_mistral, _transcribe_local, _transcribe_local_command) are byte-identical.

How to Test

  1. Check out this branch and ensure .venv is set up: python3 -m venv .venv && source .venv/bin/activate && pip install -e ".[all,dev]"
  2. Run the new + adjacent test suites:
    scripts/run_tests.sh \
      tests/tools/test_transcription_openrouter.py \
      tests/tools/test_transcription_openrouter_feature_anchor.py \
      tests/tools/test_transcription_tools.py \
      tests/tools/test_transcription.py \
      tests/tools/test_transcription_dotenv_fallback.py
    
    Expected: 147 passed (20 new + 127 pre-existing).
  3. Verify the feature-shape regression anchor really catches the missing feature:
    git checkout upstream/main -- tools/transcription_tools.py
    pytest tests/tools/test_transcription_openrouter_feature_anchor.py -v
    
    Expected: 2 failed with the anchor messages
    #24415 anchor: stt.provider=openrouter with OPENROUTER_API_KEY set must succeed -- got error: 'No STT provider available...'
    and
    #24415 anchor: with only OPENROUTER_API_KEY set and no other STT backend available, auto-detect MUST land on the openrouter provider....
    Restore the fix (git checkout HEAD -- tools/transcription_tools.py) → both pass.
  4. (Optional, manual) Drop your real OPENROUTER_API_KEY into ~/.hermes/.env, set stt.provider: openrouter in ~/.hermes/config.yaml, then send a voice message via the gateway:
    • Discord/Telegram: voice-note → Hermes transcribes via https://openrouter.ai/api/v1/audio/transcriptions and replies. OpenRouter dashboard should show a hit on openai/whisper-1.
    • CLI voice mode: hermes/voice on → push-to-talk; transcript flows through the same code path.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (feat(stt): ..., test(stt): ..., docs(stt): ...)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this feature (4 commits, all scoped to OpenRouter STT + its tests + its docs)
  • I've run scripts/run_tests.sh on the affected test files and all 147 tests pass
  • I've added tests for my changes (18 unit tests + 2 feature-shape regression anchors)
  • I've tested on my platform: macOS 15.2 (Darwin 24.6.0), Python 3.12

Documentation & Housekeeping

  • I've updated relevant documentation — website/docs/user-guide/features/voice-mode.md got an OpenRouter tip, troubleshooting hint, and env-var reference rows
  • I've updated cli-config.yaml.example if I added/changed config keys — added a commented stt.openrouter block
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — feature is platform-agnostic; relies only on the openai SDK + OPENROUTER_API_KEY env var, both of which work identically on Windows / macOS / Linux. Tests are hermetic (mocked openai module via patch.dict("sys.modules", ...), no real network or filesystem outside tmp_path).
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A (internal STT pipeline; the transcribe_audio public surface keeps the exact same signature and return shape)

Screenshots / Logs

$ scripts/run_tests.sh tests/tools/test_transcription_openrouter.py tests/tools/test_transcription_openrouter_feature_anchor.py tests/tools/test_transcription_tools.py tests/tools/test_transcription.py tests/tools/test_transcription_dotenv_fallback.py
4 workers [147 items]
============================== 147 passed in 1.17s =============================

$ ruff check tools/transcription_tools.py tests/tools/test_transcription_openrouter.py tests/tools/test_transcription_openrouter_feature_anchor.py
All checks passed!

Feature-shape repro on upstream/main (anchor proves the feature is genuinely missing pre-PR):

$ git checkout upstream/main -- tools/transcription_tools.py
$ pytest tests/tools/test_transcription_openrouter_feature_anchor.py -v
FAILED tests/tools/test_transcription_openrouter_feature_anchor.py::TestOpenRouterSttFeatureContract::test_full_user_round_trip_honours_issue_contract
  AssertionError: #24415 anchor: stt.provider=openrouter with OPENROUTER_API_KEY set
    must succeed -- got error: 'No STT provider available. Install faster-whisper for
    free local transcription, ... or set VOICE_TOOLS_OPENAI_KEY or OPENAI_API_KEY for
    the OpenAI Whisper API.'.  This means _get_provider does not recognise
    'openrouter' or the dispatch in transcribe_audio is missing -- the feature is not
    wired.
FAILED tests/tools/test_transcription_openrouter_feature_anchor.py::TestOpenRouterSttFeatureContract::test_auto_detect_lights_up_with_only_openrouter_key
  AssertionError: #24415 anchor: with only OPENROUTER_API_KEY set and no other STT
    backend available, auto-detect MUST land on the openrouter provider...

feat/openrouter-stt-provider

xxxigm added 4 commits May 13, 2026 08:24
Adds openrouter alongside the existing five STT providers (local /
groq / openai / mistral / xai).  Lets users with OPENROUTER_API_KEY
already configured for LLM access reuse the same key & billing for
voice transcription -- no separate Groq / OpenAI / xAI account
needed.

OpenRouter exposes POST /api/v1/audio/transcriptions with the same
wire format as OpenAI, so the OpenAI SDK with a custom base_url +
api_key is the cleanest route -- no new dependency.  The new
_transcribe_openrouter() function mirrors the _transcribe_groq()
shape (same SDK call, same error envelope).

Wiring:
- Constants: DEFAULT_OPENROUTER_STT_MODEL ("openai/whisper-1"),
  OPENROUTER_STT_BASE_URL (https://openrouter.ai/api/v1), and an
  informational OPENROUTER_MODELS set used for documentation only
  (we do NOT block other model IDs -- OpenRouter routinely adds
  new ones).
- _get_provider() explicit branch for "openrouter" (requires
  OPENROUTER_API_KEY + the openai SDK).
- Auto-detect chain extended: local > groq > openai > xai >
  openrouter.  openrouter is last in the cloud chain because it's
  a catch-all -- if the user ALSO has a dedicated STT key we want
  to honour their explicit preference for that provider's pricing
  and routing.
- transcribe_audio() dispatch + per-provider model resolution
  (stt.openrouter.model wins, default openai/whisper-1).
- Updated module docstring + the no-provider error hint.

Config:
- stt.openrouter section added to cli-config.yaml.example with
  api_key / base_url overrides documented (both optional --
  defaults to OPENROUTER_API_KEY env + the standard base URL).

No new Python dependency: the existing openai package handles
OpenRouter natively via base_url override.
18 unit tests across three classes that mock the OpenAI SDK and
exercise the three observable surfaces of the new provider:

TestGetProviderOpenRouter (7 tests):
- explicit "openrouter" + key  -> "openrouter"
- explicit "openrouter" no key -> "none"  (no silent fallback,
  mirrors the NousResearchGH-1774 contract for groq / openai)
- explicit "openrouter" without the openai SDK installed -> "none"
- auto-detect picks openrouter when only OPENROUTER_API_KEY is set
- auto-detect prefers groq / openai / xai over openrouter (the
  precedence rule from the issue: openrouter is a catch-all, so
  any dedicated STT key wins)

TestTranscribeOpenRouter (7 tests):
- no key returns the documented error message
- openai SDK missing returns the documented error message
- happy path: env key + default base URL reach OpenAI(); model and
  response_format="text" pinned (whisper-1 must NOT be sent JSON)
- stt.openrouter.api_key in config wins over the env var
- stt.openrouter.base_url override is respected (trailing slash
  stripped to match the OpenAI SDK's URL builder)
- APIError surface routes to "API error: ..." not the catch-all
- PermissionError surfaces "Permission denied" cleanly

TestTranscribeAudioDispatchOpenRouter (4 tests):
- transcribe_audio routes to _transcribe_openrouter with the
  default model when stt.openrouter.model is unset
- stt.openrouter.model override flows through
- explicit model= argument wins over config
- the no-provider error hint mentions OPENROUTER_API_KEY +
  OpenRouter so users discover it without reading source

Mocks the openai package via patch.dict("sys.modules", ...) so the
test runs on every CI box regardless of whether openai is
installed.  An autouse fixture clears every API-key env var before
each test to keep cases hermetic.
…search#24415)

Two end-to-end anchors that drive the exact contract from the
issue's "Proposed Solution" section so a future refactor that
drops any link in the chain fails with an explicit "NousResearch#24415 anchor"
message.

The five contract bullets from the issue:
  1. stt.provider: openrouter is honoured.
  2. OpenAI SDK called with base_url=https://openrouter.ai/api/v1.
  3. Authorisation reuses OPENROUTER_API_KEY from env (the same
     env var the LLM stack reads).
  4. Default model is openai/whisper-1.
  5. Provider lives in the auto-detect chain so a user with ONLY
     OPENROUTER_API_KEY set still gets STT.

Test 1 (test_full_user_round_trip_honours_issue_contract) asserts
bullets 1-4 in a single end-to-end call.  Each assertion carries
its own NousResearch#24415 message identifying which contract bullet broke.

Test 2 (test_auto_detect_lights_up_with_only_openrouter_key)
asserts bullet 5: with only OPENROUTER_API_KEY set, no
stt.provider, no other STT key, transcribe_audio() must still
land on the openrouter provider.

Verified on upstream/main: both fail with
  AssertionError: NousResearch#24415 anchor: stt.provider=openrouter with
    OPENROUTER_API_KEY set must succeed -- got error: 'No STT
    provider available...'.  This means _get_provider does not
    recognise 'openrouter' or the dispatch in transcribe_audio
    is missing -- the feature is not wired.
and
  AssertionError: NousResearch#24415 anchor: with only OPENROUTER_API_KEY
    set and no other STT backend available, auto-detect MUST
    land on the openrouter provider...
-- exactly the "feature not implemented" signal we want.  After
the feature lands both pass.
Updates voice-mode.md so users discover the OpenRouter STT path
without spelunking the source:

- API Keys section: add OPENROUTER_API_KEY alongside the existing
  STT keys (also lists XAI_API_KEY which the section was missing).
  Adds an OpenRouter-specific tip explaining the "reuse your LLM
  key" value proposition + the auto-detect order rationale.
- Troubleshooting bullet: extend the "verify STT is available"
  hint to mention OPENROUTER_API_KEY as a viable fallback.
- Configuration Reference > Environment Variables: list
  OPENROUTER_API_KEY, STT_OPENROUTER_MODEL, and
  STT_OPENROUTER_BASE_URL alongside the existing STT_*
  overrides.

Each change tagged with (NousResearch#24415) so future grep-archeology lands
on the issue + this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/tools Tool registry, model_tools, toolsets P3 Low — cosmetic, nice to have provider/openrouter OpenRouter aggregator tool/tts Text-to-speech and transcription type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Add OpenRouter as an STT provider

2 participants