Skip to content

feat(stt): add OpenRouter speech-to-text provider#25721

Open
RemyFevry wants to merge 2 commits into
NousResearch:mainfrom
RemyFevry:feat/openrouter-stt-provider
Open

feat(stt): add OpenRouter speech-to-text provider#25721
RemyFevry wants to merge 2 commits into
NousResearch:mainfrom
RemyFevry:feat/openrouter-stt-provider

Conversation

@RemyFevry

@RemyFevry RemyFevry commented May 14, 2026

Copy link
Copy Markdown

Summary

Adds openrouter as STT provider via OpenRouter /api/v1/audio/transcriptions.

Benchmark (68KB OGG voice message)

Model Time Cost Quality
whisper-large-v3-turbo 0.49s $0.0004 Good
gpt-4o-mini-transcribe 1.81s ~$0 Best
google/chirp-3 1.81s $0.0088 Poor
whisper-1 2.45s ~$0 Poor
local (base) 3.64s $0 (CPU) Slowest

turbo is 7x faster than local with better accuracy.

Changes

  • tools/transcription_tools.py — _transcribe_openrouter(), constants, dispatch
  • hermes_cli/config.py — openrouter config section
  • hermes_cli/web_server.py — "openrouter" option

Testing

  • Live transcription
  • 5-model benchmark
  • Auto-detection
  • Error handling

Adds openrouter as a seventh STT provider to transcription_tools.py.
Uses the OpenRouter /api/v1/audio/transcriptions endpoint with
base64-encoded JSON payloads. Supports all whisper models routed
through OpenRouter (openai/whisper-1, groq/whisper-large-v3-turbo,
etc.) with unified billing through existing OPENROUTER_API_KEY.

Changes:
- tools/transcription_tools.py: add _transcribe_openrouter(),
  constants, provider selection, and auto-detect fallback
- hermes_cli/config.py: add openrouter to default config template
- hermes_cli/web_server.py: add openrouter to stt.provider options
@alt-glitch alt-glitch added type/feature New feature or request P3 Low — cosmetic, nice to have tool/tts Text-to-speech and transcription provider/openrouter OpenRouter aggregator comp/cli CLI entry point, hermes_cli/, setup wizard duplicate This issue or pull request already exists labels May 14, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #21799 which adds OpenRouter as both transcription + speech provider with 19 tests. Also competes with #24703. Feature requested in #24415.

16 tests covering _transcribe_openrouter, _get_provider, and
transcribe_audio dispatch. Follows existing xAI test patterns.
Tests: key handling, success, whitespace stripping, API errors,
empty transcripts, permission errors, network errors, JSON body
verification, custom base URL, auto-detect priority, and model
override passthrough.

@liuhao1024 liuhao1024 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: _transcribe_openrouter uses JSON body instead of multipart/form-data

The function sends audio as JSON with base64-encoded input_audio, but OpenRouter's /audio/transcriptions endpoint follows the OpenAI Whisper API format, which expects multipart/form-data with a file upload. This will likely return a 400/422 at runtime.

Every other provider in this file (OpenAI, Groq, xAI) uses multipart/form-data. Compare with the xAI implementation ~40 lines above:

# xAI (correct — multipart/form-data)
with open(file_path, "rb") as audio_file:
    response = requests.post(
        f"{base_url}/stt",
        headers={"Authorization": f"Bearer {api_key}"},
        files={"file": (Path(file_path).name, audio_file)},
        data={"model": model_name},
        timeout=120,
    )

The OpenRouter implementation should use the same pattern instead of json={"input_audio": {"data": b64, ...}}:

with open(file_path, "rb") as audio_file:
    response = requests.post(
        f"{base_url}/audio/transcriptions",
        headers={"Authorization": f"Bearer {api_key}"},
        files={"file": (Path(file_path).name, audio_file)},
        data={"model": model_name},
        timeout=120,
    )

Minor: OPENROUTER_STT_MODELS is defined but never used

The OPENROUTER_STT_MODELS set (~line 97-101) is never referenced for model validation. Either remove it or add a guard in _transcribe_openrouter that warns when the model isn't in the set (like the OpenAI provider does with GROQ_MODELS).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard duplicate This issue or pull request already exists P3 Low — cosmetic, nice to have provider/openrouter OpenRouter aggregator tool/tts Text-to-speech and transcription type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants