feat(audio): add OpenRouter as a transcription + speech provider by saurabh · Pull Request #21799 · NousResearch/hermes-agent

saurabh · 2026-05-08T10:06:53Z

Summary

Adds openrouter as a unified provider for both transcription (STT) and speech synthesis (TTS), alongside the existing local/groq/openai/mistral/xai (STT) and edge/elevenlabs/openai/minimax/mistral/gemini/xai/neutts/kittentts/piper (TTS) sets.

OpenRouter announced their audio APIs on May 1, 2026, exposing both /v1/audio/transcriptions and /v1/audio/speech that route across providers (OpenAI Whisper, GPT-4o-transcribe, Google Chirp 3, Groq Whisper for STT; OpenAI TTS, Google Gemini Flash TTS, Mistral Voxtral Mini TTS for speech) under one API key.

"Text-to-speech and transcription are now live on OpenRouter. Two new endpoints give you access to speech synthesis and audio transcription across multiple providers, under one API."

Why this is useful

One key (OPENROUTER_API_KEY) covers chat + STT + TTS for users already on OpenRouter for LLMs.
OpenRouter handles fallback across providers automatically.
Lets users try premium audio models (openai/gpt-4o-mini-tts, google/chirp-3, etc.) without separate accounts.

STT — `tools/transcription_tools.py`

OR's transcription endpoint takes JSON body with base64-encoded audio (not OpenAI-style multipart), so it can't reuse the existing openai client path.

New _transcribe_openrouter() — JSON body with input_audio.data (base64) + format string.
Format whitelist (wav/mp3/flac/m4a/ogg/webm/aac); rejects unsupported extensions early. Aliases: .oga/.opus → ogg, .mpeg → mp3.
25 MiB size cap matching the documented limit.
Typed exception branches for Timeout / ConnectionError so transient failures surface cleanly.
_get_provider() — explicit openrouter routing + auto-detect entry that ranks below local/groq and above openai.
tools/voice_mode.py — voice_doctor now reports the openrouter / mistral / xai branches.

TTS — `tools/tts_tool.py`

OR's speech endpoint is OpenAI-shape JSON, so this provider reuses the OpenAI client with a swapped base_url + key.

New _generate_openrouter_tts() — uses the OpenAI SDK against OR's base URL, default model openai/gpt-4o-mini-tts-2025-12-15.
response_format is hard-coded to mp3 — OR only accepts mp3 / pcm. Telegram-bound .ogg outputs go through the existing ffmpeg-to-Opus path (same as Edge TTS).
BUILTIN_TTS_PROVIDERS includes openrouter so a user's tts.providers.openrouter command block can never shadow it.
PROVIDER_MAX_TEXT_LENGTH["openrouter"] = 4096 (follows the underlying OpenAI cap).
Configurable base_url via tts.openrouter.base_url or TTS_OPENROUTER_BASE_URL.

Tests

19 STT tests + 13 TTS tests = 32 new tests total. All pass.
Full suite (pytest tests/tools/test_transcription_tools.py tests/tools/test_tts_*.py): 265 passed, 7 skipped.

Live verification

Direct curl to OR's /v1/audio/transcriptions with a real .wav → correct transcript + usage metadata.
Direct curl to OR's /v1/audio/speech with model openai/gpt-4o-mini-tts-2025-12-15 → valid audio/mpeg.
_transcribe_openrouter() end-to-end → {"success": true, "transcript": "...", "provider": "openrouter"}.
_generate_openrouter_tts() end-to-end → 75 KiB MP3.
Full TTS dispatcher with HERMES_SESSION_PLATFORM=telegram → valid Opus OGG (24 kHz mono) via ffmpeg conversion, marked voice_compatible: true.

(A bug was caught during live verification: my first TTS impl asked for response_format: opus when the output path was .ogg, which OR rejects with ZodError. Fixed by always requesting mp3 and routing through the existing ffmpeg conversion path.)

Test plan

Unit tests pass (pytest tests/tools/test_transcription_tools.py tests/tools/test_tts_openrouter.py)
STT: manually verified against the live /v1/audio/transcriptions endpoint
TTS: manually verified against the live /v1/audio/speech endpoint, including the Telegram Opus path
Reviewer can verify config: set stt.provider: openrouter + tts.provider: openrouter + OPENROUTER_API_KEY and exchange a voice note

Adds `openrouter` to the STT provider list, alongside local/groq/openai/ mistral/xai. OpenRouter exposes /v1/audio/transcriptions with a JSON+base64 protocol (not OpenAI-multipart), so it needs its own client path rather than re-using the openai-compatible code. Why this is useful - One key (OPENROUTER_API_KEY) covers chat + STT, which simplifies configuration for users already on OpenRouter for LLMs. - OpenRouter routes requests across providers (Groq Whisper, OpenAI Whisper, etc.), giving fallback for free. Provider selection - Explicit: stt.provider: openrouter (or STT_PROVIDER=openrouter). - Auto-detect: chosen when local + groq are unavailable but OPENROUTER_API_KEY is set; ranks below local/groq, above openai. Implementation notes - _transcribe_openrouter() in tools/transcription_tools.py: - JSON body with base64-encoded input_audio.data + format string. - Format whitelist (wav/mp3/flac/m4a/ogg/webm/aac); rejects unsupported extensions early, with .oga/.opus → ogg and .mpeg → mp3 aliases. - 25 MiB size cap matching the documented limit. - Typed exception branches for Timeout / ConnectionError so transient failures surface cleanly instead of as tracebacks. - voice_mode.py: voice_doctor now reports the openrouter/mistral/xai branches (previously only listed local/groq/openai). - cli-config.yaml.example: documents the new provider + model + optional base_url override. Tests - 19 new tests in TestTranscribeOpenRouter / TestGetProviderOpenRouter / TestTranscribeAudioOpenRouterDispatch covering: missing key, success, whitespace/empty handling, HTTP errors, permission errors, format whitelist + aliases, JSON-not-multipart wire format, base_url override, header shape, connection errors, and provider auto-detect ranking. - pytest tests/tools/test_transcription_tools.py: 106 passed, 7 skipped.

Adds `openrouter` to the TTS provider list, alongside edge/elevenlabs/ openai/minimax/mistral/gemini/xai/neutts/kittentts/piper. Mirrors the STT provider added in the prior commit on this branch — same OPENROUTER_API_KEY covers chat + STT + TTS. OpenRouter exposes /v1/audio/speech with the OpenAI request shape ({model, voice, input, response_format, speed}), so this provider reuses the OpenAI client with a swapped base_url + key. The model slug selects the underlying provider (openai/google/mistral/...). Implementation - _generate_openrouter_tts() in tools/tts_tool.py: OpenAI-client based, default model openai/gpt-4o-mini-tts-2025-12-15 (the slug OR returns for the OpenAI route as of the May 2026 audio API launch). - response_format is hard-coded to "mp3" — OR's endpoint only accepts mp3 / pcm. Telegram-bound .ogg outputs go through the existing ffmpeg conversion path (same path Edge TTS takes). - openrouter is in BUILTIN_TTS_PROVIDERS so a user's tts.providers.openrouter command block can never shadow it. - PROVIDER_MAX_TEXT_LENGTH["openrouter"] = 4096 (follows the underlying OpenAI cap). - Configurable base_url via tts.openrouter.base_url or TTS_OPENROUTER_BASE_URL env (matches the xAI / OpenAI patterns). Tests - 13 new tests in test_tts_openrouter.py covering missing key, success, default + custom model/voice, default + config + env base_url, the always-mp3 invariant (via .ogg output_path), speed clamp + omitted default, dispatcher routing, BUILTIN_TTS_PROVIDERS membership, and PROVIDER_MAX_TEXT_LENGTH. - pytest tests/tools/test_tts_openrouter.py: 13 passed. - Full TTS + STT suite: 265 passed, 7 skipped (one collection error in test_tts_kittentts is a local-only missing-numpy issue, unrelated). Live verification - Direct curl against POST https://openrouter.ai/api/v1/audio/speech with model openai/gpt-4o-mini-tts-2025-12-15 — returns audio/mpeg. - _generate_openrouter_tts() against the live endpoint — produces a valid 75 KiB MP3. - Full dispatcher via text_to_speech_tool with HERMES_SESSION_PLATFORM= telegram — produces a valid Opus OGG (24kHz mono) via ffmpeg. Docs - website/docs/user-guide/features/tts.md: added the provider row, config block, and per-request input cap entry.

alt-glitch added type/feature New feature or request provider/openrouter OpenRouter aggregator tool/tts Text-to-speech and transcription P3 Low — cosmetic, nice to have labels May 8, 2026

saurabh changed the title ~~feat(stt): add OpenRouter as a transcription provider~~ feat(audio): add OpenRouter as a transcription + speech provider May 8, 2026

This was referenced May 14, 2026

Feature Request: Add OpenRouter as STT provider #25722

Open

feat(stt): add OpenRouter speech-to-text provider #25721

Open

feat: add OpenRouter as STT transcription provider #28848

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(audio): add OpenRouter as a transcription + speech provider#21799

feat(audio): add OpenRouter as a transcription + speech provider#21799
saurabh wants to merge 2 commits into
NousResearch:mainfrom
saurabh:feat/stt-openrouter-provider

saurabh commented May 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

saurabh commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this is useful

STT — tools/transcription_tools.py

TTS — tools/tts_tool.py

Tests

Live verification

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

saurabh commented May 8, 2026 •

edited

Loading

STT — `tools/transcription_tools.py`

TTS — `tools/tts_tool.py`