feat(audio): add OpenRouter as a transcription + speech provider#21799
Open
saurabh wants to merge 2 commits into
Open
feat(audio): add OpenRouter as a transcription + speech provider#21799saurabh wants to merge 2 commits into
saurabh wants to merge 2 commits into
Conversation
Adds `openrouter` to the STT provider list, alongside local/groq/openai/
mistral/xai. OpenRouter exposes /v1/audio/transcriptions with a
JSON+base64 protocol (not OpenAI-multipart), so it needs its own client
path rather than re-using the openai-compatible code.
Why this is useful
- One key (OPENROUTER_API_KEY) covers chat + STT, which simplifies
configuration for users already on OpenRouter for LLMs.
- OpenRouter routes requests across providers (Groq Whisper, OpenAI
Whisper, etc.), giving fallback for free.
Provider selection
- Explicit: stt.provider: openrouter (or STT_PROVIDER=openrouter).
- Auto-detect: chosen when local + groq are unavailable but
OPENROUTER_API_KEY is set; ranks below local/groq, above openai.
Implementation notes
- _transcribe_openrouter() in tools/transcription_tools.py:
- JSON body with base64-encoded input_audio.data + format string.
- Format whitelist (wav/mp3/flac/m4a/ogg/webm/aac); rejects unsupported
extensions early, with .oga/.opus → ogg and .mpeg → mp3 aliases.
- 25 MiB size cap matching the documented limit.
- Typed exception branches for Timeout / ConnectionError so transient
failures surface cleanly instead of as tracebacks.
- voice_mode.py: voice_doctor now reports the openrouter/mistral/xai
branches (previously only listed local/groq/openai).
- cli-config.yaml.example: documents the new provider + model + optional
base_url override.
Tests
- 19 new tests in TestTranscribeOpenRouter / TestGetProviderOpenRouter /
TestTranscribeAudioOpenRouterDispatch covering: missing key, success,
whitespace/empty handling, HTTP errors, permission errors, format
whitelist + aliases, JSON-not-multipart wire format, base_url override,
header shape, connection errors, and provider auto-detect ranking.
- pytest tests/tools/test_transcription_tools.py: 106 passed, 7 skipped.
Adds `openrouter` to the TTS provider list, alongside edge/elevenlabs/
openai/minimax/mistral/gemini/xai/neutts/kittentts/piper. Mirrors the
STT provider added in the prior commit on this branch — same
OPENROUTER_API_KEY covers chat + STT + TTS.
OpenRouter exposes /v1/audio/speech with the OpenAI request shape
({model, voice, input, response_format, speed}), so this provider
reuses the OpenAI client with a swapped base_url + key. The model slug
selects the underlying provider (openai/google/mistral/...).
Implementation
- _generate_openrouter_tts() in tools/tts_tool.py: OpenAI-client based,
default model openai/gpt-4o-mini-tts-2025-12-15 (the slug OR returns
for the OpenAI route as of the May 2026 audio API launch).
- response_format is hard-coded to "mp3" — OR's endpoint only accepts
mp3 / pcm. Telegram-bound .ogg outputs go through the existing ffmpeg
conversion path (same path Edge TTS takes).
- openrouter is in BUILTIN_TTS_PROVIDERS so a user's
tts.providers.openrouter command block can never shadow it.
- PROVIDER_MAX_TEXT_LENGTH["openrouter"] = 4096 (follows the underlying
OpenAI cap).
- Configurable base_url via tts.openrouter.base_url or
TTS_OPENROUTER_BASE_URL env (matches the xAI / OpenAI patterns).
Tests
- 13 new tests in test_tts_openrouter.py covering missing key, success,
default + custom model/voice, default + config + env base_url, the
always-mp3 invariant (via .ogg output_path), speed clamp + omitted
default, dispatcher routing, BUILTIN_TTS_PROVIDERS membership, and
PROVIDER_MAX_TEXT_LENGTH.
- pytest tests/tools/test_tts_openrouter.py: 13 passed.
- Full TTS + STT suite: 265 passed, 7 skipped (one collection error in
test_tts_kittentts is a local-only missing-numpy issue, unrelated).
Live verification
- Direct curl against POST https://openrouter.ai/api/v1/audio/speech
with model openai/gpt-4o-mini-tts-2025-12-15 — returns audio/mpeg.
- _generate_openrouter_tts() against the live endpoint — produces a
valid 75 KiB MP3.
- Full dispatcher via text_to_speech_tool with HERMES_SESSION_PLATFORM=
telegram — produces a valid Opus OGG (24kHz mono) via ffmpeg.
Docs
- website/docs/user-guide/features/tts.md: added the provider row,
config block, and per-request input cap entry.
This was referenced May 14, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
openrouteras a unified provider for both transcription (STT) and speech synthesis (TTS), alongside the existing local/groq/openai/mistral/xai (STT) and edge/elevenlabs/openai/minimax/mistral/gemini/xai/neutts/kittentts/piper (TTS) sets.OpenRouter announced their audio APIs on May 1, 2026, exposing both
/v1/audio/transcriptionsand/v1/audio/speechthat route across providers (OpenAI Whisper, GPT-4o-transcribe, Google Chirp 3, Groq Whisper for STT; OpenAI TTS, Google Gemini Flash TTS, Mistral Voxtral Mini TTS for speech) under one API key.Why this is useful
OPENROUTER_API_KEY) covers chat + STT + TTS for users already on OpenRouter for LLMs.openai/gpt-4o-mini-tts,google/chirp-3, etc.) without separate accounts.STT —
tools/transcription_tools.pyOR's transcription endpoint takes JSON body with base64-encoded audio (not OpenAI-style multipart), so it can't reuse the existing
openaiclient path._transcribe_openrouter()— JSON body withinput_audio.data(base64) +formatstring.wav/mp3/flac/m4a/ogg/webm/aac); rejects unsupported extensions early. Aliases:.oga/.opus→ogg,.mpeg→mp3.Timeout/ConnectionErrorso transient failures surface cleanly._get_provider()— explicitopenrouterrouting + auto-detect entry that ranks belowlocal/groqand aboveopenai.tools/voice_mode.py—voice_doctornow reports the openrouter / mistral / xai branches.TTS —
tools/tts_tool.pyOR's speech endpoint is OpenAI-shape JSON, so this provider reuses the OpenAI client with a swapped base_url + key.
_generate_openrouter_tts()— uses the OpenAI SDK against OR's base URL, default modelopenai/gpt-4o-mini-tts-2025-12-15.response_formatis hard-coded tomp3— OR only acceptsmp3/pcm. Telegram-bound.oggoutputs go through the existing ffmpeg-to-Opus path (same as Edge TTS).BUILTIN_TTS_PROVIDERSincludesopenrouterso a user'stts.providers.openroutercommand block can never shadow it.PROVIDER_MAX_TEXT_LENGTH["openrouter"] = 4096(follows the underlying OpenAI cap).base_urlviatts.openrouter.base_urlorTTS_OPENROUTER_BASE_URL.Tests
pytest tests/tools/test_transcription_tools.py tests/tools/test_tts_*.py): 265 passed, 7 skipped.Live verification
curlto OR's/v1/audio/transcriptionswith a real.wav→ correct transcript + usage metadata.curlto OR's/v1/audio/speechwith modelopenai/gpt-4o-mini-tts-2025-12-15→ valid audio/mpeg._transcribe_openrouter()end-to-end →{"success": true, "transcript": "...", "provider": "openrouter"}._generate_openrouter_tts()end-to-end → 75 KiB MP3.HERMES_SESSION_PLATFORM=telegram→ valid Opus OGG (24 kHz mono) via ffmpeg conversion, markedvoice_compatible: true.(A bug was caught during live verification: my first TTS impl asked for
response_format: opuswhen the output path was.ogg, which OR rejects withZodError. Fixed by always requestingmp3and routing through the existing ffmpeg conversion path.)Test plan
pytest tests/tools/test_transcription_tools.py tests/tools/test_tts_openrouter.py)/v1/audio/transcriptionsendpoint/v1/audio/speechendpoint, including the Telegram Opus pathstt.provider: openrouter+tts.provider: openrouter+OPENROUTER_API_KEYand exchange a voice note