feat(stt): add OpenRouter speech-to-text provider#25721
Conversation
Adds openrouter as a seventh STT provider to transcription_tools.py. Uses the OpenRouter /api/v1/audio/transcriptions endpoint with base64-encoded JSON payloads. Supports all whisper models routed through OpenRouter (openai/whisper-1, groq/whisper-large-v3-turbo, etc.) with unified billing through existing OPENROUTER_API_KEY. Changes: - tools/transcription_tools.py: add _transcribe_openrouter(), constants, provider selection, and auto-detect fallback - hermes_cli/config.py: add openrouter to default config template - hermes_cli/web_server.py: add openrouter to stt.provider options
16 tests covering _transcribe_openrouter, _get_provider, and transcribe_audio dispatch. Follows existing xAI test patterns. Tests: key handling, success, whitespace stripping, API errors, empty transcripts, permission errors, network errors, JSON body verification, custom base URL, auto-detect priority, and model override passthrough.
liuhao1024
left a comment
There was a problem hiding this comment.
Bug: _transcribe_openrouter uses JSON body instead of multipart/form-data
The function sends audio as JSON with base64-encoded input_audio, but OpenRouter's /audio/transcriptions endpoint follows the OpenAI Whisper API format, which expects multipart/form-data with a file upload. This will likely return a 400/422 at runtime.
Every other provider in this file (OpenAI, Groq, xAI) uses multipart/form-data. Compare with the xAI implementation ~40 lines above:
# xAI (correct — multipart/form-data)
with open(file_path, "rb") as audio_file:
response = requests.post(
f"{base_url}/stt",
headers={"Authorization": f"Bearer {api_key}"},
files={"file": (Path(file_path).name, audio_file)},
data={"model": model_name},
timeout=120,
)The OpenRouter implementation should use the same pattern instead of json={"input_audio": {"data": b64, ...}}:
with open(file_path, "rb") as audio_file:
response = requests.post(
f"{base_url}/audio/transcriptions",
headers={"Authorization": f"Bearer {api_key}"},
files={"file": (Path(file_path).name, audio_file)},
data={"model": model_name},
timeout=120,
)Minor: OPENROUTER_STT_MODELS is defined but never used
The OPENROUTER_STT_MODELS set (~line 97-101) is never referenced for model validation. Either remove it or add a guard in _transcribe_openrouter that warns when the model isn't in the set (like the OpenAI provider does with GROQ_MODELS).
Summary
Adds
openrouteras STT provider via OpenRouter/api/v1/audio/transcriptions.Benchmark (68KB OGG voice message)
whisper-large-v3-turbogpt-4o-mini-transcribegoogle/chirp-3whisper-1turbo is 7x faster than local with better accuracy.
Changes
tools/transcription_tools.py— _transcribe_openrouter(), constants, dispatchhermes_cli/config.py— openrouter config sectionhermes_cli/web_server.py— "openrouter" optionTesting