fix(tts): rename .ogg->mp3 before opus conversion for mp3-only providers (edge-tts, minimax, xai)#20882
Conversation
The gateway's _send_voice_reply() hardcoded .mp3 as the output path extension, which caused ElevenLabs and OpenAI TTS to output mp3 format even on Telegram. Telegram requires Opus/OGG for native voice bubbles — mp3 files are sent as audio file attachments instead. Now detects the platform from session context and uses .ogg for Telegram, .mp3 for everything else. The TTS tool already checks the extension to select the appropriate codec (opus_48000_64 vs mp3_44100_128).
…oviders When _send_voice_reply() passes a .ogg path (new in PR NousResearch#20878) to text_to_speech_tool, mp3-only providers like edge-tts write raw MP3 bytes into the .ogg-named file. The pre-existing opus-conversion guard elif provider in (edge, ...) and not file_str.endswith(.ogg): evaluated to False (path ends in .ogg), so _convert_to_opus was skipped, leaving a .ogg file containing MP3 bytes. Telegram then received a corrupted audio file that couldn't play. Fix: remove the .ogg guard; instead rename the mislabeled file to .mp3 before calling _convert_to_opus, then clean up the intermediate .mp3. Non-Telegram paths (file_str ends in .mp3) are unaffected — the rename block is only reached when the caller explicitly requested .ogg.
|
I dug into this locally with Telegram auto-TTS and I think there’s a slightly cleaner fix than forcing Current root causes I found:
I tested this by adding regression coverage:
Suggested approach:
This avoids hardcoding I can open a PR with the two regression tests and the small implementation change if that would be useful. |
What this fixes\n\nPR #20878 (
fix(tts): use .ogg extension for Telegram auto-TTS voice replies) introduces platform-aware extension selection in_send_voice_reply(). The change is correct for ElevenLabs/OpenAI/Mistral/Gemini — those providers honour the.oggextension and output native Opus.\n\nBug introduced by #20878 (mp3-only providers — edge-tts, minimax, xai, neutts, kittentts, piper):\n\n_generate_edge_tts()(and the other mp3-only providers) callcommunicate.save(output_path)unconditionally — they write raw MP3 bytes regardless of the path extension. After generation the existing opus-conversion block was:\n\npython\nelif provider in ("edge", "neutts", ...) and not file_str.endswith(".ogg"):\n opus_path = _convert_to_opus(file_str)\n\n\nWith a.oggpath the guardnot file_str.endswith(".ogg")evaluates toFalse, so_convert_to_opusis skipped entirely. The result is a.oggfile containing MP3 bytes — Telegram rejects it and the voice bubble silently fails.\n\n## Fix\n\nBefore calling_convert_to_opus, rename the mislabeled file from.ogg→.mp3so ffmpeg receives a correctly-named source. The intermediate.mp3is cleaned up immediately after conversion. The original.mp3path is unaffected (rename branch only reached when caller requested.ogg).\n\n## Verification\n\n- Non-Telegram path (.mp3): rename block not entered, no behaviour change.\n- Telegram + ElevenLabs/OpenAI:voice_compatiblebranch untouched, still correct.\n- Telegram + edge-tts:.oggrenamed to.mp3,_convert_to_opusproduces real Opus,voice_compatible = True, Telegram renders voice bubble.\n\nCompanion fix to #20878 — handles the mp3-only provider edge case that #20878 leaves broken.\n