fix(tts): rename .ogg->mp3 before opus conversion for mp3-only providers (edge-tts, minimax, xai) by Warrenpoobear · Pull Request #20882 · NousResearch/hermes-agent

Warrenpoobear · 2026-05-06T19:56:35Z

What this fixes\n\nPR #20878 (`fix(tts): use .ogg extension for Telegram auto-TTS voice replies`) introduces platform-aware extension selection in `_send_voice_reply()`. The change is correct for ElevenLabs/OpenAI/Mistral/Gemini — those providers honour the `.ogg` extension and output native Opus.\n\nBug introduced by #20878 (mp3-only providers — edge-tts, minimax, xai, neutts, kittentts, piper):\n\n`_generate_edge_tts()` (and the other mp3-only providers) call `communicate.save(output_path)` unconditionally — they write raw MP3 bytes regardless of the path extension. After generation the existing opus-conversion block was:\n\n`python\nelif provider in ("edge", "neutts", ...) and not file_str.endswith(".ogg"):\n opus_path = _convert_to_opus(file_str)\n`\n\nWith a `.ogg` path the guard `not file_str.endswith(".ogg")` evaluates to `False`, so `_convert_to_opus` is skipped entirely. The result is a `.ogg` file containing MP3 bytes — Telegram rejects it and the voice bubble silently fails.\n\n## Fix\n\nBefore calling `_convert_to_opus`, rename the mislabeled file from `.ogg` → `.mp3` so ffmpeg receives a correctly-named source. The intermediate `.mp3` is cleaned up immediately after conversion. The original `.mp3` path is unaffected (rename branch only reached when caller requested `.ogg`).\n\n## Verification\n\n- Non-Telegram path (`.mp3`): rename block not entered, no behaviour change.\n- Telegram + ElevenLabs/OpenAI: `voice_compatible` branch untouched, still correct.\n- Telegram + edge-tts: `.ogg` renamed to `.mp3`, `_convert_to_opus` produces real Opus, `voice_compatible = True`, Telegram renders voice bubble.\n\nCompanion fix to #20878 — handles the mp3-only provider edge case that #20878 leaves broken.\n

The gateway's _send_voice_reply() hardcoded .mp3 as the output path extension, which caused ElevenLabs and OpenAI TTS to output mp3 format even on Telegram. Telegram requires Opus/OGG for native voice bubbles — mp3 files are sent as audio file attachments instead. Now detects the platform from session context and uses .ogg for Telegram, .mp3 for everything else. The TTS tool already checks the extension to select the appropriate codec (opus_48000_64 vs mp3_44100_128).

…oviders When _send_voice_reply() passes a .ogg path (new in PR NousResearch#20878) to text_to_speech_tool, mp3-only providers like edge-tts write raw MP3 bytes into the .ogg-named file. The pre-existing opus-conversion guard elif provider in (edge, ...) and not file_str.endswith(.ogg): evaluated to False (path ends in .ogg), so _convert_to_opus was skipped, leaving a .ogg file containing MP3 bytes. Telegram then received a corrupted audio file that couldn't play. Fix: remove the .ogg guard; instead rename the mislabeled file to .mp3 before calling _convert_to_opus, then clean up the intermediate .mp3. Non-Telegram paths (file_str ends in .mp3) are unaffected — the rename block is only reached when the caller explicitly requested .ogg.

achhabra2 · 2026-05-08T21:48:13Z

I dug into this locally with Telegram auto-TTS and I think there’s a slightly cleaner fix than forcing .ogg from the gateway caller.

Current root causes I found:

GatewayRunner._send_voice_reply() forces output_path=...mp3, so text_to_speech_tool() cannot use its Telegram-aware default output selection. This causes Telegram replies to be delivered as MP3/audio attachments instead of native voice bubbles.
There is a second path in BasePlatformAdapter._process_message_background() where auto-TTS runs after GatewayRunner has already cleared the session context. In that path, HERMES_SESSION_PLATFORM / get_session_env("HERMES_SESSION_PLATFORM") is blank, so the TTS tool again defaults to MP3 even though the source platform is Telegram.

I tested this by adding regression coverage:

against origin/main, the tests fail:
- _send_voice_reply() passes an explicit .mp3 output path
- base adapter auto-TTS sees platform "" instead of "telegram"
with the fix, they pass.

Suggested approach:

In GatewayRunner._send_voice_reply(), do not pass an explicit output_path; call:
```
text_to_speech_tool(text=tts_text)
```
so the TTS tool can choose .ogg/Opus for Telegram-capable providers and keep normal defaults elsewhere.
In BasePlatformAdapter._process_message_background(), re-establish session context around the text_to_speech_tool() call using set_session_vars(...) / clear_session_vars(...), because this auto-TTS path runs after the runner’s handler context has been cleared.

This avoids hardcoding .ogg at the gateway layer and should also avoid the mp3-only-provider edge case that this PR is handling: the TTS tool remains responsible for provider-specific behavior and Opus conversion.

I can open a PR with the two regression tests and the small implementation change if that would be useful.

tarekskr and others added 2 commits May 6, 2026 23:32

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery tool/tts Text-to-speech and transcription platform/telegram Telegram bot adapter labels May 6, 2026

This was referenced May 15, 2026

Telegram: TTS voice bubbles not delivered despite valid OGG Opus + #26355

Open

fix(tts): preserve native audio outside Telegram voice delivery #26406

Closed

andrepia mentioned this pull request May 18, 2026

[Bug]: Telegram auto voice replies can fall back to MP3 attachments and silent final notifications #27970

Closed

1 task

Lbatson mentioned this pull request May 29, 2026

fix(gateway): bind session context for auto tts #34779

Closed

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tts): rename .ogg->mp3 before opus conversion for mp3-only providers (edge-tts, minimax, xai)#20882

fix(tts): rename .ogg->mp3 before opus conversion for mp3-only providers (edge-tts, minimax, xai)#20882
Warrenpoobear wants to merge 2 commits into
NousResearch:mainfrom
Warrenpoobear:fix/pr-20878-edge-tts-ogg-path

Warrenpoobear commented May 6, 2026

Uh oh!

achhabra2 commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Warrenpoobear commented May 6, 2026

Uh oh!

achhabra2 commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants