Skip to content

[Bug]: Unnecessary MP3→OGG conversion in Edge TTS breaks local CLI playback on macOS #26404

@jagaliano

Description

@jagaliano

Bug Description

When Hermes uses Edge TTS, audio is generated natively as MP3 but is then unconditionally converted to OGG/Opus after generation. On macOS CLI this causes broken playback via afplay — the returned file is .ogg instead of .mp3, and OGG/Opus output sounds cut off or distorted locally.

Key facts:

  • Edge TTS already outputs MP3 directly (await communicate.save(output_path)).
  • Even when passing an explicit .mp3 output path, Hermes returns an .ogg file.
  • The OGG conversion is only needed for Telegram voice-message delivery, but it fires for all platforms.
  • WAV/MP3 playback works reliably on macOS; OGG does not.

Steps to Reproduce

  1. Set tts.provider: edge in config.yaml.
  2. Call text_to_speech in CLI mode with any text (e.g., "Hola, cómo estás").
  3. Optionally pass an explicit .mp3 output path.
  4. Observe the returned file_path ends in .ogg, not .mp3.
  5. Play with afplay: afplay /path/to/file.ogg — audio is cut off or broken.

Expected Behavior

Hermes should preserve the native MP3 output for CLI/local mode and only convert to OGG when delivering to Telegram (or another platform requiring Opus).

Actual Behavior

MP3→OGG conversion happens unconditionally after generation, breaking local playback.

Affected Component

CLI (interactive chat), Other

Messaging Platform (if gateway-related)

N/A (CLI only)

Debug Report

Report       https://paste.rs/83g8S
agent.log    https://paste.rs/jFcWk
gateway.log  https://paste.rs/MLCE3

Operating System

macOS 15.7.7

Python Version

3.14.5

Hermes Version

Hermes Agent v0.13.0 (2026.5.7)

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

In tools/tts_tool.py, the post-generation conversion block (~line 1843) converts Edge/NeuTTS/MiniMax/xAI/KittenTTS/Piper output to OGG regardless of whether the platform needs Opus:

elif provider in {"edge", "neutts", "minimax", "xai", "kittentts", "piper"} and not file_str.endswith(".ogg"):
    opus_path = _convert_to_opus(file_str)
    if opus_path:
        file_str = opus_path
        voice_compatible = True

This should be gated by want_opus (which is already computed from platform == "telegram" earlier in the function) so conversion only happens when Telegram delivery is active.

Proposed Fix (optional)

No response

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/cliCLI entry point, hermes_cli/, setup wizardtool/ttsText-to-speech and transcriptiontype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions