Skip to content

fix(tools): gate OGG conversion on platform and strip markdown from media paths#11457

Open
nsyring wants to merge 2 commits into
NousResearch:mainfrom
nsyring:fix/tts-voice-delivery
Open

fix(tools): gate OGG conversion on platform and strip markdown from media paths#11457
nsyring wants to merge 2 commits into
NousResearch:mainfrom
nsyring:fix/tts-voice-delivery

Conversation

@nsyring

@nsyring nsyring commented Apr 17, 2026

Copy link
Copy Markdown

What does this PR do?

Fixes two TTS delivery issues:

  1. OGG/Opus conversion was applied to all platforms — only Telegram requires Opus for voice bubbles. Other platforms (Discord, Nextcloud Talk, etc.) work better with the original MP3/WAV. The fix adds a want_opus flag derived from the platform.

  2. Markdown artifacts in MEDIA paths — some models (e.g., Mistral) wrap MEDIA tags in bold/italic markdown (**MEDIA:/tmp/file.mp3**). The fix strips * and _ from media path extraction.

Related Issue

No existing issue.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)

Changes Made

  • tools/tts_tool.py: Add want_opus flag gated on platform == "telegram", applied at all three OGG conversion decision points
  • gateway/platforms/base.py: Add *_ to lstrip/rstrip in extract_media path parsing
  • tests/gateway/test_send_image_file.py: 4 tests for markdown artifact stripping (bold, italic, underscore, mixed)

How to Test

  1. Generate TTS on a non-Telegram platform (e.g., Nextcloud Talk)
  2. Audio should be delivered as MP3/WAV, not converted to OGG
  3. Send a message with **MEDIA:/tmp/test.mp3** — path should extract cleanly
  4. Run pytest tests/gateway/test_send_image_file.py -v — all tests pass

Checklist

  • I have read the Contributing Guide
  • My commits follow the Conventional Commits format
  • I have searched for existing PRs to avoid duplicates
  • This PR contains only related changes
  • pytest tests/ -q passes
  • I have added tests for my changes
  • I have tested on: Debian 13 (LXC, amd64)
  • N/A — no new docs needed
  • N/A — no config key changes
  • N/A — no architecture changes
  • Cross-platform: this IS a cross-platform fix
  • N/A — no tool schema changes

@nsyring nsyring force-pushed the fix/tts-voice-delivery branch 4 times, most recently from f91c10a to 0e4f75a Compare April 21, 2026 09:29
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists tool/tts Text-to-speech and transcription comp/gateway Gateway runner, session dispatch, delivery labels Apr 24, 2026
@nsyring nsyring force-pushed the fix/tts-voice-delivery branch 7 times, most recently from 492ef8c to 774a574 Compare May 1, 2026 06:18
@nsyring nsyring force-pushed the fix/tts-voice-delivery branch 9 times, most recently from 5139e1d to 4d6fad3 Compare May 9, 2026 05:47
@nsyring nsyring force-pushed the fix/tts-voice-delivery branch 6 times, most recently from a504be9 to d9d9cfb Compare May 14, 2026 06:20
@nsyring nsyring force-pushed the fix/tts-voice-delivery branch 10 times, most recently from 92f23e0 to f175194 Compare May 30, 2026 06:19
@nsyring nsyring force-pushed the fix/tts-voice-delivery branch 12 times, most recently from a842a2c to 9aa8bd4 Compare June 6, 2026 21:20
@nsyring nsyring force-pushed the fix/tts-voice-delivery branch 6 times, most recently from a279e45 to 537eff0 Compare June 13, 2026 06:20
nsyring added 2 commits June 14, 2026 06:21
…edia paths

- Only convert MP3 to OGG when the target platform wants Opus (Telegram).
  Other platforms (e.g. Nextcloud Talk) need the original MP3 for proper
  voice-message rendering.
- Strip *_ markdown artifacts from MEDIA tag paths. Some LLMs wrap
  MEDIA tags in bold (**MEDIA:path**) causing path extraction to include
  trailing asterisks.
The agent-callable text_to_speech_tool() did not strip markdown before
provider dispatch. Result: Edge TTS (and other providers) verbalized raw
markdown artifacts like *bold* ("asterisk asterisk Bold asterisk asterisk"),
# headers ("hash hash Summary"), and `code` ("backtick code backtick").

Two other TTS call sites already strip markdown:
- gateway/run.py:_send_voice_reply via _strip_markdown_for_tts()
- gateway/platforms/base.py Auto-TTS via re.sub regex

This brings the third call site (the model-callable tool) in line with
them. Strip happens after empty-check and before max_len truncation, so
the per-provider character budget applies to spoken length, not raw
markdown length.

Command-providers can opt out via tts.providers.<name>.skip_markdown_strip
for SSML-aware CLIs that need raw markup passed through.

Tests: tests/tools/test_tts_markdown_strip.py — 7 cases covering bold,
headers, inline code, list markers, truncation interaction, and the
skip-opt-out flag for command providers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists tool/tts Text-to-speech and transcription type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants