fix(tools): gate OGG conversion on platform and strip markdown from media paths#11457
Open
nsyring wants to merge 2 commits into
Open
fix(tools): gate OGG conversion on platform and strip markdown from media paths#11457nsyring wants to merge 2 commits into
nsyring wants to merge 2 commits into
Conversation
f91c10a to
0e4f75a
Compare
492ef8c to
774a574
Compare
5139e1d to
4d6fad3
Compare
a504be9 to
d9d9cfb
Compare
92f23e0 to
f175194
Compare
a842a2c to
9aa8bd4
Compare
a279e45 to
537eff0
Compare
…edia paths - Only convert MP3 to OGG when the target platform wants Opus (Telegram). Other platforms (e.g. Nextcloud Talk) need the original MP3 for proper voice-message rendering. - Strip *_ markdown artifacts from MEDIA tag paths. Some LLMs wrap MEDIA tags in bold (**MEDIA:path**) causing path extraction to include trailing asterisks.
The agent-callable text_to_speech_tool() did not strip markdown before
provider dispatch. Result: Edge TTS (and other providers) verbalized raw
markdown artifacts like *bold* ("asterisk asterisk Bold asterisk asterisk"),
# headers ("hash hash Summary"), and `code` ("backtick code backtick").
Two other TTS call sites already strip markdown:
- gateway/run.py:_send_voice_reply via _strip_markdown_for_tts()
- gateway/platforms/base.py Auto-TTS via re.sub regex
This brings the third call site (the model-callable tool) in line with
them. Strip happens after empty-check and before max_len truncation, so
the per-provider character budget applies to spoken length, not raw
markdown length.
Command-providers can opt out via tts.providers.<name>.skip_markdown_strip
for SSML-aware CLIs that need raw markup passed through.
Tests: tests/tools/test_tts_markdown_strip.py — 7 cases covering bold,
headers, inline code, list markers, truncation interaction, and the
skip-opt-out flag for command providers.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes two TTS delivery issues:
OGG/Opus conversion was applied to all platforms — only Telegram requires Opus for voice bubbles. Other platforms (Discord, Nextcloud Talk, etc.) work better with the original MP3/WAV. The fix adds a
want_opusflag derived from the platform.Markdown artifacts in MEDIA paths — some models (e.g., Mistral) wrap MEDIA tags in bold/italic markdown (
**MEDIA:/tmp/file.mp3**). The fix strips*and_from media path extraction.Related Issue
No existing issue.
Type of Change
Changes Made
tools/tts_tool.py: Addwant_opusflag gated onplatform == "telegram", applied at all three OGG conversion decision pointsgateway/platforms/base.py: Add*_to lstrip/rstrip inextract_mediapath parsingtests/gateway/test_send_image_file.py: 4 tests for markdown artifact stripping (bold, italic, underscore, mixed)How to Test
**MEDIA:/tmp/test.mp3**— path should extract cleanlypytest tests/gateway/test_send_image_file.py -v— all tests passChecklist
pytest tests/ -qpasses