fix(gateway): deliver .json and .md files sent via MEDIA: tags#33089
fix(gateway): deliver .json and .md files sent via MEDIA: tags#33089ZMGID wants to merge 3 commits into
Conversation
extract_media regex (base.py) and the two GatewayRunner tool-result MEDIA
regexes (run.py) listed common document extensions but were missing .json
and .md, so MEDIA:/path/x.{json,md} emitted by the agent was never
extracted and got delivered as raw text instead of as a native attachment.
Whitelist already contained txt/csv/docx/pdf/zip/etc., so other document
types worked; only json and md were affected.
Regression test for extract_media now accepting MEDIA:/path.{json,md}.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds support for treating .json and .md file paths as extractable MEDIA: attachments, preventing them from being forwarded as raw text.
Changes:
- Extend the
MEDIA:extension whitelist to includejsonandmd - Update two tool-related media path regexes to recognize
MEDIA:.json/.mdpaths - Add a regression test validating extraction/cleaning for
.jsonand.md
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/gateway/test_platform_base.py | Adds regression coverage for .json/.md extraction via BasePlatformAdapter.extract_media. |
| gateway/run.py | Updates two regex patterns used to detect MEDIA: file paths to allow .json/.md. |
| gateway/platforms/base.py | Extends extract_media() regex whitelist to recognize .json/.md attachments. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # and quoted/backticked paths for LLM-formatted outputs. | ||
| media_pattern = re.compile( | ||
| r'''[`"']?MEDIA:\s*(?P<path>`[^`\n]+`|"[^"\n]+"|'[^'\n]+'|(?:~/|/)\S+(?:[^\S\n]+\S+)*?\.(?:png|jpe?g|gif|webp|mp4|mov|avi|mkv|webm|ogg|opus|mp3|wav|m4a|flac|epub|pdf|zip|rar|7z|docx?|xlsx?|pptx?|txt|csv|apk|ipa)(?=[\s`"',;:)\]}]|$))[`"']?''' | ||
| r'''[`"']?MEDIA:\s*(?P<path>`[^`\n]+`|"[^"\n]+"|'[^'\n]+'|(?:~/|/)\S+(?:[^\S\n]+\S+)*?\.(?:png|jpe?g|gif|webp|mp4|mov|avi|mkv|webm|ogg|opus|mp3|wav|m4a|flac|epub|pdf|zip|rar|7z|docx?|xlsx?|pptx?|txt|csv|json|md|apk|ipa)(?=[\s`"',;:)\]}]|$))[`"']?''' |
| r'MEDIA:((?:/|~\/)\S+\.(?:png|jpe?g|gif|webp|' | ||
| r'mp4|mov|avi|mkv|webm|ogg|opus|mp3|wav|m4a|' | ||
| r'flac|epub|pdf|zip|rar|7z|docx?|xlsx?|pptx?|' | ||
| r'txt|csv|apk|ipa))', | ||
| r'txt|csv|json|md|apk|ipa))', |
|
Related: #30588 adds the same missing extensions and is itself a duplicate of #29609 which takes the preferred approach of dynamically deriving extensions from |
Same gap as json/md: .html and .htm were missing from the extract_media regex (base.py) and the two GatewayRunner tool-result MEDIA regexes (run.py), so MEDIA:/path/file.html was delivered as raw text instead of a native attachment. extract_local_files already accepted .html/.htm for the platform-reply path, so this aligns the MEDIA-tag path with it.
|
Superseded by #34844, which consolidates this cluster. This PR widens the Closing as superseded — thanks for surfacing and helping pin down this bug; it was part of getting the full fix right. See #34844. |
Summary
MEDIA:<path>tags emitted by the agent are only turned into native fileattachments if the path ends in a whitelisted extension. The whitelist
listed common document/data types (
txt,csv,docx,xlsx,pdf,zip, …) but was missing.jsonand.md. As a result, when the agentproduced a JSON or Markdown file and sent
MEDIA:/path/file.json/MEDIA:/path/file.md, the tag was never matched, so the literal text wasdelivered to the chat instead of the file. Every other document type worked,
which made this look type-specific.
The extension list appears in three places (one
extract_mediaregex plustwo
GatewayRunnertool-result MEDIA regexes);json|mdis added to allthree so behavior is consistent across the CLI/tool and platform-reply paths.
Changes
gateway/platforms/base.py— addjson|mdto theextract_mediaextension whitelistgateway/run.py— addjson|mdto the twoGatewayRunnerMEDIA tool-result regexestests/gateway/test_platform_base.py— regression test assertingMEDIA:/…/.jsonand/…/.mdare extractedReproduced live on Telegram (personal WeChat affected identically): with the
fix, freshly-produced
.json/.mdfiles are now delivered as nativeattachments. Files outside the cache allowlist and older than the recency
window are still gated by
validate_media_delivery_path, unchanged.Test plan
pytest tests/gateway/test_platform_base.py::TestExtractMedia -q→ 15 passedsend_message(MEDIA:/tmp/x.json)andMEDIA:/tmp/x.mdto a Telegram chat deliver as document attachments