fix(gateway): accept text/config extensions in MEDIA tag regex#32751
Closed
briandevans wants to merge 1 commit into
Closed
fix(gateway): accept text/config extensions in MEDIA tag regex#32751briandevans wants to merge 1 commit into
briandevans wants to merge 1 commit into
Conversation
`BasePlatformAdapter.extract_media()` uses a regex whitelist of file extensions to recognize `MEDIA:<path>` tags emitted by the model. The whitelist covers images, video, audio, archives and office documents but omits common text/config file types (`.md`, `.json`, `.yaml`, `.yml`, `.toml`, `.log`). For those, the regex does not match so the MEDIA tag survives into the cleaned message body and is rendered as raw text on every platform (WeChat, Feishu, Slack, Telegram, ...) instead of being routed to the file-attachment dispatch path. Add `md|json|ya?ml|toml|log` to the extension alternation. The existing wrapping (`(?:~/|/)\S+...`, quote/backtick stripping, trailing-delimiter lookahead) all still apply. Parametrized tests cover the six new extensions, including the standard wrapping that already worked for `.png` etc. Fixes Bug 1 of NousResearch#32601. Bug 2 (session-expiry retry path inside `send_weixin_direct()`) lives in `gateway/platforms/weixin.py` and is a distinct fix; left for a follow-up.
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds support for attaching additional text/config file types via MEDIA: tags in the platform adapter.
Changes:
- Extend
BasePlatformAdapter.extract_mediaMEDIA:regex to include additional extensions (md,json,yaml/yml,toml,log). - Add a parametrized test to ensure these extensions are parsed into media attachments and removed from the cleaned message body.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| tests/gateway/test_platform_base.py | Adds coverage for extracting MEDIA: paths with new text/config extensions. |
| gateway/platforms/base.py | Expands the MEDIA: extraction regex to recognize additional text/config extensions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
2415
to
2417
| media_pattern = re.compile( | ||
| r'''[`"']?MEDIA:\s*(?P<path>`[^`\n]+`|"[^"\n]+"|'[^'\n]+'|(?:~/|/)\S+(?:[^\S\n]+\S+)*?\.(?:png|jpe?g|gif|webp|mp4|mov|avi|mkv|webm|ogg|opus|mp3|wav|m4a|flac|epub|pdf|zip|rar|7z|docx?|xlsx?|pptx?|txt|csv|apk|ipa)(?=[\s`"',;:)\]}]|$))[`"']?''' | ||
| r'''[`"']?MEDIA:\s*(?P<path>`[^`\n]+`|"[^"\n]+"|'[^'\n]+'|(?:~/|/)\S+(?:[^\S\n]+\S+)*?\.(?:png|jpe?g|gif|webp|mp4|mov|avi|mkv|webm|ogg|opus|mp3|wav|m4a|flac|epub|pdf|zip|rar|7z|docx?|xlsx?|pptx?|txt|csv|md|json|ya?ml|toml|log|apk|ipa)(?=[\s`"',;:)\]}]|$))[`"']?''' | ||
| ) |
Collaborator
Contributor
Author
|
Closing — superseded by @hanhan-tg's #32604, which is the better fix here.
Thanks @hanhan-tg — your PR is the right one to land. Closing this to keep the queue clean. |
18 tasks
hanhan-tg
pushed a commit
to hanhan-tg/hermes-agent
that referenced
this pull request
May 27, 2026
…sions Cover each new extension (md, json, yaml, yml, toml, log) with a parametrized test verifying extract_media correctly parses the MEDIA: tag and strips it from the cleaned content. Refs: NousResearch#32601, credit to @briandevans for the test template from NousResearch#32751.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
BasePlatformAdapter.extract_media()recognizesMEDIA:<path>tags via a regex extension whitelist. The whitelist covers media/archive/office formats but omits common text and config types (.md,.json,.yaml,.yml,.toml,.log). Today, when a model emitsMEDIA:/tmp/notes.md, the regex misses it and the rawMEDIA:text survives into the cleaned body, so WeChat/Feishu/Telegram/etc. all render the tag literally instead of routing the file through their document-upload paths.This PR adds
md|json|ya?ml|toml|logto the regex alternation. All other parts of the pattern (path wrapping, quote/backtick stripping, trailing-delimiter lookahead,[[audio_as_voice]]/[[as_document]]directives) are unchanged.Related Issue
Fixes #32601 (Bug 1 only — see Positioning below).
Type of Change
Changes Made
gateway/platforms/base.py— extend the MEDIA tag regex alternation withmd|json|ya?ml|toml|log(single-line change).tests/gateway/test_platform_base.py— add a parametrized test (`TestExtractMedia.test_media_tag_accepts_text_config_extensions`) covering each new extension.How to Test
Without the fix,
BasePlatformAdapter.extract_media('MEDIA:/tmp/notes.md')returns([], 'MEDIA:/tmp/notes.md')— the tag leaks as text.With the fix, it returns
([('/tmp/notes.md', False)], '').```bash
uv run --with pytest --with pytest-xdist --with pytest-asyncio python3 -m pytest tests/gateway/test_platform_base.py -v -k TestExtractMedia
```
20 passed (6 new parametrized cases for md/json/yaml/yml/toml/log + 14 existing).
Adjacent suites (`tests/gateway/test_media_extraction.py`, `test_weixin.py`, `test_wecom.py`, `test_feishu.py`, `test_dingtalk.py`) also pass: 326 passed, 45 skipped.
Checklist
Code
Documentation & Housekeeping
Related / Positioning
Issue #32601 lists two bugs:
Keeping the two as separate PRs lets the regex change land independently of any WeChat session/retry semantics review. Happy to widen this PR to cover Bug 2 if preferred, or leave Bug 2 to another contributor.
For New Skills
N/A.