fix: add missing extensions (.md, .yaml, .json, etc.) to extract_media() MEDIA: regex#30588
Closed
bunnyfu wants to merge 1 commit into
Closed
fix: add missing extensions (.md, .yaml, .json, etc.) to extract_media() MEDIA: regex#30588bunnyfu wants to merge 1 commit into
bunnyfu wants to merge 1 commit into
Conversation
…a() MEDIA: regex The MEDIA:<path> tag regex in extract_media() was missing several common document/data extensions that extract_local_files() already recognizes: .md, .yaml/.yml, .json, .xml, .html/.htm, .odt, .rtf, .ods, .tsv This caused MEDIA:/path/to/file.md (and similar) to be silently ignored — the tag was never parsed, so no file upload was attempted. The file content was either lost or the raw MEDIA: tag leaked into the message. The fix aligns extract_media()'s extension list with extract_local_files() so both methods support the same set of document types.
Collaborator
This was referenced May 23, 2026
Closed
Contributor
|
Superseded by #34844, which consolidates this cluster. This PR widens the Closing as superseded — thanks for surfacing and helping pin down this bug; it was part of getting the full fix right. See #34844. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
extract_media()ingateway/platforms/base.pyuses a regex to parseMEDIA:<path>tags from agent responses. The regex has a hardcoded extension whitelist:Several common document/data formats are missing — most notably
.md,.yaml/.yml,.json,.xml,.html/.htm,.odt,.rtf,.ods, and.tsv.This causes
MEDIA:/path/to/file.mdto be silently ignored: the tag is never parsed, no upload is attempted, and the rawMEDIA:tag may leak into the user-visible message.Notably, the sister method
extract_local_files()in the same file already includes all of these extensions in its_LOCAL_MEDIA_EXTStuple (line ~2204). The two lists should be consistent.Impact
.mdfile attachments fail on Matrix (and all other platforms) when sent viasend_messagetool withMEDIA:tag.yaml,.json,.xml,.html, and other common data/doc formatsextract_local_files()(bare path detection in gateway responses) works fine for these extensions — only the explicitMEDIA:tag path is brokenFix
Added the missing extensions to the
extract_media()regex to align withextract_local_files():Testing
Verified the regex change matches
MEDIA:/tmp/report.md,MEDIA:/tmp/data.yaml,MEDIA:/tmp/config.json, etc. where previously they were silently dropped.