Skip to content

fix: add missing extensions (.md, .yaml, .json, etc.) to extract_media() MEDIA: regex#30588

Closed
bunnyfu wants to merge 1 commit into
NousResearch:mainfrom
bunnyfu:fix/extract-media-missing-extensions
Closed

fix: add missing extensions (.md, .yaml, .json, etc.) to extract_media() MEDIA: regex#30588
bunnyfu wants to merge 1 commit into
NousResearch:mainfrom
bunnyfu:fix/extract-media-missing-extensions

Conversation

@bunnyfu

@bunnyfu bunnyfu commented May 22, 2026

Copy link
Copy Markdown

Problem

extract_media() in gateway/platforms/base.py uses a regex to parse MEDIA:<path> tags from agent responses. The regex has a hardcoded extension whitelist:

png|jpe?g|gif|webp|mp4|mov|avi|mkv|webm|ogg|opus|mp3|wav|m4a|flac|epub|pdf|zip|rar|7z|docx?|xlsx?|pptx?|txt|csv|apk|ipa

Several common document/data formats are missing — most notably .md, .yaml/.yml, .json, .xml, .html/.htm, .odt, .rtf, .ods, and .tsv.

This causes MEDIA:/path/to/file.md to be silently ignored: the tag is never parsed, no upload is attempted, and the raw MEDIA: tag may leak into the user-visible message.

Notably, the sister method extract_local_files() in the same file already includes all of these extensions in its _LOCAL_MEDIA_EXTS tuple (line ~2204). The two lists should be consistent.

Impact

  • .md file attachments fail on Matrix (and all other platforms) when sent via send_message tool with MEDIA: tag
  • Same issue for .yaml, .json, .xml, .html, and other common data/doc formats
  • extract_local_files() (bare path detection in gateway responses) works fine for these extensions — only the explicit MEDIA: tag path is broken

Fix

Added the missing extensions to the extract_media() regex to align with extract_local_files():

md|ya?ml|json|xml|html?|odt|rtf|ods|tsv

Testing

Verified the regex change matches MEDIA:/tmp/report.md, MEDIA:/tmp/data.yaml, MEDIA:/tmp/config.json, etc. where previously they were silently dropped.

…a() MEDIA: regex

The MEDIA:<path> tag regex in extract_media() was missing several common
document/data extensions that extract_local_files() already recognizes:
.md, .yaml/.yml, .json, .xml, .html/.htm, .odt, .rtf, .ods, .tsv

This caused MEDIA:/path/to/file.md (and similar) to be silently ignored
— the tag was never parsed, so no file upload was attempted. The file
content was either lost or the raw MEDIA: tag leaked into the message.

The fix aligns extract_media()'s extension list with extract_local_files()
so both methods support the same set of document types.
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery labels May 22, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #29609, which takes the preferred approach of dynamically deriving the extension set from SUPPORTED_DOCUMENT_TYPES rather than hardcoding additions. Also overlaps with #22492.

@teknium1

Copy link
Copy Markdown
Contributor

Superseded by #34844, which consolidates this cluster.

This PR widens the extract_media extension allowlist, which is the right direction — but on its own it leaves the unconditional MEDIA:\s*\S+ strip in place, so a MEDIA: tag with any extension still outside the (now wider) list keeps getting deleted from the body before extract_local_files can pick up the bare path. #34844 fixes both halves: it unifies the two extractors onto a single shared extension set (MEDIA_DELIVERY_EXTS) AND replaces the loose strip with an extension-anchored one, so an unknown-extension path survives in the text instead of vanishing.

Closing as superseded — thanks for surfacing and helping pin down this bug; it was part of getting the full fix right. See #34844.

@teknium1 teknium1 closed this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants