Skip to content

fix(gateway): unify media extension list between extract_media and extract_local_files#31150

Closed
ilonagaja509-glitch wants to merge 1 commit into
NousResearch:mainfrom
ilonagaja509-glitch:fix/media-extract-extensions
Closed

fix(gateway): unify media extension list between extract_media and extract_local_files#31150
ilonagaja509-glitch wants to merge 1 commit into
NousResearch:mainfrom
ilonagaja509-glitch:fix/media-extract-extensions

Conversation

@ilonagaja509-glitch

Copy link
Copy Markdown
Contributor

The media_pattern regex in extract_media() had a hardcoded extension list that drifted from the _LOCAL_MEDIA_EXTS tuple in extract_local_files(). Extensions like .html, .md, .json, .svg, .tar, .gz were missing from the MEDIA: directive regex, causing silent data loss when agents emitted explicit MEDIA:/path/file.html tags.

Fix by extracting the extension list to a module-level _MEDIA_EXTS constant and using it in both places. The regex is now built dynamically from the same source of truth.

Fixes #31137

Checklist

  • I have read the CONTRIBUTING.md guidelines
  • Code compiles without warnings
  • Tests pass locally
  • New tests added for the fix

…tract_local_files

The media_pattern regex in extract_media() had a hardcoded extension list
that drifted from the _LOCAL_MEDIA_EXTS tuple in extract_local_files().
Extensions like .html, .md, .json, .svg, .tar, .gz were missing from the
MEDIA: directive regex, causing silent data loss when agents emitted
explicit MEDIA:/path/file.html tags.

Fix by extracting the extension list to a module-level _MEDIA_EXTS constant
and using it in both places. The regex is now built dynamically from the
same source of truth.

Fixes NousResearch#31137
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery duplicate This issue or pull request already exists labels May 23, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #31138 (same fix, same files). Both part of a saturated cluster: #29609 (dynamic allowlist sync from SUPPORTED_DOCUMENT_TYPES, preferred approach), #22492, #30588. All address extract_media() regex/whitelist drift from _LOCAL_MEDIA_EXTS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery duplicate This issue or pull request already exists P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(gateway): MEDIA: directive silently drops .html and other extensions due to regex/whitelist drift

2 participants