fix(gateway): align MEDIA: regex whitelist with SUPPORTED_DOCUMENT_TYPES#32995
Closed
echo26max wants to merge 1 commit into
Closed
fix(gateway): align MEDIA: regex whitelist with SUPPORTED_DOCUMENT_TYPES#32995echo26max wants to merge 1 commit into
echo26max wants to merge 1 commit into
Conversation
The MEDIA:<path> regex in extract_media() carried its own hard-coded
extension whitelist that drifted out of sync with SUPPORTED_DOCUMENT_TYPES
(line ~1023). Extensions registered as deliverable documents but missing
from the regex were silently stripped from response text, with no WARNING
logged — the gateway just never saw a media directive.
Surfaced when an agent emitted MEDIA:/path/to/report.md on Telegram. The
.md path vanished from the rendered message and no document was attached;
gateway.log showed 'Sending response (M chars)' where M < 'response=N
chars' but no 'Skipping unsafe MEDIA directive' warning, because the path
never reached validate_media_delivery_path().
Add the missing extensions to the regex (md, log, json, xml, yaml/yml,
toml, ini, cfg, ts, py, sh) and a comment documenting the alignment
contract with SUPPORTED_DOCUMENT_TYPES so the next person who edits
either side knows to update the other.
Tests:
* Parametrize over every newly-allowed extension.
* Add a regression test for the original bug report (.md path with
leading non-ASCII context line).
Collaborator
This was referenced May 27, 2026
1 task
Contributor
|
Superseded by #34844, which consolidates this cluster. This PR widens the Closing as superseded — thanks for surfacing and helping pin down this bug; it was part of getting the full fix right. See #34844. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
MEDIA:<path>regex inextract_media()(gateway/platforms/base.py) carries its own hard-coded extension whitelist that has drifted out of sync withSUPPORTED_DOCUMENT_TYPES. Extensions registered as deliverable documents but missing from the regex are silently stripped from response text — no WARNING is logged because the path never reachesvalidate_media_delivery_path().This PR aligns the two whitelists and adds a regression test.
Reproduction (before the fix)
Same failure mode for
.json,.yaml,.toml,.ini,.cfg,.log,.ts,.py,.sh,.xml— all listed inSUPPORTED_DOCUMENT_TYPES(line ~1023) but absent from the regex (line ~2416).Real-world impact
Surfaced when delivering a
.mdresearch report on Telegram. TheMEDIA:line vanished from the rendered message; no document attachment;gateway.logshowed:…with no
Skipping unsafe MEDIA directive path outside allowed rootswarning that normally accompanies allowlist rejection. The 206-char delta corresponds to the three strippedMEDIA:lines.Fix
Add the missing extensions to the regex alternation:
md | log | json | xml | ya?ml | toml | ini | cfg | ts | py | sh…and a comment block above the pattern documenting the alignment contract so the next person who edits either side knows to update the other.
Tests
test_media_tag_recognizes_document_extensions) covers every newly-allowed extension.test_media_tag_recognizes_markdown_with_quoted_path) —.mdpath with a leading non-ASCII context line, mirroring the original failure.Verification
Reverted the
gateway/platforms/base.pyhunk and re-ran the new tests against the un-fixed code: 13 failures, exactly as expected (12 parametrized + 1 real-world). Restored the fix → 27 passed inTestExtractMedia. Fulltests/gateway/test_platform_base.pysuite: 114 passed, 2 skipped.Scope
Minimal, additive — only adds extensions already enumerated as deliverable per
SUPPORTED_DOCUMENT_TYPES. Does not touchextract_local_files()(which already supports these via its own broader list),validate_media_delivery_path(), or the document dispatch path.A follow-up PR could refactor the regex to construct its alternation from
SUPPORTED_DOCUMENT_TYPESprogrammatically, eliminating the drift class entirely. That's deliberately out of scope here to keep this fix small and reviewable.