fix(gateway): match extract_media extension list to extract_local_files#31138
Closed
Razultull wants to merge 1 commit into
Closed
fix(gateway): match extract_media extension list to extract_local_files#31138Razultull wants to merge 1 commit into
Razultull wants to merge 1 commit into
Conversation
The MEDIA: directive parser and the bare-path detector use independent extension whitelists. extract_local_files accepts .html/.md/.svg/.json and many other doc/data/web extensions; extract_media did not. When the model returned 'MEDIA:/abs/path/file.html', the regex skipped it, the file was never attached, and the parent send_message tool still reported success because the text portion shipped fine — silent data loss with no log line. The (?<![/:\\w.]) anti-URL guard on extract_local_files also disqualifies the path (the ':' in 'MEDIA:' defeats it), so the path falls between both detectors. Extend the extract_media regex extension list to be a superset of _LOCAL_MEDIA_EXTS and add a NOTE comment binding the two lists together so future drift is caught in review. Regression test: TestExtractMediaExtensionCoverage in tests/gateway/test_platform_base.py — 43 parametrized extensions plus the literal user-reported failure path.
Collaborator
This was referenced May 23, 2026
Closed
Contributor
|
Superseded by #34844, which consolidates this cluster. This PR widens the Closing as superseded — thanks for surfacing and helping pin down this bug; it was part of getting the full fix right. See #34844. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
gateway/platforms/base.pyhad two file-extension whitelists that drifted out of sync:extract_media()parses explicitMEDIA:<path>directives.extract_local_files()finds bare local paths in response text.extract_local_files's_LOCAL_MEDIA_EXTSincludes.html,.htm,.md,.svg,.bmp,.tiff,.json,.xml,.yaml,.yml,.tsv,.ods,.odp,.odt,.rtf,.key,.tar,.gz,.tgz,.bz2,.xz. The regex insideextract_mediadid not. AMEDIA:/abs/path/file.htmldirective was silently dropped — the parentsend_messagetool still reportedsuccess: truebecause the text portion shipped fine.Closes #31137.
Changes
gateway/platforms/base.py(+4): extension list in theextract_mediaregex is now a superset of_LOCAL_MEDIA_EXTS. Added aNOTEcomment binding the two together so future drift is caught in review.tests/gateway/test_platform_base.py(+36):TestExtractMediaExtensionCoverage— 43-case parametrized regression test covering every previously-missing extension + every previously-working one, plus the literal user-reported failure path (MEDIA:/root/.hermes/media_cache/00-Visual-Report.html).Validation
MEDIA:/tmp/report.htmlMEDIA:/tmp/notes.mdMEDIA:/tmp/icon.svgMEDIA:/tmp/data.jsonMEDIA:/tmp/foo.mp3,.pdf,.docx, etc.send_messagesuccess reporting on dropped filessuccess: true(misleading)success: trueand file actually shipsTargeted suite:
Broader media-related suite (no regressions):
Tested manually on Telegram (Ubuntu, Python 3.11, gateway under user systemd unit) — same path that produced the silent drop now ships the document natively.
Why
This is a top-priority class of bug per CONTRIBUTING.md: silent data loss (file attachments dropped without warning) affecting every messaging platform that goes through
BasePlatformAdapter.extract_media. The fix surface is tiny and well-tested.