fix(gateway): define missing _MEDIA_EXTS — MEDIA tag and local file extraction#34115
Closed
proactive-assistant wants to merge 3 commits into
Closed
Conversation
added 3 commits
May 28, 2026 23:32
…on regexes The MEDIA:<path> tag regex in extract_media() was hardcoded with an incomplete extension list — missing .md, .json, .yaml, .html, .svg, and 15 other extensions that extract_local_files() already recognised via its own _LOCAL_MEDIA_EXTS tuple. Consolidate both code paths onto a single module-level _MEDIA_EXTS frozenset built from the union of all four support dicts (_AUDIO_EXTS, SUPPORTED_VIDEO_TYPES, SUPPORTED_DOCUMENT_TYPES, SUPPORTED_IMAGE_DOCUMENT_TYPES). This means: - Adding a new supported document type automatically picks up MEDIA-tag delivery and local-file-path extraction without touching the regex. - extract_media() and extract_local_files() can never drift apart again — they share the same source of truth. The hardcoded extensions already in the regex but missing from the dicts (.epub, .rar, .7z, .apk, .ipa) are intentionally dropped because the gateway has no MIME type / delivery support for them.
…ts in non-streaming responses Three related fixes in run.py: 1. Replace two hardcoded MEDIA regexes (lines 16982, 17288) — both used the same incomplete list (png|jpe?g|...|apk|ipa) missing .md and 20+ other extensions. Both now build the regex dynamically from _MEDIA_EXTS, imported from gateway.platforms.base. 2. The non-streaming response path returned raw response text with MEDIA: tags still embedded — the adapter sent them as literal text and the files were never extracted or delivered. Only the streaming path had media delivery (via _deliver_media_from_response). Fix: call _deliver_media_from_response + extract_media before returning the response in the non-streaming path. Files are delivered and MEDIA tags are stripped from the text the adapter receives. 3. Added _MEDIA_EXTS to the top-level import from gateway.platforms.base so both regex sites and future consumers can use it.
- Compute _ext_part and _TOOL_MEDIA_RE once before each loop instead of recompiling on every iteration. No behavioral change. - Fix misleading "directly above" comment on _MEDIA_EXTS — the source dicts are spread across the module, not directly above.
Collaborator
Contributor
|
Superseded by #34844, which consolidates this cluster. This PR widens the Closing as superseded — thanks for surfacing and helping pin down this bug; it was part of getting the full fix right. See #34844. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three bugs across two files, all related to MEDIA tag / file attachment handling:
1.
base.py—_MEDIA_EXTSnever defined (crash on use)extract_media()andextract_local_files()both referencedBasePlatformAdapter._MEDIA_EXTSbut the attribute was never defined — would crash withAttributeErroron any call. The hardcoded regex inextract_media()also missed.mdand 20+ other extensions thatextract_local_files()supported.Fix: Define
_MEDIA_EXTSas the union of_AUDIO_EXTS+SUPPORTED_VIDEO_TYPES+SUPPORTED_DOCUMENT_TYPES+SUPPORTED_IMAGE_DOCUMENT_TYPES. Use it in both methods — adding a new doc type automatically enables MEDIA-tag and local-file delivery.2.
run.py— two more hardcoded MEDIA regexesThe TTS/tool-result scanning code had the same hardcoded incomplete regex duplicated at two sites (lines 16982, 17288). Missing the same 20+ extensions.
Fix: Import
_MEDIA_EXTSand build the regex dynamically.3.
run.py— non-streaming responses drop MEDIA attachmentsThe streaming path called
_deliver_media_from_response()to extract and deliver files. The non-streaming path returned the raw response as-is — MEDIA tags appeared as literal text in chat and files were never delivered.Fix: Call
_deliver_media_from_response()+extract_media()in the non-streaming path before returning. Files are delivered and tags are stripped from the text.Test plan
test_media_extraction.py(4/4),test_platform_base.py(101/101).md,.pdf,.png,.mp4,.mp3all extracted from MEDIA tags and local file paths_MEDIA_EXTSavailable