fix(gateway): derive MEDIA extensions dynamically from SUPPORTED_DOCUMENT_TYPES#31159
fix(gateway): derive MEDIA extensions dynamically from SUPPORTED_DOCUMENT_TYPES#31159mohamedorigami-jpg wants to merge 2 commits into
Conversation
…A_EXTS extract_media() regex was missing 20+ extensions that extract_local_files() already accepts: svg, bmp, tiff, md, odt, rtf, ods, tsv, json, xml, yaml, yml, ppt, odp, key, tar, gz, tgz, bz2, xz, html, htm. When the model emits MEDIA:/path/to/file.html (or any other missing extension), the regex silently drops the file while send_message reports success: true. The path falls between both detectors because extract_media rejects it and extract_local_files anti-URL guard disqualifies the MEDIA: prefix. Fixes NousResearch#31137
…MENT_TYPES Instead of maintaining separate hardcoded extension lists for extract_media regex and _LOCAL_MEDIA_EXTS, build a single set at module level derived from SUPPORTED_DOCUMENT_TYPES + SUPPORTED_IMAGE_DOCUMENT_TYPES + known audio/video/archive types. - extract_media() now uses the precompiled module-level _MEDIA_TAG_RE - extract_local_files() builds _LOCAL_MEDIA_EXTS from _MEDIA_EXTS_SET - 60 extensions covered (was ~30 in the original regex) - Adding new types to SUPPORTED_DOCUMENT_TYPES auto-propagates Closes NousResearch#29609 (preferred dynamic approach) Fixes NousResearch#31137
|
Fair call. I updated the PR to use the dynamic approach from #29609 - the regex and _LOCAL_MEDIA_EXTS are now both derived from a single set built from SUPPORTED_DOCUMENT_TYPES + SUPPORTED_IMAGE_DOCUMENT_TYPES + known audio/video/archive types. 60 extensions total, 95 tests pass. |
|
Superseded by #34844, which consolidates this cluster. This PR widens the Closing as superseded — thanks for surfacing and helping pin down this bug; it was part of getting the full fix right. See #34844. |
Instead of maintaining separate hardcoded extension lists for extract_media and extract_local_files that keep drifting out of sync, this builds a single set at module level derived from SUPPORTED_DOCUMENT_TYPES + SUPPORTED_IMAGE_DOCUMENT_TYPES + known audio/video/archive types.
Changes:
95 tests pass in tests/gateway/test_platform_base.py.
Closes #29609 (preferred dynamic approach per alt-glitch's review)
Fixes #31137