fix(gateway): unify extract_media + extract_local_files extension lists (#34321)#34345
Closed
Bartok9 wants to merge 1 commit into
Closed
fix(gateway): unify extract_media + extract_local_files extension lists (#34321)#34345Bartok9 wants to merge 1 commit into
Bartok9 wants to merge 1 commit into
Conversation
Collaborator
4 tasks
…ts (NousResearch#34321) BasePlatformAdapter.extract_media() rejects .md (and 22 other text-based extensions) that extract_local_files() correctly accepts. This causes MEDIA:<path> tags pointing to those files to fall through to plain text on every platform, while a bare path in the same response would deliver as a native attachment. Commit ea49b38 narrowed the extract_media regex (to fix Mattermost false positives) but left out: .md .doc .odt .rtf .ods .tsv .json .xml .yaml .yml .ppt .odp .key .tar .gz .tgz .bz2 .xz .html .htm .tiff .svg .bmp Reproduction from issue: send_message(message='MEDIA:~/.hermes/cache/documents/report.md', target='weixin') → before: literal 'MEDIA:~/.hermes/...' delivered as text → after: report.md uploaded as native attachment Fix: Extract the extension list into a class-level constant `BasePlatformAdapter._MEDIA_DELIVERY_EXTS` and use it in BOTH extract_media() and extract_local_files(). The single source of truth eliminates the drift class. Implementation notes: - _MEDIA_DELIVERY_EXTS is a class attribute (Tuple[str, ...]) so it participates in subclass overrides naturally if a platform ever needs to narrow it (Mattermost / WhatsApp text-only fallback, etc.) without forking extract_media() itself. - Regex alternation is sorted by length descending so longer extensions like `.tgz` win against shorter aliases — necessary when the same suffix could otherwise short-circuit a longer one. - re.IGNORECASE added on extract_media's regex to match the case insensitivity already present on extract_local_files. Real responses include `MEDIA:/path/foo.PDF` from skill authors who uppercase extensions. - .epub and .opus were already in extract_media but not extract_local_files; .apk/.ipa were in extract_media but not extract_local_files. Both sets are now in the shared constant, which is the union of both prior lists. Tests (10 new in TestExtractMediaExtensionParity): - Shared constant exists, is non-empty, contains .md (the issue's example). - Parametrized across all 23 previously-missing extensions: each is now in the shared constant AND now matches via extract_media(). - Real-world NousResearch#34321 reproduction case (`MEDIA:~/.hermes/cache/ documents/report.md`) extracts correctly. - Cross-method probe: both extract_media and extract_local_files accept the same set on real files at /tmp. - [[audio_as_voice]] still applies to .md (behavior preserved). - Unknown extensions like .exe still rejected (guardrail: this fix expands the allow-list, it does not open the gate to anything). All 209 tests in test_platform_base.py + test_extract_local_files.py + test_media_extraction.py pass. Refs: NousResearch#34321 Closes: NousResearch#34321 Co-authored-by: Cursor <cursoragent@cursor.com>
5fb1364 to
496c879
Compare
Contributor
|
Merged via #34844, which consolidates the MEDIA: extension-allowlist cluster. Your contribution was used directly and your authorship is preserved via a Thanks for the work here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #34321
Problem
BasePlatformAdapter.extract_media()rejects.md(and 22 other text-based extensions) thatextract_local_files()correctly accepts. MEDIA: tags pointing to those files fall through to plain text on every platform, while a bare path in the same response delivers as a native attachment.Commit
ea49b3862narrowed theextract_mediaregex to fix Mattermost false positives, but left out:Reproduction (from issue)
MEDIA:~/.hermes/...delivered as textFix
Extract the extension list into a class-level constant
BasePlatformAdapter._MEDIA_DELIVERY_EXTSand use it in bothextract_media()andextract_local_files(). Single source of truth eliminates the drift class.Implementation notes:
_MEDIA_DELIVERY_EXTSis a class attribute (Tuple[str, ...]) so subclass overrides work naturally if a platform ever needs to narrow it..tgzwins against.gz(would short-circuit otherwise).re.IGNORECASEon extract_media to match the case-insensitivity already in extract_local_files. Real responses includeMEDIA:/path/foo.PDF..epub/.opus/.apk/.ipawere in extract_media but missing from extract_local_files. Both sets now in the shared constant.Tests
10 new tests in
TestExtractMediaExtensionParity:.mdextract_media()extract_media()regex missing.mdand other text-based extensions, mismatch withextract_local_files()#34321 reproduction case (MEDIA:~/.hermes/cache/documents/report.md)[[audio_as_voice]]still applies to .md (behavior preserved).exestill rejected (this fix expands the allow-list, doesn't open the gate)$ python -m pytest tests/gateway/test_platform_base.py tests/gateway/test_extract_local_files.py tests/gateway/test_media_extraction.py === 209 passed, 2 skipped in 1.58s ===Files changed
gateway/platforms/base.pytests/gateway/test_platform_base.py🎻 Co-authored-by: Cursor cursoragent@cursor.com