fix(gateway): recognize Windows drive-letter paths in extract_local_files() bare-path uploads#28991
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Updates media/local-path extraction to properly detect Windows drive-letter absolute paths (e.g., C:\..., C:/...) and adds regression tests to ensure relative paths are still rejected.
Changes:
- Expand
extract_media()regex anchor to include Windows drive-letter absolute paths. - Expand
extract_local_files()regex anchor and separators to support Windows paths. - Add/adjust pytest coverage for Windows absolute-path detection and relative-path rejection.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| tests/gateway/test_signal.py | Adds parametrized tests ensuring MEDIA: tags extract Windows drive-letter paths and still reject relative tokens. |
| tests/gateway/test_extract_local_files.py | Replaces the “not matched” Windows test with parametrized coverage asserting Windows drive-letter paths are detected; adds a relative Windows-path regression test. |
| gateway/platforms/base.py | Broadens regex anchors to match Windows drive-letter absolute paths in both extract_media() and extract_local_files(). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Related: #24049 (same Windows drive-letter MEDIA fix for Also related: #24132, #24217 (competing fixes for the same regex), #6742 (same bug class in MCP |
|
Thanks @alt-glitch — agreed. Narrowed this PR in c7436c74f: reverted the |
|
CI audit — the only failure on this PR's last run is Reproduces on every recent open PR's CI without this change (e.g. #29048, #29049). PR #29056 covers a separate baseline failure on |
c7436c7 to
f59afd6
Compare
4dc7c56 to
d7554f9
Compare
NousResearch#28989) The MEDIA: tag regex in `extract_media()` and the bare-path regex in `extract_local_files()` (both in `gateway/platforms/base.py`) anchored on `(?:~/|/)` — Unix absolute and home-relative paths only. On Windows, `MEDIA:C:/Users/.../image.jpg` and `MEDIA:C:\Users\...\image.jpg` were silently dropped, so the raw text was sent to the messaging platform instead of the actual file. Widen both anchors to `(?:~/|/|[A-Za-z]:[/\\])` so Windows drive-letter paths with either separator are recognized. In `extract_local_files` the inner path-segment separator `[\w.\-]+/` also becomes `[\w.\-]+[/\\]` so backslash-separated Windows paths walk segment-by-segment. Tests: - `test_extract_media_recognizes_windows_paths` (parametrized 4 cases): C:/ and C:\ forms, non-C: drives, lowercase drive letters. - `test_extract_media_still_rejects_relative_paths`: regression guard ensuring the widened regex did not re-introduce the `\S+` fallback removed in ea49b38. - `test_windows_drive_letter_paths_matched` (parametrized 4 cases) in test_extract_local_files.py flips the prior `test_windows_path_not_matched` (which documented the old broken behavior) to assert the correct one. - `test_relative_windows_path_not_matched`: bare `foo\bar.png` without a drive letter still doesn't match, mirroring the Unix relative-path exclusion.
Revert the extract_media() regex change and its tests after @alt-glitch flagged partial overlap with NousResearch#24049 (which covers the extract_media() half plus GIS extensions). This PR now narrows to the parallel-but-distinct bug in extract_local_files(), where the same Unix-only path anchor (?:~/|/) silently drops Windows drive-letter paths from bare-path uploads. NousResearch#24049 remains the canonical fix for the MEDIA: tag regex.
d7554f9 to
392f150
Compare
|
Closing as a duplicate — the Windows MEDIA-path fix landed in #35388 (#35388), salvaged from #35161. We picked the implementation that patched all four regex sites (MEDIA_TAG_CLEANUP_RE + extract_local_files in base.py, both _TOOL_MEDIA_RE in run.py) against current main. Thanks for tackling the same bug — your approach was correct, it just overlapped. |
What does this PR do?
gateway.platforms.base.BasePlatformAdapter.extract_local_files()anchors its bare-path regex on(?:~/|/), which only matches Unix absolute/home paths. Windows drive-letter paths (C:/Users/...,C:\\Users\\...,D:/..., etc.) silently fail to match, so the gateway never recognizes them as native uploads and the raw text is sent through to the messaging platform instead.The fix widens the path anchor to
(?:~/|/|[A-Za-z]:[/\\])and allows\\as an in-path separator. A(?<![/:\w.])negative lookbehind is already present so URLs and relative paths (./foo.png) continue to be rejected.Related Issue
Discovered while reviewing #28989 (which reports the
extract_media()half). This PR does not close #28989 — #24049 is the canonical fix there.Type of Change
Changes Made
gateway/platforms/base.py— widen theextract_local_files()path anchor to accept[A-Za-z]:[/\\]drive-letter prefixes and\\as an in-path separator.tests/gateway/test_extract_local_files.py— replace the now-obsoletetest_windows_path_not_matchedassertion with a parametrized positive test across four drive-letter shapes (uppercaseC:, lowercasee:, forward and back slashes, non-C:drive) plus a regression guard confirming a bare relativefoo\\bar.pngstill does NOT match.How to Test
origin/main:uv run --with pytest --with pytest-xdist --with pytest-asyncio python3 -m pytest tests/gateway/test_extract_local_files.py -v→ 48 passed locally.Checklist
Code
fix(scope):,feat(scope):, etc.)Documentation & Housekeeping
docs/, docstrings) — N/A (the function-level comment above the regex is updated to describe the new anchor)cli-config.yaml.exampleif I added/changed config keys — N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — N/ARelated / Positioning
extract_media()Windows-path anchor + GIS extensions. Canonical fix for [Bug]: MEDIA: tags with Windows paths not parsed — images sent as plain text #28989. No overlap with this PR after the narrowing in $SHA.extract_media()prefix-glue, space-truncation, and allowlist narrowing. Orthogonal to this PR.extract_local_files()CALL from_deliver_media_from_response(). If that PR lands first, this PR becomes a defense-in-depth improvement to the function for callers that still invoke it directly.CI failures on this PR are pre-existing baselines on
origin/main(test_restart_resume_pending, test_anthropic_error_handling 429/500 timeouts, test_file_operations missing_check_git_baseline, etc.) — none touchgateway/platforms/base.pyor theextract_local_filestest file.