Skip to content

fix(gateway): recognize Windows drive-letter paths in extract_local_files() bare-path uploads#28991

Closed
briandevans wants to merge 2 commits into
NousResearch:mainfrom
briandevans:fix/gateway-media-windows-paths-28989
Closed

fix(gateway): recognize Windows drive-letter paths in extract_local_files() bare-path uploads#28991
briandevans wants to merge 2 commits into
NousResearch:mainfrom
briandevans:fix/gateway-media-windows-paths-28989

Conversation

@briandevans

@briandevans briandevans commented May 20, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

gateway.platforms.base.BasePlatformAdapter.extract_local_files() anchors its bare-path regex on (?:~/|/), which only matches Unix absolute/home paths. Windows drive-letter paths (C:/Users/..., C:\\Users\\..., D:/..., etc.) silently fail to match, so the gateway never recognizes them as native uploads and the raw text is sent through to the messaging platform instead.

The fix widens the path anchor to (?:~/|/|[A-Za-z]:[/\\]) and allows \\ as an in-path separator. A (?<![/:\w.]) negative lookbehind is already present so URLs and relative paths (./foo.png) continue to be rejected.

Audited siblings: extract_media() in the same file has the same Unix-only anchor, but @liuhao1024 already covered it in #24049 (plus GIS extension additions). After @alt-glitch's partial-overlap flag I reverted the extract_media() half of this PR so the two PRs no longer compete on the same surface — this PR is now scoped to the parallel extract_local_files() bug only.

Related Issue

Discovered while reviewing #28989 (which reports the extract_media() half). This PR does not close #28989#24049 is the canonical fix there.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • gateway/platforms/base.py — widen the extract_local_files() path anchor to accept [A-Za-z]:[/\\] drive-letter prefixes and \\ as an in-path separator.
  • tests/gateway/test_extract_local_files.py — replace the now-obsolete test_windows_path_not_matched assertion with a parametrized positive test across four drive-letter shapes (uppercase C:, lowercase e:, forward and back slashes, non-C: drive) plus a regression guard confirming a bare relative foo\\bar.png still does NOT match.

How to Test

  1. Reproduce the bug on origin/main:
    uv run --with pytest --with pytest-xdist --with pytest-asyncio python3 -m pytest tests/gateway/test_extract_local_files.py::TestEdgeCases::test_windows_drive_letter_paths_matched -v
    
    → fails (anchor doesn't match Windows paths).
  2. Apply this PR and rerun the same command → 4/4 pass.
  3. Full file regression: uv run --with pytest --with pytest-xdist --with pytest-asyncio python3 -m pytest tests/gateway/test_extract_local_files.py -v → 48 passed locally.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run focused tests for the touched code and all pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS 15.x

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — N/A (the function-level comment above the regex is updated to describe the new anchor)
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact (Windows, macOS) — this IS the cross-platform fix; the new regex is OS-neutral
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A

Related / Positioning

CI failures on this PR are pre-existing baselines on origin/main (test_restart_resume_pending, test_anthropic_error_handling 429/500 timeouts, test_file_operations missing _check_git_baseline, etc.) — none touch gateway/platforms/base.py or the extract_local_files test file.

Copilot AI review requested due to automatic review settings May 20, 2026 00:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Updates media/local-path extraction to properly detect Windows drive-letter absolute paths (e.g., C:\..., C:/...) and adds regression tests to ensure relative paths are still rejected.

Changes:

  • Expand extract_media() regex anchor to include Windows drive-letter absolute paths.
  • Expand extract_local_files() regex anchor and separators to support Windows paths.
  • Add/adjust pytest coverage for Windows absolute-path detection and relative-path rejection.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
tests/gateway/test_signal.py Adds parametrized tests ensuring MEDIA: tags extract Windows drive-letter paths and still reject relative tokens.
tests/gateway/test_extract_local_files.py Replaces the “not matched” Windows test with parametrized coverage asserting Windows drive-letter paths are detected; adds a relative Windows-path regression test.
gateway/platforms/base.py Broadens regex anchors to match Windows drive-letter absolute paths in both extract_media() and extract_local_files().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery labels May 20, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Related: #24049 (same Windows drive-letter MEDIA fix for extract_media()). This PR additionally fixes extract_local_files() and removes the \S+ fallback; #24049 additionally adds GIS file extensions. Partial overlap — maintainer should pick one and cherry-pick unique parts from the other.

Also related: #24132, #24217 (competing fixes for the same regex), #6742 (same bug class in MCP _extract_attachments).

@briandevans briandevans changed the title fix(gateway): recognize Windows drive-letter paths in MEDIA extraction (#28989) fix(gateway): recognize Windows drive-letter paths in extract_local_files() bare-path uploads May 20, 2026
@briandevans

Copy link
Copy Markdown
Contributor Author

Thanks @alt-glitch — agreed. Narrowed this PR in c7436c74f: reverted the extract_media() half so it no longer competes with #24049 on the same regex. The PR now covers only the parallel extract_local_files() bug (same root cause, different function used for bare-path uploads), which #24049 doesn't touch. Updated title and description to match.

@briandevans

Copy link
Copy Markdown
Contributor Author

CI audit — the only failure on this PR's last run is tests/hermes_cli/test_update_hangup_protection.py::TestInstallHangupProtection::test_wraps_stdout_and_stderr_with_mirror, which is unrelated to this Windows-path fix: it's a baseline flake from a module-reload ordering issue in xdist (the captured _UpdateOutputStream reference at the test file's top-level becomes stale after test_skills_subparser.py does del sys.modules['hermes_cli.main']).

Reproduces on every recent open PR's CI without this change (e.g. #29048, #29049). PR #29056 covers a separate baseline failure on main (xAI plugin contract mismatch); the hangup_protection flake will need its own follow-up.

NousResearch#28989)

The MEDIA: tag regex in `extract_media()` and the bare-path regex in
`extract_local_files()` (both in `gateway/platforms/base.py`) anchored
on `(?:~/|/)` — Unix absolute and home-relative paths only. On Windows,
`MEDIA:C:/Users/.../image.jpg` and `MEDIA:C:\Users\...\image.jpg` were
silently dropped, so the raw text was sent to the messaging platform
instead of the actual file.

Widen both anchors to `(?:~/|/|[A-Za-z]:[/\\])` so Windows drive-letter
paths with either separator are recognized. In `extract_local_files`
the inner path-segment separator `[\w.\-]+/` also becomes `[\w.\-]+[/\\]`
so backslash-separated Windows paths walk segment-by-segment.

Tests:
- `test_extract_media_recognizes_windows_paths` (parametrized 4 cases):
  C:/ and C:\ forms, non-C: drives, lowercase drive letters.
- `test_extract_media_still_rejects_relative_paths`: regression guard
  ensuring the widened regex did not re-introduce the `\S+` fallback
  removed in ea49b38.
- `test_windows_drive_letter_paths_matched` (parametrized 4 cases) in
  test_extract_local_files.py flips the prior
  `test_windows_path_not_matched` (which documented the old broken
  behavior) to assert the correct one.
- `test_relative_windows_path_not_matched`: bare `foo\bar.png` without
  a drive letter still doesn't match, mirroring the Unix relative-path
  exclusion.
Revert the extract_media() regex change and its tests after
@alt-glitch flagged partial overlap with NousResearch#24049 (which covers the
extract_media() half plus GIS extensions). This PR now narrows to
the parallel-but-distinct bug in extract_local_files(), where the
same Unix-only path anchor (?:~/|/) silently drops Windows
drive-letter paths from bare-path uploads.

NousResearch#24049 remains the canonical fix for the MEDIA: tag regex.
@teknium1

Copy link
Copy Markdown
Contributor

Closing as a duplicate — the Windows MEDIA-path fix landed in #35388 (#35388), salvaged from #35161. We picked the implementation that patched all four regex sites (MEDIA_TAG_CLEANUP_RE + extract_local_files in base.py, both _TOOL_MEDIA_RE in run.py) against current main. Thanks for tackling the same bug — your approach was correct, it just overlapped.

@teknium1 teknium1 closed this May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: MEDIA: tags with Windows paths not parsed — images sent as plain text

4 participants