fix(gateway): support Windows drive-letter paths and GIS extensions in MEDIA: regex#24049
Open
liuhao1024 wants to merge 1 commit into
Open
Conversation
…n MEDIA: regex The MEDIA: tag extractor in gateway/platforms/base.py fails on Windows absolute paths containing spaces (e.g. C:\Users\Foo\OneDrive\My Folder\file.pdf). The path is silently truncated at the first whitespace because the spaced-path branch only matches POSIX paths starting with ~/ or /. Additionally, common GIS/structured-data extensions (kmz, kml, geojson, gpx, json, xml, html) are absent from the spaced-path extension allowlist, so even POSIX-style spaced paths fail for those types. Changes: - Add [A-Za-z]: to the spaced-path prefix group to match Windows drive-letter paths - Add kmz, kml, geojson, gpx, json, xml, html to the extension allowlist - Add 9 regression tests covering Windows paths, GIS extensions, and combinations Fixes [bug] extract_media regex truncates Windows spaced paths and rejects GIS extensions NousResearch#24032
This was referenced May 12, 2026
Closed
This was referenced May 19, 2026
fix(gateway): recognize Windows drive-letter paths in extract_local_files() bare-path uploads
#28991
Closed
2 tasks
13 tasks
9 tasks
3 tasks
1 task
briandevans
added a commit
to briandevans/hermes-agent
that referenced
this pull request
May 29, 2026
Revert the extract_media() regex change and its tests after @alt-glitch flagged partial overlap with NousResearch#24049 (which covers the extract_media() half plus GIS extensions). This PR now narrows to the parallel-but-distinct bug in extract_local_files(), where the same Unix-only path anchor (?:~/|/) silently drops Windows drive-letter paths from bare-path uploads. NousResearch#24049 remains the canonical fix for the MEDIA: tag regex.
This was referenced May 30, 2026
fix(gateway): support Windows drive letters and backslashes in file path regex (fixes #35270)
#35328
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
The
MEDIA:tag extractor ingateway/platforms/base.py(extract_media) fails on Windows absolute paths containing spaces (e.g.C:\Users\Foo\OneDrive\My Folder\file.pdf). The path is silently truncated at the first whitespace because the spaced-path branch only matches POSIX paths starting with~/or/.Additionally, common GIS/structured-data extensions (
kmz,kml,geojson,gpx,json,xml,html) are absent from the spaced-path extension allowlist, so even POSIX-style spaced paths fail for those file types.Root Cause
The
media_patternregex at line 2066 has a spaced-path branch anchored to(?:~/|/), which only matches POSIX-style paths. Windows drive-letter paths (C:\...) and UNC paths (\\server\share\...) fall through to the\S+fallback branch, which stops at the first whitespace.The extension allowlist also omits GIS and structured-data formats, causing
.kmz,.kml,.geojson,.gpx,.json,.xml, and.htmlfiles to fail the spaced-path match even on POSIX systems.Related Issue
N/A
Type of Change
Changes Made
How to Test
pytest tests/ -q— all tests should passChecklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture and workflows — or N/A