fix: parse MEDIA paths with spaced filenames by williamumu · Pull Request #31035 · NousResearch/hermes-agent

williamumu · 2026-05-23T16:41:06Z

Summary

Handle unquoted MEDIA: file paths that contain spaces before the file extension, e.g. V1.2 .docx
Ensure streamed gateway display cleanup removes the full MEDIA: tag for those paths
Add regression coverage for both media extraction and stream display cleanup

Test Plan

python -m pytest -o 'addopts=' tests/gateway/test_platform_base.py -q

jsboige · 2026-05-23T16:45:39Z

Thanks for this fix. The core change from \S+(?:[^\S\n]+\S+)*?\.<ext> to [^\n]+?\.<ext>|\S+ is a clear improvement — the original pattern couldn't handle spaces immediately before the dot extension (e.g. V1.2 .docx), and the lazy quantifier with the extension-specific lookahead keeps the match well-constrained.

A few observations:

Regex duplication: The extraction pattern in base.py and the cleanup pattern in stream_consumer.py are now near-identical. If the extension list diverges between the two files, media tags could be extracted but not cleaned (or vice versa). Consider extracting the pattern or extension list into a shared constant.
Over-matching with [^\n]+?: While the lazy quantifier + lookahead is correct here, a pathological input like MEDIA:/a/b c d e f g h i j k.pdf unwanted text here would match greedily up to .pdf and include all the intermediate text as part of the path. The lookahead prevents capturing past the extension, so this is acceptable in practice.
Tests cover the reported case well (CJK characters + space-before-dot). Consider also adding a test for a path with multiple internal spaces (e.g. /tmp/my file name.docx) to confirm the pattern handles repeated whitespace segments.

The fix is correct and targeted. Ship-ready with the minor maintainability note above.

alt-glitch · 2026-05-23T16:52:37Z

Part of the MEDIA path parsing cluster: related to #26407 (spaced unicode paths from tool output), #26368 (Windows paths with spaces), #24132 (spaced file paths). This PR specifically handles the edge case of spaces before the file extension (e.g. V1.2 .docx).

fix: parse MEDIA paths with spaced filenames

d057972

alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists labels May 23, 2026

test: share media directive parsing pattern

e562fce

banditburai mentioned this pull request May 29, 2026

fix(gateway): unify MEDIA extraction onto one curated extension set #34656

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: parse MEDIA paths with spaced filenames#31035

fix: parse MEDIA paths with spaced filenames#31035
williamumu wants to merge 2 commits into
NousResearch:mainfrom
williamumu:fix/media-path-spaces

williamumu commented May 23, 2026

Uh oh!

jsboige commented May 23, 2026

Uh oh!

alt-glitch commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

williamumu commented May 23, 2026

Summary

Test Plan

Uh oh!

jsboige commented May 23, 2026

Uh oh!

alt-glitch commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants