You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for this fix. The core change from \S+(?:[^\S\n]+\S+)*?\.<ext> to [^\n]+?\.<ext>|\S+ is a clear improvement — the original pattern couldn't handle spaces immediately before the dot extension (e.g. V1.2 .docx), and the lazy quantifier with the extension-specific lookahead keeps the match well-constrained.
A few observations:
Regex duplication: The extraction pattern in base.py and the cleanup pattern in stream_consumer.py are now near-identical. If the extension list diverges between the two files, media tags could be extracted but not cleaned (or vice versa). Consider extracting the pattern or extension list into a shared constant.
Over-matching with [^\n]+?: While the lazy quantifier + lookahead is correct here, a pathological input like MEDIA:/a/b c d e f g h i j k.pdf unwanted text here would match greedily up to .pdf and include all the intermediate text as part of the path. The lookahead prevents capturing past the extension, so this is acceptable in practice.
Tests cover the reported case well (CJK characters + space-before-dot). Consider also adding a test for a path with multiple internal spaces (e.g. /tmp/my file name.docx) to confirm the pattern handles repeated whitespace segments.
The fix is correct and targeted. Ship-ready with the minor maintainability note above.
Part of the MEDIA path parsing cluster: related to #26407 (spaced unicode paths from tool output), #26368 (Windows paths with spaces), #24132 (spaced file paths). This PR specifically handles the edge case of spaces before the file extension (e.g. V1.2 .docx).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
comp/gatewayGateway runner, session dispatch, deliveryP2Medium — degraded but workaround existstype/bugSomething isn't working
3 participants
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
MEDIA:file paths that contain spaces before the file extension, e.g.V1.2 .docxMEDIA:tag for those pathsTest Plan
python -m pytest -o 'addopts=' tests/gateway/test_platform_base.py -q