fix(extract_media): restrict \S+ fallback to absolute paths only#24672
Closed
wesleysimplicio wants to merge 1 commit into
Closed
Conversation
## Problem extract_media() logs spurious "File not found" warnings when the agent includes instructional MEDIA: examples in its response text. The last alternative in the named-path regex group — a bare `\S+` — matches any non-whitespace token, including template placeholders and plain words. ## Root cause gateway/platforms/base.py regex at the `\S+` fallback inside `(?P<path>...)` would match `MEDIA:description`, `MEDIA:filename`, or any word-like token after `MEDIA:`, triggering a failed send_document() call and a WARNING log entry. ## Fix Changed the fallback alternative from `\S+` to `(?:~/|/)\S+` so only strings that begin with an absolute path character (`/` or `~/`) are captured. Relative tokens and plain words no longer match. ## Tests Added test_extract_media_ignores_relative_word_fallback in tests/gateway/test_signal.py — asserts that plain words and relative paths are not extracted, and that absolute paths still are.
4 tasks
Contributor
|
Automated hermes-sweeper review: this looks implemented on current main. Evidence:
Thanks for the focused fix and clear write-up; the current implementation covers the same behavior. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
BasePlatformAdapter.extract_media()logs spurious File not found warnings whenever the agent explains theMEDIA:syntax in a response. The regex's last fallback alternative — a bare\S+— matches any non-whitespace token, so plain words, relative tokens, and template placeholders like<filename>are all captured as media paths.send_document()then fails withos.path.exists() == Falseand the gateway emits:Root cause
BasePlatformAdapter.extract_media()logs spurious File not found warningswhenever the agent explains the
MEDIA:syntax in a response. The regex'slast fallback alternative — a bare
\S+— matches any non-whitespace token,so plain words, relative tokens, and template placeholders like
<filename>are all captured as media paths.
send_document()then fails withos.path.exists() == Falseand the gateway emits:Fix
Change the fallback from
\S+to(?:~/|/)\S+so only strings beginningwith an absolute-path prefix are captured. Relative tokens and plain words no
longer trigger false-positive media sends.
Why this shape
This shape mirrors #29640 so reviewers can quickly compare scope, root cause, fix, tests, and related context without having to decode a custom PR description.
Tests
Original body
Related PRs / issues
Closes #24575
Original body
Summary
BasePlatformAdapter.extract_media()logs spurious File not found warnings whenever the agent explains theMEDIA:syntax in a response. The regex's last fallback alternative — a bare\S+— matches any non-whitespace token, so plain words, relative tokens, and template placeholders like<filename>are all captured as media paths.send_document()then fails withos.path.exists() == Falseand the gateway emits:What Changed
Fluxo
A mudança continua seguindo o fluxo original descrito na seção preservada abaixo, sem ampliar o escopo funcional deste PR.
Visão
A padronização melhora a revisão, reduz ruído e evita deriva de formatação entre PRs abertos.
Test Plan
Original body
What does this PR do?
Problem
BasePlatformAdapter.extract_media()logs spurious File not found warningswhenever the agent explains the
MEDIA:syntax in a response. The regex'slast fallback alternative — a bare
\S+— matches any non-whitespace token,so plain words, relative tokens, and template placeholders like
<filename>are all captured as media paths.
send_document()then fails withos.path.exists() == Falseand the gateway emits:Root cause
gateway/platforms/base.py— the named path group in theextract_mediaregex ends with
|\S+). This makes the fallback match anything:Fix
Change the fallback from
\S+to(?:~/|/)\S+so only strings beginningwith an absolute-path prefix are captured. Relative tokens and plain words no
longer trigger false-positive media sends.
Tests
Added
test_extract_media_ignores_relative_word_fallbackintests/gateway/test_signal.py:MEDIA:description→ empty (plain word, no leading/)MEDIA:relative/path.png→ empty (relative path, no leading/)MEDIA:/tmp/file.ogg→ extracted correctly (absolute path still works)All 3 new assertions pass. Existing
test_extract_media_finds_image_tagandtest_extract_media_finds_audio_tagcontinue to pass.Visual
flowchart LR BUG["❌ Bug: MEDIA:description → captured as media path"] --> WARN["WARNING: File not found"] FIX["✅ Fix: MEDIA:description → ignored (no leading /)"] --> OK["No spurious warning"]Closes #24575
Solution Sketch
Related Issue
Closes #24575
Type of Change
Changes Made
.github/PULL_REQUEST_TEMPLATE.mdHow to Test
Checklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests passDocumentation & Housekeeping
docs/, docstrings) — or N/Acli-config.yaml.exampleif I added/changed config keys — or N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — or N/AScreenshots / Logs
Generated by Hermes Turbo
Generated by Hermes Turbo