Skip to content

extract_media() greedy \S+ fallback matches non-file text -> spurious 'File not found' warnings #24575

@vadim9712-coder

Description

@vadim9712-coder

Bug

Version: hermes-agent 0.13.0 (commit dd0923b)
Severity: cosmetic — warnings only in gateway log, no functional impact on tool-calls or message delivery
Affected: confirmed in 0.13.0, likely also in 0.12.x (regex unchanged)

BasePlatformAdapter.extract_media() in gateway/platforms/base.py line 2067
uses a regex with a greedy \S+ fallback alternative that matches any
non-whitespace text after MEDIA: — including documentation examples and
instructional text in the agent's response. This causes spurious
"File file not found" warnings in gateway logs.

Reproducer

Agent response containing instructional text like:

Use `MEDIA:/home/hermes/outbox/file` to send files via the outbox directory.

The regex captures /home/hermes/outbox/file as a media path, send_document()
fails with os.path.exists() == False, and the gateway logs:

WARNING gateway.platforms.base: [Telegram] Failed to send media ():
  File file not found: /home/hermes/outbox/file

Logs from a real session (May 12, 2026):

15:21:56 File file not found: /home/hermes/outbox/<filename>
15:26:23 File file not found: /home/hermes/outbox/file
15:26:23 File file not found: /outbox/
16:43:31 File file not found: /home/hermes/outbox/file   (post-restart auto-resume)
16:43:31 File file not found: /outbox/

The <filename> case is a template placeholder from a backtick-quoted
MEDIA example — captured by the backtick-quoted alternative. /outbox/ is
captured by the \S+ fallback on a bare directory path.

Root cause

Regex line 2066–2067, last alternative \S+:

r'''[`"']?MEDIA:\s*(?P<path>`[^`\n]+`|"[^"\n]+"|'[^'\n]+'|(?:~/|/)\S+(?:[^\S\n]+\S+)*?\.(?:png|jpe?g|gif|webp|mp4|mov|avi|mkv|webm|ogg|opus|mp3|wav|m4a|flac|epub|pdf|zip|rar|7z|docx?|xlsx?|pptx?|txt|csv|apk|ipa)(?=[\s`"',;:)\]}]|$)|\S+)[`"']?'''
                                                                                                                         ^^^^

This matches any non-whitespace after MEDIA: — unlike alternatives 1–4
which require backtick/quoted-wrapping or a recognized file extension.

Fix

Add os.path.isfile() validation before media.append(), mirroring the
pattern already used in extract_local_files() at line 2131:

# Before (line 2074–2075):
if path:
    media.append((os.path.expanduser(path), has_voice_tag))

# After:
if path:
    expanded = os.path.expanduser(path)
    if not os.path.isfile(expanded):
        continue
    media.append((expanded, has_voice_tag))

This is a one-line semantic change (3 lines with the expanded variable).
The regex itself is not modified — only the consumer validates that matched
paths point to real files before attempting delivery.

Alternative considered

Removing the \S+ fallback from the regex entirely was considered but
rejected: some models emit bare MEDIA:/path/file.txt (no quotes, no
backticks) with recognized extensions, and those should still be delivered.
The isfile() check is the safer fix — it validates what the regex
matched without changing the regex's matching scope.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/gatewayGateway runner, session dispatch, deliverysweeper:implemented-on-mainSweeper: behavior already present on current maintype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions