Skip to content

Gateway MEDIA extraction can attach stale files from serialized tool/search-result text #34375

@leoge007

Description

@leoge007

Bug Description

The gateway MEDIA: extraction path can treat a MEDIA:/... string embedded inside serialized tool/search-result text as a real outbound attachment directive. In a Telegram gateway turn, this can cause an unrelated/stale image from prior session-search context to be sent as a native photo even though the final assistant reply did not intentionally attach anything.

This is distinct from existing MEDIA delivery issues such as:

Here the problem is the opposite direction: an internal/quoted MEDIA: occurrence can be over-extracted and delivered.

Observed Behavior

In a Telegram gateway session, after a long diagnostic turn involving session_search, the final visible answer contained no MEDIA: tag and no markdown image. However, the gateway still sent one photo attachment immediately after the answer.

Sanitized log shape:

response ready: platform=telegram ... response=1278 chars
Suppressing normal final send ... final delivery already confirmed (streamed=True ... content_delivered=True)
Skipping unsafe MEDIA directive path outside allowed roots
Skipping unsafe MEDIA directive path outside allowed roots
Skipping unsafe MEDIA directive path outside allowed roots
[Telegram] Sending media group of 1 photo(s) (chunk 1/1)

The stored final assistant message contained no MEDIA: and no markdown image:

MEDIA occurrences []
markdown images []

But one of the tool results in the same turn contained a serialized historical search hit with an old media-delivery line, e.g.:

{"content":"... MEDIA:/Users/example/.hermes/media/generated/old-result.png\\n ..."}

That stale path was then interpreted as an attachment candidate and delivered if it passed media-path validation.

Minimal Reproduction Shape

Current extractor shape is too permissive because it can match MEDIA: inside serialized JSON/text, not just standalone final-response directives.

from gateway.platforms.base import BasePlatformAdapter

content = r'{"content":"previous reply MEDIA:/Users/example/.hermes/media/generated/stale.png\\nnot an attachment"}'
media, cleaned = BasePlatformAdapter.extract_media(content)
print(media)

Actual Behavior

extract_media() treats the embedded text as a real attachment directive and returns a media tuple for stale.png.

Expected Behavior

Only explicit outbound attachment directives should be extracted. A MEDIA: occurrence embedded inside JSON/tool results/quoted historical session text should remain plain text and should not trigger native upload.

At minimum, MEDIA: should require a safe directive boundary (for example beginning of line or whitespace boundary) rather than matching in arbitrary serialized payloads. Ideally, media extraction should run only against the final user-visible response text, not against tool result payloads or streamed/internal transcript material.

Impact

  • Telegram users can receive unrelated stale images/files from prior search results.
  • The assistant's final text can be correct while gateway media side-effects are wrong.
  • This is surprising and potentially sensitive if a stale MEDIA: path points at a deliverable local artifact.

Local Mitigation Tested

A local guard was tested by requiring a non-nonspace left boundary before MEDIA: and adding a regression test like:

def test_media_tag_ignores_json_escaped_tool_result_text():
    content = r'{"content":"previous reply MEDIA:/Users/example/.hermes/media/generated/stale.png\\nnot an attachment"}'
    media, cleaned = BasePlatformAdapter.extract_media(content)
    assert media == []
    assert "MEDIA:/Users/example/.hermes/media/generated/stale.png" in cleaned

Focused test run:

pytest tests/gateway/test_platform_base.py::TestExtractMedia -q -n 0 --tb=short
15 passed

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/gatewayGateway runner, session dispatch, deliverytype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions