Skip to content

Bug: extract_media() false-positives on example paths in quoted text / code blocks #35695

@shengting

Description

@shengting

Problem

extract_media() scans the full response text with a single regex pass
(MEDIA_TAG_CLEANUP_RE) without distinguishing live delivery tags from
example paths mentioned in prose.
This causes false positives when:

  1. A skill description, error message, or doc string contains a literal
    example like inside a code block or quote
  2. The agent explains how to use the MEDIA tag in its reply (e.g.
    "include MEDIA:/path/to/file in your response")
  3. A tool returns output that happens to contain a path matching the regex
    Effect: The matching text is stripped from the user-visible response
    and the path is added to the media list. validate_media_delivery_path
    then either rejects it (silent drop) or — if the path happens to exist —
    delivers an unintended file.

Minimal Reproduction

Ask the agent to explain the MEDIA delivery syntax. Its reply will likely
contain something like:

To send an image, include in your response.
The backtick-wrapped example matches MEDIA_TAG_CLEANUP_RE, gets stripped
from the text, and an attempt is made to deliver /path/to/image.jpg.

Root Cause

extract_media() in gateway/platforms/base.py (~line 2577):
for match in media_pattern.finditer(content):
path = match.group("path").strip()
...
media.append((os.path.expanduser(path), has_voice_tag))
The scan is context-blind — it does not skip:

  • Fenced code blocks (...)
  • Inline code spans (MEDIA:...)
  • Blockquotes (> ...)
  • Tool output embedded in the response

Proposed Fix Direction

Before running MEDIA_TAG_CLEANUP_RE, mask content inside fenced code
blocks, inline code spans, and blockquotes (replace protected spans with
equal-length whitespace to preserve match offsets). This keeps the cleanup
substitution correct while eliminating false positives.
Happy to submit a PR if the direction looks right — we have a working
patch in our deployment.

Environment

  • macOS 15, launchd-managed gateway
  • Feishu platform adapter
  • Triggered by: skills that document MEDIA syntax, tool error messages
    containing file paths

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/gatewayGateway runner, session dispatch, deliverytype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions