Skip to content

[Bug] extract_media: Markdown bold ** in MEDIA paths causes file-not-found on all platforms #23759

@hsdwww

Description

@hsdwww

Bug Description

When an agent generates a file (e.g. a .docx report) and the LLM responds to the user, it often wraps the file path in Markdown bold syntax: **MEDIA:/path/to/file.docx**.

The extract_media() function in gateway/platforms/base.py uses a regex that captures the path group, but the subsequent rstrip() does NOT strip trailing asterisks (*). This causes the literal ** characters to remain in the extracted path, resulting in a file-not-found error.

Root Cause

File: gateway/platforms/base.py
Line: ~2020

Current code:

path = path.lstrip("`\"'").rstrip("`\"',.;:)}")

The rstrip character set does not include *, so when the LLM outputs **MEDIA:/path/to/file.docx**, the captured path becomes /path/to/file.docx** — a non-existent file.

Reproduction

  1. Use any platform that inherits BasePlatformAdapter.extract_media() (QQ, WeChat, Telegram, etc.)
  2. Have a Skill generate a document file (e.g. .docx, .xlsx)
  3. The LLM responds with **MEDIA:/临时/报告.docx** (Markdown bold wrapping the path)
  4. The extracted path becomes /临时/报告.docx** — file not found
  5. Send fails with error: Media file not found: /临时/报告.docx**

Affected Platforms

All platforms that use BasePlatformAdapter.extract_media():

  • QQ (qqbot)
  • WeChat (weixin)
  • Telegram
  • And likely all other platform adapters

Why This Happens

The PLATFORM_HINTS for QQ/WeChat platforms encourage Markdown formatting:

"QQ supports markdown formatting and emoji"

This leads LLMs (especially Chinese-language models) to naturally bold paths using **...**. The extract_media() regex correctly identifies MEDIA:/path/file.docx but the trailing ** from the bold syntax leaks into the path group because rstrip does not strip *.

Suggested Fix

Add * to the rstrip character set in base.py line ~2020:

path = path.lstrip("`\"'").rstrip("`\"',.;:)}*")

This is a one-character change with minimal risk — * is extremely rare in valid file paths.

Impact

  • Severity: Medium — prevents file delivery on all platform adapters
  • Frequency: Any time the LLM uses bold syntax around a MEDIA path
  • Workaround: Manually edit base.py after each Hermes update to add * to rstrip

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium — degraded but workaround existscomp/gatewayGateway runner, session dispatch, deliverytype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions