Skip to content

fix(gateway,weixin): extend MEDIA regex whitelist + add retry on session errors#32604

Closed
hanhan-tg wants to merge 2 commits into
NousResearch:mainfrom
hanhan-tg:fix/weixin-media-retry
Closed

fix(gateway,weixin): extend MEDIA regex whitelist + add retry on session errors#32604
hanhan-tg wants to merge 2 commits into
NousResearch:mainfrom
hanhan-tg:fix/weixin-media-retry

Conversation

@hanhan-tg

@hanhan-tg hanhan-tg commented May 26, 2026

Copy link
Copy Markdown

What does this PR do?

Fixes two bugs in the WeChat (weixin) message delivery path:

1. MEDIA tag leaks for common file types

BasePlatformAdapter.extract_media() regex whitelist was missing .md, .json, .yaml, .yml, .toml, .log. When MEDIA:/path/to/file.md was used, the regex did not match → the MEDIA tag appeared as raw text on WeChat/Feishu/Telegram etc. instead of the file being routed through document-upload paths.

Fix: Added md|json|yaml|yml|toml|log to the regex alternation (single-line change). All other parts of the pattern (path wrapping, quote/backtick stripping, trailing-delimiter lookahead, [[audio_as_voice]] / [[as_document]] directives) are unchanged.

2. No retry on session/rate-limit errors

send_weixin_direct() fallback path returned immediately on errors like ret=-2 (rate limit) or errcode=-14 (session timeout) without any retry or context_token refresh. After gateway restart, stale context_tokens caused persistent failures.

Fix: Added a retry loop (up to 2 attempts) that clears the stale context_token before retrying. On session errors, the retry sends without a context token (tokenless fallback), with a 3-second backoff between attempts.

Related Issue

Closes #32601

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)

Changes Made

  • gateway/platforms/base.py — extend the MEDIA tag regex alternation with md|json|yaml|yml|toml|log (+1/-1 lines)
  • gateway/platforms/weixin.py — add retry loop in send_weixin_direct() fallback path for session/rate-limit errors (+29/-5 lines)
  • tests/gateway/test_platform_base.py — add parametrized test test_media_tag_accepts_text_config_extensions covering each new extension (+13 lines)

How to Test

# Test the regex fix (20 tests: 14 existing + 6 new parametrized)
python3 -m pytest tests/gateway/test_platform_base.py -v -k TestExtractMedia

# Test the retry logic
python3 -m pytest tests/gateway/test_weixin.py -v

Manual verification

  • Sent .md, .json, .yaml, .yml, .toml, .log files via MEDIA: tag → all delivered correctly
  • Restarted gateway → verified retry fires and uses tokenless fallback
  • Existing file types (.txt, .pdf, .png) still work correctly

Checklist

Code

  • I have read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(weixin): ...)
  • I searched for existing PRs to make sure this is not a duplicate
  • My PR contains only changes related to this fix (no unrelated commits)
  • I have run focused tests for the touched code and all pass
  • I have added tests for my changes
  • I have tested on my platform: macOS 15.x

Documentation & Housekeeping

  • I have updated relevant documentation — N/A (regex-internal + retry-internal change)
  • I have updated cli-config.yaml.example if I added/changed config keys — N/A
  • I have updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I have considered cross-platform impact — regex matches absolute (/...) and tilde (~/...) paths consistently with the existing pattern; no Windows-path coupling
  • I have updated tool descriptions/schemas if I changed tool behavior — N/A

Related / Positioning

Issue #32601 lists two bugs. Both are addressed in this single PR because:

Bug Scope File
1 MEDIA tag leaks for text/config file types gateway/platforms/base.py — global (all platforms)
2 send_weixin_direct() returns immediately on errors gateway/platforms/weixin.py — WeChat-only

Keeping these together makes sense because they were discovered together in a real deployment scenario (WeChat delivery path), and the retry logic in bug 2 is the fallback path that fires when file delivery (which bug 1 affects) hits transient errors.

This PR is part of a cluster of MEDIA regex fixes. Compared to siblings:

- Add md|json|yaml|yml|toml|log to extract_media regex whitelist
  so MEDIA:/path/to/file.md no longer leaks as raw text on WeChat
- Add retry loop in send_weixin_direct fallback path: on rate-limit
  or session errors, clear stale context_token and retry once

Closes NousResearch#32601
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists platform/wecom WeCom / WeChat Work adapter comp/gateway Gateway runner, session dispatch, delivery labels May 26, 2026
…sions

Cover each new extension (md, json, yaml, yml, toml, log) with a
parametrized test verifying extract_media correctly parses the
MEDIA: tag and strips it from the cleaned content.

Refs: NousResearch#32601, credit to @briandevans for the test template from NousResearch#32751.
@hanhan-tg hanhan-tg changed the title fix(weixin): extend MEDIA regex whitelist + add retry on session errors fix(gateway,weixin): extend MEDIA regex whitelist + add retry on session errors May 27, 2026
@hanhan-tg

Copy link
Copy Markdown
Author

@briandevans Thanks for closing #32751 in favor of this PR and for the test template! I've added parametrized tests covering all six extensions (credit to your approach). Appreciate the collaboration 🙏

@teknium1

Copy link
Copy Markdown
Contributor

Superseded by #34844, which consolidates this cluster.

This PR widens the extract_media extension allowlist, which is the right direction — but on its own it leaves the unconditional MEDIA:\s*\S+ strip in place, so a MEDIA: tag with any extension still outside the (now wider) list keeps getting deleted from the body before extract_local_files can pick up the bare path. #34844 fixes both halves: it unifies the two extractors onto a single shared extension set (MEDIA_DELIVERY_EXTS) AND replaces the loose strip with an extension-anchored one, so an unknown-extension path survives in the text instead of vanishing.

Closing as superseded — thanks for surfacing and helping pin down this bug; it was part of getting the full fix right. See #34844.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists platform/wecom WeCom / WeChat Work adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(weixin): MEDIA tag leaks for .md/.json/.yaml/.toml/.log files + no retry on session expiry

3 participants