fix(gateway,weixin): extend MEDIA regex whitelist + add retry on session errors#32604
fix(gateway,weixin): extend MEDIA regex whitelist + add retry on session errors#32604hanhan-tg wants to merge 2 commits into
Conversation
- Add md|json|yaml|yml|toml|log to extract_media regex whitelist so MEDIA:/path/to/file.md no longer leaks as raw text on WeChat - Add retry loop in send_weixin_direct fallback path: on rate-limit or session errors, clear stale context_token and retry once Closes NousResearch#32601
…sions Cover each new extension (md, json, yaml, yml, toml, log) with a parametrized test verifying extract_media correctly parses the MEDIA: tag and strips it from the cleaned content. Refs: NousResearch#32601, credit to @briandevans for the test template from NousResearch#32751.
|
@briandevans Thanks for closing #32751 in favor of this PR and for the test template! I've added parametrized tests covering all six extensions (credit to your approach). Appreciate the collaboration 🙏 |
|
Superseded by #34844, which consolidates this cluster. This PR widens the Closing as superseded — thanks for surfacing and helping pin down this bug; it was part of getting the full fix right. See #34844. |
What does this PR do?
Fixes two bugs in the WeChat (weixin) message delivery path:
1. MEDIA tag leaks for common file types
BasePlatformAdapter.extract_media()regex whitelist was missing.md,.json,.yaml,.yml,.toml,.log. WhenMEDIA:/path/to/file.mdwas used, the regex did not match → the MEDIA tag appeared as raw text on WeChat/Feishu/Telegram etc. instead of the file being routed through document-upload paths.Fix: Added
md|json|yaml|yml|toml|logto the regex alternation (single-line change). All other parts of the pattern (path wrapping, quote/backtick stripping, trailing-delimiter lookahead,[[audio_as_voice]]/[[as_document]]directives) are unchanged.2. No retry on session/rate-limit errors
send_weixin_direct()fallback path returned immediately on errors likeret=-2(rate limit) orerrcode=-14(session timeout) without any retry or context_token refresh. After gateway restart, stale context_tokens caused persistent failures.Fix: Added a retry loop (up to 2 attempts) that clears the stale
context_tokenbefore retrying. On session errors, the retry sends without a context token (tokenless fallback), with a 3-second backoff between attempts.Related Issue
Closes #32601
Type of Change
Changes Made
gateway/platforms/base.py— extend the MEDIA tag regex alternation withmd|json|yaml|yml|toml|log(+1/-1 lines)gateway/platforms/weixin.py— add retry loop insend_weixin_direct()fallback path for session/rate-limit errors (+29/-5 lines)tests/gateway/test_platform_base.py— add parametrized testtest_media_tag_accepts_text_config_extensionscovering each new extension (+13 lines)How to Test
Manual verification
.md,.json,.yaml,.yml,.toml,.logfiles viaMEDIA:tag → all delivered correctly.txt,.pdf,.png) still work correctlyChecklist
Code
fix(weixin): ...)Documentation & Housekeeping
cli-config.yaml.exampleif I added/changed config keys — N/ACONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — N/A/...) and tilde (~/...) paths consistently with the existing pattern; no Windows-path couplingRelated / Positioning
Issue #32601 lists two bugs. Both are addressed in this single PR because:
gateway/platforms/base.py— global (all platforms)send_weixin_direct()returns immediately on errorsgateway/platforms/weixin.py— WeChat-onlyKeeping these together makes sense because they were discovered together in a real deployment scenario (WeChat delivery path), and the retry logic in bug 2 is the fallback path that fires when file delivery (which bug 1 affects) hits transient errors.
This PR is part of a cluster of MEDIA regex fixes. Compared to siblings:
SUPPORTED_DOCUMENT_TYPES) — preferable long-term, but this PR also fixes the WeChat retry issue which fix(gateway): sync MEDIA regex extension allowlist with SUPPORTED_DOCUMENT_TYPES #29609 does not coverweixin.pyis orthogonal to whichever regex approach lands, and can be cherry-picked independently