fix(weixin): retry send without context_token on iLink session expiry#9928
fix(weixin): retry send without context_token on iLink session expiry#9928jinzheng8115 wants to merge 1 commit into
Conversation
iLink context_token has a limited TTL. When no user message has arrived for an extended period (e.g. overnight), cron-initiated pushes fail with errcode -14 (session timeout). Tested that iLink accepts sends without context_token as a degraded fallback, so we now automatically strip the expired token and retry once. This keeps scheduled push messages (weather, digests, etc.) working reliably without requiring a user message to refresh the session first. Changes: - _send_text_chunk() catches iLinkDeliveryError with session-expired errcode (-14) and retries without context_token - Stale tokens are cleared from ContextTokenStore on session expiry - All 34 existing weixin tests pass
There was a problem hiding this comment.
Pull request overview
This PR improves Weixin iLink outbound delivery resilience by detecting session-expiry (errcode/ret == -14) and retrying sends without a context_token, preventing cron-initiated pushes from being dropped after long idle periods.
Changes:
- Change
_send_message()to return the raw iLink response dict so callers can inspectret/errcode. - In
_send_text_chunk(), detect session-expired responses and retry once withoutcontext_token, attempting to clear the cached token before retrying.
Comments suppressed due to low confidence (1)
gateway/platforms/weixin.py:426
_send_message()now returns the raw response dict, but it still doesn't validate/raise on non-zeroret/errcode. This makes it easy for other call sites to ignore failures (there is at least one other_send_messagecall in this module that doesn't inspect the response). Consider centralizing response validation inside_send_message(or a helper) so errors can't be silently dropped, and let_send_text_chunkhandle the special-14retry path.
"""Send a text message via iLink sendmessage API.
Returns the raw API response dict (may contain error codes like
``errcode: -14`` for session expiry that the caller can inspect).
"""
if not text or not text.strip():
raise ValueError("_send_message: text must not be empty")
message: Dict[str, Any] = {
"from_user_id": "",
"to_user_id": to,
"client_id": client_id,
"message_type": MSG_TYPE_BOT,
"message_state": MSG_STATE_FINISH,
"item_list": [{"type": ITEM_TEXT, "text_item": {"text": text}}],
}
if context_token:
message["context_token"] = context_token
return await _api_post(
session,
base_url=base_url,
endpoint=EP_SEND_MESSAGE,
payload={"msg": message},
token=token,
timeout_ms=API_TIMEOUT_MS,
)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self._token_store._cache.pop( | ||
| self._token_store._key(self._account_id, chat_id), None | ||
| ) |
There was a problem hiding this comment.
Clearing the stale context token via self._token_store._cache.pop(...) only removes it from the in-memory cache and does not persist the deletion to disk (ContextTokenStore only persists on set()). After a restart, restore() will reload the expired token and the first send will fail again. Add a public delete()/clear() method on ContextTokenStore that pops the key and calls _persist(account_id), and use that here instead of reaching into _cache/_key private members.
| self._token_store._cache.pop( | |
| self._token_store._key(self._account_id, chat_id), None | |
| ) | |
| self._token_store.delete(self._account_id, chat_id) |
| @@ -1429,6 +1441,31 @@ async def _send_text_chunk( | |||
| context_token=context_token, | |||
| client_id=client_id, | |||
| ) | |||
| # Check iLink response for session-expired error | |||
| if resp and isinstance(resp, dict): | |||
| ret = resp.get("ret") | |||
| errcode = resp.get("errcode") | |||
| if (ret is not None and ret not in (0,)) or (errcode is not None and errcode not in (0,)): | |||
| is_session_expired = ( | |||
| ret == SESSION_EXPIRED_ERRCODE | |||
| or errcode == SESSION_EXPIRED_ERRCODE | |||
| ) | |||
| # Session expired — strip token and retry once | |||
| if is_session_expired and not retried_without_token and context_token: | |||
| retried_without_token = True | |||
| context_token = None | |||
| self._token_store._cache.pop( | |||
| self._token_store._key(self._account_id, chat_id), None | |||
| ) | |||
| logger.warning( | |||
| "[%s] session expired for %s; retrying without context_token", | |||
| self.name, _safe_id(chat_id), | |||
| ) | |||
| continue | |||
There was a problem hiding this comment.
New behavior (detect SESSION_EXPIRED_ERRCODE and retry once without context_token, plus clearing the token store) isn't covered by tests. Since this module already has unit tests for chunk retry behavior, add a test that simulates a ret/errcode = -14 response on the first attempt and asserts: (1) _send_message is called again with context_token=None, and (2) the token is removed from ContextTokenStore (including persistence if you add a delete API).
|
Merged as part of Batch-5 salvage: #11634 Your commit cherry-picked onto Commit SHA on main: e105b7a |
Problem
iLink
context_tokenhas a limited TTL. When no user message has arrived for an extended period (e.g. overnight), cron-initiated pushes fail witherrcode -14(session timeout). All scheduled push messages (weather reports, digests, etc.) are silently dropped.Root Cause
Every outbound message echoes the latest
context_tokenfor the peer. This token expires if no inbound message refreshes it. The_send_text_chunkmethod retries on generic errors but treats session expiry the same as any other failure — it retries with the same stale token, which keeps failing.Fix
When
_send_text_chunkcatches aniLinkDeliveryErrorwith session-expired errcode (-14), it now:context_tokenContextTokenStorecontext_tokenVerified that iLink accepts sends without
context_tokenas a degraded fallback — the message is delivered successfully.Testing
test_weixin.pytests pass{}(success) withoutcontext_tokenFiles Changed
gateway/platforms/weixin.py—_send_text_chunk()adds session-expired detection and tokenless retry