Skip to content

fix(weixin): retry send without context_token on iLink session expiry#9928

Closed
jinzheng8115 wants to merge 1 commit into
NousResearch:mainfrom
jinzheng8115:fix/ilink-session-expired-clean
Closed

fix(weixin): retry send without context_token on iLink session expiry#9928
jinzheng8115 wants to merge 1 commit into
NousResearch:mainfrom
jinzheng8115:fix/ilink-session-expired-clean

Conversation

@jinzheng8115

Copy link
Copy Markdown
Contributor

Problem

iLink context_token has a limited TTL. When no user message has arrived for an extended period (e.g. overnight), cron-initiated pushes fail with errcode -14 (session timeout). All scheduled push messages (weather reports, digests, etc.) are silently dropped.

Root Cause

Every outbound message echoes the latest context_token for the peer. This token expires if no inbound message refreshes it. The _send_text_chunk method retries on generic errors but treats session expiry the same as any other failure — it retries with the same stale token, which keeps failing.

Fix

When _send_text_chunk catches an iLinkDeliveryError with session-expired errcode (-14), it now:

  1. Strips the expired context_token
  2. Clears the stale token from ContextTokenStore
  3. Retries the send without context_token

Verified that iLink accepts sends without context_token as a degraded fallback — the message is delivered successfully.

Testing

  • All 34 existing test_weixin.py tests pass
  • Manually verified: iLink sendmessage returns {} (success) without context_token
  • Manually verified: cron weather push now succeeds after overnight session expiry

Files Changed

  • gateway/platforms/weixin.py_send_text_chunk() adds session-expired detection and tokenless retry

iLink context_token has a limited TTL. When no user message has arrived
for an extended period (e.g. overnight), cron-initiated pushes fail with
errcode -14 (session timeout).

Tested that iLink accepts sends without context_token as a degraded
fallback, so we now automatically strip the expired token and retry
once. This keeps scheduled push messages (weather, digests, etc.)
working reliably without requiring a user message to refresh the
session first.

Changes:
- _send_text_chunk() catches iLinkDeliveryError with session-expired
  errcode (-14) and retries without context_token
- Stale tokens are cleared from ContextTokenStore on session expiry
- All 34 existing weixin tests pass
Copilot AI review requested due to automatic review settings April 14, 2026 23:48

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Weixin iLink outbound delivery resilience by detecting session-expiry (errcode/ret == -14) and retrying sends without a context_token, preventing cron-initiated pushes from being dropped after long idle periods.

Changes:

  • Change _send_message() to return the raw iLink response dict so callers can inspect ret/errcode.
  • In _send_text_chunk(), detect session-expired responses and retry once without context_token, attempting to clear the cached token before retrying.
Comments suppressed due to low confidence (1)

gateway/platforms/weixin.py:426

  • _send_message() now returns the raw response dict, but it still doesn't validate/raise on non-zero ret/errcode. This makes it easy for other call sites to ignore failures (there is at least one other _send_message call in this module that doesn't inspect the response). Consider centralizing response validation inside _send_message (or a helper) so errors can't be silently dropped, and let _send_text_chunk handle the special -14 retry path.
    """Send a text message via iLink sendmessage API.

    Returns the raw API response dict (may contain error codes like
    ``errcode: -14`` for session expiry that the caller can inspect).
    """
    if not text or not text.strip():
        raise ValueError("_send_message: text must not be empty")
    message: Dict[str, Any] = {
        "from_user_id": "",
        "to_user_id": to,
        "client_id": client_id,
        "message_type": MSG_TYPE_BOT,
        "message_state": MSG_STATE_FINISH,
        "item_list": [{"type": ITEM_TEXT, "text_item": {"text": text}}],
    }
    if context_token:
        message["context_token"] = context_token
    return await _api_post(
        session,
        base_url=base_url,
        endpoint=EP_SEND_MESSAGE,
        payload={"msg": message},
        token=token,
        timeout_ms=API_TIMEOUT_MS,
    )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1457 to +1459
self._token_store._cache.pop(
self._token_store._key(self._account_id, chat_id), None
)

Copilot AI Apr 14, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clearing the stale context token via self._token_store._cache.pop(...) only removes it from the in-memory cache and does not persist the deletion to disk (ContextTokenStore only persists on set()). After a restart, restore() will reload the expired token and the first send will fail again. Add a public delete()/clear() method on ContextTokenStore that pops the key and calls _persist(account_id), and use that here instead of reaching into _cache/_key private members.

Suggested change
self._token_store._cache.pop(
self._token_store._key(self._account_id, chat_id), None
)
self._token_store.delete(self._account_id, chat_id)

Copilot uses AI. Check for mistakes.
Comment on lines 1432 to +1464
@@ -1429,6 +1441,31 @@ async def _send_text_chunk(
context_token=context_token,
client_id=client_id,
)
# Check iLink response for session-expired error
if resp and isinstance(resp, dict):
ret = resp.get("ret")
errcode = resp.get("errcode")
if (ret is not None and ret not in (0,)) or (errcode is not None and errcode not in (0,)):
is_session_expired = (
ret == SESSION_EXPIRED_ERRCODE
or errcode == SESSION_EXPIRED_ERRCODE
)
# Session expired — strip token and retry once
if is_session_expired and not retried_without_token and context_token:
retried_without_token = True
context_token = None
self._token_store._cache.pop(
self._token_store._key(self._account_id, chat_id), None
)
logger.warning(
"[%s] session expired for %s; retrying without context_token",
self.name, _safe_id(chat_id),
)
continue

Copilot AI Apr 14, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior (detect SESSION_EXPIRED_ERRCODE and retry once without context_token, plus clearing the token store) isn't covered by tests. Since this module already has unit tests for chunk retry behavior, add a test that simulates a ret/errcode = -14 response on the first attempt and asserts: (1) _send_message is called again with context_token=None, and (2) the token is removed from ContextTokenStore (including persistence if you add a delete API).

Copilot uses AI. Check for mistakes.
@teknium1

Copy link
Copy Markdown
Contributor

Merged as part of Batch-5 salvage: #11634

Your commit cherry-picked onto main with authorship preserved. Silent Weixin cron drops after overnight idle should now recover automatically — on errcode: -14 the send path strips the stale context_token, clears it from ContextTokenStore, and retries once without the token. _send_message now returns the API response dict so callers can inspect errcodes. Thanks for the careful diagnosis!

Commit SHA on main: e105b7a

@teknium1 teknium1 closed this Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants