fix(weixin): retry send without context_token on iLink session expiry by jinzheng8115 · Pull Request #9928 · NousResearch/hermes-agent

jinzheng8115 · 2026-04-14T23:48:27Z

Problem

iLink context_token has a limited TTL. When no user message has arrived for an extended period (e.g. overnight), cron-initiated pushes fail with errcode -14 (session timeout). All scheduled push messages (weather reports, digests, etc.) are silently dropped.

Root Cause

Every outbound message echoes the latest context_token for the peer. This token expires if no inbound message refreshes it. The _send_text_chunk method retries on generic errors but treats session expiry the same as any other failure — it retries with the same stale token, which keeps failing.

Fix

When _send_text_chunk catches an iLinkDeliveryError with session-expired errcode (-14), it now:

Strips the expired context_token
Clears the stale token from ContextTokenStore
Retries the send without context_token

Verified that iLink accepts sends without context_token as a degraded fallback — the message is delivered successfully.

Testing

All 34 existing test_weixin.py tests pass
Manually verified: iLink sendmessage returns {} (success) without context_token
Manually verified: cron weather push now succeeds after overnight session expiry

Files Changed

gateway/platforms/weixin.py — _send_text_chunk() adds session-expired detection and tokenless retry

iLink context_token has a limited TTL. When no user message has arrived for an extended period (e.g. overnight), cron-initiated pushes fail with errcode -14 (session timeout). Tested that iLink accepts sends without context_token as a degraded fallback, so we now automatically strip the expired token and retry once. This keeps scheduled push messages (weather, digests, etc.) working reliably without requiring a user message to refresh the session first. Changes: - _send_text_chunk() catches iLinkDeliveryError with session-expired errcode (-14) and retries without context_token - Stale tokens are cleared from ContextTokenStore on session expiry - All 34 existing weixin tests pass

Copilot

Pull request overview

This PR improves Weixin iLink outbound delivery resilience by detecting session-expiry (errcode/ret == -14) and retrying sends without a context_token, preventing cron-initiated pushes from being dropped after long idle periods.

Changes:

Change _send_message() to return the raw iLink response dict so callers can inspect ret/errcode.
In _send_text_chunk(), detect session-expired responses and retry once without context_token, attempting to clear the cached token before retrying.

Comments suppressed due to low confidence (1)

gateway/platforms/weixin.py:426

_send_message() now returns the raw response dict, but it still doesn't validate/raise on non-zero ret/errcode. This makes it easy for other call sites to ignore failures (there is at least one other _send_message call in this module that doesn't inspect the response). Consider centralizing response validation inside _send_message (or a helper) so errors can't be silently dropped, and let _send_text_chunk handle the special -14 retry path.

    """Send a text message via iLink sendmessage API.

    Returns the raw API response dict (may contain error codes like
    ``errcode: -14`` for session expiry that the caller can inspect).
    """
    if not text or not text.strip():
        raise ValueError("_send_message: text must not be empty")
    message: Dict[str, Any] = {
        "from_user_id": "",
        "to_user_id": to,
        "client_id": client_id,
        "message_type": MSG_TYPE_BOT,
        "message_state": MSG_STATE_FINISH,
        "item_list": [{"type": ITEM_TEXT, "text_item": {"text": text}}],
    }
    if context_token:
        message["context_token"] = context_token
    return await _api_post(
        session,
        base_url=base_url,
        endpoint=EP_SEND_MESSAGE,
        payload={"msg": message},
        token=token,
        timeout_ms=API_TIMEOUT_MS,
    )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-14T23:51:46Z

+                            self._token_store._cache.pop(
+                                self._token_store._key(self._account_id, chat_id), None
+                            )


Clearing the stale context token via self._token_store._cache.pop(...) only removes it from the in-memory cache and does not persist the deletion to disk (ContextTokenStore only persists on set()). After a restart, restore() will reload the expired token and the first send will fail again. Add a public delete()/clear() method on ContextTokenStore that pops the key and calls _persist(account_id), and use that here instead of reaching into _cache/_key private members.

Suggested change

self._token_store._cache.pop(

self._token_store._key(self._account_id, chat_id), None

)

self._token_store.delete(self._account_id, chat_id)

Copilot · 2026-04-14T23:51:47Z

@@ -1429,6 +1441,31 @@ async def _send_text_chunk(
                    context_token=context_token,
                    client_id=client_id,
                )
+                # Check iLink response for session-expired error
+                if resp and isinstance(resp, dict):
+                    ret = resp.get("ret")
+                    errcode = resp.get("errcode")
+                    if (ret is not None and ret not in (0,)) or (errcode is not None and errcode not in (0,)):
+                        is_session_expired = (
+                            ret == SESSION_EXPIRED_ERRCODE
+                            or errcode == SESSION_EXPIRED_ERRCODE
+                        )
+                        # Session expired — strip token and retry once
+                        if is_session_expired and not retried_without_token and context_token:
+                            retried_without_token = True
+                            context_token = None
+                            self._token_store._cache.pop(
+                                self._token_store._key(self._account_id, chat_id), None
+                            )
+                            logger.warning(
+                                "[%s] session expired for %s; retrying without context_token",
+                                self.name, _safe_id(chat_id),
+                            )
+                            continue


New behavior (detect SESSION_EXPIRED_ERRCODE and retry once without context_token, plus clearing the token store) isn't covered by tests. Since this module already has unit tests for chunk retry behavior, add a test that simulates a ret/errcode = -14 response on the first attempt and asserts: (1) _send_message is called again with context_token=None, and (2) the token is removed from ContextTokenStore (including persistence if you add a delete API).

teknium1 · 2026-04-17T13:51:06Z

Merged as part of Batch-5 salvage: #11634

Your commit cherry-picked onto main with authorship preserved. Silent Weixin cron drops after overnight idle should now recover automatically — on errcode: -14 the send path strips the stale context_token, clears it from ContextTokenStore, and retries once without the token. _send_message now returns the API response dict so callers can inspect errcodes. Thanks for the careful diagnosis!

Commit SHA on main: e105b7a

Copilot AI review requested due to automatic review settings April 14, 2026 23:48

Copilot started reviewing on behalf of jinzheng8115 April 14, 2026 23:48 View session

Copilot AI reviewed Apr 14, 2026

View reviewed changes

teknium1 mentioned this pull request Apr 17, 2026

fix(weixin): Batch-5 polish — SSRF allowlist, qrcode dep, session retry, macOS SSL, signature alignment #11634

Merged

teknium1 closed this Apr 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(weixin): retry send without context_token on iLink session expiry#9928

fix(weixin): retry send without context_token on iLink session expiry#9928
jinzheng8115 wants to merge 1 commit into
NousResearch:mainfrom
jinzheng8115:fix/ilink-session-expired-clean

jinzheng8115 commented Apr 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

Copilot AI Apr 14, 2026

Uh oh!

teknium1 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jinzheng8115 commented Apr 14, 2026

Problem

Root Cause

Fix

Testing

Files Changed

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

teknium1 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants