fix: reduce WeChat channel cold-start delay from ~2min to ~5s by lefarcen · Pull Request #646 · nexu-io/nexu

lefarcen · 2026-03-28T15:11:02Z

What

Reduce WeChat channel first-response latency from ~2 minutes to ~5 seconds after connecting or restarting.

Why

After connecting a WeChat channel for the first time (or reconnecting after a desktop restart), the first message takes up to 2 minutes to get a response. Users perceive the bot as broken. The delay compounds from three independent causes:

Full gateway restart (~20-45s): Unlike Feishu (which has a prewarm account), WeChat had no config subtree until first connect → OpenClaw couldn't match it to any loaded plugin's reload prefixes → full process restart instead of hot-reload.
35s initial long-poll timeout: After extension starts, the first getUpdates() used the full 35s timeout, delaying message pickup.
No readiness wait: connectWechat() returned immediately after syncAll(), so the UI had no way to know the channel was actually ready.

Closes #610

How

Three compounding fixes:

WeChat prewarm account (channel-binding-compiler.ts): Always include openclaw-weixin in the compiled config with a disabled prewarm account (same pattern as Feishu at L140-149). When the user later connects a real WeChat account, it's an account-level change that triggers a fast hot-reload (~500ms) instead of a full gateway restart (~20-45s).
Short initial poll timeout (monitor.ts): First 3 polls use a 3s timeout instead of 35s. This quickly picks up any messages queued during startup. After the initial phase, switches to the normal 35s long-poll. Server-suggested timeout overrides both.
Readiness wait (channel-service.ts): Added waitForWechatReady() (same pattern as existing waitForWhatsappReady()) that polls channel readiness for up to 30s after connect. The UI now shows accurate status.

Affected areas

Controller (backend / API)
OpenClaw runtime

Checklist

pnpm typecheck passes
pnpm lint passes
pnpm test passes (519 passed)
pnpm generate-types run (if API routes/schemas changed) — N/A, no API route changes
No credentials or tokens in code or logs
No any types introduced (use unknown with narrowing)

Notes for reviewers

The prewarm pattern is identical to the existing Feishu prewarm at channel-binding-compiler.ts:140-149. The disabled prewarm account (enabled: false) is ignored by OpenClaw's channel manager but keeps the plugin loaded.
The WeChat plugin's startAccount() already gracefully handles unconfigured accounts (throws before starting the monitor), so the prewarm account won't cause any side effects.
The initial short-poll phase (3s × 3) adds minimal overhead (~9s of short polls) but dramatically reduces first-message latency.

https://claude.ai/code/session_01CSC1RKaRB9F7C4t3HQGpSs

Three changes that compound to eliminate the cold-start latency: 1. Add WeChat prewarm account (like Feishu) so openclaw-weixin is always in the config from first boot. When the user later connects WeChat, it's an account-level change → hot-reload (~500ms) instead of full gateway restart (~20-45s). 2. Use shorter initial poll timeout (3s × 3 polls) in the WeChat monitor before switching to the normal 35s long-poll. Picks up messages queued during startup within seconds instead of up to 35s. 3. Add waitForWechatReady() after connect (like WhatsApp) so the UI shows accurate status and users don't message a not-yet-started channel. Closes #610 https://claude.ai/code/session_01CSC1RKaRB9F7C4t3HQGpSs

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e450cca18

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Change normalTimeoutMs from const to let so it stays in sync when the server returns a longpolling_timeout_ms value. Prevents a theoretical regression where the initial-poll phase end would fall back to the original default instead of the server-suggested value. https://claude.ai/code/session_01CSC1RKaRB9F7C4t3HQGpSs

When disconnecting a WeChat channel, remove the account's credential file, sync state, and accounts.json index entry from the OpenClaw state directory. Without this cleanup, old accounts accumulate across disconnect/reconnect cycles and all start on the next cold boot, wasting resources and causing session-expired errors.

Cover prewarm config compilation (4 tests), connect readiness polling (2 tests), disconnect state cleanup (5 tests) including multi-cycle accumulation regression.

Three fixes based on code review: 1. syncWeixinAccountIndex is now authoritative: only keeps account IDs present in the current config, and filters out __nexu_internal_* prewarm IDs. Prevents ghost accounts in accounts.json. 2. disconnectChannel cleanup runs BEFORE syncAll so the config writer never sees stale credential files during index sync. 3. connectWechat now rolls back (disconnect + cleanup) when readiness times out after 30s, matching the WhatsApp pattern. Previously it returned success even when the channel wasn't actually ready.

JiwaniZakir

The prewarm sentinel constant INTERNAL_WECHAT_PREWARM_ACCOUNT_ID is defined in channel-binding-compiler.ts but never imported by openclaw-config-writer.ts, which instead relies on the magic string prefix "__nexu_internal_" to filter it out. This implicit naming convention is fragile — a future internal account whose ID doesn't start with that prefix would silently leak into the persisted accounts.json index. Exporting a shared NEXU_INTERNAL_ACCOUNT_PREFIX constant (or an explicit set of reserved IDs) from a shared module would make this coupling explicit and type-safe.

Additionally, in the rollback path inside connectWechat, cleanupWechatAccountState directly mutates accounts.json on disk and then syncAll() is called immediately after. syncWeixinAccountIndex will rewrite the same file based on the authoritative config, making the manual index mutation in cleanupWechatAccountState redundant for the rollback case specifically. Clarifying which path is responsible for index cleanup — the direct file manipulation or the authoritative config writer — would reduce the risk of them diverging in future edits.

Address community review feedback: 1. Extract NEXU_INTERNAL_ACCOUNT_PREFIX as a shared exported constant from channel-binding-compiler.ts. The config writer now imports it instead of relying on a magic string prefix. 2. cleanupWechatAccountState only deletes credential/sync files — index reconciliation is exclusively owned by the authoritative config writer during syncAll(). This eliminates the dual-write ambiguity.

lefarcen · 2026-03-29T14:00:11Z

@JiwaniZakir Thanks for the thorough review — both points are spot-on.

On the magic prefix: Extracted NEXU_INTERNAL_ACCOUNT_PREFIX as a shared exported constant from channel-binding-compiler.ts. The config writer now imports it directly instead of relying on a hardcoded string. The prewarm account IDs are also derived from this prefix via template literals, so any future internal accounts will be automatically filtered.

On the dual index mutation: Agreed — the rollback path's cleanupWechatAccountState no longer touches accounts.json. Index reconciliation is now exclusively owned by the authoritative config writer during syncAll(). The cleanup method only removes credential and sync files (which is appropriate since a failed connect produces useless credentials). This makes the ownership boundary clear: files → direct cleanup, index → writer.

Both changes are in 7f4f126. Thanks for catching these!

JiwaniZakir · 2026-03-29T14:13:48Z

The three-pronged decomposition here is solid — each fix is independently correct and the combined effect makes sense. One thing worth confirming: the prewarm account approach for WeChat mirrors the Feishu pattern, so make sure the config subtree key used during prewarm doesn't collide with a real user account prefix if someone happens to register a WeChat ID matching it. Also, dropping the initial getUpdates() timeout to something short (< 5s) is the right call, but verify the fallback behavior if the first poll returns empty — you don't want a tight retry loop burning CPU before the channel is fully authenticated.

lefarcen · 2026-03-30T02:56:29Z

@JiwaniZakir Good points to double-check — both are covered:

Prewarm ID collision: The prewarm sentinel is __nexu_internal_wechat_prewarm__ (prefixed with __nexu_internal_), while real account IDs are cuid2-generated (e.g. a4946e575b9e-im-bot). No realistic collision risk, and the config writer now explicitly filters by the shared NEXU_INTERNAL_ACCOUNT_PREFIX constant as an extra guard.
Short-poll empty response: The monitor has a RETRY_DELAY_MS = 2_000 sleep between each poll iteration regardless of response content, and initialPollsRemaining only decrements on successful (non-error) responses. So an empty first poll just waits 2s and retries — no tight loop.

Thanks for keeping an eye on these edge cases!

JiwaniZakir · 2026-03-30T02:57:43Z

The NEXU_INTERNAL_ACCOUNT_PREFIX extraction is the right call — relying on a hardcoded string in two separate places for collision avoidance was a latent bug waiting to happen. One thing worth double-checking: the readiness wait added to connectWechat() should have a bounded timeout rather than blocking in, otherwise a failed prewarm (e.g. network issue during initial sync) would hang the connect flow instead of surfacing an error. If that's already handled upstream in the caller, ignore this.

lefarcen · 2026-03-30T02:58:14Z

@JiwaniZakir Already handled — waitForWechatReady() has a 30s deadline (Date.now() + 30_000). If it times out, connectWechat() rolls back the channel and throws, so the caller gets a clear error instead of a hang.

JiwaniZakir · 2026-03-30T03:33:18Z

on the prewarm account leaking into syncWeixinAccountIndex() — the fix should filter it out at the merge boundary, similar to how internal sentinel values are excluded elsewhere. Adding a guard like if (accountId.startsWith('__nexu_internal_')) continue; in syncWeixinAccountIndex() before the merge would prevent it from persisting without requiring callers to be aware of the internal ID. Alternatively, if the prewarm account needs to stay out of the index entirely, the cleaner approach is to give it a dedicated config subtree that OpenClaw recognizes for hot-reload matching but never passes to the account sync path.

lefarcen · 2026-03-30T04:12:50Z

@chatgpt-codex-connector This is already addressed in the latest commits:

syncWeixinAccountIndex() is now authoritative — it only keeps IDs present in the current config, and explicitly filters out any ID starting with NEXU_INTERNAL_ACCOUNT_PREFIX (__nexu_internal_). The prewarm ID never lands in accounts.json.
The writer also runs an orphan file sweep on openclaw-weixin/accounts/, removing credential files for accounts no longer in the authoritative set — so listStoredWeixinAccountIds() can't resurrect them either.

See commits f799947 and 7f4f126.

chatgpt-codex-connector · 2026-03-30T04:13:01Z

To use Codex here, create an environment for this repo.

JiwaniZakir · 2026-03-30T04:16:25Z

The bounded timeout on the readiness wait is the critical piece here — without it, a failed prewarm silently blocks connectWechat() in. Worth confirming the fallback path when the timeout fires actually surfaces an error to the caller rather than swallowing it and returning a false-ready state. Also, dropping the initial getUpdates() timeout from 35s to something shorter is a good fix, but make sure the reduced value is configurable or at least documented, since aggressive polling on reconnect could interact poorly with WeChat's rate limits in high-restart scenarios.

JiwaniZakir · 2026-03-30T07:13:41Z

The three-pronged approach makes sense — each fix is independently valid, but together they compound nicely just like the original delays did. One thing worth verifying: the reduced initial long-poll timeout (presumably dropping from 35s to something much shorter) should have a fallback or backoff strategy to avoid hammering the WeChat endpoint if the channel is genuinely idle at startup. Also worth confirming the prewarm account pattern mirrors exactly what Feishu does so the two code paths don't diverge in maintenance burden over time.

chatgpt-codex-connector Bot reviewed Mar 28, 2026

View reviewed changes

Comment thread apps/controller/src/lib/channel-binding-compiler.ts

claude and others added 5 commits March 29, 2026 06:31

test: add WeChat channel connect/disconnect lifecycle tests

f537191

Cover prewarm config compilation (4 tests), connect readiness polling (2 tests), disconnect state cleanup (5 tests) including multi-cycle accumulation regression.

fix: apply biome useLiteralKeys lint fix

d2d2c60

JiwaniZakir reviewed Mar 29, 2026

View reviewed changes

mrcfps approved these changes Mar 30, 2026

View reviewed changes

lefarcen merged commit 40605df into main Mar 30, 2026
11 checks passed

This was referenced Mar 30, 2026

fix: remove blocking readiness wait from connectWechat #682

Merged

🚀 release: v0.1.8 #690

Merged

Conversation

lefarcen commented Mar 28, 2026

What

Why

How

Affected areas

Checklist

Notes for reviewers

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

JiwaniZakir left a comment

Choose a reason for hiding this comment

Uh oh!

lefarcen commented Mar 29, 2026

Uh oh!

JiwaniZakir commented Mar 29, 2026

Uh oh!

lefarcen commented Mar 30, 2026

Uh oh!

JiwaniZakir commented Mar 30, 2026

Uh oh!

lefarcen commented Mar 30, 2026

Uh oh!

JiwaniZakir commented Mar 30, 2026

Uh oh!

lefarcen commented Mar 30, 2026

Uh oh!

chatgpt-codex-connector Bot commented Mar 30, 2026

Uh oh!

JiwaniZakir commented Mar 30, 2026

Uh oh!

Uh oh!

JiwaniZakir commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants