fix: reduce WeChat channel cold-start delay from ~2min to ~5s#646
fix: reduce WeChat channel cold-start delay from ~2min to ~5s#646
Conversation
Three changes that compound to eliminate the cold-start latency: 1. Add WeChat prewarm account (like Feishu) so openclaw-weixin is always in the config from first boot. When the user later connects WeChat, it's an account-level change → hot-reload (~500ms) instead of full gateway restart (~20-45s). 2. Use shorter initial poll timeout (3s × 3 polls) in the WeChat monitor before switching to the normal 35s long-poll. Picks up messages queued during startup within seconds instead of up to 35s. 3. Add waitForWechatReady() after connect (like WhatsApp) so the UI shows accurate status and users don't message a not-yet-started channel. Closes #610 https://claude.ai/code/session_01CSC1RKaRB9F7C4t3HQGpSs
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9e450cca18
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Change normalTimeoutMs from const to let so it stays in sync when the server returns a longpolling_timeout_ms value. Prevents a theoretical regression where the initial-poll phase end would fall back to the original default instead of the server-suggested value. https://claude.ai/code/session_01CSC1RKaRB9F7C4t3HQGpSs
When disconnecting a WeChat channel, remove the account's credential file, sync state, and accounts.json index entry from the OpenClaw state directory. Without this cleanup, old accounts accumulate across disconnect/reconnect cycles and all start on the next cold boot, wasting resources and causing session-expired errors.
Cover prewarm config compilation (4 tests), connect readiness polling (2 tests), disconnect state cleanup (5 tests) including multi-cycle accumulation regression.
Three fixes based on code review: 1. syncWeixinAccountIndex is now authoritative: only keeps account IDs present in the current config, and filters out __nexu_internal_* prewarm IDs. Prevents ghost accounts in accounts.json. 2. disconnectChannel cleanup runs BEFORE syncAll so the config writer never sees stale credential files during index sync. 3. connectWechat now rolls back (disconnect + cleanup) when readiness times out after 30s, matching the WhatsApp pattern. Previously it returned success even when the channel wasn't actually ready.
JiwaniZakir
left a comment
There was a problem hiding this comment.
The prewarm sentinel constant INTERNAL_WECHAT_PREWARM_ACCOUNT_ID is defined in channel-binding-compiler.ts but never imported by openclaw-config-writer.ts, which instead relies on the magic string prefix "__nexu_internal_" to filter it out. This implicit naming convention is fragile — a future internal account whose ID doesn't start with that prefix would silently leak into the persisted accounts.json index. Exporting a shared NEXU_INTERNAL_ACCOUNT_PREFIX constant (or an explicit set of reserved IDs) from a shared module would make this coupling explicit and type-safe.
Additionally, in the rollback path inside connectWechat, cleanupWechatAccountState directly mutates accounts.json on disk and then syncAll() is called immediately after. syncWeixinAccountIndex will rewrite the same file based on the authoritative config, making the manual index mutation in cleanupWechatAccountState redundant for the rollback case specifically. Clarifying which path is responsible for index cleanup — the direct file manipulation or the authoritative config writer — would reduce the risk of them diverging in future edits.
Address community review feedback: 1. Extract NEXU_INTERNAL_ACCOUNT_PREFIX as a shared exported constant from channel-binding-compiler.ts. The config writer now imports it instead of relying on a magic string prefix. 2. cleanupWechatAccountState only deletes credential/sync files — index reconciliation is exclusively owned by the authoritative config writer during syncAll(). This eliminates the dual-write ambiguity.
|
@JiwaniZakir Thanks for the thorough review — both points are spot-on. On the magic prefix: Extracted On the dual index mutation: Agreed — the rollback path's Both changes are in 7f4f126. Thanks for catching these! |
|
The three-pronged decomposition here is solid — each fix is independently correct and the combined effect makes sense. One thing worth confirming: the prewarm account approach for WeChat mirrors the Feishu pattern, so make sure the config subtree key used during prewarm doesn't collide with a real user account prefix if someone happens to register a WeChat ID matching it. Also, dropping the initial |
|
@JiwaniZakir Good points to double-check — both are covered:
Thanks for keeping an eye on these edge cases! |
|
The |
|
@JiwaniZakir Already handled — |
|
on the prewarm account leaking into |
|
@chatgpt-codex-connector This is already addressed in the latest commits:
|
|
To use Codex here, create an environment for this repo. |
|
The bounded timeout on the readiness wait is the critical piece here — without it, a failed prewarm silently blocks |
|
The three-pronged approach makes sense — each fix is independently valid, but together they compound nicely just like the original delays did. One thing worth verifying: the reduced initial long-poll timeout (presumably dropping from 35s to something much shorter) should have a fallback or backoff strategy to avoid hammering the WeChat endpoint if the channel is genuinely idle at startup. Also worth confirming the prewarm account pattern mirrors exactly what Feishu does so the two code paths don't diverge in maintenance burden over time. |
What
Reduce WeChat channel first-response latency from ~2 minutes to ~5 seconds after connecting or restarting.
Why
After connecting a WeChat channel for the first time (or reconnecting after a desktop restart), the first message takes up to 2 minutes to get a response. Users perceive the bot as broken. The delay compounds from three independent causes:
getUpdates()used the full 35s timeout, delaying message pickup.connectWechat()returned immediately aftersyncAll(), so the UI had no way to know the channel was actually ready.Closes #610
How
Three compounding fixes:
WeChat prewarm account (
channel-binding-compiler.ts): Always includeopenclaw-weixinin the compiled config with a disabled prewarm account (same pattern as Feishu at L140-149). When the user later connects a real WeChat account, it's an account-level change that triggers a fast hot-reload (~500ms) instead of a full gateway restart (~20-45s).Short initial poll timeout (
monitor.ts): First 3 polls use a 3s timeout instead of 35s. This quickly picks up any messages queued during startup. After the initial phase, switches to the normal 35s long-poll. Server-suggested timeout overrides both.Readiness wait (
channel-service.ts): AddedwaitForWechatReady()(same pattern as existingwaitForWhatsappReady()) that polls channel readiness for up to 30s after connect. The UI now shows accurate status.Affected areas
Checklist
pnpm typecheckpassespnpm lintpassespnpm testpasses (519 passed)pnpm generate-typesrun (if API routes/schemas changed) — N/A, no API route changesanytypes introduced (useunknownwith narrowing)Notes for reviewers
channel-binding-compiler.ts:140-149. The disabled prewarm account (enabled: false) is ignored by OpenClaw's channel manager but keeps the plugin loaded.startAccount()already gracefully handles unconfigured accounts (throws before starting the monitor), so the prewarm account won't cause any side effects.https://claude.ai/code/session_01CSC1RKaRB9F7C4t3HQGpSs