fix(whastapp): bound connection startup waits#90486
Conversation
If Baileys fails to emit a 'connection.update' event with either 'open' or 'close' status (e.g. due to network issues or internal errors), the waitForWaConnection promise hangs forever, blocking the entire monitor loop. Add a configurable timeout (default 60s) that rejects the promise and cleans up the event listener if no connection state is received in time. The timeout is backward-compatible as an optional parameter with a sensible default.
- Test that promise rejects with descriptive error after timeout - Test that event listener is cleaned up after timeout - Test that timer is cleared when connection opens before timeout
The 60s default broke the QR login flow in login-qr.ts, which calls waitForWaConnection without a timeout and expects to wait up to 3 minutes while the user scans. Change the default to 0 (wait forever, matching original behavior) and pass the 60s timeout explicitly at the monitor callsite where it's actually needed.
|
Codex review: needs maintainer review before merge. Reviewed June 4, 2026, 8:23 PM ET / 00:23 UTC. Summary PR surface: Source +79, Tests +119. Total +198 across 12 files. Reproducibility: yes. from source inspection: current main and Review metrics: 1 noteworthy metric.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the bounded-wait implementation after maintainer acceptance of the startup-timeout semantics, keeping QR login/no-timeout compatibility and the existing Do we have a high-confidence way to reproduce the issue? Yes from source inspection: current main and Is this the best way to solve the issue? Yes, this appears to be the best targeted fix shape: centralizing timeout behavior in the shared waiter is cleaner than keeping per-caller AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against 5ba4eeceac72. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +79, Tests +119. Total +198 across 12 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
* fix: add timeout to waitForWaConnection to prevent indefinite hangs If Baileys fails to emit a 'connection.update' event with either 'open' or 'close' status (e.g. due to network issues or internal errors), the waitForWaConnection promise hangs forever, blocking the entire monitor loop. Add a configurable timeout (default 60s) that rejects the promise and cleans up the event listener if no connection state is received in time. The timeout is backward-compatible as an optional parameter with a sensible default. * test: add coverage for waitForWaConnection timeout path - Test that promise rejects with descriptive error after timeout - Test that event listener is cleaned up after timeout - Test that timer is cleared when connection opens before timeout * fix: default timeoutMs to 0 to preserve QR login behavior The 60s default broke the QR login flow in login-qr.ts, which calls waitForWaConnection without a timeout and expects to wait up to 3 minutes while the user scans. Change the default to 0 (wait forever, matching original behavior) and pass the 60s timeout explicitly at the monitor callsite where it's actually needed. * fix: bound whatsapp connection startup waits * fix: align web channel wait contract * fix: retry whatsapp setup timeouts * fix: satisfy whatsapp status lint * fix: preserve whatsapp wait compatibility --------- Co-authored-by: MMMMSSSS8899 <praelovk@gmail.com>
* fix: add timeout to waitForWaConnection to prevent indefinite hangs If Baileys fails to emit a 'connection.update' event with either 'open' or 'close' status (e.g. due to network issues or internal errors), the waitForWaConnection promise hangs forever, blocking the entire monitor loop. Add a configurable timeout (default 60s) that rejects the promise and cleans up the event listener if no connection state is received in time. The timeout is backward-compatible as an optional parameter with a sensible default. * test: add coverage for waitForWaConnection timeout path - Test that promise rejects with descriptive error after timeout - Test that event listener is cleaned up after timeout - Test that timer is cleared when connection opens before timeout * fix: default timeoutMs to 0 to preserve QR login behavior The 60s default broke the QR login flow in login-qr.ts, which calls waitForWaConnection without a timeout and expects to wait up to 3 minutes while the user scans. Change the default to 0 (wait forever, matching original behavior) and pass the 60s timeout explicitly at the monitor callsite where it's actually needed. * fix: bound whatsapp connection startup waits * fix: align web channel wait contract * fix: retry whatsapp setup timeouts * fix: satisfy whatsapp status lint * fix: preserve whatsapp wait compatibility --------- Co-authored-by: MMMMSSSS8899 <praelovk@gmail.com>
* fix: add timeout to waitForWaConnection to prevent indefinite hangs If Baileys fails to emit a 'connection.update' event with either 'open' or 'close' status (e.g. due to network issues or internal errors), the waitForWaConnection promise hangs forever, blocking the entire monitor loop. Add a configurable timeout (default 60s) that rejects the promise and cleans up the event listener if no connection state is received in time. The timeout is backward-compatible as an optional parameter with a sensible default. * test: add coverage for waitForWaConnection timeout path - Test that promise rejects with descriptive error after timeout - Test that event listener is cleaned up after timeout - Test that timer is cleared when connection opens before timeout * fix: default timeoutMs to 0 to preserve QR login behavior The 60s default broke the QR login flow in login-qr.ts, which calls waitForWaConnection without a timeout and expects to wait up to 3 minutes while the user scans. Change the default to 0 (wait forever, matching original behavior) and pass the 60s timeout explicitly at the monitor callsite where it's actually needed. * fix: bound whatsapp connection startup waits * fix: align web channel wait contract * fix: retry whatsapp setup timeouts * fix: satisfy whatsapp status lint * fix: preserve whatsapp wait compatibility --------- Co-authored-by: MMMMSSSS8899 <praelovk@gmail.com>
* fix: add timeout to waitForWaConnection to prevent indefinite hangs If Baileys fails to emit a 'connection.update' event with either 'open' or 'close' status (e.g. due to network issues or internal errors), the waitForWaConnection promise hangs forever, blocking the entire monitor loop. Add a configurable timeout (default 60s) that rejects the promise and cleans up the event listener if no connection state is received in time. The timeout is backward-compatible as an optional parameter with a sensible default. * test: add coverage for waitForWaConnection timeout path - Test that promise rejects with descriptive error after timeout - Test that event listener is cleaned up after timeout - Test that timer is cleared when connection opens before timeout * fix: default timeoutMs to 0 to preserve QR login behavior The 60s default broke the QR login flow in login-qr.ts, which calls waitForWaConnection without a timeout and expects to wait up to 3 minutes while the user scans. Change the default to 0 (wait forever, matching original behavior) and pass the 60s timeout explicitly at the monitor callsite where it's actually needed. * fix: bound whatsapp connection startup waits * fix: align web channel wait contract * fix: retry whatsapp setup timeouts * fix: satisfy whatsapp status lint * fix: preserve whatsapp wait compatibility --------- Co-authored-by: MMMMSSSS8899 <praelovk@gmail.com>
…26.6.5) (#963) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [ghcr.io/openclaw/openclaw](https://openclaw.ai) ([source](https://github.com/openclaw/openclaw)) | patch | `2026.6.1` → `2026.6.5` | --- ### Release Notes <details> <summary>openclaw/openclaw (ghcr.io/openclaw/openclaw)</summary> ### [`v2026.6.5`](https://github.com/openclaw/openclaw/blob/HEAD/CHANGELOG.md#202665) [Compare Source](openclaw/openclaw@v2026.6.1...v2026.6.5) ##### Highlights - QQBot now strips model reasoning/thinking scaffolding before native delivery, preventing raw `<thinking>` content from leaking into channel replies. ([#​89913](openclaw/openclaw#89913), [#​90132](openclaw/openclaw#90132)) Thanks [@​openperf](https://github.com/openperf). - MCP tool results now coerce `resource_link`, `resource`, `audio`, malformed image, and future non-text/image blocks at the materialize boundary, preventing Anthropic 400s and poisoned session history after a tool returns richer MCP content. ([#​90710](openclaw/openclaw#90710), [#​90728](openclaw/openclaw#90728)) Thanks [@​RanSHammer](https://github.com/RanSHammer) and [@​849261680](https://github.com/849261680). - Anthropic extended-thinking sessions recover after prompt-cache expiry or Gateway restart because stream start events wait for `message_start`, letting pre-generation signature errors trigger the existing recovery retry. ([#​90667](openclaw/openclaw#90667), [#​90697](openclaw/openclaw#90697)) Thanks [@​openperf](https://github.com/openperf). - Parallel is now a bundled `web_search` provider with `PARALLEL_API_KEY` discovery, guarded endpoint handling, cache-safe session ids, onboarding picker support, and docs. ([#​85158](openclaw/openclaw#85158)) Thanks [@​NormallyGaussian](https://github.com/NormallyGaussian). - Google Vertex ADC users get static catalog rows and runtime model resolution again, while single-provider cooldown recovery and memory adapter status checks are more reliable. ([#​90506](openclaw/openclaw#90506), [#​90609](openclaw/openclaw#90609), [#​90717](openclaw/openclaw#90717), [#​90816](openclaw/openclaw#90816)) Thanks [@​849261680](https://github.com/849261680). - Matrix can preflight voice notes before mention gating, preserve thread reads/replies through Matrix relations pagination, and carry QA coverage for voice and thread flows. ([#​78016](openclaw/openclaw#78016), [#​90415](openclaw/openclaw#90415)) - Auth and plugin install state is more durable: auth profiles now live in SQLite, official npm plugin install records keep their trusted pins, and prerelease fallback integrity checks avoid carrying stale integrity forward. ([#​89102](openclaw/openclaw#89102), [#​88585](openclaw/openclaw#88585)) - macOS node mode no longer silently self-reconnects away from a healthy direct Gateway session, reducing unexpected companion app session churn. ([#​90668](openclaw/openclaw#90668), [#​90815](openclaw/openclaw#90815)) Thanks [@​vrurg](https://github.com/vrurg). - Upgrade and service paths are safer: cron legacy JSON stores migrate during doctor preflight, service env placeholders no longer mask state-dir secrets, WhatsApp startup waits are bounded, and disabled WhatsApp accounts tear down on config reload. ([#​90072](openclaw/openclaw#90072), [#​90208](openclaw/openclaw#90208), [#​90277](openclaw/openclaw#90277), [#​90488](openclaw/openclaw#90488), [#​90486](openclaw/openclaw#90486), [#​87951](openclaw/openclaw#87951), [#​87965](openclaw/openclaw#87965)) Thanks [@​MonkeyLeeT](https://github.com/MonkeyLeeT), [@​sallyom](https://github.com/sallyom), [@​mcaxtr](https://github.com/mcaxtr), and [@​MukundaKatta](https://github.com/MukundaKatta). ##### Changes - Search/providers: add the Parallel bundled web-search plugin, live provider tests, registration contracts, onboarding/docs wiring, and guarded `api.parallel.ai/v1/search` support. ([#​85158](openclaw/openclaw#85158)) Thanks [@​NormallyGaussian](https://github.com/NormallyGaussian). - Matrix/channels: add voice-message preflight and thread-aware read/reply behavior, including Matrix QA scenario wiring and docs for voice-message behavior. ([#​78016](openclaw/openclaw#78016), [#​90415](openclaw/openclaw#90415)) - Skills/ClawHub: install ClawHub skills backed by GitHub repositories through the resolved install API, download the pinned GitHub commit, keep install-policy checks, and report install telemetry after success. ([#​90478](openclaw/openclaw#90478)) Thanks [@​Patrick-Erichsen](https://github.com/Patrick-Erichsen). - Google Chat/channels: add native approval card actions and click handling so Google Chat approvals use platform-native cards instead of generic message flow. - Mobile: Android provider/model screens now surface expiring, unavailable, unresolved, and attention states more clearly, while iOS settings and Talk tabs keep diagnostics, gateway rows, attachment labels, and unavailable Talk controls reachable. - Memory: QMD search can use the new rerank toggle, and memory adapter status uses the resolved default model identity when checking plain status. ([#​61834](openclaw/openclaw#61834)) - Docs/tooling: add Parallel search docs, refresh weather-skill guidance toward `web_fetch`, clarify legacy `openai-codex` auth, document release/test helper scripts, and tighten changed-test routing docs for CI/debugging work. ([#​90028](openclaw/openclaw#90028), [#​90250](openclaw/openclaw#90250)) Thanks [@​fuller-stack-dev](https://github.com/fuller-stack-dev). - Release/process: switch release trains to `YYYY.M.PATCH` monthly patch numbering, keep pre-transition tags compatible, and pin the June 2026 floor at `2026.6.5` after the published beta. - Platform maintenance: refresh Android, Swift/macOS, Docker, CodeQL, Buildx, Docker build/push, and Codex Action dependencies for this release train. ([#​74980](openclaw/openclaw#74980), [#​81757](openclaw/openclaw#81757), [#​86481](openclaw/openclaw#86481), [#​86483](openclaw/openclaw#86483), [#​90601](openclaw/openclaw#90601)) - QQBot: add `/bot-group-allways on|off` slash command (with named-account and default-account support) to toggle whether group messages require an `@mention` before the bot replies, and clear the runtime config snapshot after the write so the new account-level `defaultRequireMention` takes effect immediately without restart. ([#​91423](openclaw/openclaw#91423)) Thanks [@​cxyhhhhh](https://github.com/cxyhhhhh). ##### Fixes - Channel content boundaries: QQBot now strips reasoning/thinking tags before sending, preserving final answers while hiding internal model narration from users. ([#​89913](openclaw/openclaw#89913), [#​90132](openclaw/openclaw#90132)) Thanks [@​openperf](https://github.com/openperf). - Agents/MCP/providers: coerce non-text/image MCP tool-result blocks before they reach provider converters, preserving valid images and turning richer MCP content into text instead of malformed image blocks. ([#​90710](openclaw/openclaw#90710), [#​90728](openclaw/openclaw#90728)) Thanks [@​RanSHammer](https://github.com/RanSHammer) and [@​849261680](https://github.com/849261680). - Anthropic/Codex/ACP/agent recovery: defer Anthropic stream start events until `message_start`, strip stale compaction thinking signatures before Anthropic replay, detect unsigned thinking-only stalls, refresh prompt fences after compaction writes, reject empty completion handoffs, preserve parent streaming-off overrides/shared progress commentary, forward heartbeat metadata to context-engine hooks, and cover Codex session/thread migration edge cases. ([#​90667](openclaw/openclaw#90667), [#​90697](openclaw/openclaw#90697), [#​90163](openclaw/openclaw#90163), [#​90108](openclaw/openclaw#90108), [#​89874](openclaw/openclaw#89874), [#​89505](openclaw/openclaw#89505), [#​90632](openclaw/openclaw#90632), [#​89302](openclaw/openclaw#89302), [#​90729](openclaw/openclaw#90729), [#​90317](openclaw/openclaw#90317), [#​90319](openclaw/openclaw#90319)) Thanks [@​openperf](https://github.com/openperf), [@​100yenadmin](https://github.com/100yenadmin), and [@​ooiuuii](https://github.com/ooiuuii). - Provider/model resolution: preserve Google Vertex ADC auth markers in generated catalogs, re-probe a single-provider primary after cooldown, share Codex model visibility, fail closed for unknown model auth, preserve Codex alias availability, keep unresolved profile refs unknown, and avoid resolving auth while listing models. ([#​90506](openclaw/openclaw#90506), [#​90609](openclaw/openclaw#90609), [#​90717](openclaw/openclaw#90717), [#​90702](openclaw/openclaw#90702)) Thanks [@​849261680](https://github.com/849261680). - Gateway/macOS/mobile: avoid duplicate Gateway probe warnings by identity, rate-limit node pairing requests while preserving paired-node reconnects, keep macOS node mode on a healthy direct Gateway session, keep iOS diagnostics and gateway rows reachable, and avoid Linux ARM Gradle resource tasks during Android builds. ([#​85791](openclaw/openclaw#85791), [#​90147](openclaw/openclaw#90147), [#​90668](openclaw/openclaw#90668), [#​90815](openclaw/openclaw#90815)) Thanks [@​giodl73-repo](https://github.com/giodl73-repo) and [@​vrurg](https://github.com/vrurg). - TUI/chat/Workboard/auto-reply: optimistic user messages stay stable across stale history reloads, runId reassignment, and abort windows instead of disappearing, jumping, or lingering as ghost rows; Workboard stale lifecycle bulk updates no longer overwrite newer status/provenance; message-tool sends now count as delivery. ([#​86205](openclaw/openclaw#86205), [#​89600](openclaw/openclaw#89600), [#​88592](openclaw/openclaw#88592), [#​90123](openclaw/openclaw#90123)) Thanks [@​RomneyDa](https://github.com/RomneyDa). - Cron/update/service env: doctor config preflight now migrates legacy cron JSON stores into SQLite before runtime reads, service env planning skips unresolved placeholders that would mask state-dir `.env` values, and session transcript rewrites keep registry markers/discriminants consistent. ([#​90072](openclaw/openclaw#90072), [#​90208](openclaw/openclaw#90208), [#​90277](openclaw/openclaw#90277), [#​90488](openclaw/openclaw#90488)) Thanks [@​MonkeyLeeT](https://github.com/MonkeyLeeT) and [@​sallyom](https://github.com/sallyom). - Security/config/tooling: guard MCP HTTP redirects, protect global agent config defaults, and keep release/test/tooling proof failures bounded and explicit. ([#​89732](openclaw/openclaw#89732), [#​90145](openclaw/openclaw#90145)) - Channels: WhatsApp restarts when per-account config changes, bounds background startup waits, closes failed sockets, and preserves reconnect behavior; Mattermost slash commands keep their state on `globalThis`; Feishu streaming cards preserve full merged content; voice-call tracks Twilio streams after connect; ClickClack reply tools respect `toolsAllow`. ([#​87951](openclaw/openclaw#87951), [#​87965](openclaw/openclaw#87965), [#​90486](openclaw/openclaw#90486), [#​68113](openclaw/openclaw#68113), [#​90534](openclaw/openclaw#90534), [#​90181](openclaw/openclaw#90181), [#​90607](openclaw/openclaw#90607), [#​89500](openclaw/openclaw#89500)) Thanks [@​MukundaKatta](https://github.com/MukundaKatta), [@​mcaxtr](https://github.com/mcaxtr), [@​infoanton](https://github.com/infoanton), [@​mushuiyu886](https://github.com/mushuiyu886), and [@​sahibzada-allahyar](https://github.com/sahibzada-allahyar). - Feishu: retry transient send rate-limit errors (HTTP 429, per-chat code 230020, tenant-level code 11232) with linear backoff, including SDK responses that fulfill with rate-limit bodies instead of throwing, and route streaming-card sends through the retry wrapper. ([#​89659](openclaw/openclaw#89659)) Thanks [@​ladygege](https://github.com/ladygege). - Release/CI/E2E: main CI guard drift, PR merge diff scoping, live Docker credential staging, base-image qualification, installer Docker classification, Playwright dependency install recovery, API-key auth for Codex live Docker lanes, Parallels option terminators, and JSON-mode progress handling are tighter so release proof fails cleaner. ([#​90532](openclaw/openclaw#90532), [#​90287](openclaw/openclaw#90287), [#​90058](openclaw/openclaw#90058)) Thanks [@​RomneyDa](https://github.com/RomneyDa), [@​hxy91819](https://github.com/hxy91819), and [@​mrunalp](https://github.com/mrunalp). - Release/CI/E2E: Docker E2E and live Docker harness runs now apply default memory, CPU, and process ceilings while preserving explicit per-lane overrides. - Release/CI/E2E: plugin lifecycle matrix resource sampling now fails phases that exceed RSS, wall-clock, or CPU ceilings instead of only logging the measurements. - Release/CI/E2E: Codex npm plugin live assertions now cap transcript discovery and diagnostic log reads so failure proof stays bounded. - Tests/state isolation: QA Lab valid-tool-call metrics now require runtime tool-call evidence when runtime parity data is available instead of counting tool-backed scenario pass status alone. - Tests/state isolation: QA Lab runtime parity now fails planned-only tool-call rows without matching tool results instead of treating matching mock plans as real tool evidence. - Tests/state isolation: provider, media, auth, cron, task, session, sandbox, Gateway, and Codex timeout fixtures now scope more home/state/env data per test, reducing cross-test leakage and making release validation failures less noisy. ([#​90027](openclaw/openclaw#90027), [#​89974](openclaw/openclaw#89974)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about these updates again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My4xMDEuMSIsInVwZGF0ZWRJblZlciI6IjQzLjEwMS4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9jb250YWluZXIiLCJ0eXBlL3BhdGNoIl19--> Reviewed-on: https://git.erwanleboucher.dev/eleboucher/homelab/pulls/963
Summary
mainbranch after that PR was repeatedly closed.waitForWaConnectionaccept an explicit wait policy while preserving old one-argument callers as no-timeout waits.Real behavior proof
Behavior addressed: WhatsApp Web startup could wait forever if the socket never reached
openorclose, preventing reconnect logic from running.Real environment tested: Blacksmith Testbox Ubuntu runner using this branch checkout.
Exact steps or command run after this patch:
corepack pnpm test extensions/whatsapp/src/session.test.ts extensions/whatsapp/src/connection-controller.test.ts extensions/whatsapp/src/inbound.media.test.ts extensions/whatsapp/src/qa-driver.runtime.test.ts extensions/whatsapp/src/auto-reply.web-auto-reply.connection-and-logging.e2e.test.ts src/plugins/runtime/runtime-web-channel-plugin.test.tsEvidence after fix: Testbox
tbx_01ktah9tw9jn2r5a74sgx72wrfpassed 3 Vitest shards: runtime wrapper, WhatsApp auto-reply e2e, and WhatsApp extension unit tests.Observed result after fix: Bounded startup waits reject with status
408, reconnect setup timeouts retry through the existing controller, failed sockets are closed, and QR login / one-argument dynamic callers keep the old no-timeout behavior.What was not tested: A live WhatsApp account QR scan against the production WhatsApp service was not run in this branch.
Verification
git diff --checktbx_01ktah9tw9jn2r5a74sgx72wrf: focused WhatsApp/runtime tests passedtbx_01ktah9tt34vm7dmafwtx8v2y3:corepack pnpm lint:extensionspassed