fix: stop treating idle WhatsApp sessions as stale sockets#47513
fix: stop treating idle WhatsApp sessions as stale sockets#47513jeffrey4341 wants to merge 2 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR fixes a deterministic false-positive reconnect loop affecting low-traffic WhatsApp accounts by removing two places where "inbox quiet" was incorrectly treated as "socket dead." The fix is conservative and well-scoped: it skips the stale-socket heuristic for WhatsApp at the gateway layer and makes the channel-layer no-message watchdog opt-in (defaulting to
Confidence Score: 4/5
|
|
Updated this PR branch to current Local verification before push:
This matters because the prior failing I’ll watch the rerun and only keep patching if the updated branch still shows failures that look specific to this PR. |
|
Thanks for the PR and for working on this. We checked the current main branch, and this fix is already in main via #60007, which landed in commit ff62705. The WhatsApp watchdog now resets Because that has landed, I'm closing this PR as superseded by #60007. Thanks again for the work here. If you think this closure is mistaken and your PR still fixes something meaningfully different on current main, feel free to open a new PR with that explanation. |
Summary
Fixes #34155.
Helps with #46372 by preventing the false restart loop that was dropping outbound replies during unnecessary reconnects.
Root cause
lastEventAtfor WhatsApp currently tracks inbound message flow, not a trustworthy socket-liveness signal. On low-traffic accounts that means:connected=true+lastEventAt older than 30mstale-socketThe local gateway logs from the affected deployment showed repeated
health-monitor: restarting (reason: stale-socket)entries with no corresponding listener-side close / disconnect errors, which points to an idle false positive rather than a real dead-socket event.The WhatsApp monitor also had a second false-positive path: after the first inbound message, a quiet inbox for 30 minutes could trigger the internal watchdog even if the socket was otherwise healthy.
Why this approach
A quiet WhatsApp inbox is normal. Until the channel exposes a real liveness proof, treating missing inbound messages as proof of socket death is worse than the failure mode it tries to catch: it creates deterministic restart churn and can drop outbound replies during the restart window.
This change takes the conservative route:
That preserves real close/disconnect handling and reconnect logic, while removing the deterministic false-positive restart loop.
Test plan
corepack pnpm exec vitest run src/gateway/channel-health-policy.test.ts src/gateway/channel-health-monitor.test.tscorepack pnpm exec vitest run --config <temp config> extensions/whatsapp/src/auto-reply.web-auto-reply.connection-and-logging.e2e.test.tsPATH="/tmp/openclaw-pr/node_modules/.bin:$PATH" corepack pnpm check