fix(whatsapp): retry reconnect loop on initial connection failure#9727
fix(whatsapp): retry reconnect loop on initial connection failure#9727luizlf wants to merge 2 commits intoopenclaw:mainfrom
Conversation
src/web/auto-reply.reconnects-after-initial-connection-failure.test.ts
Outdated
Show resolved
Hide resolved
When DNS or network errors occur during initial WhatsApp connection (e.g., ENOTFOUND web.whatsapp.com), the reconnect loop now catches the error and retries with backoff, instead of exiting entirely. Fixes openclaw#2198
76011d5 to
465acbc
Compare
CI Failure NoteThe failing check ( This test was introduced by commit All tests from this PR pass on all platforms, including the new |
|
Fixes #13371 |
bfc1ccb to
f92900f
Compare
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
This comment was marked as spam.
|
Agree @nikolasdehor. Great job bringing more attention to this. |
nikolasdehor
left a comment
There was a problem hiding this comment.
I previously reviewed and approved PR #14484 by @onthway, which was closed as a duplicate of this PR. Both address the same root cause — #13371 (WhatsApp permanent disconnect on DNS/timeout), which I've been tracking since it was filed.
Core fix comparison:
Both PRs take the same fundamental approach: wrap the listenerFactory/monitorWebInbox() call inside the reconnect while(true) loop with a try-catch, so connection-phase errors (DNS failures, TLS handshake errors, etc.) flow into the existing backoff/retry path instead of crashing the loop. This is the correct fix.
What #14484 had that this PR is missing:
The socket leak fix in src/web/inbound/monitor.ts. When waitForWaConnection(sock) throws, the already-created socket from createWaSocket() is never closed. #14484 addressed this with:
try {
await waitForWaConnection(sock);
} catch (err) {
try {
sock.ws?.close();
} catch {}
throw err;
}Without this, each failed connection retry accumulates a dangling socket/FD. Under sustained DNS/network failures with the reconnect loop now correctly retrying, this could mean dozens of leaked sockets before maxAttempts is reached (or unlimited leaks if maxAttempts: 0).
I flagged this in my earlier comment, and I think it should be included before merge. It's a 5-line change in a separate file so it won't conflict with anything here.
Minor observations on the catch block:
emitStatus()is called twice in succession (lines ~210 and ~213) — the first emit is immediately superseded by the second. Could consolidate to a single emit after all status fields are set.- The log level is
errorfor the initial failure message. #14484 usedwarnwhich feels more appropriate since this is now a retryable condition, not a terminal error. Minor style point.
Summary: The core fix is solid and correct. I'd like to see the socket leak fix from #14484 folded in before this merges — otherwise we're solving the reconnect problem but introducing a resource leak under the exact conditions the fix enables (repeated connection failures). Happy to approve once that's added.
|
Closing as AI-assisted stale-fix triage. Linked issue #13506 ("[Bug]: WhatsApp reconnect loop exits on initial connection failure (DNS/network errors)") is currently CLOSED and was closed on 2026-02-13T03:23:29Z with state reason NOT_PLANNED. If the underlying bug is still reproducible on current main, please reopen this PR (or open a new focused fix PR) and reference both #13506 and #9727 for fast re-triage. |
|
Closed after AI-assisted stale-fix triage (closed issue duplicate/stale fix). |
Summary
monitorWebChannelusing the existing reconnect backoff instead of exiting.maxAttempts.ENOTFOUNDand verifies the reconnect loop retries.Why
ENOTFOUND web.whatsapp.com) previously escaped the reconnect loop, causing the gateway to stop. This change makes initial connection failures behave like later reconnects and fixes [Bug]: WhatsApp reconnect loop exits on initial connection failure (DNS/network errors) #13506.Log Evidence
Testing
pnpm vitest run --config vitest.unit.config.ts "src/web/auto-reply.reconnects"(1 test passed in 17ms)src/web/auto-reply.reconnects-after-initial-connection-failure.test.tsuses a mocked listenerFactory that throwsENOTFOUNDon the first attempt, asserts a second attempt happens without propagating the error, then aborts and closes cleanly.pnpm build && pnpm check && pnpm testAI Assistance
monitorWebChannel(the initialawait listenerFactory()call lacked a try/catch).ENOTFOUND.Greptile Overview
Greptile Summary
This PR updates the WhatsApp Web reconnect logic so that failures during the initial listener startup are handled by the same reconnect/backoff loop as later disconnects, rather than escaping and stopping the gateway. Concretely,
monitorWebChannelnow wraps the initiallistenerFactory/monitorWebInboxstartup in atry/catch, records the error in channel status, incrementsreconnectAttempts, appliesmaxAttempts, waits using the configured backoff, and retries.It also adds a regression test that simulates a first-attempt DNS failure (
ENOTFOUND) from the listener factory and asserts that the reconnect loop performs a second startup attempt without propagating the initial error, then aborts cleanly.Confidence Score: 4/5
(2/5) Greptile learns from your feedback when you react with thumbs up/down!