Skip to content

fix(whatsapp): catch connection-phase errors in reconnect loop#14484

Closed
onthway wants to merge 5 commits intoopenclaw:mainfrom
onthway:fix/whatsapp-reconnect-econnreset
Closed

fix(whatsapp): catch connection-phase errors in reconnect loop#14484
onthway wants to merge 5 commits intoopenclaw:mainfrom
onthway:fix/whatsapp-reconnect-econnreset

Conversation

@onthway
Copy link

@onthway onthway commented Feb 12, 2026

Summary

  • monitorWebInbox() inside the while(true) reconnect loop was not wrapped in try-catch
  • If the connection attempt itself failed (e.g. ECONNRESET during TLS handshake to web.whatsapp.com), the exception propagated out of the loop, permanently killing the WhatsApp channel — even with maxAttempts: 0 (unlimited retries)
  • Wrap listener creation in try-catch so connection-phase errors enter the same backoff/retry path as post-connection closes

Root cause

The reconnect loop handles two phases differently:

Phase Error handling Result
After connection (line 337, listener.onClose) .catch() → retryable Reconnects correctly
During connection (line 192, monitorWebInbox()) No try-catch Exception crashes the loop

Real-world failure sequence observed:

  1. Status 428 disconnect → retry starts
  2. Retry calls monitorWebInbox()waitForWaConnection() → TLS handshake
  3. ECONNRESET on web.whatsapp.com:443 → promise rejects
  4. Exception propagates out of while(true) → channel exits permanently

Test plan

  • pnpm build passes
  • pnpm test passes (6492 passed, 4 pre-existing failures unrelated)
  • Deployed to local Docker gateway, WhatsApp connects normally

🤖 Generated with Claude Code

Greptile Overview

Greptile Summary

This change wraps WhatsApp Web listener creation inside the reconnect while (true) loop with a try/catch, so connection-phase failures (e.g. errors thrown by monitorWebInbox() / waitForWaConnection()) flow through the same backoff/retry path as post-connection closes.

The logic fits the existing reconnect design (backoff via computeBackoff, status emission via tuning.statusSink, and disconnect events via enqueueSystemEvent).

Confidence Score: 3/5

  • Mostly safe to merge, but repeated connection failures can now leak sockets during retries.
  • The reconnect-loop try/catch is correct for preventing permanent shutdowns, but monitorWebInbox allocates a socket before awaiting connection and does not close it on connection-phase rejection; with the new retry behavior that can accumulate resources in real-world flaky-network scenarios.
  • src/web/inbound/monitor.ts (connection setup cleanup path)

(2/5) Greptile learns from your feedback when you react with thumbs up/down!

@openclaw-barnacle openclaw-barnacle bot added channel: whatsapp-web Channel integration: whatsapp-web size: S labels Feb 12, 2026
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 12, 2026

Additional Comments (1)

src/web/inbound/monitor.ts
Leaked socket on connect failure

monitorWebInbox() creates sock and then await waitForWaConnection(sock) (src/web/inbound/monitor.ts:40-43). If waitForWaConnection rejects (e.g. TLS/handshake errors), the function throws without closing sock.ws, and after this PR those errors will be retried in a loop. That can accumulate dangling connections/FDs during repeated failures; consider ensuring the socket is closed in a catch/finally path when connection setup fails.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/web/inbound/monitor.ts
Line: 40:43

Comment:
**Leaked socket on connect failure**

`monitorWebInbox()` creates `sock` and then `await waitForWaConnection(sock)` (`src/web/inbound/monitor.ts:40-43`). If `waitForWaConnection` rejects (e.g. TLS/handshake errors), the function throws without closing `sock.ws`, and after this PR those errors will be retried in a loop. That can accumulate dangling connections/FDs during repeated failures; consider ensuring the socket is closed in a `catch`/`finally` path when connection setup fails.

How can I resolve this? If you propose a fix, please make it concise.

monitorWebInbox() was called without try-catch inside the while(true)
reconnect loop. If the connection attempt itself failed (e.g. ECONNRESET
during TLS handshake to web.whatsapp.com), the exception propagated out
of the loop and killed the channel permanently — even with maxAttempts:0
(unlimited retries).

Wrap the listener creation in try-catch so connection-phase errors are
treated as retryable disconnects with the same backoff/maxAttempts logic
as post-connection closes.

Co-Authored-By: onthway <hruicn@gmail.com>
@nikolasdehor

This comment was marked as spam.

@onthway
Copy link
Author

onthway commented Feb 17, 2026

Thanks @nikolasdehor for the thorough comparison and for flagging the test from #9727!

I've addressed both points:

  1. Socket leak fix — added sock.ws?.close() in a catch path inside monitorWebInbox() so connection-phase failures no longer accumulate dangling FDs (this was also the Greptile review feedback).
  2. Test coverage — ported the reconnects-after-initial-connection-failure test from fix(whatsapp): retry reconnect loop on initial connection failure #9727 (@luizlf) into this PR. All existing reconnect tests continue to pass as well.
  3. Build fix — also fixed a missing formatDurationMsformatDurationPrecise reference that was breaking the build.

Should be ready for review now. cc @tyler6204 @gumadeiras @sebslight

Copy link

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thorough re-review after the latest commits (3fce4bd, 85c0b18, 7accb75, 0590b23). Everything looks good — approving.

Socket leak fix (src/web/inbound/monitor.ts)sock.ws?.close() in the catch path after waitForWaConnection rejects is exactly the right fix. The inner try/catch around the close call is good defensive coding since the socket may be partially initialized. This directly addresses the Greptile feedback and prevents FD accumulation during repeated connection failures.

Reconnect loop fix (src/web/auto-reply/monitor.ts) — The try-catch around listenerFactory invocation correctly routes connection-phase errors (ECONNRESET, ENOTFOUND, etc.) into the existing backoff/retry path. The catch block properly:

  • Increments reconnectAttempts and updates status
  • Respects maxAttempts limit
  • Computes backoff via computeBackoff
  • Handles abort signal during sleep
  • Uses formatDurationPrecise (not the old formatDurationMs)

Test (reconnects-after-initial-connection-failure) — Good port from #9727. The mock correctly throws on first attempt, succeeds on second, and verifies both the retry and the backoff sleep. The sendMessage/sendPoll/sendReaction/sendComposingTo stubs (commit 7accb75) properly match the expected return type.

All CI checks passing. This fixes the root cause of #13371 — connection-phase exceptions no longer crash the reconnect loop. Ship it!

@sebslight
Copy link
Member

Closing as duplicate of #9727. If this is incorrect, please contact us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: whatsapp-web Channel integration: whatsapp-web size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants