Skip to content

[Bug] WhatsApp channel exits permanently after DNS failure instead of retrying #13371

@nikolasdehor

Description

@nikolasdehor

Description

When the WhatsApp Web connection drops due to a transient DNS failure (ENOTFOUND web.whatsapp.com), the channel exits permanently instead of continuing the retry loop. The gateway stays running but without a WhatsApp listener, so all messages fail with "No active WhatsApp Web listener" until a manual restart.

Environment

  • OpenClaw v2026.2.9
  • Platform: macOS (Darwin 25.2.0)
  • WhatsApp provider: personal Web

Steps to reproduce

  1. Have a running gateway with an active WhatsApp Web connection
  2. Experience a brief internet/DNS outage (even 30-60 seconds)
  3. WhatsApp connection drops, channel exits permanently
  4. Internet recovers, but WhatsApp never reconnects

Logs

This happened on two consecutive days at the same time (~10:00 UTC):

Feb 9:

2026-02-09T10:00:51.938Z [whatsapp] Web connection closed (status 428). Retry 1/12 in 2.4s…
2026-02-09T10:00:54.387Z [whatsapp] [default] channel exited: {"error":{"data":{"errno":-3008,"code":"ENOTFOUND","syscall":"getaddrinfo","hostname":"web.whatsapp.com"}}}

Feb 10:

2026-02-10T10:00:59.535Z [whatsapp] Web connection closed (status 428). Retry 1/12 in 2.31s…
2026-02-10T10:01:01.893Z [whatsapp] [default] channel exited: {"error":{"data":{"errno":-3008,"code":"ENOTFOUND","syscall":"getaddrinfo","hostname":"web.whatsapp.com"}}}

After channel exited, the WhatsApp listener is dead:

2026-02-10T10:07:31Z Delivery failed: No active WhatsApp Web listener
2026-02-10T11:00:16Z Delivery failed: No active WhatsApp Web listener
2026-02-10T12:00:19Z Delivery failed: No active WhatsApp Web listener

The gateway process remains running (PID alive, cron jobs executing, heartbeat active) — only the WhatsApp channel is dead.

Expected behavior

After ENOTFOUND on retry, the channel should:

  1. Complete the full 12-retry cycle (currently exits after retry 1)
  2. If all 12 retries fail, schedule a longer-term reconnection (e.g., retry every 5 minutes)
  3. Never permanently give up on reconnecting — the gateway has KeepAlive: true specifically to stay running

Workaround

Manual restart of the gateway:

launchctl bootout gui/501/ai.openclaw.gateway
launchctl bootstrap gui/501 ~/Library/LaunchAgents/ai.openclaw.gateway.plist

Impact

This causes hours of downtime after a brief internet hiccup. The gateway appears healthy (process running, cron executing) but can't communicate via WhatsApp, which is the primary channel. Users don't realize the agent is unreachable until they notice messages aren't being delivered.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions