Skip to content

[Bug]: WhatsApp gateway fails to reconnect after DNS resolution failure #2198

@26tajeen

Description

@26tajeen

Bug Report: WhatsApp gateway fails to reconnect after DNS resolution failure

Summary

When the WhatsApp gateway encounters a DNS resolution failure (ENOTFOUND), it exits the channel and does not attempt automatic reconnection, even after DNS is restored. The gateway process continues running but the WhatsApp listener remains dead until manual intervention (gateway restart or relink).

Environment

  • Clawdbot version: 2026.1.24-3
  • Node version: v25.4.0
  • OS: macOS Darwin 22.6.0 (x64)
  • Channel: WhatsApp

Steps to Reproduce

  1. Have WhatsApp gateway connected and working
  2. DNS resolution fails for web.whatsapp.com (e.g., due to DNS server issues)
  3. Gateway logs the DNS error and exits the WhatsApp channel
  4. DNS is restored and working
  5. Observe that WhatsApp does not automatically reconnect

Expected Behavior

After a DNS resolution failure, the gateway should:

  1. Retry DNS resolution with exponential backoff
  2. Automatically reconnect the WhatsApp listener once DNS is available again
  3. Not require manual restart or relink

Actual Behavior

The WhatsApp channel exits permanently after a DNS failure. The gateway process continues running, but:

  • WhatsApp listener is marked as inactive
  • No automatic reconnection attempts are made
  • Manual clawdbot gateway restart is required to restore connectivity
  • Attempting to relink via UI/tool times out ("Timed out waiting for WhatsApp QR") even though whatsapp_login reports "WhatsApp is already linked"

Timeline from Logs

2026-01-26T02:19:40.197Z [whatsapp] Web connection closed (status 408). Retry 1/12 in 2.06s…
2026-01-26T02:19:45.865Z [whatsapp] Listening for personal WhatsApp inbound messages.  ← reconnected OK

2026-01-26T04:20:32.556Z [whatsapp] Web connection closed (status 408). Retry 1/12 in 2.12s…
2026-01-26T04:20:34.866Z [whatsapp] [default] channel exited: {
  "error": {
    "data": {
      "errno": -3008,
      "code": "ENOTFOUND",
      "syscall": "getaddrinfo",
      "hostname": "web.whatsapp.com"
    },
    "output": {
      "statusCode": 408,
      "payload": {
        "error": "Request Time-out",
        "message": "WebSocket Error (getaddrinfo ENOTFOUND web.whatsapp.com)"
      }
    }
  }
}

After 04:20:34, no further reconnection attempts were logged. The gateway continued running (PID unchanged) but WhatsApp remained dead.

At 07:20, DNS was confirmed working:

$ nslookup web.whatsapp.com 8.8.8.8
Server:     8.8.8.8
Address:    8.8.8.8#53
web.whatsapp.com    canonical name = mmx-ds.cdn.whatsapp.net.
Name:   mmx-ds.cdn.whatsapp.net
Address: 57.144.239.32

Yet at 07:20, attempts to send messages failed:

[tools] message failed: Error: No active WhatsApp Web listener (account: default). 
Start the gateway, then link WhatsApp with: clawdbot channels login --channel whatsapp --account default.

Attempts to generate a new QR code also failed:

Failed to get QR: Error: Timed out waiting for WhatsApp QR

Resolution

Only a full clawdbot gateway restart restored WhatsApp connectivity:

2026-01-26T11:15:46.170Z [gateway] signal SIGTERM received
2026-01-26T11:15:57.919Z [gateway] listening on ws://0.0.0.0:18789 (PID 13538)
2026-01-26T11:15:58.020Z [whatsapp] [default] starting provider (+447580380000)
2026-01-26T11:15:59.454Z [whatsapp] Listening for personal WhatsApp inbound messages.

Analysis

The retry logic handles status 408 (timeout) correctly — it retries and reconnects. However, when a retry itself fails due to DNS (ENOTFOUND), the channel exits completely rather than continuing to retry.

The WhatsApp provider enters a dead state where:

  • It's not listening for messages
  • It won't respond to relink requests (times out)
  • The only recovery is a full gateway restart

Suggested Fix

  1. DNS failures during reconnection should not exit the channel — they should trigger continued retries with backoff
  2. Consider a "channel health check" that can restart dead channels without requiring a full gateway restart
  3. The whatsapp_login tool should be able to force-restart a dead channel, not just generate a QR for an unlinked account

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions