Skip to content

WhatsApp provider: stale-socket every ~35 minutes on idle connections (keepalive regression) #34155

@haydenmckay

Description

@haydenmckay

Summary

WhatsApp connections consistently go stale after ~35 minutes of no message traffic, triggering a health-monitor: restarting (reason: stale-socket) cycle. This happens with clockwork precision regardless of network conditions.

Environment

  • OpenClaw version: 2026.3.2
  • OS: Arch Linux, kernel 6.18.13-arch1-1 (x64)
  • Node: v25.1.0
  • Channel: channels.whatsapp (personal, not Business API)
  • Network: WiFi 5GHz, -61 dBm signal, power save OFF

Behaviour

Every 35 minutes (±30s), the health-monitor detects the WhatsApp socket as stale and restarts the provider. Recovery is fast (<5 seconds), but the pattern repeats indefinitely.

Today's log — 20 stale-socket restarts in 18 hours:

00:31 CONNECTED
01:06 STALE → RECONNECTED (2s)
01:41 STALE → RECONNECTED (2s)
02:16 STALE → RECONNECTED (2s)
02:51 STALE → RECONNECTED (2s)
03:26 STALE → RECONNECTED (2s)
... repeating every 35 min ...
14:11 STALE → RECONNECTED (2s)

Health-monitor config (from logs):

interval: 300s, startup-grace: 60s, channel-connect-grace: 120s

Log entries at each event:

{"subsystem":"gateway/health-monitor","message":"[whatsapp:default] health-monitor: restarting (reason: stale-socket)"}
{"subsystem":"gateway/channels/whatsapp","message":"[default] starting provider (+614XXXXXXXX)"}
{"subsystem":"gateway/channels/whatsapp","message":"Listening for personal WhatsApp inbound messages."}

Additional context

  • March 1-3 (pre-2026.3.x): Different failure mode — connection got stuck with reconnectAttempts: 1 indefinitely (worse). The 2026.3.x health-monitor improvement is appreciated, but the underlying keepalive issue remains.
  • Network ruled out: Zero packet errors, power save off, stable signal. The 35-minute interval is too precise to be network noise.
  • 35-min interval: This matches a likely server-side idle timeout from Meta's WhatsApp Web servers. The keepalive mechanism in the provider isn't preventing the timeout.
  • Also observed: Occasional 408 Request Time-out drops separate from the stale-socket cycle (appears to be the web client connection, not the WhatsApp WebSocket itself).

Expected behaviour

WhatsApp WebSocket connection should remain stable indefinitely via keepalive pings, regardless of message traffic volume.

Workaround

None — the health-monitor auto-recovery works but doesn't fix the underlying keepalive issue. The connection restores in ~5s each time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    staleMarked as stale due to inactivity

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions