Skip to content

WhatsApp: reconnect loop lacks exponential backoff after status 499 disconnects #60626

@cpicoto

Description

@cpicoto

Summary

When the WhatsApp Web connection drops with status 499, OpenClaw enters a reconnect loop that retries every ~60 seconds indefinitely without exponential backoff. This creates a storm of reconnect attempts, log spam, and unnecessary resource consumption.

Observed Behavior

The reconnect cycle works like this:

  1. WhatsApp heartbeat detects no messages for 30+ minutes
  2. Forces reconnect → connection closed with status 499
  3. Schedules retry 1/12 in ~2 seconds → reconnects successfully
  4. 60 seconds later, heartbeat fires again → still no messages since the original timeout threshold
  5. Forces another reconnect → goto step 2

The cycle repeats because the heartbeat uses the original lastInboundAt timestamp (which never gets updated since no actual messages arrive), so every new connection immediately triggers a new timeout detection.

This went on for hours (2:04 PM to 8:48 PM on April 2, 2026) generating hundreds of reconnect cycles.

Log Evidence

14:04:58 No messages received in 39m - restarting connection
14:06:01 No messages received in 40m - restarting connection  
14:07:05 No messages received in 41m - restarting connection
...
14:36:45 No messages received in 71m - restarting connection
...
20:41:06 No messages received in 30m - restarting connection
20:42:10 No messages received in 31m - restarting connection
20:43:14 No messages received in 32m - restarting connection
...continued until gateway was killed by auto-update at 20:48

Each cycle also triggers the false creds.json corruption restore (see #60625).

Root Cause Analysis

Two issues compound:

  1. No backoff between heartbeat-driven reconnects: After a successful reconnect, the heartbeat should reset its "time since last message" counter or at minimum apply exponential backoff before forcing another reconnect.

  2. lastInboundAt is not reset on reconnect: The heartbeat keeps comparing against the original last-message timestamp. Since no new messages arrive (likely because it's nighttime and nobody is messaging), every 60-second heartbeat check immediately exceeds the 30-minute threshold and forces yet another reconnect.

Expected Behavior

  • After a reconnect, the heartbeat timer should reset (using the reconnect time as the new baseline)
  • If multiple consecutive reconnects fail to receive messages, apply exponential backoff (e.g., 30min → 1h → 2h → 4h cap)
  • After N consecutive failed reconnect cycles (e.g., 5-10), stop attempting and log an error suggesting manual intervention
  • The 91MB error log generated by this loop should not be possible in normal operation

Impact

  • Generated 91MB of error log in one night
  • Constant WhatsApp reconnection churn
  • Combined with the update that followed, left the gateway down for ~21 hours

Environment

  • OpenClaw: 2026.4.2 (observed on 2026.3.31 before update)
  • OS: macOS 25.3.0 (ARM64)
  • Node: v25.6.1
  • WhatsApp account type: Personal (linked device)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions