[Bug]: WhatsApp gateway flapping — status 499 reconnect cascade with aggressive heartbeat timeout

## Description

The WhatsApp gateway enters a recurring flapping loop where status 499 disconnects trigger a heartbeat-timeout → reconnect → 499 cascade that lasts 20-60 minutes before self-healing.

## Environment
- **OS:** Windows 11, x64
- **OpenClaw:** v2026.4.3 (49936f6)
- **Node.js:** v24.12.0
- **Connection:** WhatsApp Multi-Device (Web)

## Observed Pattern

Every ~60 seconds:
1. `web-heartbeat`: detects message timeout (`minutesSinceLastMessage` increments each cycle)
2. Forces reconnect
3. WhatsApp responds with **status 499** (server-side disconnect)
4. `web-session`: treats `creds.json` as corrupted, restores from `.bak`
5. New `connectionId` assigned, connection briefly established
6. Back to step 1

### Metrics from 2026-04-03 (32-minute flapping window: 15:58–16:30)
- **62** status 499 disconnects
- **42** creds.json restored from backup
- **31** heartbeat timeout forced reconnects
- **31** unique connectionIds generated

### Context
- Flapping started ~42 minutes after last inbound message (idle timeout trigger?)
- `creds.json` is only 1.9 KB — **no pre-key bloat**
- `registered: False` in creds.json after flapping (unclear if cause or effect)
- Pattern occurs almost daily (observed 2026-03-30, 2x on 2026-04-02, 2026-04-03)

## Log Excerpt

```
2026-04-03T16:10:16 | web-heartbeat: minutesSinceLastMessage: 42, forcing reconnect
2026-04-03T16:10:16 | WhatsApp Web connection closed (status 499). Retry 1/12 in 2.41s
2026-04-03T16:10:20 | restored corrupted WhatsApp creds.json from backup
2026-04-03T16:11:20 | web-heartbeat: minutesSinceLastMessage: 43, forcing reconnect  ← 60s later, loop continues
```

## Root Cause Analysis

1. **Heartbeat timeout too aggressive**: 60s silence → immediate forced reconnect. During idle periods (no inbound messages), this triggers unnecessarily.
2. **Reconnect backoff too short**: ~2s initial retry is too fast when WhatsApp returns 499 repeatedly.
3. **Creds falsely flagged as corrupted**: Every reconnect cycle restores creds.json from backup, even though the file is intact (1.9 KB, no bloat). This may destabilize the session further.
4. **Self-healing**: After ~30 min the cascade stops — possibly server-side backoff window expiring.

## Suggested Improvements

1. **Increase heartbeat message-timeout threshold** to 3–5 minutes before forcing a reconnect (60s is too aggressive for idle connections)
2. **Exponential backoff on 499**: Instead of 2s flat retry, use exponential backoff (2s → 4s → 8s → …) specifically for status 499
3. **Don't treat creds as corrupted on 499**: Status 499 is a server disconnect, not a credential issue. Skip creds restore for this error code.
4. **Separate idle-timeout from connection-failure**: `minutesSinceLastMessage` growing is normal during idle periods and should not trigger reconnects by itself

## Workaround

Currently self-healing after 20-60 minutes. No messages lost. No user action required.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: WhatsApp gateway flapping — status 499 reconnect cascade with aggressive heartbeat timeout #60378

Description

Environment

Observed Pattern

Metrics from 2026-04-03 (32-minute flapping window: 15:58–16:30)

Context

Log Excerpt

Root Cause Analysis

Suggested Improvements

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: WhatsApp gateway flapping — status 499 reconnect cascade with aggressive heartbeat timeout #60378

Description

Description

Environment

Observed Pattern

Metrics from 2026-04-03 (32-minute flapping window: 15:58–16:30)

Context

Log Excerpt

Root Cause Analysis

Suggested Improvements

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions