Skip to content

WhatsApp WebSocket drops on WSL2 — missing TCP keepalive on underlying socket #58481

@yhyatt

Description

@yhyatt

Summary

The WhatsApp WebSocket connection drops every ~60 seconds on WSL2, causing a sustained disconnect/reconnect storm. Root cause: OpenClaw does not set TCP keepalive on the underlying socket, so the Windows Hyper-V virtual switch drops idle NAT entries before Baileys'' application-level WS pings can keep them alive.

Environment

  • OpenClaw 2026.3.28
  • Node 22.22.0
  • WSL2 (kernel 6.6.87.2-microsoft-standard-WSL2, running under Windows 11)

Root Cause

Baileys sends application-level WebSocket pings roughly every 25-30 seconds. However, the Windows Hyper-V virtual network switch (used by WSL2) maintains NAT table entries at the TCP level — it does not inspect WS frames. When Windows enters a low-power state, changes network profile, or the Hyper-V switch refreshes its NAT table, idle TCP connections are torn down from underneath the WebSocket — even if WS-level pings are succeeding at the application layer.

The result is a → → reconnect cycle repeating exactly every ~60 seconds until the Windows host stabilizes.

Stack trace observed:

This is a TCP-level close, not a WA server close ( confirmed).

Impact

  • Observed: 70 consecutive reconnects over 70 minutes (17:25–18:34)
  • Each reconnect also triggers the creds.json corruption race (see related issue)
  • Connection is functional but thrashing; group agents may miss inbound messages during reconnect windows

Suggested Fix

Set TCP keepalive on the WebSocket socket when creating the connection:

This forces OS-level TCP ACKs at 15-second intervals, keeping the WSL2 NAT entry alive regardless of application-level traffic patterns. This is standard practice for long-lived WebSocket connections on Windows-hosted environments.

The fix is a single line in the socket initialization path (likely in or where the WS client is constructed).

Workaround

None currently. The storm self-resolves when the Windows host network stabilizes, but there is no way to prevent it from the OpenClaw side without this fix.

Related

  • See also: creds.json race condition issue (each reconnect also triggers a backup restore)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions