Skip to content

Gateway WebSocket connections leak - CLOSE_WAIT/FIN_WAIT_2 zombie connections cause crashes #56215

@davidme6

Description

@davidme6

GitHub Issue Report: Gateway Socket Connection Leak

Bug Title

Gateway WebSocket connections leak - CLOSE_WAIT/FIN_WAIT_2 zombie connections accumulate and cause crashes

Environment

  • OpenClaw version: 2026.3.24
  • Node.js version: v24.14.0
  • Platform: Windows 10 (x64)
  • Gateway port: 18789

Problem Description

The Gateway process accumulates zombie socket connections in CLOSE_WAIT and FIN_WAIT_2 states over time. These connections are not properly closed by the Gateway, leading to resource leak and eventual crashes.

Evidence

Connection State Analysis (observed on 2026-03-28)

State Count Description
ESTABLISHED 2 Normal active connections
LISTENING 2 Gateway listening on IPv4 and IPv6
CLOSE_WAIT 4 Zombie connections - remote closed, Gateway didn't close
FIN_WAIT_2 4 Zombie connections - Gateway initiated close, remote not responding
TIME_WAIT 4 Recently closed connections (normal)

Zombie connection ratio: 66% of non-listening connections

Crash Pattern

Gateway crashes at irregular intervals (1-6 hours), requiring manual or automated restart.

Crash times observed on 2026-03-28:

  • 00:28, 01:22-01:23, 07:53, 09:00, 10:09, 13:06

Memory Behavior

  • Gateway memory fluctuates between 600-1500 MB
  • After restart: ~600 MB
  • After 1-2 hours: ~800-1400 MB
  • Not a memory leak per se, but connection accumulation correlates with crashes

Root Cause Analysis

This is a socket connection management bug in the Gateway:

  1. WebSocket clients (main OpenClaw process, PID 1424) request connection close
  2. Gateway WebSocket handler does not properly respond to close events
  3. Connections remain in CLOSE_WAIT/FIN_WAIT_2 states
  4. Zombie connections accumulate over time
  5. Eventually causes Gateway instability and crashes

Impact

  • Gateway crashes unpredictably (1-6 hour intervals)
  • Requires external watchdog process to restart
  • Users experience service interruption until restart completes

Reproduction Steps

  1. Start OpenClaw Gateway: openclaw gateway --port 18789
  2. Connect multiple clients (Discord, Feishu, etc.)
  3. Monitor connections: netstat -ano | findstr 18789
  4. Observe CLOSE_WAIT/FIN_WAIT_2 connections accumulating over time
  5. Gateway eventually crashes (no error output to stderr)

Expected Behavior

Gateway should properly close WebSocket connections when:

  • Client requests close
  • Connection timeout occurs
  • Keep-alive fails

Workaround (Implemented)

  1. Gateway Guardian watchdog process (auto-restart within 30 seconds)
  2. Daily scheduled restart at 04:00 AM to clear zombie connections
  3. Error logging via stderr redirect: gateway.cmd 2>> gateway-error.log

Suggested Fix

Review WebSocket connection lifecycle handling in:

  • gateway-runtime-DkLKThnW.js
  • server-C8VdPOMv.js
  • WebSocket close event handlers

Ensure proper socket cleanup on:

  • Client-initiated close
  • Server-initiated close
  • Connection timeout/keep-alive failure

Additional Information

  • No error output captured in stderr during crashes
  • System has ample memory (16GB, 7GB free), not memory exhaustion
  • Node.js heap limit: 4.28GB (not exceeded)

Reporter: OpenClaw user
Date: 2026-03-28

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions