Skip to content

[Bug] 2026.4.24 on WSL2: Ghost EADDRINUSE loop & systemd split-brain #72693

@birdforce14d

Description

@birdforce14d

Environment

  • Host: WSL2 Ubuntu on Windows (T14sGen1)
  • OpenClaw Version: 2026.4.24 (npm latest)
  • Node: 22.22.0
  • Kernel: linux 6.6.87.2-microsoft-standard-WSL2 x64
  • Gateway port: 18789
  • Service modes tested: Manual foreground, system-level systemd (/etc/systemd/system), and native user-level systemd (~/.config/systemd/user).

Main Symptoms

After upgrading to OpenClaw 2026.4.24, the gateway became entirely unreachable on WSL2, entering a 30–50 second infinite crash loop.

Observed logs repeatedly showed:

[gateway] starting HTTP server...
[canvas] host mounted at http://127.0.0.1:18789/__openclaw__/canvas/
[health-monitor] started

However, high-frequency polling (lsof -nP -iTCP:18789 -sTCP:LISTEN) confirmed no process was actually listening on the port.

Exactly 30 to 50 seconds later, the gateway would self-terminate with:

[gmail-watcher] gmail watcher stopped
Gateway failed to start: another gateway instance is already listening on ws://127.0.0.1:18789
listen EADDRINUSE: address already in use 127.0.0.1:18789

The CLI (openclaw gateway probe) consistently failed with timeout, socket hang up, or read ECONNRESET.

Diagnostics & Findings

Finding 1: Systemd "Split-Brain" created by doctor --fix

OpenClaw 2026.4.24 strictly expects to manage a native user-level service (~/.config/systemd/user/openclaw-gateway.service).

If a user has a legacy manual system-level service (/etc/systemd/system/openclaw.service), running openclaw doctor --fix will silently generate the user-level unit. This creates a split-brain scenario where two systemd layers fight for the port.

Fix applied: Completely decommissioned the /etc/systemd manual units and isolated management to the native user-level service.

Finding 2: Stricter Schema Enforcement in 2026.4.24

The new version strictly lints openclaw.json. Old config keys like "adapters" or "paperclip", or manual additions like "system", trigger schema validation panics. Furthermore, setting "gateway": { "bind": "0.0.0.0" } is now flagged as a legacy alias (mapped to lan).

Finding 3: Port Shifts Ignore Config

We attempted to bypass the 18789 loop by updating openclaw.json to use "port": 18888. However, the auto-generated systemd unit hardcodes the port:

ExecStart=/usr/bin/node .../dist/index.js gateway --port 18789

This forced the service to ignore the JSON config and crash on the same port.

Root Cause Analysis: The "Ghost" EADDRINUSE

The issue is a lifecycle/watchdog race condition specific to 2026.4.24 under WSL2 networking, not a standard port conflict.

  1. The Hang: The gateway process starts and mounts the canvas, but asynchronously hangs while trying to bind to the WSL2 loopback bridge. lsof remains empty.
  2. The Watchdog Timeout: After ~30 seconds, internal health monitors (like health-monitor or gmail-watcher) time out and initiate a teardown.
  3. The Self-Collision: The gateway attempts to internally restart/rebind while its original socket request is still trapped in the kernel as an unresolved/pending state. It trips over its own ghost socket, throws EADDRINUSE against itself, and exits.

Confirmed Resolution (Downgrade)

Because the internal watchdog race cannot be bypassed via config in 2026.4.24 without triggering strict schema panics, the only stable fix is downgrading to 2026.4.22.

Recovery Steps Taken:

  1. Stopped and physically removed all OpenClaw systemd units (both system and user) and killed all detached node processes.
  2. Downgraded via: npm install -g openclaw@2026.4.22
  3. Sanitized openclaw.json (removed adapters, paperclip, system keys to prevent legacy panics).
  4. Ran openclaw doctor --fix on the downgraded version to rebuild the native user service cleanly.

Result: Immediate stability. The gateway bound successfully, the crash loop stopped, and openclaw status --all reported Gateway: reachable 42ms.

Questions for Maintainers

Could the core team confirm if 2026.4.24 introduced undocumented regressions to any of the following?

  1. Gateway listener lifecycle or self-probe/watchdog logic (health-monitor timeouts).
  2. User-level systemd service generation (specifically hardcoding --port in ExecStart vs reading openclaw.json).
  3. gateway.bind strict schema changes and WSL2-specific loopback handling.
  4. gmail-watcher startup/shutdown blocking the main event loop.

The key confusing symptom is [canvas] host mounted alongside an empty lsof output, followed 30 seconds later by a self-inflicted EADDRINUSE crash. For now, WSL2 users should remain pinned to 2026.4.22.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions