Skip to content

[Bug]: Intermittent local gateway websocket handshake failures on loopback (ws://127.0.0.1:18789) #45222

@funsaized

Description

@funsaized

Bug type

Regression (worked before, now fails)

Summary

Intermittent local gateway websocket handshake failures on loopback (ws://127.0.0.1:18789) cause frequent CLI failures (openclaw cron list) and break Antfarm cron setup (workflow ensure-crons).

This reproduces on latest available build in this environment:

OpenClaw: 2026.3.12 (6472949)
Host: Ubuntu 24.04 VPS
Gateway bind: loopback (127.0.0.1:18789)

Steps to reproduce

1) Confirm gateway appears healthy

openclaw gateway status

Observed: running, probe ok, listening on 127.0.0.1:18789.

2) Repeated CLI calls to gateway

for i in {1..50}; do
openclaw cron list >/dev/null && echo "ok $i" || echo "fail $i"
sleep 0.5
done

Observed in this run:

  • ok: 16
  • fail: 34
  • ~68% failure rate

Typical failure:

gateway connect failed: Error: gateway closed (1000):
Error: gateway closed (1000 normal closure): no close reason
Gateway target: ws://127.0.0.1:18789
Source: local loopback
Config: /home/sai/.openclaw/openclaw.json
Bind: loopback

3) Antfarm failure path

antfarm workflow ensure-crons healthcare-ai-research

Observed error:

Failed to create cron job for agent "scoper": CLI fallback failed: Error: gateway connect failed: Error: gateway client stopped
Error: gateway timeout after 120ms
Gateway target: ws://127.0.0.1:18789
...

Gateway logs (journalctl)

Repeated patterns during failures:

  • handshake timeout ... remote=127.0.0.1
  • closed before connect ... host=127.0.0.1:18789 ... code=1000
  • occasional code=1008 reason=connect challenge timeout

Representative excerpts:

[ws] handshake timeout conn=... remote=127.0.0.1
[ws] closed before connect conn=... remote=127.0.0.1 ... host=127.0.0.1:18789 ... code=1000 reason=n/a
[ws] closed before connect conn=... remote=127.0.0.1 ... code=1008 reason=connect challenge timeout

Expected behavior

Expected behavior

  • Local loopback websocket connections for CLI commands should be stable.
  • openclaw cron list should not fail intermittently under normal local load.
  • Connect challenge should not time out this frequently on localhost.

Actual behavior

Actual behavior

  • Frequent handshake/connect challenge failures on loopback.
  • CLI commands that need gateway connectivity fail intermittently.
  • Higher-level workflows (Antfarm cron setup) become unreliable and can leave partial state.

OpenClaw version

2026.3.12

Operating system

Ubuntu 24.04 VPS

Install method

npm global

Model

anthropic/opus-4.6

Provider / routing chain

openclaw

Config file / key location

No response

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Repeated CLI calls to gateway

for i in {1..50}; do
  openclaw cron list >/dev/null && echo "ok $i" || echo "fail $i"
  sleep 0.5
done

Observed in this run:

  • ok: 16
  • fail: 34
  • ~68% failure rate

Impact and severity

Unable to manage things through CLI (i.e crons). Breaks antfarm workflows

Additional information

Notes

  • This is not an auth misconfiguration: same config/environment succeeds intermittently.
  • Failures occur while other websocket clients can still be active.
  • Workaround used locally: make Antfarm workflow run tolerant when crons already exist, but root issue appears to be gateway WS handshake stability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingregressionBehavior that previously worked and now fails

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions