Summary
CLI commands that connect to the gateway (e.g. openclaw cron list) fail with gateway closed (1000 normal closure): no close reason on systems where Node.js ESM module compilation takes longer than the gateway's 3-second WebSocket handshake timeout.
Environment
- OpenClaw: 2026.3.13 (61d171a)
- Node.js: v22.22.1
- OS: Ubuntu Linux (systemd service)
- Gateway mode: local (loopback)
Steps to Reproduce
- Install openclaw, configure gateway in local mode
- Run
openclaw cron list (or any CLI command that connects to the gateway)
- Observe the error:
gateway connect failed: Error: gateway closed (1000):
Error: gateway closed (1000 normal closure): no close reason
Gateway target: ws://127.0.0.1:18789
Note: openclaw gateway status (which uses RPC probe, not WebSocket handshake) works fine. Webchat also works fine.
Root Cause Analysis
Through debugging, the following timeline was identified:
- T+0ms: CLI creates WebSocket connection to gateway
- T+~1ms: Gateway accepts TCP connection, sends
connect.challenge event
- T+~8ms: Node.js
setImmediate fires — event loop appears free
- T+3000ms: Gateway's handshake timer expires (3s default), closes WebSocket with code 1000
- T+~12000ms: CLI's event loop finally processes WebSocket
open and message events
- CLI calls
sendConnect() → request("connect") → finds WebSocket already in CLOSING state → error
The ~12-second gap between WebSocket creation and event processing is caused by Node.js ESM module compilation blocking the event loop. The CLI's large bundled dist files (with 41 dynamic import() calls) take significant time to compile on cold start.
The gateway's DEFAULT_HANDSHAKE_TIMEOUT_MS = 3000 (in src/gateway/server-constants.ts) is insufficient for CLI clients that experience this cold-start delay.
Key evidence:
- Standalone WebSocket test connects in ~120ms (no module compilation overhead)
- Inside the CLI process, the same connection takes ~12 seconds
setInterval(100ms) set right after client.start() doesn't fire its first tick until +12 seconds later
- Webchat is unaffected because browser JS is pre-compiled/bundled
Suggested Fix
- Increase the default handshake timeout from 3s to at least 15s (CLI cold start can take 12+ seconds)
- Make the timeout configurable via
openclaw.json:
{
"gateway": {
"handshakeTimeoutMs": 15000
}
}
- Allow env var override (without requiring
VITEST):
const getHandshakeTimeoutMs = () => {
// Config file
const configValue = config.gateway?.handshakeTimeoutMs;
if (typeof configValue === 'number' && configValue > 0) return configValue;
// Env var override
const envValue = Number(process.env.OPENCLAW_HANDSHAKE_TIMEOUT_MS);
if (Number.isFinite(envValue) && envValue > 0) return envValue;
// Default
return DEFAULT_HANDSHAKE_TIMEOUT_MS; // 15000
};
Current Workaround
Patching the compiled dist files (gateway-cli-*.js) to change DEFAULT_HANDSHAKE_TIMEOUT_MS from 3e3 to 15e3, then restarting the gateway. This fix is lost on every openclaw update.
Summary
CLI commands that connect to the gateway (e.g.
openclaw cron list) fail withgateway closed (1000 normal closure): no close reasonon systems where Node.js ESM module compilation takes longer than the gateway's 3-second WebSocket handshake timeout.Environment
Steps to Reproduce
openclaw cron list(or any CLI command that connects to the gateway)Note:
openclaw gateway status(which uses RPC probe, not WebSocket handshake) works fine. Webchat also works fine.Root Cause Analysis
Through debugging, the following timeline was identified:
connect.challengeeventsetImmediatefires — event loop appears freeopenandmessageeventssendConnect()→request("connect")→ finds WebSocket already in CLOSING state → errorThe ~12-second gap between WebSocket creation and event processing is caused by Node.js ESM module compilation blocking the event loop. The CLI's large bundled dist files (with 41 dynamic
import()calls) take significant time to compile on cold start.The gateway's
DEFAULT_HANDSHAKE_TIMEOUT_MS = 3000(insrc/gateway/server-constants.ts) is insufficient for CLI clients that experience this cold-start delay.Key evidence:
setInterval(100ms)set right afterclient.start()doesn't fire its first tick until +12 seconds laterSuggested Fix
openclaw.json:{ "gateway": { "handshakeTimeoutMs": 15000 } }VITEST):Current Workaround
Patching the compiled dist files (
gateway-cli-*.js) to changeDEFAULT_HANDSHAKE_TIMEOUT_MSfrom3e3to15e3, then restarting the gateway. This fix is lost on everyopenclawupdate.