Skip to content

[Bug]: Windows local gateway has partial RPC failures/timeouts on v2026.4.9 even when gateway process is running #64476

@Daraeya

Description

@Daraeya

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

On Windows, my local OpenClaw gateway appears to be running and listening on 127.0.0.1:18789, but gateway RPC behavior is only partially working.

Some operations succeed:

  • openclaw gateway status sometimes reports RPC probe: ok
  • sessions_list can succeed
  • ACP one-shot runs can sometimes complete

But other operations intermittently fail:

  • sessions_history
  • ACP spawn / some gateway-mediated session actions
  • openclaw status

These fail with timeouts or hang until killed, even though the gateway process is alive.

This looks similar to the recent gateway WS/RPC handshake timeout / partial-RPC issues reported in:

There was also a service-install/token embedding issue in my setup that looked similar to #57104, but even after cleaning that up, the core RPC timeout problem remained.

Steps to reproduce

.NOT_ENOUGH_INFO

Expected behavior

If the local gateway is alive and listening, gateway-mediated RPC operations should behave consistently:

• openclaw status
• sessions_history
• ACP spawn/session operations
• related gateway RPC calls

These should not intermittently timeout while other gateway functions still work.

Actual behavior

The gateway appears partially alive:

• process exists
• port is listening
• some probe/status paths succeed
• some RPC/session paths timeout or hang

This leaves the system in a confusing "looks alive but not reliably usable" state.

OpenClaw version

2026.4.9

Operating system

Windows 11

Install method

irm https://openclaw.ai/install.ps1 | iex

Model

openai-codex/gpt-5.4

Provider / routing chain

openclaw local CLI -> local gateway on ws://127.0.0.1:18789 (loopback, token auth) -> local OpenClaw runtime -> session/ACP RPC

Additional provider/model setup details

Gateway is configured in local mode on Windows with loopback bind and token auth.

Relevant gateway config:

  • gateway.mode = local
  • gateway.bind = loopback
  • gateway.port = 18789
  • gateway.auth.mode = token
  • gateway.auth.token = ${OPENCLAW_GATEWAY_TOKEN}

The gateway is started through the Windows Scheduled Task "OpenClaw Gateway", which executes ~/.openclaw/gateway.cmd and launches the local node-based gateway process.

Observed behavior:

  • The gateway process is alive and listening on 127.0.0.1:18789
  • openclaw gateway status may report RPC probe ok or failed depending on timing
  • sessions_list may succeed
  • sessions_history, ACP spawn, and openclaw status may intermittently timeout or hang
  • ACP itself is not fully broken, because one-shot ACP tasks can still complete in some cases

There is no intentional remote gateway, reverse proxy, or external transport in this setup. This is a local loopback gateway with local CLI/session RPC usage.

Logs, screenshots, and evidence

Observed command outputs and evidence:

1. `openclaw gateway status`
- Sometimes reports:
  - `RPC probe: ok`
  - `Listening: 127.0.0.1:18789`
- Other times reports:
  - `RPC probe: failed`
  - `RPC target: ws://127.0.0.1:18789`
  - `timeout`

2. `sessions_history` failures
Repeated failures with:
- `gateway timeout after 10000ms`
- `Gateway target: ws://127.0.0.1:18789`
- `Source: local loopback`
- `Config: C:\Users\serap\.openclaw\openclaw.json`
- `Bind: loopback`

3. `openclaw status`
- Intermittently hangs and in some checks ended with `SIGKILL`

4. `openclaw doctor`
Reported:
- `Health check failed: Error: gateway timeout after 10000ms`
- `Port 18789 is already in use.`
- `Runtime: unknown`

5. Scheduled Task / service evidence
Windows Scheduled Task `OpenClaw Gateway` is configured to run:
- `C:\Users\serap\.openclaw\gateway.cmd`

6. Gateway runtime process evidence
The active gateway runtime was observed as a local `node.exe` process listening on:
- `127.0.0.1:18789`
with high memory usage (~2.5GB to ~2.9GB range during debugging)

7. ACP evidence
ACP itself is not fully broken:
- Claude ACP returned `CLAUDE_ACP_OK`
- Gemini ACP returned `GEMINI_ACP_OK`
- Later checks after restart / reboot / cleanup also returned successful one-shot ACP responses

8. Service config cleanup evidence
Earlier, the generated `~/.openclaw/gateway.cmd` contained an embedded `OPENCLAW_GATEWAY_TOKEN`, matching the warning:
- `Gateway service embeds OPENCLAW_GATEWAY_TOKEN`
After manually removing embedded token values from `gateway.cmd`, that warning stopped appearing, but the RPC timeout problem remained.

9. Version / comparison context
The issue was investigated on:
- `OpenClaw 2026.4.9`
Recent GitHub issues with similar symptoms include:
- #46218
- #45560
- #50380
- #51469
- #46892
- #57104

Impact and severity

Affected users/systems/channels:
Observed on a Windows local OpenClaw setup using a local loopback gateway (ws://127.0.0.1:18789) with token auth. The affected workflows include local CLI-to-gateway RPC, session RPC, and ACP/session actions. Telegram-facing operation may still partially work, but gateway-mediated local control paths are affected.

Severity:
Blocks workflow. The gateway is not fully down, but key RPC operations intermittently fail or hang, which makes local administration and session control unreliable.

Frequency:
Intermittent but recurring. During direct observation, some calls succeeded (sessions_list, some ACP one-shot responses, some gateway status checks) while others repeatedly failed (sessions_history, ACP spawn attempts, openclaw status, some doctor/health checks).

Consequence:
Practical consequences observed include failed or hanging local status checks, failed session-history retrieval, failed ACP spawn attempts, misleading partial-health signals (gateway status sometimes ok while other RPCs fail), repeated manual recovery attempts (gateway restart, Windows reboot, reinstall), and increased debugging/operations overhead.

Additional information

This appears to be a regression cluster rather than a simple local misconfiguration.

Grounded additional context:

  • The gateway process is clearly alive and listening on 127.0.0.1:18789
  • The failure mode is partial, not total: some gateway-mediated operations succeed while others timeout
  • The issue persisted across multiple recovery attempts, including:
    • openclaw gateway restart
    • Windows reboot
    • OpenClaw reinstall
  • Earlier in the investigation, ~/.openclaw/gateway.cmd contained embedded token values and produced service-config warnings. After manually removing embedded token values, that specific warning stopped appearing, but the core RPC timeout behavior remained
  • This suggests the token/service-config issue may be related but is not sufficient to explain the full failure

Regression notes:

  • Similar recent OpenClaw issues exist in the 2026.3.x line involving gateway RPC / WebSocket handshake timeout and partial CLI-to-gateway failures
  • In this environment, the issue was observed on 2026.4.9
  • Last known good version: .NOT_ENOUGH_INFO
  • First known bad version: .NOT_ENOUGH_INFO

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingbug:behaviorIncorrect behavior without a crash

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions