Skip to content

[Bug]: Windows gateway self-restart enters infinite retry loop — stale process never killed #60878

@arifahmedjoy

Description

@arifahmedjoy

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

On Windows, the in-process self-restart path (triggerOpenClawRestartrelaunchGatewayScheduledTask) fails to kill the old gateway process before launching the new one. The new gateway instance cannot bind port 18789, producing an infinite retry loop:

[gateway] already running under schtasks; waiting 5000ms before retrying startup

The root cause is that findGatewayPidsOnPortSync() in src/infra/restart-stale-pids.ts returns [] immediately on win32, so cleanStaleGatewayProcessesSync() never finds or terminates stale gateway processes.

Note: openclaw daemon restart is unaffected because it uses a separate code path (restartScheduledTask()terminateScheduledTaskGatewayListeners()) that correctly uses the Windows-aware findVerifiedGatewayListenerPidsOnPortSync().

Steps to reproduce

  1. Install OpenClaw on Windows with the schtasks-based daemon supervisor.
  2. Start the gateway normally (openclaw daemon start).
  3. Trigger an in-process self-restart (e.g., config change that fires triggerOpenClawRestart, or SIGUSR1-equivalent restart).
  4. Observe the new gateway instance failing to start, retrying in a loop every 5 seconds.

Expected behavior

The self-restart path should:

  1. Detect the old gateway process listening on port 18789.
  2. Kill it using taskkill.exe (graceful /T, then forced /F).
  3. Wait for the port to be released.
  4. Launch the new gateway, which binds successfully.

Actual behavior

findGatewayPidsOnPortSync() returns [] on Windows (early return, no port inspection), so cleanStaleGatewayProcessesSync() is a no-op. The old gateway keeps running, the new one cannot bind the port, and the schtasks supervisor enters an unbounded 5-second retry loop that never resolves.

OpenClaw version

2026.4.3 (and earlier — the return [] for win32 has been present since the function was introduced)

Operating system

Windows 11

Install method

npm global

Model

N/A — affects all configurations

Provider / routing chain

N/A — affects all configurations

Additional provider/model setup details

No response

Logs, screenshots, and evidence

# Gateway log output during the infinite loop:
[gateway] already running under schtasks; waiting 5000ms before retrying startup
[gateway] already running under schtasks; waiting 5000ms before retrying startup
[gateway] already running under schtasks; waiting 5000ms before retrying startup
...

Impact and severity

  • Affected: All Windows users using the schtasks daemon supervisor with config-triggered or SIGUSR1 in-process restarts
  • Severity: High — gateway becomes permanently stuck, requires manual intervention (taskkill or Task Scheduler restart)
  • Frequency: 100% reproducible on any Windows self-restart trigger
  • Workaround: Use openclaw daemon restart (which uses a different code path that works correctly)

Additional information

Proposed fix: #60480

The fix:

  1. Extracts Windows port/process helpers into a shared src/infra/windows-port-pids.ts module with configurable timeoutMs
  2. Makes findGatewayPidsOnPortSync discover + verify Windows gateway PIDs via PowerShell/netstat
  3. Adds pollPortOnceWindows with a 400ms budget-compliant timeout for port-free polling
  4. Adds terminateStaleProcessesWindows using taskkill.exe (graceful /T then forced /F)
  5. Breaks the circular import between restart-stale-pids.ts and gateway-processes.ts

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions