Skip to content

fix(gateway): fall back to PowerShell when wmic is unavailable on Win…#43624

Closed
litaohz wants to merge 2 commits intoopenclaw:mainfrom
litaohz:fix/windows-wmic-fallback-powershell
Closed

fix(gateway): fall back to PowerShell when wmic is unavailable on Win…#43624
litaohz wants to merge 2 commits intoopenclaw:mainfrom
litaohz:fix/windows-wmic-fallback-powershell

Conversation

@litaohz
Copy link
Copy Markdown

@litaohz litaohz commented Mar 12, 2026

Summary

  • openclaw gateway restart always times out (60s) on Windows machines where wmic has been removed (Windows 11+), even though the gateway
    restarts successfully
  • Fix: fall back to PowerShell Get-CimInstance Win32_Process when wmic fails, restoring correct process classification on modern Windows

Root Cause

Microsoft deprecated and removed wmic.exe on modern Windows (11+). This causes resolveWindowsCommandLine() in src/infra/ports-inspect.ts to
silently fail, returning no command line for port listeners. The restart health check then only sees "node.exe" (the image name from tasklist),
which classifyPortListener() cannot identify as a gateway process — it requires "openclaw" in the command string. This results in:

ownsPort = false → healthy = false → 60s timeout loop → false failure report

The gateway is actually running and healthy the entire time.

Fix

When wmic returns a non-zero exit code or produces no output, fall back to powershell -NoProfile -Command "(Get-CimInstance Win32_Process -Filter 'ProcessId=<pid>').CommandLine" to retrieve the full command line. This is the Microsoft-recommended replacement for wmic process get CommandLine.

Related

Test plan

  • Verified on Windows machine without wmic: health check now correctly identifies the gateway process via PowerShell fallback
  • When wmic is available, behavior is unchanged (wmic path taken first)
  • When both wmic and powershell fail, returns undefined gracefully (same as before)

🤖 Generated with Claude Code

…dows

On modern Windows (11+), Microsoft has deprecated and removed wmic.exe.
This causes resolveWindowsCommandLine() to silently fail, returning no
command line for port listeners. The health check then only sees
"node.exe" (the image name), which classifyPortListener() cannot
identify as a gateway process — it requires "openclaw" in the command
string. This results in ownsPort=false → healthy=false, and the restart
health check loops for 60s before timing out, even though the gateway
restarted successfully.

Fix: when wmic fails (non-zero exit or no output), fall back to
PowerShell Get-CimInstance Win32_Process to retrieve the full command
line. This restores correct process classification on wmic-less systems.

Related: openclaw#32620 (same class of bug on Linux when lsof is missing)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 12, 2026

Greptile Summary

This PR adds a PowerShell Get-CimInstance Win32_Process fallback to resolveWindowsCommandLine in src/infra/ports-inspect.ts, fixing the 60-second timeout on Windows 11+ machines where wmic.exe has been removed. The fix correctly addresses the root cause and uses the Microsoft-recommended migration path.

  • The new resolveWindowsCommandLineViaPowerShell function is well-structured and mirrors the safety/timeout conventions of the rest of the file.
  • The fallback is invoked in one unintended extra case: when wmic exits with code 0 but produces no non-empty commandline= line. On Windows 10 and earlier (where wmic is present), this causes an unnecessary powershell.exe spawn for processes whose command line isn't accessible via wmic. Adding return undefined before the resolveWindowsCommandLineViaPowerShell call — inside the if (res.code === 0) block — would restrict the fallback to only the wmic-unavailable case.
  • No new unit tests cover the Windows PowerShell fallback path, though the existing test suite skips all Windows-specific tests, so this is consistent with the project's current testing approach.

Confidence Score: 3/5

  • Safe to merge for the Windows 11+ fix, but has a minor regression on Windows 10 machines with wmic available.
  • The core fix is correct and solves the reported Windows 11+ timeout bug. However, the fallback condition is broader than intended: PowerShell is invoked even when wmic succeeds but returns no commandline, adding unnecessary latency on machines where wmic is available. This is a logic error that should be addressed before merging to avoid degrading performance on the currently-working platform.
  • src/infra/ports-inspect.ts — specifically the fallback condition in resolveWindowsCommandLine (lines 278–291)
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/infra/ports-inspect.ts
Line: 278-291

Comment:
**PowerShell fallback triggered even when wmic succeeds**

The fallback to PowerShell is called in two cases, but it should only be called when `wmic` itself fails (non-zero exit code):

1.`wmic` returns non-zero (unavailable) → fallback to PowerShell — **intended**
2.`wmic` returns exit code 0 but produces no non-empty `commandline=` line → also falls through to PowerShell — **unintended regression**

Case 2 happens on machines where `wmic` **is** present (Windows 10 and earlier, the currently-working case). For any process that `wmic` can't report a commandline for (e.g. a system process with an inaccessible command line, or a process that has already exited), the code will now redundantly spawn `powershell.exe`. Since `powershell -NoProfile` can still take hundreds of milliseconds to start, this adds latency to every port inspection on the "working" Windows platform.

The fix is to only fall back to PowerShell when `wmic` returns a non-zero exit code:

```suggestion
  if (res.code === 0) {
    for (const rawLine of res.stdout.split(/\r?\n/)) {
      const line = rawLine.trim();
      if (!line.toLowerCase().startsWith("commandline=")) {
        continue;
      }
      const value = line.slice("commandline=".length).trim();
      if (value) {
        return value;
      }
    }
    return undefined;
  }
  // Fallback to PowerShell Get-CimInstance when wmic is unavailable.
  return resolveWindowsCommandLineViaPowerShell(pid);
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: 9f4d01e

Comment thread src/infra/ports-inspect.ts
Address CI check failure (oxfmt) and Greptile review feedback:
- Only fall back to PowerShell when wmic exits with non-zero code
- When wmic succeeds (exit 0) but returns no commandline, return
  undefined directly instead of spawning an unnecessary PowerShell
  process (avoids latency regression on Windows 10 where wmic works)
- Fix function signature formatting to satisfy oxfmt

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@steipete
Copy link
Copy Markdown
Contributor

Closing this as implemented after Codex review.

Current main already avoids the Windows restart false-timeout this PR targets by treating a busy-but-reachable gateway as healthy during restart, even when listener classification is ambiguous. The exact wmic -> PowerShell fallback in src/infra/ports-inspect.ts is not present, but the reported restart symptom is already fixed on main and shipped in v2026.4.22.

What I checked:

  • Restart health no longer depends solely on listener classification: inspectGatewayRestart() first probes the gateway when the port is busy and runtime status is not running, and also probes again when runtime is running but port ownership is ambiguous. That makes a healthy restarted gateway pass even if Windows listener attribution falls back to node.exe/unknown. (src/cli/daemon-cli/restart-health.ts:137, 28fc03c386e1)
  • Gateway restart command uses the new health path on Windows: openclaw gateway restart calls waitForGatewayHealthyRestart() with Windows-specific stale-listener handling, so the shipped restart flow uses the probe-backed logic. (src/cli/daemon-cli/lifecycle.ts:241, 28fc03c386e1)
  • Tests cover the ambiguous-ownership and busy-port probe cases: There are explicit tests that a local probe marks the restart healthy when ownership is ambiguous, and when runtime status lags but the probe succeeds. Those are the failure modes described in this PR. (src/cli/daemon-cli/restart-health.test.ts:208, 28fc03c386e1)
  • The exact PR patch is still absent from port inspection: resolveWindowsCommandLine() on current main still uses wmic only and returns undefined on non-zero exit, so this PR's code itself was not merged. The close decision is based on the restart symptom already being handled elsewhere on main. (src/infra/ports-inspect.ts:252, 28fc03c386e1)
  • Shipped release contains the implemented behavior: The released commit 00bd2cf7a376f1fba26291c6c4766f1f15cbdfa5 is tagged v2026.4.22, so the current behavior is already in a release. (00bd2cf7a376)

So I’m closing this as already implemented rather than keeping a duplicate issue open.

Review notes: reviewed against 28fc03c386e1; fix evidence: release v2026.4.22, commit 00bd2cf7a376.

@steipete steipete closed this Apr 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants