fix(windows): detect stale gateway PIDs via netstat#52485
fix(windows): detect stale gateway PIDs via netstat#52485andyk-ms wants to merge 3 commits intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR fixes the long-standing Windows no-op in Issues found:
Confidence Score: 4/5
|
| try { | ||
| const tl = spawnSync("tasklist", ["/FI", `PID eq ${pid}`, "/FO", "CSV", "/NH"], { | ||
| encoding: "utf8", | ||
| timeout: spawnTimeoutMs, | ||
| windowsHide: true, | ||
| }); | ||
| if (tl.stdout && tl.stdout.toLowerCase().includes("node")) { | ||
| confirmed.push(pid); | ||
| } | ||
| } catch { | ||
| // If tasklist fails, include the PID anyway — better to kill a stale | ||
| // process than leave a port occupied. | ||
| confirmed.push(pid); | ||
| } |
There was a problem hiding this comment.
Dead
catch block — fallback silently dropped
spawnSync never throws. When tasklist times out or exits non-zero, tl.error is populated and tl.stdout is empty/null, but the catch block is never reached. The stated intent — "If tasklist fails, include the PID anyway" — is silently broken: tasklist failures cause the PID to be dropped instead of conservatively included.
The fix is to check tl.error (or a non-zero/null status) explicitly:
| try { | |
| const tl = spawnSync("tasklist", ["/FI", `PID eq ${pid}`, "/FO", "CSV", "/NH"], { | |
| encoding: "utf8", | |
| timeout: spawnTimeoutMs, | |
| windowsHide: true, | |
| }); | |
| if (tl.stdout && tl.stdout.toLowerCase().includes("node")) { | |
| confirmed.push(pid); | |
| } | |
| } catch { | |
| // If tasklist fails, include the PID anyway — better to kill a stale | |
| // process than leave a port occupied. | |
| confirmed.push(pid); | |
| } | |
| const tl = spawnSync("tasklist", ["/FI", `PID eq ${pid}`, "/FO", "CSV", "/NH"], { | |
| encoding: "utf8", | |
| timeout: spawnTimeoutMs, | |
| windowsHide: true, | |
| }); | |
| if (tl.error || tl.status == null || tl.status !== 0) { | |
| // tasklist failed — include the PID anyway — better to kill a stale | |
| // process than leave a port occupied. | |
| confirmed.push(pid); | |
| } else if (tl.stdout && tl.stdout.toLowerCase().includes("node")) { | |
| confirmed.push(pid); | |
| } |
Replace the whole try/catch block with the above (no try or catch wrappers needed).
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/restart-stale-pids.ts
Line: 128-141
Comment:
**Dead `catch` block — fallback silently dropped**
`spawnSync` never throws. When `tasklist` times out or exits non-zero, `tl.error` is populated and `tl.stdout` is empty/null, but the `catch` block is never reached. The stated intent — _"If tasklist fails, include the PID anyway"_ — is silently broken: `tasklist` failures cause the PID to be **dropped** instead of conservatively included.
The fix is to check `tl.error` (or a non-zero/null status) explicitly:
```suggestion
const tl = spawnSync("tasklist", ["/FI", `PID eq ${pid}`, "/FO", "CSV", "/NH"], {
encoding: "utf8",
timeout: spawnTimeoutMs,
windowsHide: true,
});
if (tl.error || tl.status == null || tl.status !== 0) {
// tasklist failed — include the PID anyway — better to kill a stale
// process than leave a port occupied.
confirmed.push(pid);
} else if (tl.stdout && tl.stdout.toLowerCase().includes("node")) {
confirmed.push(pid);
}
```
Replace the whole `try/catch` block with the above (no `try` or `catch` wrappers needed).
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e069075a4b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (tl.stdout && tl.stdout.toLowerCase().includes("node")) { | ||
| confirmed.push(pid); |
There was a problem hiding this comment.
Restrict Windows stale-PID match to OpenClaw process
findGatewayPidsOnPortWindows claims to return only OpenClaw gateway PIDs, but this check accepts any tasklist entry containing node, so any unrelated Node app listening on the configured gateway port can be terminated during cleanStaleGatewayProcessesSync restart cleanup. This is a behavioral regression versus the Unix path (which filters to OpenClaw command names) and can kill non-OpenClaw workloads if they bind the same port.
Useful? React with 👍 / 👎.
|
Addressed review feedback with two fixes: Fix 1 (P1 logic bug): Replaced dead try/catch around spawnSync(tasklist) in findGatewayPidsOnPortWindows with an explicit tl.error || tl.status !== 0 check. spawnSync never throws, errors are reported via the .error property, so the catch block was unreachable dead code. Now errors are properly detected and the PID is included as intended. Fix 2 (P2 orphaned JSDoc): Moved the stranded JSDoc comment from between the Windows function and findGatewayPidsOnPortSync to sit directly above findGatewayPidsOnPortSync where it belongs. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 315e2f7968
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (tl.error || tl.status !== 0) { | ||
| // If tasklist fails, include the PID anyway — better to kill a stale | ||
| // process than leave a port occupied. | ||
| confirmed.push(pid); | ||
| continue; |
There was a problem hiding this comment.
Skip PIDs when process identity cannot be verified
When tasklist fails, this branch still adds the PID to confirmed, so restart cleanup can terminate whatever process is listening on the gateway port even if it is not OpenClaw. This is a real risk on timeout/permission/transient failures of tasklist: the code explicitly bypasses identity verification and proceeds to kill by PID, which can disrupt unrelated services bound to that port.
Useful? React with 👍 / 👎.
|
Pushed test fix: the win32 test was still expecting the old no-op behavior (return [], don't call spawnSync). Updated to mock netstat output and verify the Windows code path is actually exercised. The TypeError on line 99 (res undefined) should be resolved — CI will confirm. |
findGatewayPidsOnPortSync() returned [] on Windows, so gateway stop could not kill stale node.exe holding the port (EADDRINUSE on restart). Add netstat-based PID detection and port polling for Windows. Fixes openclaw#52049
15474e0 to
e3e28a8
Compare
|
Closing — this was superseded by #60480 which landed the same Windows stale-PID cleanup (netstat + command-line verification via isGatewayArgv) with a cleaner shared-helper architecture. All three review concerns (P1: dead catch, P1: broad node match, P2: skip unverified) are addressed in that implementation. Thanks @arifahmedjoy! |
|
You are welcome, @andyk-ms! |
Problem
\indGatewayPidsOnPortSync()\ returns []\ on Windows (line 89: early return). This means \openclaw gateway stop\ cannot detect or kill stale
ode.exe\ processes holding the gateway port, causing EADDRINUSE on restart.
Fix
etstat -ano\ to find PIDs listening on the gateway port, verified via \ asklist\ to confirm they are node.exe processes
No changes to \ erminateStaleProcessesSync()\ — Node.js \process.kill(pid, 'SIGTERM')\ already works on Windows.
Testing
Tested on Windows 10 (build 26200) with 3 scenarios:
Fixes #52049