fix(windows): delegate npm global update to detached helper to avoid EBUSY#44614
fix(windows): delegate npm global update to detached helper to avoid EBUSY#44614Eiman24 wants to merge 1 commit intoopenclaw:mainfrom
Conversation
Greptile SummaryThis PR introduces a detached-process update flow on Windows to work around the Issues found:
Confidence Score: 3/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/gateway/server-methods/update.ts
Line: 119
Comment:
**`Math.max(undefined, 2000)` evaluates to `NaN`, causing immediate exit**
`restartDelayMs` has type `number | undefined` (see `parseRestartRequestParams`). When the caller omits this parameter, `restartDelayMs` is `undefined`, and `Math.max(undefined, 2000)` returns `NaN` in JavaScript. `setTimeout(fn, NaN)` treats `NaN` as `0`, so `process.exit(0)` fires on the very next event-loop tick — before the WebSocket / HTTP response written by `respond()` above has had a chance to flush. The client never receives the "detached" response and the update appears to have silently failed.
```suggestion
}, Math.max(restartDelayMs ?? 0, 2000));
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/infra/update-detached-win32.ts
Line: 178-188
Comment:
**`readDetachedUpdateResult` is exported but never consumed**
The detached helper writes a JSON result file (e.g. `{"ok":false,"reason":"install-failed","exitCode":1}`) so the next Gateway boot can surface the outcome to the user. However, nothing in this PR reads that file on startup. The diagnostic information is silently discarded: a failed update leaves no visible signal in the UI, logs, or restart sentinel.
Consider wiring up a startup check (e.g. in the Gateway boot sequence) that calls `readDetachedUpdateResult` for any pending result path — perhaps stored in the restart sentinel written in `update.ts` at line 90 — and then emits a log warning or an observable update-status event before deleting the result file.
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: 9f59d37 |
| setTimeout(() => { | ||
| context?.logGateway?.info("Exiting for detached Windows update..."); | ||
| process.exit(0); | ||
| }, Math.max(restartDelayMs, 2000)); |
There was a problem hiding this comment.
Math.max(undefined, 2000) evaluates to NaN, causing immediate exit
restartDelayMs has type number | undefined (see parseRestartRequestParams). When the caller omits this parameter, restartDelayMs is undefined, and Math.max(undefined, 2000) returns NaN in JavaScript. setTimeout(fn, NaN) treats NaN as 0, so process.exit(0) fires on the very next event-loop tick — before the WebSocket / HTTP response written by respond() above has had a chance to flush. The client never receives the "detached" response and the update appears to have silently failed.
| }, Math.max(restartDelayMs, 2000)); | |
| }, Math.max(restartDelayMs ?? 0, 2000)); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/server-methods/update.ts
Line: 119
Comment:
**`Math.max(undefined, 2000)` evaluates to `NaN`, causing immediate exit**
`restartDelayMs` has type `number | undefined` (see `parseRestartRequestParams`). When the caller omits this parameter, `restartDelayMs` is `undefined`, and `Math.max(undefined, 2000)` returns `NaN` in JavaScript. `setTimeout(fn, NaN)` treats `NaN` as `0`, so `process.exit(0)` fires on the very next event-loop tick — before the WebSocket / HTTP response written by `respond()` above has had a chance to flush. The client never receives the "detached" response and the update appears to have silently failed.
```suggestion
}, Math.max(restartDelayMs ?? 0, 2000));
```
How can I resolve this? If you propose a fix, please make it concise.| export function readDetachedUpdateResult( | ||
| resultPath: string, | ||
| ): { ok: boolean; reason?: string; exitCode?: number } | null { | ||
| try { | ||
| const raw = fs.readFileSync(resultPath, "utf8").trim(); | ||
| const parsed = JSON.parse(raw); | ||
| return parsed; | ||
| } catch { | ||
| return null; | ||
| } | ||
| } |
There was a problem hiding this comment.
readDetachedUpdateResult is exported but never consumed
The detached helper writes a JSON result file (e.g. {"ok":false,"reason":"install-failed","exitCode":1}) so the next Gateway boot can surface the outcome to the user. However, nothing in this PR reads that file on startup. The diagnostic information is silently discarded: a failed update leaves no visible signal in the UI, logs, or restart sentinel.
Consider wiring up a startup check (e.g. in the Gateway boot sequence) that calls readDetachedUpdateResult for any pending result path — perhaps stored in the restart sentinel written in update.ts at line 90 — and then emits a log warning or an observable update-status event before deleting the result file.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/update-detached-win32.ts
Line: 178-188
Comment:
**`readDetachedUpdateResult` is exported but never consumed**
The detached helper writes a JSON result file (e.g. `{"ok":false,"reason":"install-failed","exitCode":1}`) so the next Gateway boot can surface the outcome to the user. However, nothing in this PR reads that file on startup. The diagnostic information is silently discarded: a failed update leaves no visible signal in the UI, logs, or restart sentinel.
Consider wiring up a startup check (e.g. in the Gateway boot sequence) that calls `readDetachedUpdateResult` for any pending result path — perhaps stored in the restart sentinel written in `update.ts` at line 90 — and then emits a log warning or an observable update-status event before deleting the result file.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9f59d379f8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const child = spawn("cmd.exe", ["/d", "/s", "/c", quoteCmdScriptArg(scriptPath)], { | ||
| detached: true, | ||
| stdio: "ignore", | ||
| windowsHide: true, | ||
| }); |
There was a problem hiding this comment.
Propagate prepared env to detached updater process
This detached path drops the install environment assembled in runGatewayUpdate: spawnDetachedUpdate accepts env but never passes it to spawn(...), so the helper runs with the parent process environment instead of globalInstallEnv. On Windows, that loses values from createGlobalInstallEnv (notably portable-Git PATH prepends and npm config vars), which can make detached npm i -g fail on machines that rely on those settings even though the previous in-process update path succeeded.
Useful? React with 👍 / 👎.
| setTimeout(() => { | ||
| context?.logGateway?.info("Exiting for detached Windows update..."); | ||
| process.exit(0); | ||
| }, Math.max(restartDelayMs, 2000)); |
There was a problem hiding this comment.
Clamp detached shutdown delay to helper wait window
The gateway exit delay is taken from restartDelayMs without an upper bound, but the detached helper only waits 60 seconds for this PID before aborting with pid-wait-timeout. If a caller sends restartDelayMs > 60000, the helper exits before the gateway stops, so the install/restart never runs and the later process.exit(0) can leave the gateway down.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ff87e1440e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| setTimeout(() => { | ||
| context?.logGateway?.info("Exiting for detached Windows update..."); | ||
| process.exit(0); | ||
| }, Math.max(restartDelayMs, 2000)); |
There was a problem hiding this comment.
Default detached exit delay to a finite value
restartDelayMs is optional, but this call computes Math.max(restartDelayMs, 2000) directly; when the caller omits restartDelayMs, that expression becomes NaN, and Node timers treat NaN like an immediate timeout. In the detached update path this can call process.exit(0) almost instantly, which can cut off the update.run response before it flushes to the client in the common default-parameter case.
Useful? React with 👍 / 👎.
| const installArgv = globalInstallArgs(globalManager, spec); | ||
| const detached = spawnDetachedUpdate({ installArgv, env: globalInstallEnv as NodeJS.ProcessEnv | undefined }); | ||
| if (!detached.ok) { |
There was a problem hiding this comment.
Preserve npm omit-optional retry in win32 detached updates
The Windows branch delegates a single globalInstallArgs(...) command to the helper and returns early, so it never reaches the existing npm fallback retry (globalInstallFallbackArgs) used elsewhere in this function. That regresses environments where the first global npm install fails but the --omit=optional retry succeeds, causing Windows updates to fail in cases that were previously handled.
Useful? React with 👍 / 👎.
| const installCommand = params.installArgv | ||
| .map((arg) => (arg.includes(" ") ? `"${arg}"` : arg)) | ||
| .join(" "); |
There was a problem hiding this comment.
Escape detached install argv for cmd metacharacters
This constructs installCommand by only wrapping args that contain spaces, but cmd.exe still interprets metacharacters like &, |, %, !, and ^ in unescaped arguments. In the detached path, package specs such as signed URLs or local paths containing these characters can be split/expanded and make the update command fail (or execute unintended fragments), whereas the previous in-process argv execution avoided shell parsing.
Useful? React with 👍 / 👎.
…EBUSY On Windows, 'npm i -g openclaw@latest' fails with EBUSY when the Gateway process is running because Node.js locks loaded .js files. The Dashboard 'Update now' button always fails on Windows for this reason. This fix introduces a detached update flow for Windows: 1. update-runner detects Windows + global npm install scenario 2. Spawns a detached .cmd helper script (update-detached-win32.ts) 3. Returns status 'ok' with detachedResultPath so the handler knows to exit the Gateway process (not just SIGUSR1 in-process reload) 4. The detached helper waits for the Gateway PID to exit, runs npm install, then relaunches the Gateway via Scheduled Task This mirrors the existing windows-task-restart.ts pattern but adds the npm install step between stop and restart. Files changed: - src/infra/update-detached-win32.ts (new): Detached update helper - src/infra/update-runner.ts: Windows branch in global update path - src/gateway/server-methods/update.ts: Handle detachedResultPath - src/infra/update-detached-win32.test.ts (new): Tests Fixes: Dashboard 'Update now' EBUSY failure on Windows
ff87e14 to
c83ca14
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c83ca14db7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Return status "ok" so the caller schedules a graceful shutdown. | ||
| // The detached script handles the actual install + restart. | ||
| return { | ||
| status: "ok", |
There was a problem hiding this comment.
Report delegated update as pending until install actually finishes
The win32 global-update path returns status: "ok" as soon as the detached helper is spawned, before any install step runs. update.run then uses that status for the API response and restart sentinel, so if the helper later records install-failed or pid-wait-timeout, users still get a success-style restart message on next boot. This masks real update failures and regresses observability versus the previous synchronous path; the delegated flow should not be marked ok until detachedResultPath has been reconciled.
Useful? React with 👍 / 👎.
| setTimeout( | ||
| () => { | ||
| context?.logGateway?.info("Exiting for detached Windows update..."); | ||
| process.exit(0); |
There was a problem hiding this comment.
Require supervised restart capability before forced detached exit
This branch unconditionally hard-exits the gateway after delegation, but the helper restart mechanism is schtasks-only and no check is performed that the current Windows runtime is actually task-supervised. In unsupervised runs (which the existing restart pipeline explicitly supports via in-process fallback), the helper cannot relaunch the process, so update.run can leave the gateway down immediately after returning success. Gate detached exit behind detected scheduled-task supervision, or keep the prior restart fallback path.
Useful? React with 👍 / 👎.
|
Thanks for the context here. I swept through the related work, and this is now duplicate or superseded. Close as superseded: the Windows EBUSY update bug is real, but this PR is unsafe as-is and open PR #75649 explicitly carries this work forward with the staged-update, restart-supervision, env, quoting, result-reconciliation, docs, and changelog follow-up. So I’m closing this here and keeping the remaining discussion on the canonical linked item. Review detailsBest possible solution: Close this source PR and continue review on #75649 as the canonical replacement for #40540, preserving the original contributor credit while landing the safer staged-update design. Do we have a high-confidence way to reproduce the issue? Yes. The reproduction path is source-backed by #40540's Windows EBUSY logs plus current main's live Gateway/global npm package update path; I did not run a live Windows update in this read-only review. Is this the best way to solve the issue? No. The detached-helper direction is plausible, but this PR is not the best implementation because it bypasses current staged-update safeguards, drops env handling, builds unsafe cmd commands, and reports success before the detached update is reconciled; #75649 is the safer replacement path. Security review: Security review needs attention: The PR introduces a Windows command-execution risk in the detached update helper.
What I checked:
Likely related people:
Codex review notes: model gpt-5.5, reasoning high; reviewed against 775c27433fa3. |
Problem
On Windows, the Dashboard "Update now" button (
update.run) always fails withEBUSYbecause:.jsfiles lockednpm i -g openclaw@latestneeds to rename/delete those locked filesEPERM/EBUSY→ update never succeedsThis breaks the user expectation that clicking "Update now" should work. The current workaround requires manually stopping the Gateway, running npm, and restarting — which defeats the purpose of having an update button.
Solution
Introduce a detached update flow for Windows that mirrors the existing
windows-task-restart.tspattern:update-runner.tsdetects Windows + global npm install scenario.cmdhelper script (update-detached-win32.ts)status: "ok"withdetachedResultPathto signal delegationupdate.tshandler seesdetachedResultPath→ sends response → exits the Gateway process (process.exit(0))npm i -g openclaw@latestschtasks /Run)This is the same approach used by many desktop apps (VS Code, Chrome, etc.): exit the main process, let an updater replace files, then restart.
Changes
src/infra/update-detached-win32.tssrc/infra/update-runner.tsdetachedResultPathfield onUpdateRunResultsrc/gateway/server-methods/update.tsdetachedResultPath: respond, thenprocess.exit(0)to release file lockssrc/infra/update-detached-win32.test.tsNon-Windows platforms
No changes. Linux/macOS continue to use the existing in-process
npm i -g+ SIGUSR1 restart flow, which works fine because those platforms don't lock loaded JS files.Testing