Skip to content

fix(windows): prevent restart race from duplicate schtasks /Run#52487

Open
andyk-ms wants to merge 1 commit intoopenclaw:mainfrom
andyk-ms:fix/windows-restart-race
Open

fix(windows): prevent restart race from duplicate schtasks /Run#52487
andyk-ms wants to merge 1 commit intoopenclaw:mainfrom
andyk-ms:fix/windows-restart-race

Conversation

@andyk-ms
Copy link
Copy Markdown
Contributor

Problem

When openclaw gateway restart runs, two sources race to start the task:

  1. CLI: schtasks /End then schtasks /Run
  2. Self-restart script: dying gateway fires relaunchGatewayScheduledTask() which spawns a detached .cmd that retries schtasks /Run up to 12 times

Both succeed, creating duplicate gateway windows (the "triple window" bug).

Fix

The self-restart .cmd script now queries the task status before attempting schtasks /Run. If the task is already running (CLI beat it there), the script exits cleanly instead of spawning a duplicate.

Impact

  • Zero behavior change for config-watcher restarts (task won't be running yet)
  • Prevents duplicate window on CLI-initiated restarts
  • Self-cleaning: script still deletes itself on exit

Fixes #52044

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 22, 2026

Greptile Summary

This PR fixes the "triple window" bug on Windows by inserting a pre-flight schtasks /Query status check into the self-restart .cmd script, preventing it from launching a duplicate gateway process when the CLI has already restarted the scheduled task.

Key changes:

  • Before each schtasks /Run attempt, the script now queries the task's current status and exits (goto cleanup) if the task is already in the Running state.
  • The fix is zero-impact for config-watcher-initiated restarts (task won't be running when the script first checks) and self-cleaning (script still deletes itself on exit).

Notable concern:

  • The findstr /I "Running" match against schtasks /Query /FO LIST output is locale-sensitive — on non-English Windows, the status string is localised (e.g. German, French), so the guard silently does nothing and the duplicate-window race can still occur for those users. Consider using (Get-ScheduledTask ...).State via PowerShell for a locale-agnostic check.

Confidence Score: 4/5

  • Safe to merge for English Windows users; non-English users won't regress but also won't benefit from the fix.
  • The fix is logically correct — the errorlevel semantics with the pipe are sound, the task names don't introduce false positives, and the failure mode on non-English Windows is benign (silent no-op, not a crash or new bug). The one actionable gap is locale sensitivity of the findstr "Running" check, which limits the fix's effectiveness to English Windows installations. That's worth addressing before this is considered fully resolved, but it doesn't break anything new.
  • src/infra/windows-task-restart.ts — specifically the findstr /I "Running" status check on line 32.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/infra/windows-task-restart.ts
Line: 32-33

Comment:
**`findstr "Running"` is locale-sensitive**

`schtasks /Query /FO LIST` outputs localised status strings on non-English Windows installations (e.g. German "Wird ausgeführt", French "En cours d'exécution"). On those systems `findstr /I "Running"` will never match, so the early-exit guard silently does nothing and the script falls back to the original retry-12-times behaviour — leaving the duplicate-window bug present for non-English users.

A locale-independent alternative is to use PowerShell's `Get-ScheduledTask`, whose `.State` property is always an English enum value regardless of OS locale:

```
`powershell -Command "(Get-ScheduledTask -TaskName ${quotedTaskName} -ErrorAction SilentlyContinue).State" 2>nul | findstr /I "Running" >nul 2>&1`,
```

Or, if PowerShell is undesirable, check the exit code of `schtasks /Run` for the specific HRESULT `0x41301` ("task is already running") rather than querying status first — though that code isn't easily testable from a plain batch script.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(windows): prevent restart race from ..." | Re-trigger Greptile

Comment thread src/infra/windows-task-restart.ts Outdated
Comment on lines +32 to +33
`schtasks /Query /TN ${quotedTaskName} /FO LIST 2>nul | findstr /I "Running" >nul 2>&1`,
"if not errorlevel 1 goto cleanup",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 findstr "Running" is locale-sensitive

schtasks /Query /FO LIST outputs localised status strings on non-English Windows installations (e.g. German "Wird ausgeführt", French "En cours d'exécution"). On those systems findstr /I "Running" will never match, so the early-exit guard silently does nothing and the script falls back to the original retry-12-times behaviour — leaving the duplicate-window bug present for non-English users.

A locale-independent alternative is to use PowerShell's Get-ScheduledTask, whose .State property is always an English enum value regardless of OS locale:

`powershell -Command "(Get-ScheduledTask -TaskName ${quotedTaskName} -ErrorAction SilentlyContinue).State" 2>nul | findstr /I "Running" >nul 2>&1`,

Or, if PowerShell is undesirable, check the exit code of schtasks /Run for the specific HRESULT 0x41301 ("task is already running") rather than querying status first — though that code isn't easily testable from a plain batch script.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/windows-task-restart.ts
Line: 32-33

Comment:
**`findstr "Running"` is locale-sensitive**

`schtasks /Query /FO LIST` outputs localised status strings on non-English Windows installations (e.g. German "Wird ausgeführt", French "En cours d'exécution"). On those systems `findstr /I "Running"` will never match, so the early-exit guard silently does nothing and the script falls back to the original retry-12-times behaviour — leaving the duplicate-window bug present for non-English users.

A locale-independent alternative is to use PowerShell's `Get-ScheduledTask`, whose `.State` property is always an English enum value regardless of OS locale:

```
`powershell -Command "(Get-ScheduledTask -TaskName ${quotedTaskName} -ErrorAction SilentlyContinue).State" 2>nul | findstr /I "Running" >nul 2>&1`,
```

Or, if PowerShell is undesirable, check the exit code of `schtasks /Run` for the specific HRESULT `0x41301` ("task is already running") rather than querying status first — though that code isn't easily testable from a plain batch script.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed: the script uses PowerShell \Get-ScheduledTask\ whose .State\ property returns an English enum (\Running, \Ready, etc.) regardless of OS locale. The \ indstr /I \Running\ here matches that PowerShell output, not \schtasks /Query\ output — so it's locale-safe.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a830edaf61

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/infra/windows-task-restart.ts Outdated
@@ -27,6 +27,10 @@ function buildScheduledTaskRestartScript(taskName: string): string {
":retry",
`timeout /t ${TASK_RESTART_RETRY_DELAY_SEC} /nobreak >nul`,
"set /a attempts+=1",
// Check if the task is already running (e.g. CLI restart beat us to it).
// schtasks /Query output contains "Running" when the task is active.
`schtasks /Query /TN ${quotedTaskName} /FO LIST 2>nul | findstr /I "Running" >nul 2>&1`,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid locale-dependent running check in restart helper

This findstr /I "Running" guard depends on English schtasks output, but Windows localizes task status text, so on non-English installs the helper will miss the "already running" case and still execute /Run, allowing the duplicate-start race this patch is meant to prevent. Use a locale-invariant signal (for example Last Run Result code 0x41301) instead of matching an English token in the raw query output.

Useful? React with 👍 / 👎.

@andyk-ms
Copy link
Copy Markdown
Contributor Author

Addressed the locale-sensitivity feedback: replaced schtasks /Query /FO LIST | findstr /I Running with PowerShell's Get-ScheduledTask. The .State property returns an English enum (Running, Ready, etc.) regardless of OS locale, unlike schtasks which outputs localized status strings on non-English Windows.

@andyk-ms
Copy link
Copy Markdown
Contributor Author

andyk-ms commented Apr 4, 2026

CI: The only failure is security-fast which appears to be an infra flake (not related to this PR's changes). Could a maintainer re-run the failed job? All other 45/46 checks pass. Thanks!

@andyk-ms andyk-ms force-pushed the fix/windows-restart-race branch from ed7abae to 7895ec0 Compare April 5, 2026 22:30
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7895ec0bac

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/infra/windows-task-restart.ts Outdated
// Use PowerShell Get-ScheduledTask instead of schtasks /Query because
// schtasks outputs localized status strings on non-English Windows,
// while Get-ScheduledTask .State is always an English enum.
`powershell -Command "(Get-ScheduledTask -TaskName '${taskName}' -ErrorAction SilentlyContinue).State" 2>nul | findstr /I "Running" >nul 2>&1`,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Escape task name in PowerShell running-state probe

The new running check embeds taskName directly in -TaskName '${taskName}' inside a .cmd line, so values from OPENCLAW_WINDOWS_TASK_NAME (or profile-derived names) containing characters like ' or % are parsed incorrectly (' breaks the PowerShell string, % is expanded by cmd before PowerShell runs). In those cases the probe never detects an already-running task and the script still calls schtasks /Run, which can recreate the duplicate-start race this patch is trying to prevent.

Useful? React with 👍 / 👎.

@andyk-ms
Copy link
Copy Markdown
Contributor Author

Checked upstream main — no fixes landed for the failing \�xtension-signal\ / \�xtension-memory-core\ tests. These failures are pre-existing upstream flakes (Signal SSE stream timeout, memory module unexpected status) unrelated to the \Get-ScheduledTask\ change in this PR. The actual fix is solid. Could a maintainer re-run CI or confirm these are known flakes? Happy to rebase once upstream stabilises these tests.

@andyk-ms andyk-ms force-pushed the fix/windows-restart-race branch from 7895ec0 to b0ef1ef Compare April 12, 2026 18:21
@andyk-ms
Copy link
Copy Markdown
Contributor Author

Rebased onto current main and addressed the P2 review comment: task name single quotes are now escaped by doubling ('') for safe PowerShell string interpolation.

@andyk-ms andyk-ms force-pushed the fix/windows-restart-race branch from b0ef1ef to ec1f20d Compare April 16, 2026 02:04
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ec1f20d6c7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/infra/windows-task-restart.ts Outdated
// Use PowerShell Get-ScheduledTask instead of schtasks /Query because
// schtasks outputs localized status strings on non-English Windows,
// while Get-ScheduledTask .State is always an English enum.
`powershell -Command "(Get-ScheduledTask -TaskName '${taskName.replace(/'/g, "''")}' -ErrorAction SilentlyContinue).State" 2>nul | findstr /I "Running" >nul 2>&1`,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Run scheduled-task probe without PowerShell profiles

The new running-state probe uses powershell -Command without -NoProfile/-NonInteractive, so user or system PowerShell profiles execute before Get-ScheduledTask. On machines with customized profiles, extra stdout (for example text containing Running) can make findstr report a false match, and interactive/profile startup code can block the probe entirely; both cases prevent the helper from reliably reaching schtasks /Run during self-restart.

Useful? React with 👍 / 👎.

@andyk-ms
Copy link
Copy Markdown
Contributor Author

Friendly ping — CI is all green. Would appreciate a review when you get a chance. Thanks!

@andyk-ms
Copy link
Copy Markdown
Contributor Author

Another friendly ping — CI remains green and the fix is solid. Would appreciate a maintainer review when you get a chance. Thanks!

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 30, 2026

Codex review: needs real behavior proof before merge.

Summary
The PR adds a locale-stable PowerShell scheduled-task state probe before the Windows self-restart helper retries schtasks /Run, plus focused tests and a changelog entry.

Reproducibility: yes. Source inspection on current main shows the CLI restart path starts the scheduled task while the shutdown self-restart helper separately retries schtasks /Run, and the related report gives a concrete native Windows openclaw gateway restart path.

Real behavior proof
Needs real behavior proof before merge: Only tests/build/check output is described; no after-fix native Windows restart output, screenshot, recording, or redacted log demonstrates the real duplicate-window fix. After adding proof, update the PR body; ClawSweeper should re-review automatically. If it does not, ask a maintainer to comment @clawsweeper re-review.

Next step before merge
Human/contributor action is needed for after-fix native Windows behavior proof; there is no narrow code repair for ClawSweeper to make.

Security
Cleared: Security pass cleared the refreshed diff; the local PowerShell probe uses -NoProfile -NonInteractive and routes the command through the existing cmd quoting layer.

Review details

Best possible solution:

Land this patch or an equivalent current-main port after adding redacted native Windows restart proof that one gateway instance remains, while preserving the existing fallback behavior and regression tests.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection on current main shows the CLI restart path starts the scheduled task while the shutdown self-restart helper separately retries schtasks /Run, and the related report gives a concrete native Windows openclaw gateway restart path.

Is this the best way to solve the issue?

Yes for the code direction. The self-restart running-state guard is the narrowest maintainable fix for this duplicate /Run race, and the refreshed diff addresses the prior quoting/profile/changelog gaps; it is not merge-ready without real behavior proof.

What I checked:

  • Current main self-restart retries /Run unguarded: buildScheduledTaskRestartScript() enters :retry, waits, increments attempts, and calls schtasks /Run without first checking whether another restart path already started the scheduled task. (src/infra/windows-task-restart.ts:40, 9eaca28ef715)
  • Current main CLI restart is the competing start path: restartScheduledTask() ends the scheduled task and then starts it again through runScheduledTaskOrThrow(), matching the race described in Bug: gateway restart spawns duplicate processes on Windows (3 windows) #52044. (src/daemon/schtasks.ts:994, 9eaca28ef715)
  • Latest PR head addresses prior command-safety blockers: The provided PR diff builds a PowerShell Get-ScheduledTask state command, wraps it with quoteCmdScriptArg(), and invokes powershell.exe -NoProfile -NonInteractive -ExecutionPolicy Bypass before retrying schtasks /Run. (src/infra/windows-task-restart.ts:48, 9a31bc85e7f8)
  • Focused tests and changelog are now present in the PR diff: The refreshed patch adds assertions for the running-task probe order and PowerShell single-quote escaping, and adds a user-facing Windows/restart changelog entry. (src/infra/windows-task-restart.test.ts:120, 9a31bc85e7f8)
  • Real behavior proof is still missing: The latest maintainer refresh comment lists formatter, Vitest, build, and check results, but no redacted native Windows openclaw gateway restart screenshot, terminal output, recording, or log showing the duplicate-window fix after the patch. (9a31bc85e7f8)

Likely related people:

  • BradGroux: A maintainer force-pushed the prepared head, stated that it was rebased onto current origin/main, added focused coverage and changelog coverage, and reported local verification for the refreshed patch. (role: recent maintainer / likely follow-up owner; confidence: medium; commits: 9a31bc85e7f8; files: src/infra/windows-task-restart.ts, src/infra/windows-task-restart.test.ts, CHANGELOG.md)
  • steipete: Available local blame for the checked-out main attributes the scheduled-task restart helper, restartScheduledTask(), and cmd quoting helper to Peter Steinberger's grafted snapshot commit; older provenance is limited by the shallow checkout. (role: current-main adjacent owner; confidence: low; commits: bb75d408b9f6; files: src/infra/windows-task-restart.ts, src/daemon/schtasks.ts, src/daemon/cmd-argv.ts)

Remaining risk / open question:

  • No after-fix native Windows restart proof is present yet, so the duplicate-window suppression has not been demonstrated in a real scheduled-task setup.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 0071f67fe6ea.

When CLI runs 'openclaw gateway restart', it does schtasks /End then
schtasks /Run. Meanwhile the dying gateway fires a self-restart script
(relaunchGatewayScheduledTask) that also retries schtasks /Run up to
12 times. Both succeed, creating duplicate gateway windows.

Fix: the self-restart .cmd script now checks if the task is already
running before attempting schtasks /Run. If another restart (CLI) beat
it there, the script exits cleanly instead of spawning a duplicate.

Fixes openclaw#52044
@BradGroux BradGroux force-pushed the fix/windows-restart-race branch from ec1f20d to 9a31bc8 Compare May 9, 2026 09:08
@BradGroux
Copy link
Copy Markdown
Member

Maintainer refresh: rebased this onto current origin/main and kept the Windows scheduled-task restart fix aligned with the current restart-log/fallback helper.

Prepared head: 9a31bc85e7f8b12a8bc4cbd62b81e69f2c980eaf

Local verification passed:

  • pnpm exec oxfmt --check --threads=1 src/infra/windows-task-restart.ts src/infra/windows-task-restart.test.ts CHANGELOG.md
  • pnpm vitest run src/infra/windows-task-restart.test.ts
  • pnpm build
  • pnpm check

I also added focused coverage for the duplicate /Run guard and PowerShell task-name escaping, plus the changelog entry. Waiting on fresh GitHub CI before this can be merged.

@openclaw-barnacle openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size: XS triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: gateway restart spawns duplicate processes on Windows (3 windows)

2 participants