fix(cli): stabilize Windows scheduled-task restart health after ready by openclaw-clownfish[bot] · Pull Request #73889 · openclaw/openclaw

openclaw-clownfish · 2026-04-29T00:26:37Z

Summary

keep [Bug]: Windows Scheduled Task gateway restart/health becomes inconsistent after ready #63491 as the canonical live Windows Scheduled Task restart/health issue
add narrow source-backed handling or regression coverage for the remaining post-ready HTTP loss path
preserve the existing schtasks Last Result and policy-close behavior from current main

Validation

pnpm check:changed

References

Canonical issue: [Bug]: Windows Scheduled Task gateway restart/health becomes inconsistent after ready #63491
Related diagnostics: bug(windows): gateway restart false-times out because schtasks parser misses 'Last Result' and health probe rejects 1008 policy closes #48771 and [Bug]: OpenClaw gateway restart #48766
Prior credited policy-close work: fix(cli): accept local policy closes during gateway restart health checks #48801 and fix(cli): tighten Windows restart policy-close health checks #72660
Related SIGUSR1 work: fix(gateway): SIGUSR1 restart fast path that doesn't break Windows schtasks #68853

ProjectClownfish replacement details:

Cluster: ghcrawl-156629-autonomous-smoke
Source PRs: fix(cli): accept local policy closes during gateway restart health checks #48801, fix(cli): tighten Windows restart policy-close health checks #72660
Credit: Credit @ufomaker for the live [Bug]: Windows Scheduled Task gateway restart/health becomes inconsistent after ready #63491 canonical report.; Credit @genoooool for the bug(windows): gateway restart false-times out because schtasks parser misses 'Last Result' and health probe rejects 1008 policy closes #48771 diagnosis of schtasks Last Result and policy-close restart-health behavior.; Credit @DanTheRealMan for the earlier [Bug]: OpenClaw gateway restart #48766 Windows restart timeout report.; If the implementation reuses the fix(cli): accept local policy closes during gateway restart health checks #48801 policy-close direction, preserve source PR credit for @MarsDoge and mention that fix(cli): tighten Windows restart policy-close health checks #72660 already carried that replacement path forward.
Validation: pnpm check:changed

greptile-apps · 2026-04-29T00:29:44Z

Greptile Summary

Adds a requireReachability flag to the Windows scheduled-task restart health path, ensuring the loopback HTTP/WebSocket probe must succeed before declaring the restart healthy — even after schtasks and port ownership show ready. The change is narrow, well-tested, and consistent with the existing includeUnknownListenersAsStale Windows-only guard.

There is a subtle interaction worth reviewing: when requireReachability=true and the first probe throws (e.g. 3s timeout) rather than returning {ok: false}, reachability stays null, and the pre-existing best-effort second probe block (lines 390-398) re-runs unconditionally because !expectedVersion is true. If that second probe succeeds, healthy ends up true, silently bypassing the requirement. In the primary bug scenario the probe returns {ok: false} and the memoised result prevents this — but the throw path is worth guarding explicitly.

Confidence Score: 4/5

Safe to merge; the fix correctly handles the main Windows post-ready HTTP loss scenario, with one narrow edge case where a thrown probe could be bypassed by the existing best-effort retry block.

Only P2 findings are present. The core logic change is correct for the targeted bug (probe returning {ok:false}), and tests confirm the expected behavior. The probe-throw bypass is unlikely in the specific failure mode being fixed but represents a logical gap worth addressing.

src/cli/daemon-cli/restart-health.ts lines 375-398 — the interaction between the new requireReachability first block and the pre-existing best-effort second block.

Comments Outside Diff (1)

src/cli/daemon-cli/restart-health.ts, line 375-398 (link)

requireReachability enforcement bypassed on probe throw

When requireReachability=true (Windows path) and the first probe throws (e.g. a 3s timeout), healthy is set to false and reachability remains null. The second best-effort block (lines 390–398) then fires — !healthy && running && portUsage.status === "busy" && !expectedVersion — and re-executes loadReachability() from scratch. If that second probe succeeds, healthy ends up true, silently circumventing the requireReachability constraint.

In the specific bug scenario (post-ready HTTP loss), the probe likely returns {ok: false} rather than throwing, so the memoised result from the first call would be returned unchanged. But for timeout failures the path diverges. Consider guarding the second block to skip when requireReachability is set:

if (!healthy && running && portUsage.status === "busy" && !expectedVersion && !params.requireReachability) {

Prompt To Fix With AI

This is a comment left during a code review.
Path: src/cli/daemon-cli/restart-health.ts
Line: 375-398

Comment:
**`requireReachability` enforcement bypassed on probe throw**

When `requireReachability=true` (Windows path) and the first probe throws (e.g. a 3s timeout), `healthy` is set to `false` and `reachability` remains `null`. The second best-effort block (lines 390–398) then fires — `!healthy && running && portUsage.status === "busy" && !expectedVersion` — and re-executes `loadReachability()` from scratch. If that second probe succeeds, `healthy` ends up `true`, silently circumventing the `requireReachability` constraint.

In the specific bug scenario (post-ready HTTP loss), the probe likely returns `{ok: false}` rather than throwing, so the memoised result from the first call would be returned unchanged. But for timeout failures the path diverges. Consider guarding the second block to skip when `requireReachability` is set:

```typescript
if (!healthy && running && portUsage.status === "busy" && !expectedVersion && !params.requireReachability) {
```

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: src/cli/daemon-cli/restart-health.ts
Line: 375-398

Comment:
**`requireReachability` enforcement bypassed on probe throw**

When `requireReachability=true` (Windows path) and the first probe throws (e.g. a 3s timeout), `healthy` is set to `false` and `reachability` remains `null`. The second best-effort block (lines 390–398) then fires — `!healthy && running && portUsage.status === "busy" && !expectedVersion` — and re-executes `loadReachability()` from scratch. If that second probe succeeds, `healthy` ends up `true`, silently circumventing the `requireReachability` constraint.

In the specific bug scenario (post-ready HTTP loss), the probe likely returns `{ok: false}` rather than throwing, so the memoised result from the first call would be returned unchanged. But for timeout failures the path diverges. Consider guarding the second block to skip when `requireReachability` is set:

```typescript
if (!healthy && running && portUsage.status === "busy" && !expectedVersion && !params.requireReachability) {
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix(cli): stabilize Windows scheduled-ta..." | Re-trigger Greptile}

clawsweeper · 2026-04-29T00:30:55Z

Codex review: needs changes before merge.

Summary
The PR adds a Windows-only requireReachability restart-health option, forwards it through gateway restart, adds regression coverage, and adds a changelog entry.

Reproducibility: yes. by source inspection rather than a live Windows run. Current main can treat running && ownsPort as healthy in the normal gateway restart path without requiring a loopback HTTP/WS probe.

Next step before merge
A narrow automated branch repair can change only the changelog marker from a closing keyword to a non-closing reference before normal PR validation.

Security
Cleared: The diff tightens local restart-health probing and updates tests/changelog without adding dependencies, CI execution, secret handling, downloads, or new external trust boundaries.

Review findings

[P3] Reference the canonical issue without closing it — CHANGELOG.md:34

Review details

Best possible solution:

Land the focused Windows-only reachability gate after changing the changelog to reference #63491 without closing it, leaving that issue open for the broader Windows Scheduled Task instability investigation.

Do we have a high-confidence way to reproduce the issue?

Yes, by source inspection rather than a live Windows run. Current main can treat running && ownsPort as healthy in the normal gateway restart path without requiring a loopback HTTP/WS probe.

Is this the best way to solve the issue?

Yes, with one wording correction. A Windows-only reachability flag reuses the existing local probe and preserves prior schtasks and policy-close behavior; the changelog should use a non-closing reference to #63491.

Full review comments:

[P3] Reference the canonical issue without closing it — CHANGELOG.md:34
The PR body says [Bug]: Windows Scheduled Task gateway restart/health becomes inconsistent after ready #63491 remains the live canonical Windows Scheduled Task issue, but the changelog entry says Fixes #63491. Since this patch only covers the post-ready reachability path, use Refs #63491 or similar wording so the broader issue is not auto-closed on merge.
Confidence: 0.91

Overall correctness: patch is correct
Overall confidence: 0.86

Acceptance criteria:

pnpm exec oxfmt --check --threads=1 CHANGELOG.md
pnpm test src/cli/daemon-cli/restart-health.test.ts src/cli/daemon-cli/lifecycle.test.ts
pnpm check:changed

What I checked:

current_main_restart_call_lacks_required_reachability: The scheduled-task restart path calls waitForGatewayHealthyRestart with includeUnknownListenersAsStale only; there is no option requiring loopback reachability before reporting a healthy restart. (src/cli/daemon-cli/lifecycle.ts:285, e8f13c625e34)
current_main_can_return_healthy_from_ownership_only: inspectGatewayRestart initializes healthy from running && ownsPort, and only forces the reachability probe when expectedVersion is present; this leaves the post-ready HTTP/WS loss path source-reproducible. (src/cli/daemon-cli/restart-health.ts:378, e8f13c625e34)
prior_policy_close_behavior_preserved_on_main: Current main already has the local 1008 policy-close allowlist for connect/device/pairing/auth cases, so the PR is extending restart health rather than replacing that merged behavior. (src/cli/daemon-cli/restart-health.ts:81, e8f13c625e34)
schtasks_last_result_fix_already_on_main: Current main accepts Last Result as an alias for Last Run Result, matching the PR body's intent to preserve the existing scheduled-task parser fix. (src/daemon/schtasks.ts:190, e8f13c625e34)
pr_diff_adds_targeted_reachability_requirement: The provided PR diff adds requireReachability, checks reachability when that flag is set, forwards it through the waiter, and passes it on Windows gateway restart calls. (src/cli/daemon-cli/restart-health.ts:277, 6b1ccb6d6d23)
pr_test_covers_post_ready_probe_loss: The provided PR diff adds a regression test where Windows schtasks owns the port but the first loopback probe is down, and the waiter keeps polling until a later successful probe. (src/cli/daemon-cli/restart-health.test.ts:245, 6b1ccb6d6d23)

Likely related people:

vincentkoc: Provided related PR context shows merged PR fix(cli): tighten Windows restart policy-close health checks #72660 carried forward and tightened the Windows restart policy-close path that this PR preserves and extends; local current-main blame also routes the central restart-health files through Vincent Koc in this checkout. (role: recent maintainer for Windows restart-health policy-close behavior; confidence: high; commits: c9ece5df3786, c8fa0fd1c9d5; files: src/cli/daemon-cli/restart-health.ts, src/cli/daemon-cli/restart-health.test.ts, src/cli/daemon-cli/lifecycle.ts)
MarsDoge: Provided related PR context shows fix(cli): accept local policy closes during gateway restart health checks #48801 introduced the local policy-close restart-health approach later carried forward by fix(cli): tighten Windows restart policy-close health checks #72660 and referenced by this PR. (role: source contributor for prior policy-close restart-health work; confidence: medium; commits: e82bc552d094; files: src/cli/daemon-cli/restart-health.ts, src/cli/daemon-cli/restart-health.test.ts)
Peter Steinberger: Local history shows repeated adjacent restart-health work on timeout behavior, health verification, and shared restart-health test fixtures. (role: adjacent maintainer for restart-health timeout and test helper work; confidence: medium; commits: 905e355f651d, 80f430c2be2e, 5ad27fa25f73; files: src/cli/daemon-cli/restart-health.ts, src/cli/daemon-cli/restart-health.test.ts, src/cli/daemon-cli/lifecycle.ts)

Remaining risk / open question:

No live Windows Scheduled Task run was executed in this read-only review; merge should still wait for the targeted tests and changed gate after the changelog wording fix.

Codex review notes: model gpt-5.5, reasoning high; reviewed against e8f13c625e34.

fix(cli): stabilize Windows scheduled-task restart health after ready

6b1ccb6

openclaw-clownfish Bot added the clawsweeper Tracked by ClawSweeper automation label Apr 29, 2026

openclaw-barnacle Bot added cli CLI command changes size: S labels Apr 29, 2026

clawsweeper Bot mentioned this pull request Apr 29, 2026

[Bug]: Windows Scheduled Task gateway restart/health becomes inconsistent after ready #63491

Open

BradGroux mentioned this pull request Apr 29, 2026

WORKING: All Microsoft Issues and PRs (refresh) #74163

Draft

clawsweeper Bot mentioned this pull request May 1, 2026

fix(gateway): SIGUSR1 restart fast path that doesn't break Windows schtasks #68853

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(cli): stabilize Windows scheduled-task restart health after ready#73889

fix(cli): stabilize Windows scheduled-task restart health after ready#73889
openclaw-clownfish[bot] wants to merge 1 commit intomainfrom
clownfish/ghcrawl-156629-autonomous-smoke

openclaw-clownfish Bot commented Apr 29, 2026

Uh oh!

greptile-apps Bot commented Apr 29, 2026 •

edited

Loading

Comments Outside Diff (1)

Uh oh!

clawsweeper Bot commented Apr 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

Uh oh!

Conversation

openclaw-clownfish Bot commented Apr 29, 2026

Summary

Validation

References

Uh oh!

greptile-apps Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Comments Outside Diff (1)

Uh oh!

clawsweeper Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

0 participants

greptile-apps Bot commented Apr 29, 2026 •

edited

Loading

clawsweeper Bot commented Apr 29, 2026 •

edited

Loading