Skip to content

fix(windows): repair doctor update fallback migration#88311

Draft
vincentkoc wants to merge 3 commits into
mainfrom
fix/windows-doctor-startup-migration
Draft

fix(windows): repair doctor update fallback migration#88311
vincentkoc wants to merge 3 commits into
mainfrom
fix/windows-doctor-startup-migration

Conversation

@vincentkoc

@vincentkoc vincentkoc commented May 30, 2026

Copy link
Copy Markdown
Member

Summary

  • fixes Windows update-mode doctor repair so running managed gateways are activated/restarted, while stopped services stay staged
  • audits stale OPENCLAW_SERVICE_VERSION values and repairs version drift through the existing gateway service doctor path
  • removes legacy Windows Startup-folder fallback launchers (.cmd and hidden .vbs) only after Task Scheduler reports running evidence
  • preserves auth during update-mode task rewrites by migrating a legacy embedded OPENCLAW_GATEWAY_TOKEN into gateway.auth.token before reinstalling the task
  • keeps config writes safe across both modern update parents and legacy update handoffs, so later doctor writes do not drop recovered tokens and older parents are not given unreadable config

Fixes #87156.

Verification

  • node scripts/run-vitest.mjs src/commands/doctor-gateway-services.test.ts src/flows/doctor-health-contributions.test.ts src/commands/doctor-update.test.ts src/daemon/schtasks.startup-fallback.test.ts src/daemon/schtasks.stop.test.ts -- --run - passed after rebase on 11f564ede1ba3acade0a697090e62e21794c8e59 (100 assertions across 5 files)
  • node_modules/.bin/oxfmt --check src/commands/doctor-gateway-services.ts src/commands/doctor-gateway-services.test.ts src/flows/doctor-health-contributions.ts src/flows/doctor-health-contributions.test.ts - passed
  • node scripts/run-oxlint.mjs src/commands/doctor-gateway-services.ts src/commands/doctor-gateway-services.test.ts src/flows/doctor-health-contributions.ts src/flows/doctor-health-contributions.test.ts - passed
  • git diff --check origin/main...HEAD && git diff --check - passed
  • .agents/skills/autoreview/scripts/autoreview --mode branch --base origin/main - clean after addressing accepted findings for token propagation, update-safe write options, and legacy update-parent write guards
  • Native Windows Crabbox proof run_52d49841f33d on AWS Windows (cbx_013fda705836) created both Startup-folder fallback extensions under the real Windows Startup folder and removed both: windows_startup_dual_extension_cleanup=ok, removed_extensions=.cmd,.vbs
  • Native Windows GitHub probe https://github.com/openclaw/openclaw/actions/runs/26690553757 passed on windows-2025; WSL was present (wsl_status_exit=0, default version 2) but no distro was installed, so this is WSL availability proof, not successful WSL distro execution
  • Crabbox changed gate run_a9b4283c04ab reached pnpm check:changed; conflict markers, changelog attributions, dependency guards, core typecheck, and core test typecheck passed, then the gate failed in an untouched current-main Discord boundary dts check: extensions/discord/src/monitor/gateway-plugin.ts:165 (1000 | 1001 compared with 1008). This branch does not touch extensions/discord/**.

Real behavior proof

Behavior addressed: Windows doctor/update migration now distinguishes running vs stopped managed gateways, detects stale service versions, preserves gateway auth during task rewrites, restarts stale installed/running gateways after update, and removes old Startup-folder fallback launchers only after Task Scheduler running evidence.

Real environment tested: native AWS Windows via Crabbox for Startup-folder dual-extension cleanup; GitHub-hosted Windows Server 2025 probe for native Windows and WSL availability; focused local Vitest shards for daemon startup fallback, daemon stop, doctor update, doctor gateway services, and doctor health config handoff.

Exact steps or command run after this patch: focused Vitest/format/lint/autoreview commands listed above; native Windows cleanup probe run_52d49841f33d; remote changed gate node scripts/crabbox-wrapper.mjs run --provider aws --idle-timeout 90m --ttl 240m --timing-json --shell -- "pnpm check:changed" (run_a9b4283c04ab).

Evidence after fix: focused tests passed after the latest rebase at 11f564ede1ba3acade0a697090e62e21794c8e59; autoreview is clean; native Windows cleanup proof removed both .cmd and .vbs Startup-folder fallbacks; remote changed gate passed the touched core typecheck and test typecheck lanes before hitting the unrelated Discord boundary dts failure.

Observed result after fix: the Windows doctor/service migration paths are validated for the touched unit behavior and the native Windows Startup-folder cleanup path. The broad changed gate is blocked by an untouched current-main Discord type issue, not by this patch.

What was not tested: successful WSL2 distro boot/execution inside a Windows runner; direct Blacksmith Testbox proof, because Blacksmith auth is not available in this environment.

@vincentkoc vincentkoc self-assigned this May 30, 2026
@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime commands Command implementations size: M maintainer Maintainer-authored PR labels May 30, 2026
@clawsweeper

clawsweeper Bot commented May 30, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed May 31, 2026, 8:31 AM ET / 12:31 UTC.

Summary
The PR changes Windows doctor/update service repair to audit stale gateway service versions, preserve recovered gateway auth, restart or activate already-running managed gateways, stage stopped ones, and remove legacy Startup-folder fallbacks only after Scheduled Task running evidence.

PR surface: Source +158, Tests +488. Total +646 across 11 files.

Reproducibility: yes. source inspection and the linked user report give a high-confidence reproduction path: current main stages all update-mode service repairs and does not remove hidden .vbs fallbacks, while the linked issue reports stale Windows fallback and stale gateway process after update. I did not execute a native Windows reproduction during this read-only review.

Review metrics: 2 noteworthy metrics.

  • Config Repair Write Surface: 1 in-repair write added, 1 final-write handoff updated. The PR can now persist recovered gateway.auth.token during a service repair and must keep that config alive for later doctor writes.
  • Windows Fallback Cleanup Guards: 2 launcher extensions covered, 2 activation paths guarded. Both .cmd and hidden .vbs fallbacks are now removed after install/restart only when Scheduled Task running evidence is present.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • none.

Risk before merge

  • [P2] Merging changes Windows update-time service activation and fallback cleanup; if the running-evidence gate is wrong, users could keep a stale gateway or lose their Startup-folder fallback too early.
  • [P1] The PR intentionally migrates an embedded gateway token into config during supported Windows update repairs, so maintainers should accept that upgrade/security boundary before landing.
  • [P1] The broad changed gate reported in the PR body is blocked by an unrelated current-main Discord dts issue, so final landing still needs normal required-check handling even though touched core lanes passed.

Maintainer options:

  1. Accept Windows Upgrade Semantics (recommended)
    Land after maintainers explicitly accept that update-mode repairs may activate/restart already-running Windows gateways and remove old fallbacks only after Scheduler running evidence.
  2. Ask For Broader Native Proof
    Request another native Windows update transcript if maintainers need direct end-to-end proof of stale service-version repair plus token preservation beyond the focused cleanup proof and unit coverage.
  3. Pause On Stopped Fallback Policy
    Pause this PR if maintainers want stopped Startup-folder fallback installs migrated immediately instead of staged for the next start.

Next step before merge

  • [P2] Keep this in the maintainer review lane because the PR is active, draft/protected, and merge-ready code review still depends on accepting Windows upgrade compatibility and availability risk rather than a narrow automated repair.

Security
Cleared: No concrete security or supply-chain regression found; the diff adds no dependencies or workflows and keeps token migration inside the existing gateway service repair path with legacy-parent and external-repair guards.

Review details

Best possible solution:

Land a maintainer-approved version that keeps the existing service repair ownership, restarts or activates only already-running Windows managed gateways, preserves recovered auth through update-safe config writes, and deletes old Startup-folder fallbacks only after Scheduled Task running evidence.

Do we have a high-confidence way to reproduce the issue?

Yes, source inspection and the linked user report give a high-confidence reproduction path: current main stages all update-mode service repairs and does not remove hidden .vbs fallbacks, while the linked issue reports stale Windows fallback and stale gateway process after update. I did not execute a native Windows reproduction during this read-only review.

Is this the best way to solve the issue?

Yes, the PR uses the existing doctor/service repair path instead of adding a new config surface, and it includes explicit guards for external repair policy, legacy update parents, stopped services, and Scheduled Task running evidence. The remaining judgment is whether maintainers accept the intended Windows upgrade semantics.

AGENTS.md: found and applied where relevant.

Codex review notes: reasoning high; reviewed against 729712d19467.

Label changes

Label justifications:

  • P2: This is a normal-priority Windows doctor/update repair with real gateway availability impact but a bounded affected platform and workflow.
  • merge-risk: 🚨 compatibility: The PR changes upgrade-time config writes, service repair behavior, token migration, and fallback cleanup for existing Windows installs.
  • merge-risk: 🚨 availability: A mistake in the restart/activation/fallback cleanup path could leave the gateway stale, stopped, or without its previous Startup-folder fallback.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (logs): The PR body includes structured after-fix proof with native Windows Crabbox cleanup evidence, a Windows GitHub probe, focused test/lint/autoreview commands, and observed .cmd/.vbs fallback removal; maintainers can still ask for the raw artifact link if desired.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes structured after-fix proof with native Windows Crabbox cleanup evidence, a Windows GitHub probe, focused test/lint/autoreview commands, and observed .cmd/.vbs fallback removal; maintainers can still ask for the raw artifact link if desired.
Evidence reviewed

PR surface:

Source +158, Tests +488. Total +646 across 11 files.

View PR surface stats
Area Files Added Removed Net
Source 6 189 31 +158
Tests 5 500 12 +488
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 11 689 43 +646

What I checked:

  • Repository policy applied: Root AGENTS.md was read fully; its compatibility guidance applies because the PR changes update, config write, auth token, startup fallback, and gateway service behavior. (AGENTS.md:1)
  • Current main stages all update-mode service repairs: Current main uses service.stage for every update repair mode, which explains why running Windows fallback installs can remain stale instead of being activated/restarted. (src/commands/doctor-gateway-services.ts:632, 729712d19467)
  • Current main misses hidden fallback cleanup: Current main resolves only the primary and .cmd Startup-folder entries, and install/restart do not remove fallbacks after Scheduled Task evidence. (src/daemon/schtasks.ts:93, 729712d19467)
  • PR adds version drift audit and running-only activation: The branch adds stale OPENCLAW_SERVICE_VERSION audit and switches Windows update repair from unconditional staging to install only when the gateway runtime is already running. (src/commands/doctor-gateway-services.ts:464, 11f564ede1ba)
  • PR guards fallback deletion on Scheduler evidence: The branch includes .cmd and .vbs fallback paths and removes them only after readScheduledTaskRuntime reports a running Scheduled Task with a running result code. (src/daemon/schtasks.ts:93, 11f564ede1ba)
  • PR preserves repair config across doctor writes: The gateway services contribution now assigns the returned repaired config to ctx.cfg, and final config writes preserve update-time write options and legacy parent version overrides. (src/flows/doctor-health-contributions.ts:523, 11f564ede1ba)

Likely related people:

  • steipete: Peter Steinberger appears throughout the Windows schtasks and doctor service history, including Startup-folder fallback creation/cleanup, gateway lifecycle hardening, and doctor repair-flow refactors. (role: recent area contributor; confidence: high; commits: 433e65711f78, 5189ba851c2d, 5c9e4cd30a1e; files: src/daemon/schtasks.ts, src/commands/doctor-gateway-services.ts, src/flows/doctor-health-contributions.ts)
  • vincentkoc: Vincent Koc authored this PR and also has prior merged history in doctor health contribution and related core flow surfaces, so routing review to him is supported beyond this branch alone. (role: recent area contributor; confidence: medium; commits: 74e7b8d47b18, 8143b9a23ead; files: src/flows/doctor-health-contributions.ts, src/commands/doctor-gateway-services.ts)
  • giulio-leone: The current update-mode skip/stage behavior is adjacent to giulio-leone's prior doctor change to skip service config repairs during updates. (role: introduced adjacent behavior; confidence: medium; commits: 67c7f98c328b; files: src/commands/doctor-gateway-services.ts)
  • tmimmanuel: tmimmanuel previously repaired Windows scheduled task restart/install behavior on the same schtasks surface this PR changes. (role: adjacent Windows task contributor; confidence: medium; commits: 0fef95b17d72; files: src/daemon/schtasks.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. labels May 30, 2026
@vincentkoc vincentkoc force-pushed the fix/windows-doctor-startup-migration branch from 41233b5 to d6193de Compare May 30, 2026 11:42
@vincentkoc vincentkoc force-pushed the fix/windows-doctor-startup-migration branch 3 times, most recently from e107b30 to b4f59bc Compare May 30, 2026 12:24
@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 30, 2026
@vincentkoc vincentkoc force-pushed the fix/windows-doctor-startup-migration branch 3 times, most recently from e8066b7 to 18a08eb Compare May 30, 2026 17:37
@vincentkoc vincentkoc force-pushed the fix/windows-doctor-startup-migration branch from 18a08eb to 11f564e Compare May 31, 2026 12:22
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 31, 2026
@clawsweeper clawsweeper Bot added rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

commands Command implementations gateway Gateway runtime maintainer Maintainer-authored PR merge-risk: 🚨 availability 🚨 May cause crashes, hangs, restart loops, stalls, or process outages. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: L status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Windows doctor update leaves Startup-folder gateway fallback stale and does not install Scheduled Task

1 participant