Skip to content

fix(doctor): commit legacy migrations even when unrelated validation fails#76800

Merged
steipete merged 5 commits intoopenclaw:mainfrom
hclsys:fix/76798-legacy-migration-partial-commit
May 3, 2026
Merged

fix(doctor): commit legacy migrations even when unrelated validation fails#76800
steipete merged 5 commits intoopenclaw:mainfrom
hclsys:fix/76798-legacy-migration-partial-commit

Conversation

@hclsys
Copy link
Copy Markdown
Contributor

@hclsys hclsys commented May 3, 2026

Fixes #76798.

Problem

migrateLegacyConfig returned config: null when post-migration validation found any unrelated issue (e.g. a missing plugin, a broken provider). The caller (applyLegacyCompatibilityStep) then kept the unmigrated config as the candidate, so doctor --fix never wrote the migration to disk.

Concrete repro: user has both agents.defaults.llm.idleTimeoutSeconds (deprecated legacy key) and a stale brave plugin entry. doctor --fix knows how to migrate the llm key but drops it on the floor because the brave plugin fails validation — leaving users stuck in the LKG-restore loop documented in #76700.

Fix

When applyLegacyDoctorMigrations succeeds but validateConfigObjectWithPlugins fails on the result, return config: next (the migrated shape) with partiallyValid: true instead of config: null.

applyLegacyCompatibilityStep already handles this correctly — the !migrated branch was the only blocker. Now the migrated config is committed to state.candidate regardless of unrelated validation issues, so doctor --fix always applies every safe migration it knows about.

Tests

Regression test added to config-flow-steps.test.ts: asserts that when migrateLegacyConfig returns partiallyValid: true, state.candidate and state.cfg are updated to the migrated shape.

🤖 Generated with Claude Code

@openclaw-barnacle openclaw-barnacle Bot added commands Command implementations size: S labels May 3, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 3, 2026

Codex review: passed.

Summary
The PR preserves partially valid doctor legacy migrations, threads an internal skip-plugin-validation write path through doctor config persistence, adds regression coverage, and records the user-facing fix in the changelog.

Reproducibility: yes. Source inspection on current main shows migrateLegacyConfig returns null after unrelated plugin-aware validation failure, and the caller leaves the unmigrated candidate in place.

Next step before merge
No separate repair PR is needed; this automerge-opted PR already contains the focused fix and should proceed through exact-head CI/merge gates.

Security
Cleared: No concrete security or supply-chain regression found; the new skip path is an internal doctor/config write option and raw config validation still runs.

Review details

Best possible solution:

Doctor should persist safe legacy migration output and use an internal write path that skips only plugin-aware validation when unrelated plugin issues remain.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection on current main shows migrateLegacyConfig returns null after unrelated plugin-aware validation failure, and the caller leaves the unmigrated candidate in place.

Is this the best way to solve the issue?

Yes. The PR is the narrow maintainable fix: mark the migration as partially valid, keep the migrated candidate, and skip only plugin-aware write validation while preserving base schema validation.

What I checked:

  • Current main drops migrated config after unrelated validation failure: migrateLegacyConfig validates the migrated config with plugins and returns config: null when that validation fails, discarding the migrated shape. (src/commands/doctor/shared/legacy-config-migrate.ts:13, 7915110259c2)
  • Current main keeps the unmigrated candidate: applyLegacyCompatibilityStep only updates cfg and candidate when migrateLegacyConfig returns a truthy config; the null branch keeps prior state while marking pending changes. (src/commands/doctor/shared/config-flow-steps.ts:26, 7915110259c2)
  • Doctor repair contract is existing behavior: The doctor docs define --fix as the --repair alias and describe non-interactive safe migrations/non-service repairs, so this is a bug fix to an existing repair path. Public docs: docs/cli/doctor.md. (docs/cli/doctor.md:33, 7915110259c2)
  • PR keeps the partial migration result: The PR returns the migrated config with partiallyValid: true, propagates that marker out of applyLegacyCompatibilityStep, and returns skipPluginValidationOnWrite from the doctor config flow. (src/commands/doctor-config-flow.ts:90, e4490be3f266)
  • PR scopes validation skipping to plugin-aware write checks: The write path passes pluginValidation: "skip" only when the internal write option is set, and the underlying validator still runs raw/base config validation before honoring plugin-validation skip. (src/config/io.ts:2025, e4490be3f266)
  • Base schema validation remains active: validateConfigObjectWithPluginsBase calls validateConfigObjectRaw before the pluginValidation === "skip" return, so the PR does not bypass raw config schema validation. (src/config/validation.ts:754, 7915110259c2)

Likely related people:

  • steipete: GitHub path history shows recent work on doctor legacy migration internals, doctor config flow, config recovery, config writes, and include writes; local blame for the current extracted doctor/config files also points to Peter's recent main snapshot. (role: recent maintainer and likely follow-up owner; confidence: high; commits: 820761396da0, d92a634faeb0, 0ee52e940547; files: src/commands/doctor/shared/legacy-config-migrate.ts, src/commands/doctor/shared/config-flow-steps.ts, src/commands/doctor-config-flow.ts)
  • vincentkoc: Recent config recovery and clobber-protection work touches the same config persistence/recovery area used by this doctor write path. (role: adjacent recent maintainer; confidence: medium; commits: 3c5169254362; files: src/config/io.ts)

Remaining risk / open question:

  • Exact-head changed-gate completion is not included in the provided context; the automerge path should still wait for required CI on the current head/base.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 7915110259c2.

@hclsys
Copy link
Copy Markdown
Contributor Author

hclsys commented May 3, 2026

Fixed. Pushed fixup de764fd97a extending the persistence path:

Changes in this fixup:

  • src/config/io.ts: ConfigWriteOptions.skipPluginValidation?: booleanpluginValidation: "skip" when set
  • src/commands/doctor-config-flow.ts: return skipPluginValidationOnWrite: true when legacyMigrationPartiallyValid
  • src/commands/doctor/shared/config-flow-steps.ts: propagate partiallyValid through applyLegacyCompatibilityStep return
  • src/flows/doctor-health-contributions.ts: DoctorConfigResult picks up skipPluginValidationOnWrite; runWriteConfigHealth passes skipPluginValidation: true to replaceConfigFile when set
  • src/commands/doctor-config-flow.test.ts: E2E test verifying skipPluginValidationOnWrite: true is returned when migration is partially valid (#76800)

Tests: pnpm test src/commands/doctor/shared/config-flow-steps.test.ts src/commands/doctor-config-flow.test.ts — 35/35 pass.

@hclsys
Copy link
Copy Markdown
Contributor Author

hclsys commented May 3, 2026

Fixup pushed (SHA `8acad8f93d`):

P2 — exported writeConfigFile wrapper now forwards skipPluginValidation
src/config/io.ts: The exported writeConfigFile now passes pluginValidation: "skip" to createConfigIO when skipPluginValidation is set, so both the write-phase validateConfigObjectRawWithPlugins call and the post-write loadConfig re-read honor the flag.

P2 — include-write fast path now skips plugin validation when flag is set
src/config/mutate.ts: tryWriteSingleTopLevelIncludeMutation now wraps its validateConfigObjectWithPlugins call in if (!params.writeOptions?.skipPluginValidation), so single-key include writes no longer block safe legacy migrations with unrelated plugin schema errors.

Regression test added (src/config/io.write-config.test.ts): Verifies that skipPluginValidation: true allows a write that would fail plugin validation to succeed, while skipPluginValidation: false still rejects it.

Tests: 59/59 pass (config-flow-steps, doctor-config-flow, io.write-config, mutate). Formatter clean.

Re-review progress:

@hclsys
Copy link
Copy Markdown
Contributor Author

hclsys commented May 3, 2026

Fixup pushed (SHA `023a7ceae8`):

P2 — include fast path bails out when skipPluginValidation is set
src/config/mutate.ts: tryWriteSingleTopLevelIncludeMutation now returns false immediately when writeOptions.skipPluginValidation is set. This falls back to the root writer, which already has full end-to-end plugin-validation skipping (both the write-phase validateConfigObjectRawWithPlugins call and the post-write loadConfig re-read via createConfigIO({ pluginValidation: "skip" })). The include fast path's post-write readConfigFileSnapshotForWrite() readback no longer throws on unrelated plugin issues.

Tests: 65/65 pass. Formatter clean.

@steipete
Copy link
Copy Markdown
Contributor

steipete commented May 3, 2026

Thanks, this is the right direction for #76798: safe legacy doctor migrations should not be dropped just because an unrelated plugin/provider validation issue remains.

Current blocker is mechanical rather than conceptual: GitHub reports this branch as conflicting against current main (at least CHANGELOG.md), and the latest head only has light automation checks so far. Please rebase on current main and let the normal CI/changed gate run; after that this should be ready for maintainer landing unless the rebase exposes a real overlap.

@hclsys hclsys force-pushed the fix/76798-legacy-migration-partial-commit branch from 023a7ce to 3769ae9 Compare May 3, 2026 23:12
hclsys added a commit to hclsys/moltbot that referenced this pull request May 3, 2026
…nclaw#76800)

Clawsweeper P2 x2:
1. io.ts exported writeConfigFile wrapper now passes skipPluginValidation to
   createConfigIO so both write-phase validation and post-write loadConfig
   re-read honor the flag.
2. mutate.ts tryWriteSingleTopLevelIncludeMutation now skips plugin validation
   when writeOptions.skipPluginValidation is set, so include-write fast path
   no longer blocks safe legacy migrations with unrelated plugin errors.

Adds regression test: skipPluginValidation bypasses plugin schema rejection
on writeConfigFile and falls back to throwing when flag is not set.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
hclsys added a commit to hclsys/moltbot that referenced this pull request May 3, 2026
…n is set (openclaw#76800)

Clawsweeper P2: after the include write, readConfigFileSnapshotForWrite()
calls loadConfig() which validates with plugins; refreshedSnapshot.valid is
false when an unrelated plugin issue exists, causing the include path to
throw even though skipPluginValidation was requested.

Simplest fix: return false from tryWriteSingleTopLevelIncludeMutation when
skipPluginValidation is set, letting the root writer handle the write with
plugin validation disabled end-to-end (including post-write readback via
createConfigIO({ pluginValidation: "skip" })).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@hclsys
Copy link
Copy Markdown
Contributor Author

hclsys commented May 3, 2026

Rebased onto current main — merge conflict in CHANGELOG.md resolved. CI running.

@hclsys
Copy link
Copy Markdown
Contributor Author

hclsys commented May 3, 2026

Fixed lint failure: src/plugins/install.ts used .sort() (pre-existing violation in main, surfaced by the PR's oxlint run). Changed to .toSorted() per no-array-sort rule. Force-pushed.

hclsys and others added 5 commits May 4, 2026 00:24
…fails (openclaw#76798)

migrateLegacyConfig previously returned config: null when post-migration
validation found any issue (e.g. a missing plugin). The caller then kept
the unmigrated config as the candidate, so doctor --fix never wrote the
legacy migration to disk.

Now when validation fails after a successful migration, the migrated
config is returned with partiallyValid: true. applyLegacyCompatibilityStep
always commits the migrated config to state.candidate, ensuring
agents.defaults.llm and other known-legacy keys are cleaned up on
doctor --fix even when an unrelated provider or plugin issue blocks
the full validator.

Adds regression test asserting that candidate is updated to the migrated
shape when partiallyValid is set.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Thread skipPluginValidationOnWrite through loadAndMaybeMigrateDoctorConfig
return value into runWriteConfigHealth so replaceConfigFile bypasses plugin
validation when migration is only partially valid. Add E2E test verifying
the flag propagates end-to-end.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nclaw#76800)

Clawsweeper P2 x2:
1. io.ts exported writeConfigFile wrapper now passes skipPluginValidation to
   createConfigIO so both write-phase validation and post-write loadConfig
   re-read honor the flag.
2. mutate.ts tryWriteSingleTopLevelIncludeMutation now skips plugin validation
   when writeOptions.skipPluginValidation is set, so include-write fast path
   no longer blocks safe legacy migrations with unrelated plugin errors.

Adds regression test: skipPluginValidation bypasses plugin schema rejection
on writeConfigFile and falls back to throwing when flag is not set.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n is set (openclaw#76800)

Clawsweeper P2: after the include write, readConfigFileSnapshotForWrite()
calls loadConfig() which validates with plugins; refreshedSnapshot.valid is
false when an unrelated plugin issue exists, causing the include path to
throw even though skipPluginValidation was requested.

Simplest fix: return false from tryWriteSingleTopLevelIncludeMutation when
skipPluginValidation is set, letting the root writer handle the write with
plugin validation disabled end-to-end (including post-write readback via
createConfigIO({ pluginValidation: "skip" })).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@steipete steipete force-pushed the fix/76798-legacy-migration-partial-commit branch from 4ae444b to e4490be Compare May 3, 2026 23:28
@steipete
Copy link
Copy Markdown
Contributor

steipete commented May 3, 2026

@clawsweeper automerge

@clawsweeper clawsweeper Bot added the clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge label May 3, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 3, 2026

🦞👀
ClawSweeper saw the passing review, but did not merge yet.

Source: clawsweeper[bot]
Feedback: structured ClawSweeper verdict: pass (sha=e4490be3f266f7c0263e950e2ad05ae52b6daba4)
Merge status: merge command failed: GraphQL: Base branch was modified. Review and try the merge again. (mergePullRequest)

I left the PR open for the remaining gate instead of bypassing it.

Automerge progress:

  • 2026-05-03 23:29:45 UTC review queued e4490be3f266 (queued)
  • 2026-05-03 23:34:10 UTC review passed e4490be3f266 (structured ClawSweeper verdict: pass (sha=e4490be3f266f7c0263e950e2ad05ae52b6da...)

@clawsweeper clawsweeper Bot added the clawsweeper:merge-ready ClawSweeper found the PR merge-ready but a human gate is still closed label May 3, 2026
@steipete steipete merged commit b1db87f into openclaw:main May 3, 2026
112 of 113 checks passed
vincentkoc added a commit that referenced this pull request May 3, 2026
#76800 (fixes #76798) added user-facing `doctor --fix` behavior to
commit safe legacy migrations even when unrelated validation issues
prevent full validation, but the existing entry credited no contributor
and used a bare PR/issue reference. Promote #76798 to a `Fixes` ref,
add the merging PR ref (#76800), and credit the human contributor
@hclsys per CLAUDE.md changelog-attribution rules.
arieldiego73 pushed a commit to arieldiego73/openclaw that referenced this pull request May 5, 2026
openclaw#76800 (fixes openclaw#76798) added user-facing `doctor --fix` behavior to
commit safe legacy migrations even when unrelated validation issues
prevent full validation, but the existing entry credited no contributor
and used a bare PR/issue reference. Promote openclaw#76798 to a `Fixes` ref,
add the merging PR ref (openclaw#76800), and credit the human contributor
@hclsys per CLAUDE.md changelog-attribution rules.
lxe pushed a commit to lxe/openclaw that referenced this pull request May 6, 2026
…fails (openclaw#76800)

* fix(doctor): commit legacy migrations even when unrelated validation fails (openclaw#76798)

migrateLegacyConfig previously returned config: null when post-migration
validation found any issue (e.g. a missing plugin). The caller then kept
the unmigrated config as the candidate, so doctor --fix never wrote the
legacy migration to disk.

Now when validation fails after a successful migration, the migrated
config is returned with partiallyValid: true. applyLegacyCompatibilityStep
always commits the migrated config to state.candidate, ensuring
agents.defaults.llm and other known-legacy keys are cleaned up on
doctor --fix even when an unrelated provider or plugin issue blocks
the full validator.

Adds regression test asserting that candidate is updated to the migrated
shape when partiallyValid is set.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fixup: extend skipPluginValidation to write path with E2E coverage

Thread skipPluginValidationOnWrite through loadAndMaybeMigrateDoctorConfig
return value into runWriteConfigHealth so replaceConfigFile bypasses plugin
validation when migration is only partially valid. Add E2E test verifying
the flag propagates end-to-end.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fixup(doctor): wire skipPluginValidation through full write path (openclaw#76800)

Clawsweeper P2 x2:
1. io.ts exported writeConfigFile wrapper now passes skipPluginValidation to
   createConfigIO so both write-phase validation and post-write loadConfig
   re-read honor the flag.
2. mutate.ts tryWriteSingleTopLevelIncludeMutation now skips plugin validation
   when writeOptions.skipPluginValidation is set, so include-write fast path
   no longer blocks safe legacy migrations with unrelated plugin errors.

Adds regression test: skipPluginValidation bypasses plugin schema rejection
on writeConfigFile and falls back to throwing when flag is not set.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fixup(doctor): bail out of include fast path when skipPluginValidation is set (openclaw#76800)

Clawsweeper P2: after the include write, readConfigFileSnapshotForWrite()
calls loadConfig() which validates with plugins; refreshedSnapshot.valid is
false when an unrelated plugin issue exists, causing the include path to
throw even though skipPluginValidation was requested.

Simplest fix: return false from tryWriteSingleTopLevelIncludeMutation when
skipPluginValidation is set, letting the root writer handle the write with
plugin validation disabled end-to-end (including post-write readback via
createConfigIO({ pluginValidation: "skip" })).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(doctor): cover partial legacy migration writes

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
lxe pushed a commit to lxe/openclaw that referenced this pull request May 6, 2026
openclaw#76800 (fixes openclaw#76798) added user-facing `doctor --fix` behavior to
commit safe legacy migrations even when unrelated validation issues
prevent full validation, but the existing entry credited no contributor
and used a bare PR/issue reference. Promote openclaw#76798 to a `Fixes` ref,
add the merging PR ref (openclaw#76800), and credit the human contributor
@hclsys per CLAUDE.md changelog-attribution rules.
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
…fails (openclaw#76800)

* fix(doctor): commit legacy migrations even when unrelated validation fails (openclaw#76798)

migrateLegacyConfig previously returned config: null when post-migration
validation found any issue (e.g. a missing plugin). The caller then kept
the unmigrated config as the candidate, so doctor --fix never wrote the
legacy migration to disk.

Now when validation fails after a successful migration, the migrated
config is returned with partiallyValid: true. applyLegacyCompatibilityStep
always commits the migrated config to state.candidate, ensuring
agents.defaults.llm and other known-legacy keys are cleaned up on
doctor --fix even when an unrelated provider or plugin issue blocks
the full validator.

Adds regression test asserting that candidate is updated to the migrated
shape when partiallyValid is set.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fixup: extend skipPluginValidation to write path with E2E coverage

Thread skipPluginValidationOnWrite through loadAndMaybeMigrateDoctorConfig
return value into runWriteConfigHealth so replaceConfigFile bypasses plugin
validation when migration is only partially valid. Add E2E test verifying
the flag propagates end-to-end.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fixup(doctor): wire skipPluginValidation through full write path (openclaw#76800)

Clawsweeper P2 x2:
1. io.ts exported writeConfigFile wrapper now passes skipPluginValidation to
   createConfigIO so both write-phase validation and post-write loadConfig
   re-read honor the flag.
2. mutate.ts tryWriteSingleTopLevelIncludeMutation now skips plugin validation
   when writeOptions.skipPluginValidation is set, so include-write fast path
   no longer blocks safe legacy migrations with unrelated plugin errors.

Adds regression test: skipPluginValidation bypasses plugin schema rejection
on writeConfigFile and falls back to throwing when flag is not set.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fixup(doctor): bail out of include fast path when skipPluginValidation is set (openclaw#76800)

Clawsweeper P2: after the include write, readConfigFileSnapshotForWrite()
calls loadConfig() which validates with plugins; refreshedSnapshot.valid is
false when an unrelated plugin issue exists, causing the include path to
throw even though skipPluginValidation was requested.

Simplest fix: return false from tryWriteSingleTopLevelIncludeMutation when
skipPluginValidation is set, letting the root writer handle the write with
plugin validation disabled end-to-end (including post-write readback via
createConfigIO({ pluginValidation: "skip" })).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* test(doctor): cover partial legacy migration writes

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Peter Steinberger <steipete@gmail.com>
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
openclaw#76800 (fixes openclaw#76798) added user-facing `doctor --fix` behavior to
commit safe legacy migrations even when unrelated validation issues
prevent full validation, but the existing entry credited no contributor
and used a bare PR/issue reference. Promote openclaw#76798 to a `Fixes` ref,
add the merging PR ref (openclaw#76800), and credit the human contributor
@hclsys per CLAUDE.md changelog-attribution rules.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clawsweeper:automerge Maintainer opted this PR into bounded ClawSweeper-reviewed automerge clawsweeper:merge-ready ClawSweeper found the PR merge-ready but a human gate is still closed commands Command implementations size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: doctor --fix discards legacy migrations if any unrelated validation issue remains, leaving config unfixed

2 participants