You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Upgrade regression / catch-22 — locks users out of openclaw doctor --fix, the only documented escape from upgrade-induced crash loops.
Severity
High. When this fires, the documented recovery path (openclaw doctor --fix) silently does nothing useful. Users are forced to hand-edit ~/.openclaw/openclaw.json (and the .bak LKG file) to escape. Combined with the LKG-restore loop documented in #76700 and the validation-before-migration order described in #68664, the user is effectively bricked.
Summary
migrateLegacyConfig in src/commands/doctor/shared/legacy-config-migrate.ts (bundled at dist/doctor-config-flow-7oxT6MZQ.js:924) applies legacy migrations to a config, then runs full plugin-aware validation on the migrated result. If the post-migration config still has any unrelated validation issue (e.g. a missing plugin, a broken provider), the function returns config: null and the migrated raw is silently discarded, even though the legacy migration itself succeeded and is strictly safe to persist.
The caller (applyLegacyCompatibilityStep, same file, line 945) then falls through with the unmigrated candidate, the doctor flow continues, and the write step ultimately throws Error: Config validation failed on the unrelated issue. The legacy migration is never written to disk.
This means doctor --fix is non-incremental: a single unrelated validation issue defeats every safe migration that doctor knows how to apply.
Concrete repro (the path I hit)
This is the scenario that triggered the investigation — two independent issues that, individually, doctor knows how to handle, but together brick the user:
Long-running install on v2026.4.x. Config has agents.defaults.llm.idleTimeoutSeconds: 300 (was valid; deprecation rule for it lives in dist/legacy-config-issues-Bce7-rlH.js:526 and a migration for it lives at line 605 — id agents.defaults.llm->models.providers.timeoutSeconds).
Upgrade to v2026.5.2 (commit 8b2a6e5). The new validator now rejects the legacy key as agents.defaults: Unrecognized key: "llm", with a sibling legacyIssue flagging the migration is available.
Independently, plugins.entries.brave.enabled: true plus tools.web.search.provider: "brave" are present, and ~/.openclaw/plugins/installs.json claims @openclaw/brave-plugin@2026.5.2 is installed at ~/.openclaw/npm/node_modules/@openclaw/brave-plugin/, but the directory does not exist on disk. (I'm not sure how this happens — could be a half-completed install, an aborted upgrade, or the same provider-registration failure mode [Bug]: Brave plugin install + tools.web.search.provider: "brave" causes 1.2s crash loop with no CLI escape (5.2) #76700 documents. Issue is orthogonal — a stale install record, a half-uninstalled plugin, or anything else that produces a web_search provider is not available: brave validation error reproduces this just as well.)
User runs openclaw doctor --fix. Expected: at least the legacy agents.defaults.llm key is removed, since OpenClaw ships an explicit migration for it. Actual: doctor ends with Error: Config validation failed: tools.web.search.provider: web_search provider is not available: brave. Re-reading ~/.openclaw/openclaw.json shows agents.defaults.llm is still there. The migration ran in memory but was discarded.
I confirmed this by reading the post-doctor --fix config back; the llm block was unchanged. Only manual JSON edits (on both openclaw.json and the .bak LKG, since the LKG restore loop overwrites in-place edits) escaped the loop.
Root cause (code reference)
In dist/doctor-config-flow-7oxT6MZQ.js, with apparent source path src/commands/doctor/shared/legacy-config-migrate.ts:
The "all-or-nothing" coupling between legacy migration and full plugin-aware validation is the bug. Legacy migrations are strictly safe (they remove a known-legacy key for which a migration is registered) and should be applied independently of unrelated plugin/provider problems.
This issue is specifically: the manually-invoked doctor --fix flow drops legacy migrations on the floor when any other validation issue remains. Even if #68664 lands (migrations before validation in gateway startup), users who already hit the loop and try doctor --fix per OpenClaw's own error message (Run "openclaw doctor --fix") will still find it does nothing.
Suggested fix
Two options, in increasing order of scope.
1. Minimal — make legacy migration independently committable. In migrateLegacyConfig, when post-migration validation fails, still return the migrated raw object instead of null, with a flag noting that other issues remain. The caller commits the legacy-migrated state to state.candidate and surfaces the remaining issues separately. The final write either succeeds (legacy migration alone unbricks the config) or fails on the unrelated issue (but the legacy keys are now gone, so the LKG loop in #76700 no longer reproduces them after the next promotion).
2. Broader — split doctor --fix into independently-committed atomic steps. Each migration / fix is its own transaction. Apply, validate the change in isolation (does this specific change make the config strictly closer to valid?), commit. Surface remaining unfixable issues. This is the structural answer that also dovetails with #50561 (gateway can run the legacy-migration subset on startup safely) and addresses #68664 (legacy migrations are applied earlier, in their own commit, before the strict validator gates startup).
In either form, the user-visible contract should be: doctor --fix always applies every safe migration it knows about, and only refuses on the issues it cannot fix. Today, one unfixable issue blocks every fixable one.
Reproduction artefacts (what to check on a repro instance)
~/.openclaw/openclaw.json — has both agents.defaults.llm.* and a config-level reference to a missing/unloadable web_search provider
~/.openclaw/openclaw.json.bak — same legacy key (the LKG was promoted under the old schema)
Bug type
Upgrade regression / catch-22 — locks users out of
openclaw doctor --fix, the only documented escape from upgrade-induced crash loops.Severity
High. When this fires, the documented recovery path (
openclaw doctor --fix) silently does nothing useful. Users are forced to hand-edit~/.openclaw/openclaw.json(and the.bakLKG file) to escape. Combined with the LKG-restore loop documented in #76700 and the validation-before-migration order described in #68664, the user is effectively bricked.Summary
migrateLegacyConfiginsrc/commands/doctor/shared/legacy-config-migrate.ts(bundled atdist/doctor-config-flow-7oxT6MZQ.js:924) applies legacy migrations to a config, then runs full plugin-aware validation on the migrated result. If the post-migration config still has any unrelated validation issue (e.g. a missing plugin, a broken provider), the function returnsconfig: nulland the migrated raw is silently discarded, even though the legacy migration itself succeeded and is strictly safe to persist.The caller (
applyLegacyCompatibilityStep, same file, line 945) then falls through with the unmigrated candidate, the doctor flow continues, and the write step ultimately throwsError: Config validation failedon the unrelated issue. The legacy migration is never written to disk.This means
doctor --fixis non-incremental: a single unrelated validation issue defeats every safe migration that doctor knows how to apply.Concrete repro (the path I hit)
This is the scenario that triggered the investigation — two independent issues that, individually, doctor knows how to handle, but together brick the user:
v2026.4.x. Config hasagents.defaults.llm.idleTimeoutSeconds: 300(was valid; deprecation rule for it lives indist/legacy-config-issues-Bce7-rlH.js:526and a migration for it lives at line 605 — idagents.defaults.llm->models.providers.timeoutSeconds).v2026.5.2(commit8b2a6e5). The new validator now rejects the legacy key asagents.defaults: Unrecognized key: "llm", with a siblinglegacyIssueflagging the migration is available.plugins.entries.brave.enabled: trueplustools.web.search.provider: "brave"are present, and~/.openclaw/plugins/installs.jsonclaims@openclaw/brave-plugin@2026.5.2is installed at~/.openclaw/npm/node_modules/@openclaw/brave-plugin/, but the directory does not exist on disk. (I'm not sure how this happens — could be a half-completed install, an aborted upgrade, or the same provider-registration failure mode [Bug]: Brave plugin install + tools.web.search.provider: "brave" causes 1.2s crash loop with no CLI escape (5.2) #76700 documents. Issue is orthogonal — a stale install record, a half-uninstalled plugin, or anything else that produces aweb_search provider is not available: bravevalidation error reproduces this just as well.)agents.defaults.llmstill considered valid → identical-hash restore loop; clean repro of [Bug]: Brave plugin install + tools.web.search.provider: "brave" causes 1.2s crash loop with no CLI escape (5.2) #76700's Issue Login fails with 'WebSocket Error (socket hang up)' ECONNRESET #2).openclaw doctor --fix. Expected: at least the legacyagents.defaults.llmkey is removed, since OpenClaw ships an explicit migration for it. Actual: doctor ends withError: Config validation failed: tools.web.search.provider: web_search provider is not available: brave. Re-reading~/.openclaw/openclaw.jsonshowsagents.defaults.llmis still there. The migration ran in memory but was discarded.I confirmed this by reading the post-
doctor --fixconfig back; thellmblock was unchanged. Only manual JSON edits (on bothopenclaw.jsonand the.bakLKG, since the LKG restore loop overwrites in-place edits) escaped the loop.Root cause (code reference)
In
dist/doctor-config-flow-7oxT6MZQ.js, with apparent source pathsrc/commands/doctor/shared/legacy-config-migrate.ts:And the caller (
src/commands/doctor/shared/config-flow-steps.ts, bundled line ~945):The "all-or-nothing" coupling between legacy migration and full plugin-aware validation is the bug. Legacy migrations are strictly safe (they remove a known-legacy key for which a migration is registered) and should be applied independently of unrelated plugin/provider problems.
Why this is a separate issue from existing ones
agents.defaults.llmmigration discards user'sidleTimeoutSecondsvaluedoctor --fixpath being itself brokenThis issue is specifically: the manually-invoked
doctor --fixflow drops legacy migrations on the floor when any other validation issue remains. Even if #68664 lands (migrations before validation in gateway startup), users who already hit the loop and trydoctor --fixper OpenClaw's own error message (Run "openclaw doctor --fix") will still find it does nothing.Suggested fix
Two options, in increasing order of scope.
1. Minimal — make legacy migration independently committable. In
migrateLegacyConfig, when post-migration validation fails, still return the migrated raw object instead ofnull, with a flag noting that other issues remain. The caller commits the legacy-migrated state tostate.candidateand surfaces the remaining issues separately. The final write either succeeds (legacy migration alone unbricks the config) or fails on the unrelated issue (but the legacy keys are now gone, so the LKG loop in #76700 no longer reproduces them after the next promotion).function migrateLegacyConfig(raw) { const { next, changes } = applyLegacyDoctorMigrations(raw); if (!next) return { config: null, changes: [] }; const validated = validateConfigObjectWithPlugins(next); if (!validated.ok) { changes.push("Migration applied; other unrelated issues remain — see below."); - return { config: null, changes }; + return { config: next, changes, unresolvedIssues: validated.issues }; } return { config: validated.config, changes }; }2. Broader — split
doctor --fixinto independently-committed atomic steps. Each migration / fix is its own transaction. Apply, validate the change in isolation (does this specific change make the config strictly closer to valid?), commit. Surface remaining unfixable issues. This is the structural answer that also dovetails with #50561 (gateway can run the legacy-migration subset on startup safely) and addresses #68664 (legacy migrations are applied earlier, in their own commit, before the strict validator gates startup).In either form, the user-visible contract should be:
doctor --fixalways applies every safe migration it knows about, and only refuses on the issues it cannot fix. Today, one unfixable issue blocks every fixable one.Reproduction artefacts (what to check on a repro instance)
~/.openclaw/openclaw.json— has bothagents.defaults.llm.*and a config-level reference to a missing/unloadableweb_searchprovider~/.openclaw/openclaw.json.bak— same legacy key (the LKG was promoted under the old schema)~/.openclaw/logs/gateway.err.log— repeating "Config auto-restored from last-known-good" for(startup-invalid-config)withRejected validation details: agents.defaults: Unrecognized key: "llm"(matches the trigger described in [Bug]: Brave plugin install + tools.web.search.provider: "brave" causes 1.2s crash loop with no CLI escape (5.2) #76700, but with the legacy key as the trigger instead of brave)openclaw doctor --fix:agents.defaults.llmshould be gone (per the migration indist/legacy-config-issues-Bce7-rlH.js:605). It is not.Environment
8b2a6e5)/opt/homebrew/lib/node_modules/openclaw)/opt/homebrew/opt/node@22/bin/node)gui/$UID/ai.openclaw.gatewaymeta.lastTouchedVersion):2026.4.14— covers the upgrade pathRelated
agents.defaults.llmmigration loses user's value when it does run; this issue is about it not running at all