Skip to content

fix(config): avoid plugin-local last-good rollback#71367

Merged
steipete merged 2 commits intoopenclaw:mainfrom
jalehman:hawking-948ca406-fix-config-recovery-rollback
Apr 25, 2026
Merged

fix(config): avoid plugin-local last-good rollback#71367
steipete merged 2 commits intoopenclaw:mainfrom
jalehman:hawking-948ca406-fix-config-recovery-rollback

Conversation

@jalehman
Copy link
Copy Markdown
Contributor

Summary

Fixes #71289.

Prevents plugin-local validation failures from triggering whole-file last-known-good recovery. This preserves openclaw.json during plugin schema evolution / minHostVersion skew windows while keeping last-known-good recovery intact for true base config corruption.

What changed

  • Added a shared recovery policy that classifies invalid snapshots scoped only to plugins.entries.* as plugin-local.
  • Made startup, hot reload, and direct recovery helper paths consult that policy before restoring .last-good.
  • Kept parse failures and non-plugin/base config invalidity on the existing recovery path.
  • Added regression coverage for plugin schema evolution and plugin minHostVersion skew so stale .last-good snapshots do not clobber active config.

Verification

  • pnpm tsgo
  • pnpm check:test-types
  • pnpm test src/config/io.observe-recovery.test.ts src/gateway/config-reload.test.ts src/gateway/server-startup-config.recovery.test.ts
  • pnpm test src/config/config.plugin-validation.test.ts src/config/io.observe-recovery.test.ts src/gateway/config-reload.test.ts src/gateway/server-startup-config.recovery.test.ts

@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot Bot commented Apr 25, 2026

🔒 Aisle Security Analysis

We found 1 potential security issue(s) in this PR:

# Severity Title
1 🟡 Medium Last-known-good recovery skipped for plugin-entry-only invalid configs can cause persistent startup DoS
1. 🟡 Last-known-good recovery skipped for plugin-entry-only invalid configs can cause persistent startup DoS
Property Value
Severity Medium
CWE CWE-703
Location src/gateway/server-startup-config.ts:73-110

Description

The new recovery policy suppresses last-known-good (LKG) rollback when validation issues are only under plugins.entries.*.

Impact:

  • If the on-disk config becomes invalid solely due to plugin entry fields (e.g., a malformed plugin config written by an automated plugin/UI/management action), the gateway will not restore the last-known-good config.
  • On startup, the gateway then calls assertValidGatewayStartupConfigSnapshot(...) and throws, causing startup failure and potentially a persistent denial-of-service until the file is manually repaired.
  • Previously, an invalid config at startup could be auto-healed by whole-file rollback to the last promoted good snapshot.

Vulnerable logic (policy + enforcement):

  • shouldAttemptLastKnownGoodRecovery() returns false for plugin-local invalidity.
  • Startup path respects this flag and skips recovery, then later throws on invalid config.

Vulnerable code:

const canRecoverFromLastKnownGood = shouldAttemptLastKnownGoodRecovery(configSnapshot);
const recovered = canRecoverFromLastKnownGood
  ? await recoverConfigFromLastKnownGood({ snapshot: configSnapshot, reason: "startup-invalid-config" })
  : false;
...
assertValidGatewayStartupConfigSnapshot(configSnapshot, { includeDoctorHint: true });

Recommendation

Reconsider skipping LKG recovery entirely for plugin-scoped invalidity on startup.

Safer options:

  1. Attempt LKG recovery on startup even for plugin-local invalidity, but log that plugin config was the cause.
  2. Alternatively, implement a targeted repair strategy: strip/disable only the invalid plugins.entries.<name> sections while keeping the rest of the file.

Example (startup):

if (!configSnapshot.valid) {
  const recovered = await recoverConfigFromLastKnownGood({
    snapshot: configSnapshot,
    reason: "startup-invalid-config",
  });
  if (recovered) {
    configSnapshot = await readConfigFileSnapshot();
  }
}
assertValidGatewayStartupConfigSnapshot(configSnapshot);

If preserving other non-plugin edits is required, implement targeted plugin-entry cleanup before falling back to full LKG rollback.


Analyzed PR: #71367 at commit 5e9d33f

Last updated on: 2026-04-25T03:08:28Z

@openclaw-barnacle openclaw-barnacle Bot added gateway Gateway runtime size: M maintainer Maintainer-authored PR labels Apr 25, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 25, 2026

Greptile Summary

This PR introduces a shared recovery-policy.ts module that classifies invalid config snapshots as "plugin-local" when all validation issues are scoped to plugins.entries.*, and then gates the whole-file last-known-good rollback on that classification across startup, hot-reload, and the recoverConfigFromLastKnownGood helper. The core logic in recovery-policy.ts is correct and the regression tests cover the key plugin schema evolution and minHostVersion skew scenarios. Two small inconsistencies are flagged as inline P2 comments: the warning log in config-reload.ts fires for any false return from shouldAttemptLastKnownGoodRecovery (including the valid-snapshot branch) without the inner guard used in io.observe-recovery.ts, and the inline mock in server-startup-config.recovery.test.ts diverges from the real implementation in an edge case involving an empty plugin name segment.

Confidence Score: 4/5

Safe to merge; the recovery policy logic is correct and no critical paths are broken. Two minor P2 issues in a log message guard and a test mock remain.

Only P2 findings present: a mildly inconsistent warning log in config-reload.ts and a small mock/implementation divergence in a test. No behavioral bugs or data-loss risk introduced.

src/gateway/config-reload.ts (warning message guard) and src/gateway/server-startup-config.recovery.test.ts (mock fidelity)

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/gateway/config-reload.ts
Line: 226-230

Comment:
**Misleading warning when snapshot is unexpectedly valid**

`shouldAttemptLastKnownGoodRecovery` returns `false` for two distinct cases: (1) the snapshot is valid (no recovery needed) and (2) the snapshot is plugin-locally invalid. Both cases reach the `opts.log.warn(...)` call here, so if somehow a valid snapshot is passed into `recoverAndReadSnapshot`, the log would say "invalidity is scoped to plugin entries" when there is no invalidity at all. The `io.observe-recovery.ts` version correctly guards with an inner `isPluginLocalInvalidConfigSnapshot` check before logging — the same pattern would make this warning more precise.

```suggestion
    if (!shouldAttemptLastKnownGoodRecovery(snapshot)) {
      if (isPluginLocalInvalidConfigSnapshot(snapshot)) {
        opts.log.warn(
          `config reload recovery skipped after ${reason}: invalidity is scoped to plugin entries`,
        );
      }
      return null;
    }
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/gateway/server-startup-config.recovery.test.ts
Line: 11-20

Comment:
**Test mock diverges from real implementation**

The inline mock accepts a path of exactly `"plugins.entries."` (trailing dot, empty plugin name) as a plugin-local issue — `issue.path.startsWith("plugins.entries.")` is `true` — while the real `isPluginEntryIssue` in `recovery-policy.ts` additionally requires something after the prefix (`path.slice(prefix.length).trim().length > 0`), so it would return `false` for that same path and allow recovery. This means the mock is a slight over-approximation: a snapshot with an issue path of `"plugins.entries."` would skip recovery in tests but actually trigger recovery at runtime. The gap is unlikely to surface in practice, but it means a regression in that edge case would not be caught by these tests.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix: avoid plugin-local config recovery ..." | Re-trigger Greptile

Comment on lines +226 to +230
if (!shouldAttemptLastKnownGoodRecovery(snapshot)) {
opts.log.warn(
`config reload recovery skipped after ${reason}: invalidity is scoped to plugin entries`,
);
return null;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Misleading warning when snapshot is unexpectedly valid

shouldAttemptLastKnownGoodRecovery returns false for two distinct cases: (1) the snapshot is valid (no recovery needed) and (2) the snapshot is plugin-locally invalid. Both cases reach the opts.log.warn(...) call here, so if somehow a valid snapshot is passed into recoverAndReadSnapshot, the log would say "invalidity is scoped to plugin entries" when there is no invalidity at all. The io.observe-recovery.ts version correctly guards with an inner isPluginLocalInvalidConfigSnapshot check before logging — the same pattern would make this warning more precise.

Suggested change
if (!shouldAttemptLastKnownGoodRecovery(snapshot)) {
opts.log.warn(
`config reload recovery skipped after ${reason}: invalidity is scoped to plugin entries`,
);
return null;
if (!shouldAttemptLastKnownGoodRecovery(snapshot)) {
if (isPluginLocalInvalidConfigSnapshot(snapshot)) {
opts.log.warn(
`config reload recovery skipped after ${reason}: invalidity is scoped to plugin entries`,
);
}
return null;
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/config-reload.ts
Line: 226-230

Comment:
**Misleading warning when snapshot is unexpectedly valid**

`shouldAttemptLastKnownGoodRecovery` returns `false` for two distinct cases: (1) the snapshot is valid (no recovery needed) and (2) the snapshot is plugin-locally invalid. Both cases reach the `opts.log.warn(...)` call here, so if somehow a valid snapshot is passed into `recoverAndReadSnapshot`, the log would say "invalidity is scoped to plugin entries" when there is no invalidity at all. The `io.observe-recovery.ts` version correctly guards with an inner `isPluginLocalInvalidConfigSnapshot` check before logging — the same pattern would make this warning more precise.

```suggestion
    if (!shouldAttemptLastKnownGoodRecovery(snapshot)) {
      if (isPluginLocalInvalidConfigSnapshot(snapshot)) {
        opts.log.warn(
          `config reload recovery skipped after ${reason}: invalidity is scoped to plugin entries`,
        );
      }
      return null;
    }
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +11 to +20
shouldAttemptLastKnownGoodRecovery: vi.fn((snapshot: ConfigFileSnapshot) => {
if (snapshot.valid) {
return false;
}
return !(
snapshot.legacyIssues.length === 0 &&
snapshot.issues.length > 0 &&
snapshot.issues.every((issue) => issue.path.startsWith("plugins.entries."))
);
}),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Test mock diverges from real implementation

The inline mock accepts a path of exactly "plugins.entries." (trailing dot, empty plugin name) as a plugin-local issue — issue.path.startsWith("plugins.entries.") is true — while the real isPluginEntryIssue in recovery-policy.ts additionally requires something after the prefix (path.slice(prefix.length).trim().length > 0), so it would return false for that same path and allow recovery. This means the mock is a slight over-approximation: a snapshot with an issue path of "plugins.entries." would skip recovery in tests but actually trigger recovery at runtime. The gap is unlikely to surface in practice, but it means a regression in that edge case would not be caught by these tests.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gateway/server-startup-config.recovery.test.ts
Line: 11-20

Comment:
**Test mock diverges from real implementation**

The inline mock accepts a path of exactly `"plugins.entries."` (trailing dot, empty plugin name) as a plugin-local issue — `issue.path.startsWith("plugins.entries.")` is `true` — while the real `isPluginEntryIssue` in `recovery-policy.ts` additionally requires something after the prefix (`path.slice(prefix.length).trim().length > 0`), so it would return `false` for that same path and allow recovery. This means the mock is a slight over-approximation: a snapshot with an issue path of `"plugins.entries."` would skip recovery in tests but actually trigger recovery at runtime. The gap is unlikely to surface in practice, but it means a regression in that edge case would not be caught by these tests.

How can I resolve this? If you propose a fix, please make it concise.

@steipete steipete force-pushed the hawking-948ca406-fix-config-recovery-rollback branch from 8d693cd to d4cef0c Compare April 25, 2026 02:56
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a5684e0476

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

export function isPluginLocalInvalidConfigSnapshot(
snapshot: Pick<ConfigFileSnapshot, "valid" | "issues" | "legacyIssues">,
): boolean {
if (snapshot.valid || snapshot.legacyIssues.length > 0 || snapshot.issues.length === 0) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove legacy warning gate from plugin-local recovery check

The new classifier returns false whenever legacyIssues.length > 0, so a snapshot that is invalid only because of plugins.entries.* (the case this commit targets) still goes through whole-file last-known-good rollback if the config also contains any migrated legacy keys. Because legacy issues are warning-style migration metadata rather than the invalidity root cause, this can still clobber current openclaw.json during plugin schema/minHostVersion skew windows for users with legacy keys present.

Useful? React with 👍 / 👎.

@steipete steipete force-pushed the hawking-948ca406-fix-config-recovery-rollback branch from d4cef0c to 5e9d33f Compare April 25, 2026 03:05
@steipete steipete merged commit 9a0b26c into openclaw:main Apr 25, 2026
57 of 58 checks passed
@steipete
Copy link
Copy Markdown
Contributor

Landed via rebase onto main.

  • Local gate: pnpm test src/config/recovery-policy.test.ts src/config/io.observe-recovery.test.ts src/gateway/config-reload.test.ts src/gateway/server-startup-config.recovery.test.ts src/config/config.plugin-validation.test.ts
  • Local gate: pnpm check:changed
  • Local gate: pnpm docs:check-mdx
  • Source commits before landing: f4f2477cc3626c6c7304ac5fa8b2470826fa6e42, 5e9d33f851667956d9d81f839a6831115c05a6c5
  • Landed commits: f369939, 9a0b26c

Thanks @jalehman.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gateway Gateway runtime maintainer Maintainer-authored PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Config recovery can silently restore stale last-good snapshots during reload/version-skew windows

2 participants