Skip to content

Config: fail closed invalid config loads#39071

Merged
vincentkoc merged 8 commits intomainfrom
vincentkoc-code/issue-28140-invalid-config-fail-closed
Mar 8, 2026
Merged

Config: fail closed invalid config loads#39071
vincentkoc merged 8 commits intomainfrom
vincentkoc-code/issue-28140-invalid-config-fail-closed

Conversation

@vincentkoc
Copy link
Copy Markdown
Member

Summary

  • Problem: one invalid config key caused loadConfig() to swallow INVALID_CONFIG and return {}.
  • Why it matters: read-only CLI and service-management paths could proceed under fallback defaults, dropping intended security-sensitive settings.
  • What changed: loadConfig() now fails closed on invalid config; explicit best-effort reads are used only where diagnostics still need to inspect a broken config.
  • What did NOT change (scope boundary): this PR does not add partial schema acceptance for arbitrary unknown keys; invalid config still remains invalid until fixed.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

User-visible / Behavior Changes

  • Invalid config no longer degrades to an empty runtime config.
  • Read-only diagnostics (status, health, gateway inspection commands) still work, but only through an explicit best-effort config path.
  • Valid settings like tools.sessions.visibility are preserved for diagnostics even when an unrelated unknown key makes the config invalid.

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS 15.6.1
  • Runtime/container: local pnpm workspace
  • Model/provider: N/A
  • Integration/channel (if any): CLI / gateway diagnostics
  • Relevant config (redacted): invalid skills.nano-banana-pro plus valid tools.sessions.visibility: "all"

Steps

  1. Write an invalid openclaw.json with one unknown skill key and one valid setting.
  2. Call loadConfig() and compare it with the explicit best-effort config read.
  3. Run read-only CLI diagnostics and targeted tests.

Expected

  • loadConfig() fails on invalid config.
  • Diagnostic paths preserve valid settings instead of falling back to {} defaults.

Actual

  • Verified after the patch.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: reproduced issue config, confirmed loadConfig() throws, confirmed readBestEffortConfig() preserves tools.sessions.visibility: "all", ran pnpm build, pnpm check, and targeted Vitest coverage for config/status/health/gateway/daemon flows.
  • Edge cases checked: invalid config with unknown skill key, gateway status JSON diagnostics, daemon install/restart drift tests, health command output paths.
  • What you did not verify: full end-to-end daemon install on every platform.

Compatibility / Migration

  • Backward compatible? (Mostly)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: revert the four commits in this PR.
  • Files/config to restore: config loader changes in src/config/io.ts and the explicit best-effort call sites.
  • Known bad symptoms reviewers should watch for: diagnostic commands unexpectedly failing on invalid config instead of surfacing best-effort output.

Risks and Mitigations

  • Risk: a read-only diagnostic path still relies on implicit loadConfig() and now fails on invalid config.
  • Mitigation: build + targeted tests cover the touched diagnostic and daemon CLI paths; local repro confirms best-effort routing preserves valid settings.

@vincentkoc vincentkoc self-assigned this Mar 7, 2026
@openclaw-barnacle openclaw-barnacle Bot added cli CLI command changes commands Command implementations size: S maintainer Maintainer-authored PR labels Mar 7, 2026
@vincentkoc vincentkoc marked this pull request as ready for review March 7, 2026 18:04
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 7, 2026

Greptile Summary

This PR fixes a long-standing silent failure in loadConfig() where an INVALID_CONFIG error was swallowed and converted to an empty runtime config ({}), potentially causing security-sensitive settings to be silently dropped. loadConfig() now re-throws on invalid config, and a new readBestEffortConfig() async helper is introduced for diagnostic paths that still need to surface partial config output when the file is broken.

Key observations:

  • Core fix is correct — making loadConfig() fail closed on INVALID_CONFIG is the right call, and the updated compat test directly validates the new behavior.
  • Diagnostic call-sites are correctly routedgateway-status, status.scan, status-all, health, register, and run-main are all read-only paths and appropriately switch to readBestEffortConfig().
  • TOCTOU in readBestEffortConfig() — when the snapshot is valid, the helper discards the already-parsed snapshot.config and calls loadConfig() a second time. If the config file is replaced between the two reads (e.g. a concurrent write), loadConfig() can throw INVALID_CONFIG, defeating the resilience the helper is meant to provide. Returning snapshot.config directly on the valid branch would eliminate the race.
  • Mutating paths use best-effort configrunDaemonInstall (install.ts), runDaemonRestart/resolveGatewayRestartPort (lifecycle.ts), and the token-drift check in runServiceRestart (lifecycle-core.ts) all switched to readBestEffortConfig(). These are write/mutating operations, not read-only diagnostics. If gateway port or token settings happen to be in the invalid section of the config, the daemon can be installed or restarted with wrong defaults silently — contrary to the PR's stated "fail closed" intent for non-diagnostic paths.
  • getAgentLocalStatuses default parameter — the loadConfig() default is retained solely for a fallback that no current call-site uses; making the parameter required and removing the now-dead import would be cleaner.

Confidence Score: 3/5

  • Safe to merge for diagnostic paths, but the use of best-effort config in daemon install/restart write paths warrants scrutiny before shipping.
  • The core fix (throwing on INVALID_CONFIG) is correct and well-tested. However, three mutating daemon lifecycle paths (runDaemonInstall, runDaemonRestart, runServiceRestart) now proceed with potentially partial config, which could silently misconfigure the daemon. Additionally, the TOCTOU double-read in readBestEffortConfig() can cause the helper to throw in the very scenario it is meant to handle gracefully.
  • src/config/io.ts (TOCTOU in readBestEffortConfig), src/cli/daemon-cli/install.ts, src/cli/daemon-cli/lifecycle.ts, and src/cli/daemon-cli/lifecycle-core.ts (mutating paths using best-effort config).

Comments Outside Diff (3)

  1. src/config/io.ts, line 1379-1382 (link)

    TOCTOU: double file read with a validity race window

    When the snapshot says the config is valid, readBestEffortConfig() discards the already-parsed snapshot.config and calls loadConfig() again, reading the file a second time. If the file is replaced between readConfigFileSnapshot() and loadConfig() (e.g. a config write in another process), loadConfig() could now throw INVALID_CONFIG — the very error that readBestEffortConfig() is designed to suppress — making the best-effort path itself non-resilient.

    Returning snapshot.config unconditionally would be consistent and eliminate the race:

    Or more clearly:

    export async function readBestEffortConfig(): Promise<OpenClawConfig> {
      const snapshot = await readConfigFileSnapshot();
      return snapshot.config;
    }
    

    If loadConfig() is intentionally called here to pick up in-process runtime overrides that snapshot.config doesn't carry, that dependency should be documented with a comment explaining why the double-read is necessary.

  2. src/cli/daemon-cli/install.ts, line 29-35 (link)

    Write path using best-effort config may silently drop security-sensitive settings

    runDaemonInstall is a mutating operation, not a read-only diagnostic. By switching to readBestEffortConfig(), the install can proceed when the config is invalid, using whatever partial settings the snapshot could extract. If the gateway port or token configuration lives in the invalid portion of the file, resolveGatewayPort(cfg) will fall back to defaults, potentially installing the daemon on a different port or with a missing/incorrect token without surfacing any error to the operator.

    The PR description states that best-effort reads are "used only where diagnostics still need to inspect a broken config," but runDaemonInstall, runDaemonRestart (lifecycle.ts), and runServiceRestart (lifecycle-core.ts) are all write/mutating paths. Failing with a clear error message on invalid config here aligns better with the "fail closed" intent and prevents silent mis-configuration.

    Consider reverting these three call-sites to loadConfig() (which now correctly throws on invalid config) so the operator is forced to fix the config before mutating daemon state.

  3. src/commands/status.agent-local.ts, line 36-38 (link)

    Default parameter calls loadConfig() which now throws on invalid config

    The previous implementation called loadConfig() unconditionally inside the function body; the new signature moves that call to a default parameter value. The runtime behavior for callers that omit cfg is identical — a throw on invalid config — but the new form is subtler to audit: the loadConfig import is retained solely to power a default that is bypassed by every internal call-site in this PR.

    If the intent is that callers should always supply cfg (as all the updated call-sites now do), making the parameter required and removing the default would make that expectation explicit and eliminate the now-unused loadConfig import in the same file.

Last reviewed commit: 745bd17

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 609ea676ca

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/commands/status-all.ts
@vincentkoc
Copy link
Copy Markdown
Member Author

@greptile review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 745bd17411

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/commands/status.scan.ts
@vincentkoc vincentkoc force-pushed the vincentkoc-code/issue-28140-invalid-config-fail-closed branch from 745bd17 to 8b40b3c Compare March 7, 2026 23:51
@vincentkoc vincentkoc force-pushed the vincentkoc-code/issue-28140-invalid-config-fail-closed branch from 8b40b3c to eca5074 Compare March 7, 2026 23:52
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eca5074ac7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/cli/daemon-cli/lifecycle-core.ts Outdated
@vincentkoc vincentkoc force-pushed the vincentkoc-code/issue-28140-invalid-config-fail-closed branch from eca5074 to 2e1bf67 Compare March 8, 2026 01:47
@vincentkoc vincentkoc merged commit 2c7fb54 into main Mar 8, 2026
23 of 27 checks passed
@vincentkoc vincentkoc deleted the vincentkoc-code/issue-28140-invalid-config-fail-closed branch March 8, 2026 01:48
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2e1bf67815

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

}

const cfg = loadConfig();
const cfg = await readBestEffortConfig();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail daemon install when config read is unavailable

Switching this path to readBestEffortConfig() makes openclaw gateway install proceed when the config file cannot be read (for example EACCES on a root-owned config), because the snapshot fallback returns an empty config object instead of throwing. In that state install logic falls back to defaults and can produce a service install that is out of sync with the operator’s intended config/auth settings, whereas the previous loadConfig() path failed closed; install should still hard-fail on unreadable config and reserve best-effort behavior for read-only diagnostics.

Useful? React with 👍 / 👎.

openperf pushed a commit to openperf/moltbot that referenced this pull request Mar 8, 2026
* Config: fail closed invalid config loads

* CLI: keep diagnostics on explicit best-effort config

* Tests: cover invalid config best-effort diagnostics

* Changelog: note invalid config fail-closed fix

* Status: pass best-effort config through status-all gateway RPCs

* CLI: pass config through gateway secret RPC

* CLI: skip plugin loading from invalid config

* Tests: align daemon token drift env precedence
mcaxtr pushed a commit to mcaxtr/openclaw that referenced this pull request Mar 8, 2026
* Config: fail closed invalid config loads

* CLI: keep diagnostics on explicit best-effort config

* Tests: cover invalid config best-effort diagnostics

* Changelog: note invalid config fail-closed fix

* Status: pass best-effort config through status-all gateway RPCs

* CLI: pass config through gateway secret RPC

* CLI: skip plugin loading from invalid config

* Tests: align daemon token drift env precedence
GordonSH-oss pushed a commit to GordonSH-oss/openclaw that referenced this pull request Mar 9, 2026
* Config: fail closed invalid config loads

* CLI: keep diagnostics on explicit best-effort config

* Tests: cover invalid config best-effort diagnostics

* Changelog: note invalid config fail-closed fix

* Status: pass best-effort config through status-all gateway RPCs

* CLI: pass config through gateway secret RPC

* CLI: skip plugin loading from invalid config

* Tests: align daemon token drift env precedence
jenawant pushed a commit to jenawant/openclaw that referenced this pull request Mar 10, 2026
* Config: fail closed invalid config loads

* CLI: keep diagnostics on explicit best-effort config

* Tests: cover invalid config best-effort diagnostics

* Changelog: note invalid config fail-closed fix

* Status: pass best-effort config through status-all gateway RPCs

* CLI: pass config through gateway secret RPC

* CLI: skip plugin loading from invalid config

* Tests: align daemon token drift env precedence
V-Gutierrez pushed a commit to V-Gutierrez/openclaw-vendor that referenced this pull request Mar 17, 2026
* Config: fail closed invalid config loads

* CLI: keep diagnostics on explicit best-effort config

* Tests: cover invalid config best-effort diagnostics

* Changelog: note invalid config fail-closed fix

* Status: pass best-effort config through status-all gateway RPCs

* CLI: pass config through gateway secret RPC

* CLI: skip plugin loading from invalid config

* Tests: align daemon token drift env precedence
alexey-pelykh pushed a commit to remoteclaw/remoteclaw that referenced this pull request Mar 22, 2026
* Config: fail closed invalid config loads

* CLI: keep diagnostics on explicit best-effort config

* Tests: cover invalid config best-effort diagnostics

* Changelog: note invalid config fail-closed fix

* Status: pass best-effort config through status-all gateway RPCs

* CLI: pass config through gateway secret RPC

* CLI: skip plugin loading from invalid config

* Tests: align daemon token drift env precedence

(cherry picked from commit 2c7fb54)
alexey-pelykh pushed a commit to remoteclaw/remoteclaw that referenced this pull request Mar 22, 2026
* Config: fail closed invalid config loads

* CLI: keep diagnostics on explicit best-effort config

* Tests: cover invalid config best-effort diagnostics

* Changelog: note invalid config fail-closed fix

* Status: pass best-effort config through status-all gateway RPCs

* CLI: pass config through gateway secret RPC

* CLI: skip plugin loading from invalid config

* Tests: align daemon token drift env precedence

(cherry picked from commit 2c7fb54)
lovewanwan pushed a commit to lovewanwan/openclaw that referenced this pull request Apr 28, 2026
* Config: fail closed invalid config loads

* CLI: keep diagnostics on explicit best-effort config

* Tests: cover invalid config best-effort diagnostics

* Changelog: note invalid config fail-closed fix

* Status: pass best-effort config through status-all gateway RPCs

* CLI: pass config through gateway secret RPC

* CLI: skip plugin loading from invalid config

* Tests: align daemon token drift env precedence
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
* Config: fail closed invalid config loads

* CLI: keep diagnostics on explicit best-effort config

* Tests: cover invalid config best-effort diagnostics

* Changelog: note invalid config fail-closed fix

* Status: pass best-effort config through status-all gateway RPCs

* CLI: pass config through gateway secret RPC

* CLI: skip plugin loading from invalid config

* Tests: align daemon token drift env precedence
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
* Config: fail closed invalid config loads

* CLI: keep diagnostics on explicit best-effort config

* Tests: cover invalid config best-effort diagnostics

* Changelog: note invalid config fail-closed fix

* Status: pass best-effort config through status-all gateway RPCs

* CLI: pass config through gateway secret RPC

* CLI: skip plugin loading from invalid config

* Tests: align daemon token drift env precedence
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cli CLI command changes commands Command implementations maintainer Maintainer-authored PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Security: Invalid config silently falls back to empty defaults, dropping all security controls

1 participant