fix(backup): preserve OpenClaw openclaw.json settings across rebuild#5101
Conversation
|
Warning Review limit reached
More reviews will be available in 1 minute and 54 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughDeclares openclaw.json as durable state, broadens credential detection (name and value), preserves resolve-style placeholders during sanitization, runs a post-restore permission repair when openclaw.json is restored, and adds tests for sanitization and end-to-end backup/restore. ChangesOpenClaw Durable Config Backup & Restore
sequenceDiagram
participant Sandbox as Sandbox State
participant Manifest as Manifest Declaration
participant Filter as Credential Filter
participant Backup as Backup Store
participant Restore as Restore Process
participant Repair as Permission Repair
Sandbox->>Manifest: declare openclaw.json in state_files
Manifest->>Backup: backup reads state_files
Sandbox->>Filter: sanitizeConfigFile(openclaw.json)
Filter->>Backup: store sanitized openclaw.json
Backup->>Restore: restore openclaw.json
Restore->>Repair: if restored -> repairMutableConfigPerms(targetSandbox)
Repair-->>Restore: success/warning
Restore->>Sandbox: write final openclaw.json
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
75eaa81 to
8a68166
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/security/credential-filter.ts`:
- Around line 118-131: The current regexes CREDENTIAL_FIELD_PATTERN and
ENV_SECRET_FIELD_PATTERN incorrectly match publicKey/PUBLIC_KEY; update them to
exclude public keys by adding a negative condition so fields with the literal
"public" prefix (e.g., "publicKey", "PUBLIC_KEY") are not matched as secrets.
Concretely, modify CREDENTIAL_FIELD_PATTERN to reject identifiers that start
with "public" (or contain the segment "public" immediately before the secret
suffix) and update ENV_SECRET_FIELD_PATTERN to not match when the uppercase name
contains "PUBLIC" as the token before the secret suffix; adjust the patterns or
add explicit negative lookaheads/alternatives so verification fields like
publicKey/PUBLIC_KEY remain preserved. Ensure the updates reference the existing
constants CREDENTIAL_FIELD_PATTERN and ENV_SECRET_FIELD_PATTERN and include
tests that confirm publicKey and PUBLIC_KEY are not stripped while other secret
patterns still match.
- Around line 195-215: The function isSafeCredentialPlaceholder fails to treat
values like "Bearer unused" as safe because SAFE_CREDENTIAL_PLACEHOLDER_LITERALS
only matches the bare literal; update isSafeCredentialPlaceholder to also accept
values that begin with the "Bearer " scheme followed by a literal in
SAFE_CREDENTIAL_PLACEHOLDER_LITERALS (case-insensitive for the "Bearer" prefix
and trim whitespace) in addition to the existing regex patterns; reference
SAFE_CREDENTIAL_PLACEHOLDER_LITERALS, SAFE_CREDENTIAL_PLACEHOLDER_PATTERNS, and
isSafeCredentialPlaceholder to implement this check so proxy-injected
Authorization headers such as "Bearer unused" are preserved.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5ef2b412-d678-4603-8e2d-40acf3061985
📒 Files selected for processing (6)
agents/openclaw/manifest.yamlsrc/lib/actions/sandbox/snapshot.tssrc/lib/agent/defs.test.tssrc/lib/security/credential-filter.test.tssrc/lib/security/credential-filter.tstest/snapshot.test.ts
🚧 Files skipped from review as they are similar to previous changes (5)
- src/lib/agent/defs.test.ts
- agents/openclaw/manifest.yaml
- test/snapshot.test.ts
- src/lib/actions/sandbox/snapshot.ts
- src/lib/security/credential-filter.test.ts
`nemoclaw backup-all` drove the rebuild snapshot machinery, but the OpenClaw manifest only declared state directories — not the core openclaw.json config file. As a result backup-all captured the data dirs (agents, skills, memory, ...) yet dropped model/provider config, MCP servers, custom agents, and channel account blocks, so those settings were lost after `nemoclaw <name> rebuild` (NVIDIA#5027). Declare openclaw.json as a durable state file (copy strategy) so it is backed up and restored. The existing backup sanitizer already runs on .json files; harden it so the restored config is both safe and usable: - Broaden credential-key detection to cover OpenClaw channel token fields (botToken/appToken), SCREAMING_SNAKE env secrets in MCP `env` blocks (GITHUB_TOKEN, BRAVE_API_KEY, bare TOKEN/SECRET/...), and HTTP auth header names (Authorization, X-API-Key, X-*-Token/-Auth). - Add a value-level backstop that scrubs raw secrets by shape (sk-/ghp_/xoxb-/Bearer ...) under any key, including array elements and MCP CLI args passed as `--api-key <secret>` / `--api-key=<secret>`. - Preserve non-secret OpenShell `resolve:env`/`unused` placeholders (and the `Bearer <placeholder>` header form) so the restored config keeps routing through proxy-injected credentials instead of a dead marker. Restore order is already correct in rebuild (restore precedes `openclaw doctor --fix` and the mutable-perms repair). For the standalone `snapshot restore` path, run the posture-aware repairMutableConfigPerms after restoring openclaw.json so the gateway keeps the setgid/group-writable contract (NVIDIA#4538). The onboard recreate-restore path re-establishes that contract via nemoclaw-start.sh on the subsequent gateway start. Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
8a68166 to
d8bb13c
Compare
|
Both CodeRabbit findings are addressed in the latest push (commit d8bb13c):
@coderabbitai review |
|
✅ Action performedReview finished.
|
|
✨ Thanks for the proposed fix declaring openclaw.json as a durable state file in the OpenClaw manifest. This proposes a way to preserve model/provider settings, MCP servers, custom agents, and channel blocks across nemoclaw rebuild operations. Related open issues: |
prekshivyas
left a comment
There was a problem hiding this comment.
Reviewed (code + security checklist). Declares OpenClaw's openclaw.json as a durable state_file so backup-all/rebuild preserve core settings (model/provider, MCP servers, custom agents, channel blocks) previously lost (#5027), hardens the credential sanitizer to scrub the newly-backed-up config's secrets while preserving non-secret resolve:env placeholders, and adds a posture-aware perms-repair after standalone snapshot restore.
✅ Approve. Solid defense-in-depth. Path-traversal is rejected (normalizeStateFilePath blocks absolute/../null-byte), so the declared file can't be weaponized to exfiltrate ../-relative paths; backups are written 0600; auth-profiles.json/auth.json excluded entirely; the PUBLIC_KEY exemption is checked first and correctly scoped. The new perms-repair is posture-aware — it refuses to weaken permissions under shields-up (locked) or unreadable state, so it can't downgrade a locked config. Placeholder preservation correctly distinguishes resolve:env markers from raw secrets (no credential bake-in). Security: all categories pass.
Non-blocking nits:
- An opaque secret under a benign key (no recognizable prefix/credential-shaped key) still survives — unchanged prior behavior; the shape backstop is intentionally additive. A one-line comment acknowledging the residual gap would help.
HEADER_CREDENTIAL_PATTERNwould also match a hypothetical*-key/*-tokensetting (fail-closed →[STRIPPED_BY_MIGRATION]); acceptable trade-off.
Tests thorough: manifest assertion, 7 credential-filter cases (placeholder preservation, per-channel/env/header/array scrubbing, public-key exemption, raw-secret-under-preserved-sibling), and a full backup→restore round-trip.
## Summary - Selectively merge restored OpenClaw openclaw.json instead of overwriting fresh rebuild-generated config - Preserve runtime-owned gateway and generated managed channel/provider state from the rebuilt sandbox - Keep durable user settings from sanitized backups, including MCP servers and custom agents - Extend snapshot coverage for the #5027 preservation path and the stale-runtime-config regression ## Validation - npm test -- --run test/snapshot.test.ts src/lib/security/credential-filter.test.ts src/lib/agent/defs.test.ts - npm run build:cli ## Notes - Addresses the likely PR #5101 nightly regression where restored sanitized/stale openclaw.json clobbered gateway/channel/provider runtime state after rebuild. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Selective merge for durable config during sandbox state restore: runtime-owned sections are preserved while durable-only settings are merged when enabled. * Safer remote state reads to detect missing or non-fatal state files before restore. * **Tests** * New unit tests for the config-merge logic and updated snapshot validating selective restore behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary - Add the v0.0.63 release-note section using the published development note as source context. - Update source docs for sandbox recovery, OpenClaw config restore safety, managed vLLM selection, Slack Socket Mode conflict handling, and host diagnostics. - Refresh generated `nemoclaw-user-*` skills from the updated Fern MDX docs. - Update the release-doc refresh skill so post-release docs for version `n` look up the matching announcement discussion and use the `n+1` patch release label. - Fix CLI/docs parity by avoiding a `--from <Dockerfile>` flag mention inside the `upgrade-sandboxes` command section. ## Source summary - #5034 -> `docs/reference/troubleshooting.mdx`, `docs/about/release-notes.mdx`: Document safer stale-sandbox recovery through `rebuild --yes` before recreating from scratch. - #5091 -> `docs/reference/troubleshooting.mdx`, `docs/about/release-notes.mdx`: Document Docker-driver post-reboot recovery from OpenShell container labels. - #5101, #5174, #5177 -> `docs/manage-sandboxes/backup-restore.mdx`, `docs/about/release-notes.mdx`: Document OpenClaw `openclaw.json` preservation, merge behavior, and fail-safe restore handling. - #5102 -> `docs/reference/commands.mdx`, `docs/reference/commands-nemohermes.mdx`, `docs/manage-sandboxes/lifecycle.mdx`, `docs/about/release-notes.mdx`: Document `upgrade-sandboxes` image-fingerprint drift detection. - #4201 -> `docs/reference/troubleshooting.mdx`, `docs/about/release-notes.mdx`: Document the installer diagnostic for unexpected Docker daemon access outside the `docker` group. - #5038 -> `docs/inference/inference-options.mdx`, `docs/inference/use-local-inference.mdx`, `docs/about/release-notes.mdx`: Document the interactive managed-vLLM model picker and non-interactive override behavior. - #5040, #5041 -> `docs/reference/troubleshooting.mdx`, `docs/about/release-notes.mdx`: Document Ollama auth-proxy recovery and host DNS preflight diagnostics. - #4986, #5039 -> `docs/manage-sandboxes/messaging-channels.mdx`, `docs/about/release-notes.mdx`: Document Slack validation and duplicate Slack Socket Mode sandbox handling. - #4981, #5168 -> `docs/about/release-notes.mdx`: Capture Hermes gateway secret-guard and wrapped-argv startup hardening in the release surface. - Follow-up -> `.agents/skills/nemoclaw-contributor-update-docs/SKILL.md`: Record the post-release docs workflow, discussion-announcement lookup, and next-patch release label rule. - Follow-up -> `docs/reference/commands.mdx`, `docs/reference/commands-nemohermes.mdx`: Reword custom Dockerfile sandbox text so CLI parity does not treat `--from` as an `upgrade-sandboxes` flag. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` - `npm run build:cli` - `bash test/e2e/e2e-cloud-experimental/check-docs.sh --only-cli` - Skip-term scan for `docs/.docs-skip` blocked terms across generated user skills <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Enhanced local inference setup with interactive model selection prompts and environment variable overrides * Improved sandbox upgrade detection using build fingerprints and version checks * Clarified configuration restore behavior preserving user settings during rebuild/restore * Added gateway authentication as fifth security layer * Expanded Slack messaging validation with live credential checking * Enhanced troubleshooting guidance for Docker access, DNS issues, and sandbox recovery * Updated release notes for v0.0.63 featuring sandbox recovery and inference improvements <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
nemoclaw backup-all(which drives the rebuild snapshot machinery) did not capture OpenClaw's coreopenclaw.jsonconfig, so model/provider settings, MCP servers, custom agents, and channel account blocks were lost afternemoclaw <name> rebuild. This declaresopenclaw.jsonas a durable state file and hardens the backup credential sanitizer so the restored config is both safe and usable.Related Issue
Fixes #5027
Changes
agents/openclaw/manifest.yaml— declareopenclaw.jsonas astate_filesentry (copy strategy) so backup-all/rebuild preserve core OpenClaw settings the state dirs don't cover (model/provider, MCP servers, custom agents, channel blocks).src/lib/security/credential-filter.ts— harden sanitization now that the full config is backed up:botToken/appToken), SCREAMING_SNAKE env secrets in MCPenvblocks (GITHUB_TOKEN,BRAVE_API_KEY, bareTOKEN/SECRET/…), and HTTP auth header names (Authorization,X-API-Key,X-*-Token/-Auth).sk-/ghp_/xoxb-/Bearer …) under any key — including array elements and MCP CLI args passed as--api-key <secret>/--api-key=<secret>.resolve:env/unusedplaceholders (and theBearer <placeholder>header form) so the restored config keeps routing through proxy-injected credentials instead of a dead marker.src/lib/actions/sandbox/snapshot.ts— after the standalonesnapshot restorewritesopenclaw.json, run the posture-awarerepairMutableConfigPermsso the always-on gateway keeps the setgid/group-writable config contract ([All Platforms][Sandbox] NemoClaw mutable sandbox breaks group-writable openclaw.json contract after openclaw doctor --fix (gateway cannot persist config) #4538). Rebuild already does this in its post-restore sequence; the onboard recreate-restore path re-establishes it vianemoclaw-start.shon the next gateway start.defs.test.ts), credential-filter unit coverage (placeholders preserved, channel/env/header/array/flag secrets scrubbed, benign settings untouched), and an end-to-end OpenClawopenclaw.jsonbackup→restore round-trip insnapshot.test.ts(settings survive, secrets stripped, gateway block dropped).Type of Change
Verification
npm testpasses (targeted: snapshot, credential-filter, agent defs, mutable-config-perms, snapshot command, config-set, secret-redaction — all green on this host; fullcliproject includes slow CLI-spawn integration tests gated byNEMOCLAW_TEST_TIMEOUT)openclaw.jsonbefore; after the fix it is captured, sanitized, and restored with settings intact)Reviewed scope
Scoped to the reporter's workflow (backup-all → rebuild), which is fully correct: rebuild recreates the gateway (regenerating the
gatewayblock) and repairs config perms. Two adversarial-review edge cases on adjacent paths are intentionally left as follow-ups: preserving the livegatewayblock on a same-sandboxsnapshot restore(the block regenerates at startup), and perms on onboard recreate-restore (self-heals vianemoclaw-start.sh).Signed-off-by: Yimo Jiang yimoj@nvidia.com
Summary by CodeRabbit