fix(sandbox): graceful EACCES in mutateConfigFile (#2681)#2693
fix(sandbox): graceful EACCES in mutateConfigFile (#2681)#2693Sanjays2402 wants to merge 3 commits into
Conversation
Patch 4 wraps replaceConfigFile to swallow EACCES from the immutable sandbox config tree, but the OpenClaw 2026.4.24 control UI also uses a sibling mutateConfigFile() for in-place toggles (e.g. "Enable Dreaming", account toggles). That path was unwrapped, so any UI toggle that touched a config-backed control crashed with: GatewayRequestError: EACCES: permission denied, open '/sandbox/.openclaw/openclaw.json.<pid>.<uuid>.tmp' Add Patch 4b mirroring Patch 4: when OPENSHELL_SANDBOX=1 and the write fails with EACCES, log a warning and return success instead of propagating the error. The toggle is non-persistent in the sandbox by design (config is read-only for tamper-resistance, ref NVIDIA#514, NVIDIA#719); this restores UI usability. Verified the patch applies cleanly to the openclaw 2026.4.24 dist (mutate-JCkOvgvT.js) and the resulting JS still parses (node --check). Signed-off-by: Sanjays2402 <51058514+Sanjays2402@users.noreply.github.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a Docker build-time patch that wraps OpenClaw’s mutateConfigFile write operations in a try/catch to suppress sandbox Changes
Sequence Diagram(s)sequenceDiagram
participant UI as "OpenClaw UI"
participant Gateway as "Gateway/Server"
participant OpenClaw as "OpenClaw mutateConfigFile"
participant FS as "Filesystem (/sandbox)"
Note over OpenClaw: Patch applied at Docker build-time
UI->>Gateway: Request to mutate config (Enable Dreaming)
Gateway->>OpenClaw: call mutateConfigFile(params)
OpenClaw->>FS: try to write temp config file
alt FS returns EACCES and OPENSHELL_SANDBOX==="1"
OpenClaw-->>OpenClaw: catch EACCES\nlog single warning ("mutation not persisted")
OpenClaw->>Gateway: resolve success (no thrown error)
Gateway->>UI: respond success (UI remains usable)
else FS error non-EACCES
OpenClaw-->>Gateway: rethrow error
Gateway->>UI: respond with error
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
Dockerfile (1)
198-215: Run the Dockerfile E2E matrix before merge.Given this is build-time patching of runtime OpenClaw behavior, please run:
cloud-e2e,sandbox-survival-e2e,hermes-e2e, andrebuild-openclaw-e2e.As per coding guidelines:
Dockerfilechanges are only fully testable with a real container build, with those specific E2E jobs recommended.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile` around lines 198 - 215, This Dockerfile patch modifies runtime behavior in the mutateConfigFile function (the tryWriteSingleTopLevelIncludeMutation / writeConfigFile patch and the OPENSHELL_SANDBOX EACCES handling), so before merging run the full container-build E2E matrix to validate it: execute cloud-e2e, sandbox-survival-e2e, hermes-e2e, and rebuild-openclaw-e2e against a real container build to confirm the mutateConfigFile change (and the inserted EACCES catch/console.error) behaves correctly and does not regress toggles or sandbox immutability behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@Dockerfile`:
- Around line 198-215: This Dockerfile patch modifies runtime behavior in the
mutateConfigFile function (the tryWriteSingleTopLevelIncludeMutation /
writeConfigFile patch and the OPENSHELL_SANDBOX EACCES handling), so before
merging run the full container-build E2E matrix to validate it: execute
cloud-e2e, sandbox-survival-e2e, hermes-e2e, and rebuild-openclaw-e2e against a
real container build to confirm the mutateConfigFile change (and the inserted
EACCES catch/console.error) behaves correctly and does not regress toggles or
sandbox immutability behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 3b7ef4e4-1ea8-4500-8787-109e94b348c0
📒 Files selected for processing (1)
Dockerfile
|
✨ Thanks for submitting this PR that proposes a fix for the OpenClaw UI 'Enable Dreaming' toggle issue caused by an EACCES error in the sandbox environment. This change extends the existing sandbox EACCES patch to cover the mutateConfigFile function used by the OpenClaw control UI. Related open issues: |
…#2681) The PR is a Dockerfile sed-style patch that wraps OpenClaw's mutateConfigFile in a try/catch mirroring the existing Patch 4 wrap of replaceConfigFile. There's no unit-testable code path; this guard locks the structural invariants so future Dockerfile edits can't silently regress them. Six assertions: - BOTH PATCHES PRESENT: Patch 4 + Patch 4b headers are both wired. - PARALLEL EACCES GUARD: both gate on process.env.OPENSHELL_SANDBOX === "1" + .code === "EACCES" (regex matches exactly 2). - BUILD-TIME ASSERTIONS: both patches have grep -REq + exit 1 guards that fail the image build if the sed-style replacement didn't apply. - DISTINCT LOG MESSAGES: "plugin metadata not persisted" vs "mutation not persisted" so log-driven debugging can identify which path swallowed EACCES. - ORDERING: Patch 4 must precede Patch 4b (4b explicitly mirrors 4). - RETHROW ON UNKNOWN: each catch block re-throws _<name>Err when the error isn't EACCES-in-sandbox, so real bugs surface in non-sandbox contexts. Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
|
verified locally with 6 structural-invariant tests pushed on fe4aad3: • both patches present — Patch 4 (replaceConfigFile) and Patch 4b (mutateConfigFile) headers are wired. • parallel EACCES guard — both gate on • build-time assertions — both • distinct log messages — "plugin metadata not persisted" vs "mutation not persisted" so log-driven debugging can identify which path swallowed EACCES. • rethrow on unknown — each catch re-throws when the error isn't EACCES-in-sandbox. ready to merge. |
…ons land cleanly (#2681) (#2851) ## Summary Replaces the EACCES-swallow approach proposed in #2693 with proper Unix group permissions. Control-UI toggles in the OpenClaw dashboard (Enable Dreaming, account toggles, etc.) now **persist in default mode** instead of throwing `GatewayRequestError: EACCES` or becoming silent no-ops. ## Background OpenClaw 2026.4.24 (landed via #2484) introduced `mutateConfigFile` as the new control-UI write path. Patch 4 in the Dockerfile only wraps the legacy `replaceConfigFile` (plugin install path), so every config-toggle click in the sandbox dashboard now EACCES'd. #2693 proposed adding "Patch 4b" — a parallel try/catch that swallows the EACCES. That makes toggles non-functional in the sandbox: the user clicks "Enable Dreaming," gets no error, but nothing actually persists. UX improves over the crash; underlying limitation stays. This PR implements the alternative design Aaron sketched for #2681: rather than wrapping each new write path in EACCES handlers, fix the actual permissions so the writes succeed. ## Closes / Supersedes - Closes #2681 - Supersedes #2693 — thanks @Sanjays2402 for raising the issue and the initial swallow attempt that surfaced the deeper design question ## Implementation (the 6-item spec) | # | Item | File | |---|------|------| | 1 | Keep `gateway` as a separate UID from `sandbox`; add it to the `sandbox` group | `Dockerfile.base` | | 2 | Stale-base fallback so older `sandbox-base:latest` tags get the group membership at derived-image build time | `Dockerfile` | | 3 | `/sandbox/.openclaw` group-writable + setgid on dirs; `.config-hash` file mode 664 | `Dockerfile.base`, `Dockerfile` | | 4 | `normalize_mutable_config_perms()` at startup, gated on Shields state | `scripts/nemoclaw-start.sh` | | 5 | `shields down` restores 660/2770 (group-writable + setgid) for OpenClaw; Hermes left at historical 640/750 (no separate gateway UID, contract doesn't apply) | `src/lib/shields.ts` | | 6 | Tests assert the new invariant: writes succeed in default mode, no new EACCES swallow | `test/repro-2681-group-writable.test.ts` | ## Why setgid `chmod g+s` on directories means new files inherit `group=sandbox` regardless of which UID created them. So `gateway` writes a file → file is `group=sandbox` → the `sandbox` user (also in the group) can still read it. Without setgid, gateway's writes would land with `group=gateway` and the agent might lose read access on rotation. ## Patch 4 retention The existing `Patch 4` (replaceConfigFile EACCES swallow) is **intentionally retained** as a defensive fallback for: - Older base images during the rollout window - Host filesystems that don't honor setgid (rare, but possible on some Windows/WSL2 configurations) - Other write paths in OpenClaw that might surface in future versions No new EACCES swallow patch is added — the `Patch 4b` approach from #2693 is explicitly rejected per spec item #6. ## Verification - [x] `npm run build:cli` compiles the changed `shields.ts` - [x] 11/11 new tests pass in `test/repro-2681-group-writable.test.ts` — assert structural invariants of the group-writable contract - [x] 443/443 plugin tests pass - [x] Pre-existing CLI tests that fail on this branch ALSO fail on pristine main (`@oclif/core` module-not-found from in-flight migration; not caused by this PR) - [ ] **Brev E2E required** — touches Dockerfile + Dockerfile.base + shields lifecycle. Adaptive matrix: M×DANGER → full Brev sweep before merge ## Test plan - [x] Unit: 11 structural assertions in `repro-2681-group-writable.test.ts` - [ ] CI: `build-sandbox-images` (validates the group-membership + setgid Dockerfile changes) - [ ] CI: `test-e2e-sandbox` (validates shields lifecycle + onboard flow) - [ ] CI: `test-e2e-gateway-isolation` (validates the gateway-as-different-UID still runs cleanly) - [ ] Manual repro: onboard, click "Enable Dreaming" in dashboard, verify mutation persists across `nemoclaw status` ## Type of Change - [x] Code change (feature, bug fix, or refactor) ## AI Disclosure - [x] AI-assisted — tool: Claude Code
|
Thanks @Sanjays2402 for digging into this and for putting together the We ended up landing a different fix in #2851: instead of making the UI request look successful while leaving the mutation non-persistent, #2851 changes the sandbox permission contract so Given that #2851 supersedes this approach, I’m marking this PR as superseded and closing it. Thanks again for the help identifying and narrowing the failure path. |
## Summary Patch 4 in the sandbox `Dockerfile` is a regex-based monkey-patch over OpenClaw's compiled JS that wraps `replaceConfigFile` in an EACCES try/catch suppression. It is source-shape-coupled and has been rewritten three times in eight days chasing OpenClaw refactors: - `fefd69fa2` (#2377) — original literal-string match against [openclaw/openclaw@v2026.4.9](https://github.com/openclaw/openclaw/releases/tag/v2026.4.9). - `5dcb0a9b9` (#2484) — updated the literal string for the restructured write block in [openclaw/openclaw@v2026.4.24](https://github.com/openclaw/openclaw/releases/tag/v2026.4.24). - `e0290e153` (#2876) — hardened to a tolerant whitespace/property-order-aware regex against [openclaw/openclaw@v2026.4.24](https://github.com/openclaw/openclaw/releases/tag/v2026.4.24). #2686 and #3497 are the latest break: in current OpenClaw, the regex no longer finds the function shape and the image build aborts at Step 17/56. Patch 4 is also unnecessary by design. The EACCES it was suppressing does not happen in the supported flows: - **Mutable-default mode** (fresh sandbox, before `nemoclaw shields up`): `openclaw.json` is `chmod 660 sandbox:sandbox` and the gateway UID is in the sandbox group, courtesy of #2851 (closing #2681; superseding the EACCES-swallow attempt in #2693). `openclaw plugins install` writes through normally; no EACCES. - **Shields-up mode** (locked): the entire config tree — file, parent directory, and the `extensions`/`plugins` state dirs from [HIGH_RISK_STATE_DIRS](src/lib/shields/index.ts#L292-L306) — is locked to `root:root` by design. Shields-up exists *to refuse* runtime config and plugin mutations. Suppressing the EACCES masked that refusal and made `openclaw plugins install` appear to succeed silently while only the auto-discovery half landed. The expected lifecycle is **configure-in-mutable-mode → `shields up` → run**. Plugin install attempted while shielded should fail cleanly; that is exactly what happens without Patch 4. This PR therefore deletes Patch 4 entirely. ## Related Issue Resolves #2686 Resolves #3497 Related context: - #2681 — original "make `.openclaw` group-writable" issue, closed by #2851. - #2851 — PR that made mutable-mode plugin install work without an EACCES swallow. - #2693 — closed earlier EACCES-swallow attempt, superseded once #2851 landed. - #2544 — NemoClaw issue tracking the broader "plugin config requires multi-minute rebuild" problem. - openclaw/openclaw#72950 — upstream defect (no env-var or writable-overlay path for `plugins.entries.<id>.config`); the real fix has to land there. ## Changes - `Dockerfile`: drop the Patch 4 block (the `COPY scripts/rcf_patch.py` line and the inline Python invocation + grep guard). - `scripts/rcf_patch.py`: deleted. - `src/lib/sandbox/build-context.ts`: stop staging `scripts/rcf_patch.py` into the sandbox build context. - `test/rcf-patch.test.ts`: deleted. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [x] `npm test` passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) Signed-off-by: Tinson Lai <tinsonl@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Removed OpenClaw patching logic from Docker build process and related artifact copies * Updated build context script staging behavior * **Tests** * Enhanced sandbox configuration test suite with environment variable passthrough support * Added version-based conditional patching validation and warning behavior tests <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Summary
Fixes #2681. Extends the existing sandbox EACCES patch to cover
mutateConfigFile, the function the OpenClaw control UI uses for in-place config toggles (e.g. Enable Dreaming, account toggles).Background
Dockerfilealready carries Patch 4, which wraps OpenClaw'sreplaceConfigFilein a try/catch that swallowsEACCESwhenOPENSHELL_SANDBOX=1(because/sandbox/.openclaw/openclaw.jsonis intentionally444 root:root+ Landlockread_only, ref #514, #719). That patch covers the plugin-install write path.In OpenClaw 2026.4.24, however, control-UI toggles take a sibling code path —
mutateConfigFile()insrc/config/mutate.ts— which is not wrapped. BothmutateConfigFileandreplaceConfigFileultimately calltryWriteSingleTopLevelIncludeMutation→writeConfigFile, both of which try to write a<basename>.<pid>.<uuid>.tmpfile next to the immutableopenclaw.jsonand thenrename()it into place. In the sandbox the temp create fails withEACCES, and the gateway raises:This is exactly what users hit when clicking Dreaming Off → On in the UI (#2681 screenshot).
Fix
Add Patch 4b mirroring Patch 4's structure but targeting
mutateConfigFile's call totryWriteSingleTopLevelIncludeMutation/writeConfigFile. WhenOPENSHELL_SANDBOX=1and the write fails withEACCES, log a single warning and return success instead of propagating:The toggle change is non-persistent in the sandbox by design — config is read-only for tamper-resistance (#514, #719), and the existing root:root 444 lock + Landlock policy stay in force. This patch only restores UI usability so a single click no longer crashes the gateway request; it does not weaken any security boundary.
The patch fails the build with a clear
ERROR: ...not appliedmessage if the upstream OpenClaw source structure changes, matching the rest of the Dockerfile's patch hygiene.Verification
python3 -c "..."block applies cleanly against the liveopenclaw@2026.4.24mutateConfigFilesource (mutate-JCkOvgvT.js) — theoldliteral matches exactly once, replacement succeeds.node --checkon the patched module: passes (no syntax error introduced).grep -REq 'mutation not persisted'confirms the patch was applied.Test plan
openclaw@2026.4.24distnode --check)Related
.openclaw/ writable.openclaw-datadesign)Signed-off-by: Sanjays2402 51058514+Sanjays2402@users.noreply.github.com
Summary by CodeRabbit
Bug Fixes
Tests
Chores