Skip to content

fix(sandbox): graceful EACCES in mutateConfigFile (#2681)#2693

Closed
Sanjays2402 wants to merge 3 commits into
NVIDIA:mainfrom
Sanjays2402:fix/sandbox-mutate-config-eacces
Closed

fix(sandbox): graceful EACCES in mutateConfigFile (#2681)#2693
Sanjays2402 wants to merge 3 commits into
NVIDIA:mainfrom
Sanjays2402:fix/sandbox-mutate-config-eacces

Conversation

@Sanjays2402

@Sanjays2402 Sanjays2402 commented Apr 29, 2026

Copy link
Copy Markdown

Summary

Fixes #2681. Extends the existing sandbox EACCES patch to cover mutateConfigFile, the function the OpenClaw control UI uses for in-place config toggles (e.g. Enable Dreaming, account toggles).

Background

Dockerfile already carries Patch 4, which wraps OpenClaw's replaceConfigFile in a try/catch that swallows EACCES when OPENSHELL_SANDBOX=1 (because /sandbox/.openclaw/openclaw.json is intentionally 444 root:root + Landlock read_only, ref #514, #719). That patch covers the plugin-install write path.

In OpenClaw 2026.4.24, however, control-UI toggles take a sibling code path — mutateConfigFile() in src/config/mutate.ts — which is not wrapped. Both mutateConfigFile and replaceConfigFile ultimately call tryWriteSingleTopLevelIncludeMutationwriteConfigFile, both of which try to write a <basename>.<pid>.<uuid>.tmp file next to the immutable openclaw.json and then rename() it into place. In the sandbox the temp create fails with EACCES, and the gateway raises:

GatewayRequestError: Error: EACCES: permission denied, open
'/sandbox/.openclaw/openclaw.json.299.2daa7154-e325-4ca7-b195-014b9a9013ca.tmp'

This is exactly what users hit when clicking Dreaming Off → On in the UI (#2681 screenshot).

Fix

Add Patch 4b mirroring Patch 4's structure but targeting mutateConfigFile's call to tryWriteSingleTopLevelIncludeMutation/writeConfigFile. When OPENSHELL_SANDBOX=1 and the write fails with EACCES, log a single warning and return success instead of propagating:

[nemoclaw] Config is read-only in sandbox — mutation not persisted
(UI toggles like Enable Dreaming are non-persistent here)

The toggle change is non-persistent in the sandbox by design — config is read-only for tamper-resistance (#514, #719), and the existing root:root 444 lock + Landlock policy stay in force. This patch only restores UI usability so a single click no longer crashes the gateway request; it does not weaken any security boundary.

The patch fails the build with a clear ERROR: ...not applied message if the upstream OpenClaw source structure changes, matching the rest of the Dockerfile's patch hygiene.

Verification

  • Verified the patch's python3 -c "..." block applies cleanly against the live openclaw@2026.4.24 mutateConfigFile source (mutate-JCkOvgvT.js) — the old literal matches exactly once, replacement succeeds.
  • node --check on the patched module: passes (no syntax error introduced).
  • Build-time guard grep -REq 'mutation not persisted' confirms the patch was applied.

Test plan

  • Patch script applies to openclaw@2026.4.24 dist
  • Patched JS parses (node --check)
  • Local Docker build with the new image (CI)
  • Manual smoke: click Enable Dreaming in UI → expect non-crash (warning in gateway log; toggle reverts on next reload, by design)

Related

Signed-off-by: Sanjays2402 51058514+Sanjays2402@users.noreply.github.com

Summary by CodeRabbit

  • Bug Fixes

    • Prevents gateway/UI crashes when saving configuration in sandbox/read-only environments; attempts now log a single warning about non-persistent config instead of crashing.
  • Tests

    • Added automated tests verifying the sandbox handling and that matching safeguards are present for all config-save paths.
  • Chores

    • Build-time check now fails the Docker build if the runtime safeguard patch did not apply.

Patch 4 wraps replaceConfigFile to swallow EACCES from the immutable
sandbox config tree, but the OpenClaw 2026.4.24 control UI also uses a
sibling mutateConfigFile() for in-place toggles (e.g. "Enable Dreaming",
account toggles).  That path was unwrapped, so any UI toggle that
touched a config-backed control crashed with:

  GatewayRequestError: EACCES: permission denied, open
  '/sandbox/.openclaw/openclaw.json.<pid>.<uuid>.tmp'

Add Patch 4b mirroring Patch 4: when OPENSHELL_SANDBOX=1 and the write
fails with EACCES, log a warning and return success instead of
propagating the error.  The toggle is non-persistent in the sandbox by
design (config is read-only for tamper-resistance, ref NVIDIA#514, NVIDIA#719);
this restores UI usability.

Verified the patch applies cleanly to the openclaw 2026.4.24 dist
(mutate-JCkOvgvT.js) and the resulting JS still parses (node --check).

Signed-off-by: Sanjays2402 <51058514+Sanjays2402@users.noreply.github.com>
@copy-pr-bot

copy-pr-bot Bot commented Apr 29, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2f9eccea-f77d-4056-b77d-0842f46bae99

📥 Commits

Reviewing files that changed from the base of the PR and between fe4aad3 and 67a9852.

📒 Files selected for processing (1)
  • Dockerfile
🚧 Files skipped from review as they are similar to previous changes (1)
  • Dockerfile

📝 Walkthrough

Walkthrough

Adds a Docker build-time patch that wraps OpenClaw’s mutateConfigFile write operations in a try/catch to suppress sandbox EACCES errors (when OPENSHELL_SANDBOX==="1"), logs a single non-persistent mutation warning, rethrows non-EACCES errors, and adds a build-time assertion plus tests validating the patches.

Changes

Cohort / File(s) Summary
Docker Build-Time Patching
Dockerfile
Adds build-time search-and-replace of OpenClaw dist JS to wrap the mutateConfigFile config-write section in a try/catch; when OPENSHELL_SANDBOX==="1" and err.code==="EACCES" it logs a single warning about non-persistent mutation and suppresses the error, rethrows otherwise; adds a build-time grep/exit 1 assertion requiring the patch marker.
Tests
test/repro-2681-extra.test.ts
Adds a Vitest suite that reads the Dockerfile and asserts presence, ordering, and structural properties of both the existing Patch 4 (replaceConfigFile) and the new Patch 4b (mutateConfigFile): matching sandbox+EACCES gating, differing swallow/log messages, build-failing guard blocks with expected strings, and rethrow behavior for non-EACCES errors.

Sequence Diagram(s)

sequenceDiagram
    participant UI as "OpenClaw UI"
    participant Gateway as "Gateway/Server"
    participant OpenClaw as "OpenClaw mutateConfigFile"
    participant FS as "Filesystem (/sandbox)"
    Note over OpenClaw: Patch applied at Docker build-time
    UI->>Gateway: Request to mutate config (Enable Dreaming)
    Gateway->>OpenClaw: call mutateConfigFile(params)
    OpenClaw->>FS: try to write temp config file
    alt FS returns EACCES and OPENSHELL_SANDBOX==="1"
        OpenClaw-->>OpenClaw: catch EACCES\nlog single warning ("mutation not persisted")
        OpenClaw->>Gateway: resolve success (no thrown error)
        Gateway->>UI: respond success (UI remains usable)
    else FS error non-EACCES
        OpenClaw-->>Gateway: rethrow error
        Gateway->>UI: respond with error
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I nibbled code in Docker's light,
Wrapped a write to hush the fright,
EACCES knocks — I wink and grin,
One soft warning, no crash within,
Hoppy patch, the UI's bright! 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly describes the main change: adding graceful EACCES handling to mutateConfigFile in the sandbox, which directly addresses the core issue.
Linked Issues check ✅ Passed The PR implements all coding requirements from #2681: patches mutateConfigFile with try/catch for EACCES errors, logs non-persistent warning, returns success on sandbox write failure, and preserves security constraints.
Out of Scope Changes check ✅ Passed All changes are directly scoped to #2681: Dockerfile patch for mutateConfigFile EACCES handling and a test file verifying the patch structure and behavior.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Review rate limit: 9/10 reviews remaining, refill in 6 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
Dockerfile (1)

198-215: Run the Dockerfile E2E matrix before merge.

Given this is build-time patching of runtime OpenClaw behavior, please run: cloud-e2e, sandbox-survival-e2e, hermes-e2e, and rebuild-openclaw-e2e.

As per coding guidelines: Dockerfile changes are only fully testable with a real container build, with those specific E2E jobs recommended.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` around lines 198 - 215, This Dockerfile patch modifies runtime
behavior in the mutateConfigFile function (the
tryWriteSingleTopLevelIncludeMutation / writeConfigFile patch and the
OPENSHELL_SANDBOX EACCES handling), so before merging run the full
container-build E2E matrix to validate it: execute cloud-e2e,
sandbox-survival-e2e, hermes-e2e, and rebuild-openclaw-e2e against a real
container build to confirm the mutateConfigFile change (and the inserted EACCES
catch/console.error) behaves correctly and does not regress toggles or sandbox
immutability behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@Dockerfile`:
- Around line 198-215: This Dockerfile patch modifies runtime behavior in the
mutateConfigFile function (the tryWriteSingleTopLevelIncludeMutation /
writeConfigFile patch and the OPENSHELL_SANDBOX EACCES handling), so before
merging run the full container-build E2E matrix to validate it: execute
cloud-e2e, sandbox-survival-e2e, hermes-e2e, and rebuild-openclaw-e2e against a
real container build to confirm the mutateConfigFile change (and the inserted
EACCES catch/console.error) behaves correctly and does not regress toggles or
sandbox immutability behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3b7ef4e4-1ea8-4500-8787-109e94b348c0

📥 Commits

Reviewing files that changed from the base of the PR and between a794971 and 4b02a6b.

📒 Files selected for processing (1)
  • Dockerfile

@wscurran

wscurran commented May 1, 2026

Copy link
Copy Markdown
Contributor

✨ Thanks for submitting this PR that proposes a fix for the OpenClaw UI 'Enable Dreaming' toggle issue caused by an EACCES error in the sandbox environment. This change extends the existing sandbox EACCES patch to cover the mutateConfigFile function used by the OpenClaw control UI.


Related open issues:

@wscurran wscurran added the integration: openclaw OpenClaw integration behavior label May 1, 2026
…#2681)

The PR is a Dockerfile sed-style patch that wraps OpenClaw's
mutateConfigFile in a try/catch mirroring the existing Patch 4 wrap of
replaceConfigFile. There's no unit-testable code path; this guard
locks the structural invariants so future Dockerfile edits can't
silently regress them.

Six assertions:
- BOTH PATCHES PRESENT: Patch 4 + Patch 4b headers are both wired.
- PARALLEL EACCES GUARD: both gate on
  process.env.OPENSHELL_SANDBOX === "1" + .code === "EACCES" (regex
  matches exactly 2).
- BUILD-TIME ASSERTIONS: both patches have grep -REq + exit 1 guards
  that fail the image build if the sed-style replacement didn't apply.
- DISTINCT LOG MESSAGES: "plugin metadata not persisted" vs
  "mutation not persisted" so log-driven debugging can identify which
  path swallowed EACCES.
- ORDERING: Patch 4 must precede Patch 4b (4b explicitly mirrors 4).
- RETHROW ON UNKNOWN: each catch block re-throws _<name>Err when the
  error isn't EACCES-in-sandbox, so real bugs surface in non-sandbox
  contexts.

Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
@cjagwani

cjagwani commented May 1, 2026

Copy link
Copy Markdown
Contributor

verified locally with 6 structural-invariant tests pushed on fe4aad3:

• both patches present — Patch 4 (replaceConfigFile) and Patch 4b (mutateConfigFile) headers are wired.

• parallel EACCES guard — both gate on
OPENSHELL_SANDBOX === "1" + .code === "EACCES" (regex matches exactly 2 occurrences in the Dockerfile).

• build-time assertions — both grep -REq + exit 1 guards fail the image build if the sed-style replacement didn't apply.

• distinct log messages — "plugin metadata not persisted" vs "mutation not persisted" so log-driven debugging can identify which path swallowed EACCES.

• rethrow on unknown — each catch re-throws when the error isn't EACCES-in-sandbox.

ready to merge.

@cjagwani cjagwani self-assigned this May 1, 2026
ericksoa pushed a commit that referenced this pull request May 1, 2026
…ons land cleanly (#2681) (#2851)

## Summary

Replaces the EACCES-swallow approach proposed in #2693 with proper Unix
group permissions. Control-UI toggles in the OpenClaw dashboard (Enable
Dreaming, account toggles, etc.) now **persist in default mode** instead
of throwing `GatewayRequestError: EACCES` or becoming silent no-ops.

## Background

OpenClaw 2026.4.24 (landed via #2484) introduced `mutateConfigFile` as
the new control-UI write path. Patch 4 in the Dockerfile only wraps the
legacy `replaceConfigFile` (plugin install path), so every config-toggle
click in the sandbox dashboard now EACCES'd.

#2693 proposed adding "Patch 4b" — a parallel try/catch that swallows
the EACCES. That makes toggles non-functional in the sandbox: the user
clicks "Enable Dreaming," gets no error, but nothing actually persists.
UX improves over the crash; underlying limitation stays.

This PR implements the alternative design Aaron sketched for #2681:
rather than wrapping each new write path in EACCES handlers, fix the
actual permissions so the writes succeed.

## Closes / Supersedes

- Closes #2681
- Supersedes #2693 — thanks @Sanjays2402 for raising the issue and the
initial swallow attempt that surfaced the deeper design question

## Implementation (the 6-item spec)

| # | Item | File |
|---|------|------|
| 1 | Keep `gateway` as a separate UID from `sandbox`; add it to the
`sandbox` group | `Dockerfile.base` |
| 2 | Stale-base fallback so older `sandbox-base:latest` tags get the
group membership at derived-image build time | `Dockerfile` |
| 3 | `/sandbox/.openclaw` group-writable + setgid on dirs;
`.config-hash` file mode 664 | `Dockerfile.base`, `Dockerfile` |
| 4 | `normalize_mutable_config_perms()` at startup, gated on Shields
state | `scripts/nemoclaw-start.sh` |
| 5 | `shields down` restores 660/2770 (group-writable + setgid) for
OpenClaw; Hermes left at historical 640/750 (no separate gateway UID,
contract doesn't apply) | `src/lib/shields.ts` |
| 6 | Tests assert the new invariant: writes succeed in default mode, no
new EACCES swallow | `test/repro-2681-group-writable.test.ts` |

## Why setgid

`chmod g+s` on directories means new files inherit `group=sandbox`
regardless of which UID created them. So `gateway` writes a file → file
is `group=sandbox` → the `sandbox` user (also in the group) can still
read it. Without setgid, gateway's writes would land with
`group=gateway` and the agent might lose read access on rotation.

## Patch 4 retention

The existing `Patch 4` (replaceConfigFile EACCES swallow) is
**intentionally retained** as a defensive fallback for:

- Older base images during the rollout window
- Host filesystems that don't honor setgid (rare, but possible on some
Windows/WSL2 configurations)
- Other write paths in OpenClaw that might surface in future versions

No new EACCES swallow patch is added — the `Patch 4b` approach from
#2693 is explicitly rejected per spec item #6.

## Verification

- [x] `npm run build:cli` compiles the changed `shields.ts`
- [x] 11/11 new tests pass in `test/repro-2681-group-writable.test.ts` —
assert structural invariants of the group-writable contract
- [x] 443/443 plugin tests pass
- [x] Pre-existing CLI tests that fail on this branch ALSO fail on
pristine main (`@oclif/core` module-not-found from in-flight migration;
not caused by this PR)
- [ ] **Brev E2E required** — touches Dockerfile + Dockerfile.base +
shields lifecycle. Adaptive matrix: M×DANGER → full Brev sweep before
merge

## Test plan

- [x] Unit: 11 structural assertions in
`repro-2681-group-writable.test.ts`
- [ ] CI: `build-sandbox-images` (validates the group-membership +
setgid Dockerfile changes)
- [ ] CI: `test-e2e-sandbox` (validates shields lifecycle + onboard
flow)
- [ ] CI: `test-e2e-gateway-isolation` (validates the
gateway-as-different-UID still runs cleanly)
- [ ] Manual repro: onboard, click "Enable Dreaming" in dashboard,
verify mutation persists across `nemoclaw status`

## Type of Change

- [x] Code change (feature, bug fix, or refactor)

## AI Disclosure

- [x] AI-assisted — tool: Claude Code
@ericksoa

ericksoa commented May 1, 2026

Copy link
Copy Markdown
Contributor

Thanks @Sanjays2402 for digging into this and for putting together the mutateConfigFile EACCES patch. That helped confirm the OpenClaw dashboard toggle path was the one failing in #2681 and clarified the tradeoff around swallowing the error.

We ended up landing a different fix in #2851: instead of making the UI request look successful while leaving the mutation non-persistent, #2851 changes the sandbox permission contract so /sandbox/.openclaw is group-writable/setgid and the gateway user can actually persist the config mutation. That merged in 37d6d6a and closed #2681.

Given that #2851 supersedes this approach, I’m marking this PR as superseded and closing it. Thanks again for the help identifying and narrowing the failure path.

@ericksoa ericksoa closed this May 1, 2026
cv pushed a commit that referenced this pull request May 15, 2026
## Summary
Patch 4 in the sandbox `Dockerfile` is a regex-based monkey-patch over
OpenClaw's compiled JS that wraps `replaceConfigFile` in an EACCES
try/catch suppression. It is source-shape-coupled and has been rewritten
three times in eight days chasing OpenClaw refactors:

- `fefd69fa2` (#2377) — original literal-string match against
[openclaw/openclaw@v2026.4.9](https://github.com/openclaw/openclaw/releases/tag/v2026.4.9).
- `5dcb0a9b9` (#2484) — updated the literal string for the restructured
write block in
[openclaw/openclaw@v2026.4.24](https://github.com/openclaw/openclaw/releases/tag/v2026.4.24).
- `e0290e153` (#2876) — hardened to a tolerant
whitespace/property-order-aware regex against
[openclaw/openclaw@v2026.4.24](https://github.com/openclaw/openclaw/releases/tag/v2026.4.24).

#2686 and #3497 are the latest break: in current OpenClaw, the regex no
longer finds the function shape and the image build aborts at Step
17/56.

Patch 4 is also unnecessary by design. The EACCES it was suppressing
does not happen in the supported flows:

- **Mutable-default mode** (fresh sandbox, before `nemoclaw shields
up`): `openclaw.json` is `chmod 660 sandbox:sandbox` and the gateway UID
is in the sandbox group, courtesy of #2851 (closing #2681; superseding
the EACCES-swallow attempt in #2693). `openclaw plugins install` writes
through normally; no EACCES.
- **Shields-up mode** (locked): the entire config tree — file, parent
directory, and the `extensions`/`plugins` state dirs from
[HIGH_RISK_STATE_DIRS](src/lib/shields/index.ts#L292-L306) — is locked
to `root:root` by design. Shields-up exists *to refuse* runtime config
and plugin mutations. Suppressing the EACCES masked that refusal and
made `openclaw plugins install` appear to succeed silently while only
the auto-discovery half landed.

The expected lifecycle is **configure-in-mutable-mode → `shields up` →
run**. Plugin install attempted while shielded should fail cleanly; that
is exactly what happens without Patch 4.

This PR therefore deletes Patch 4 entirely.

## Related Issue
Resolves #2686
Resolves #3497

Related context:

- #2681 — original "make `.openclaw` group-writable" issue, closed by
#2851.
- #2851 — PR that made mutable-mode plugin install work without an
EACCES swallow.
- #2693 — closed earlier EACCES-swallow attempt, superseded once #2851
landed.
- #2544 — NemoClaw issue tracking the broader "plugin config requires
multi-minute rebuild" problem.
- openclaw/openclaw#72950 — upstream defect (no env-var or
writable-overlay path for `plugins.entries.<id>.config`); the real fix
has to land there.

## Changes
- `Dockerfile`: drop the Patch 4 block (the `COPY scripts/rcf_patch.py`
line and the inline Python invocation + grep guard).
- `scripts/rcf_patch.py`: deleted.
- `src/lib/sandbox/build-context.ts`: stop staging
`scripts/rcf_patch.py` into the sandbox build context.
- `test/rcf-patch.test.ts`: deleted.

## Type of Change

- [x] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification

- [x] `npx prek run --all-files` passes
- [x] `npm test` passes
- [x] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [ ] Docs updated for user-facing behavior changes
- [ ] `make docs` builds without warnings (doc changes only)
- [ ] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Removed OpenClaw patching logic from Docker build process and related
artifact copies
  * Updated build context script staging behavior

* **Tests**
* Enhanced sandbox configuration test suite with environment variable
passthrough support
* Added version-based conditional patching validation and warning
behavior tests

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@wscurran wscurran added area: integrations Third-party service integration behavior area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression feature PR adds or expands user-visible functionality and removed OpenShell labels Jun 3, 2026
@wscurran wscurran removed status: superseded bug Something fails against expected or documented behavior feature PR adds or expands user-visible functionality labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: integrations Third-party service integration behavior area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression integration: openclaw OpenClaw integration behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

OpenClaw UI "Enable Dreaming" doesn't work because of GatewayRequestError: EACCES: permission denied

4 participants