Skip to content

fix(dockerfile): drop Patch 4 entirely#3500

Merged
cv merged 5 commits into
mainfrom
fix/2686-rcf-patch-version-gate
May 15, 2026
Merged

fix(dockerfile): drop Patch 4 entirely#3500
cv merged 5 commits into
mainfrom
fix/2686-rcf-patch-version-gate

Conversation

@laitingsheng

@laitingsheng laitingsheng commented May 14, 2026

Copy link
Copy Markdown
Contributor

Summary

Patch 4 in the sandbox Dockerfile is a regex-based monkey-patch over OpenClaw's compiled JS that wraps replaceConfigFile in an EACCES try/catch suppression. It is source-shape-coupled and has been rewritten three times in eight days chasing OpenClaw refactors:

#2686 and #3497 are the latest break: in current OpenClaw, the regex no longer finds the function shape and the image build aborts at Step 17/56.

Patch 4 is also unnecessary by design. The EACCES it was suppressing does not happen in the supported flows:

The expected lifecycle is configure-in-mutable-mode → shields up → run. Plugin install attempted while shielded should fail cleanly; that is exactly what happens without Patch 4.

This PR therefore deletes Patch 4 entirely.

Related Issue

Resolves #2686
Resolves #3497

Related context:

Changes

  • Dockerfile: drop the Patch 4 block (the COPY scripts/rcf_patch.py line and the inline Python invocation + grep guard).
  • scripts/rcf_patch.py: deleted.
  • src/lib/sandbox/build-context.ts: stop staging scripts/rcf_patch.py into the sandbox build context.
  • test/rcf-patch.test.ts: deleted.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

  • Chores

    • Removed OpenClaw patching logic from Docker build process and related artifact copies
    • Updated build context script staging behavior
  • Tests

    • Enhanced sandbox configuration test suite with environment variable passthrough support
    • Added version-based conditional patching validation and warning behavior tests

…range

Patch 4 monkey-patches OpenClaw's `replaceConfigFile` to suppress EACCES
inside the sandbox.  Until the upstream fix in openclaw/openclaw#72950
lands, the patch's only effect is downgrading a hard EACCES crash to a
warning during plugin metadata persistence (plugins still load via
auto-discovery from extensions/).

But the patch's regex tracks OpenClaw's internal source shape, and each
OpenClaw refactor has broken it -- v0.0.29 hit one such break (#2686);
v0.0.40 on macOS hit a stale-install variant (#3497).  Each break is a
hard image-build failure even though the patch is, by design, a fallback
for noisier behaviour in the sandbox.

Replace the hard-assert with a version-gated soft-fail:

* `rcf_patch.py` now reads `OPENCLAW_VERSION` from env.  When it exceeds
  the new `LAST_OPENCLAW_NEEDING_RCF_PATCH` sentinel, the script exits 0
  without touching the source on the assumption that the upstream fix
  has landed and the patch is no longer needed.  Maintainers bump the
  sentinel DOWN once that becomes true.

* When the regex fails to match -- the OpenClaw refactor case the
  hard-assert was guarding -- the script now emits a loud
  `[nemoclaw] WARN` line and exits 0.  Plugin metadata persistence in
  the sandbox surfaces raw EACCES at runtime instead of being suppressed,
  but the image still builds.  The Dockerfile's post-patch grep follows
  the same shape (`echo WARN`, not `exit 1`).

* `test/rcf-patch.test.ts` covers the four new branches: pattern miss
  (soft-warn), missing function (soft-warn), past-sentinel skip, and
  unparseable / empty `OPENCLAW_VERSION` (still attempts the patch).

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@coderabbitai

coderabbitai Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c237adbb-6be0-4a4a-8370-2736f7c1a974

📥 Commits

Reviewing files that changed from the base of the PR and between 87da251 and f1d94ab.

📒 Files selected for processing (4)
  • Dockerfile
  • scripts/rcf_patch.py
  • src/lib/sandbox/build-context.ts
  • test/rcf-patch.test.ts
💤 Files with no reviewable changes (4)
  • src/lib/sandbox/build-context.ts
  • scripts/rcf_patch.py
  • test/rcf-patch.test.ts
  • Dockerfile

📝 Walkthrough

Walkthrough

Removes the in-image OpenClaw “Patch 4” invocation and the Dockerfile copy of scripts/rcf_patch.py; the Python rcf_patch.py script remains in repo with expanded behavior and tests, and the sandbox build staging now copies generate-openclaw-config.py into the staged scripts/ directory instead of rcf_patch.py.

Changes

Remove in-image Patch 4 & adjust staging

Layer / File(s) Summary
Dockerfile: remove Patch 4 and rcf_patch copy
Dockerfile
Deleted the COPY scripts/rcf_patch.py /usr/local/lib/nemoclaw/rcf_patch.py step and removed the entire OpenClaw "Patch 4 — graceful EACCES in replaceConfigFile" block that invoked the Python patcher and verified OPENSHELL_SANDBOX/EACCES edits.
Build context staging change
src/lib/sandbox/build-context.ts
stageOptimizedSandboxBuildContext now stages generate-openclaw-config.py into the temporary scripts/ directory instead of staging rcf_patch.py (change to the staged scripts set for Docker build).

rcf_patch script behavior (repo-only) and tests

Layer / File(s) Summary
Patch implementation that edits replaceConfigFile
scripts/rcf_patch.py
Script locates async function replaceConfigFile(...) via brace-depth scanning, finds the write/await block with a DOTALL-tolerant regex, injects a try { ... } catch(_rcfErr) { if (OPENSHELL_SANDBOX==="1" && _rcfErr.code==="EACCES") log(...) else throw _rcfErr } wrapper, writes the patched file, and logs a success message.
Test coverage and test helper change
test/rcf-patch.test.ts
Expanded tests to assert both successful patch application and "soft-warn/no-op" outcomes when patterns or functions are missing; runPatch(source, env = {}) updated to accept and forward custom environment variables to the spawned python3 process; previous hard-fail test replaced by assertions that source remains unchanged and a specific warning is emitted to stderr.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

Docker, OpenShell

Suggested reviewers

  • ericksoa
  • jyaunches
  • prekshivyas

Poem

🐰 I hopped through Dockerfile and took away a bite,
The patch lives in scripts but won't run at build-time light.
Tests now check both fix and gentle skip,
Staged scripts changed — a tidy little nip.
A carrot for reviewers, concise and bright.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(dockerfile): drop Patch 4 entirely' accurately summarizes the main change: removing the Patch 4 monkey-patch from the Dockerfile. It is concise, specific, and clearly indicates the primary modification.
Linked Issues check ✅ Passed The PR addresses all coding requirements from linked issues #2686 and #3497: removes the failing Patch 4 regex-based monkey-patch from the Dockerfile, deletes the rcf_patch.py script, removes related tests, and stops staging rcf_patch.py in build-context.ts, fully resolving the build failures.
Out of Scope Changes check ✅ Passed All changes are directly in-scope: Dockerfile Patch 4 removal, deletion of rcf_patch.py and its tests, and removal of rcf_patch.py staging from build-context.ts. No unrelated modifications are present beyond the stated objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/2686-rcf-patch-version-gate

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/rcf_patch.py (1)

1-3: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add the required SPDX header at the top of this Python source file.

This file is missing the repository-mandated SPDX header block.

Suggested patch
+# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
 """
 Patch replaceConfigFile in OpenClaw dist to wrap the
 tryWriteSingleTopLevelIncludeMutation/writeConfigFile block in a try/catch

As per coding guidelines "**/*.{js,ts,tsx,sh,py}: Include SPDX license header at the top of every source file..."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/rcf_patch.py` around lines 1 - 3, Add the repository-mandated SPDX
header block at the very top of this Python file (rcf_patch.py) above the
existing module docstring: insert the standard SPDX-License-Identifier line and
the required copyright/ownership header used by the repo so every .py file
includes the license header; ensure the header is formatted as a Python comment
block (leading #) and placed before any other code or docstrings.
🧹 Nitpick comments (1)
Dockerfile (1)

205-212: Run the Dockerfile E2E matrix for this fail-open behavior change.

Since this modifies patching behavior during image build (warn-and-continue path), I recommend running the four listed jobs on this branch before merge: cloud-e2e, sandbox-survival-e2e, hermes-e2e, and rebuild-openclaw-e2e.

As per coding guidelines "Dockerfile: This file affects the sandbox container image... E2E test recommendation: cloud-e2e, sandbox-survival-e2e, hermes-e2e, rebuild-openclaw-e2e."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Dockerfile` around lines 205 - 212, The Dockerfile change introduces a
fail-open path when patching replaceConfigFile (see rcf_file, OPENCLAW_VERSION,
and python3 /usr/local/lib/nemoclaw/rcf_patch.py) so before merging run the full
Dockerfile E2E matrix to validate behavior: trigger the cloud-e2e,
sandbox-survival-e2e, hermes-e2e, and rebuild-openclaw-e2e jobs against this
branch and confirm the warn-and-continue path (grep for
OPENSHELL_SANDBOX.*EACCES and the echo warnings about Patch 4) produces
acceptable logs and no regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@scripts/rcf_patch.py`:
- Around line 1-3: Add the repository-mandated SPDX header block at the very top
of this Python file (rcf_patch.py) above the existing module docstring: insert
the standard SPDX-License-Identifier line and the required copyright/ownership
header used by the repo so every .py file includes the license header; ensure
the header is formatted as a Python comment block (leading #) and placed before
any other code or docstrings.

---

Nitpick comments:
In `@Dockerfile`:
- Around line 205-212: The Dockerfile change introduces a fail-open path when
patching replaceConfigFile (see rcf_file, OPENCLAW_VERSION, and python3
/usr/local/lib/nemoclaw/rcf_patch.py) so before merging run the full Dockerfile
E2E matrix to validate behavior: trigger the cloud-e2e, sandbox-survival-e2e,
hermes-e2e, and rebuild-openclaw-e2e jobs against this branch and confirm the
warn-and-continue path (grep for OPENSHELL_SANDBOX.*EACCES and the echo warnings
about Patch 4) produces acceptable logs and no regressions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e8093075-ecbf-4caa-aae0-887cec8f3bc8

📥 Commits

Reviewing files that changed from the base of the PR and between 9bf3f8d and 87da251.

📒 Files selected for processing (3)
  • Dockerfile
  • scripts/rcf_patch.py
  • test/rcf-patch.test.ts

@laitingsheng laitingsheng marked this pull request as draft May 14, 2026 04:23
@copy-pr-bot

copy-pr-bot Bot commented May 14, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions

github-actions Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: cloud-onboard-e2e, skill-agent-e2e, rebuild-openclaw-e2e
Optional E2E: messaging-providers-e2e, network-policy-e2e, sandbox-operations-e2e

Dispatch hint: cloud-onboard-e2e,skill-agent-e2e,rebuild-openclaw-e2e

Auto-dispatched E2E: cloud-onboard-e2e, skill-agent-e2e, rebuild-openclaw-e2e via nightly-e2e.yaml at b1100612149aaf33e2757474930cb40bc65e3f59nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • cloud-onboard-e2e (~45 min): Validates the public install/onboard path builds the changed sandbox image, starts a real OpenShell sandbox, checks Landlock/read-only enforcement, verifies credential safety, and probes inference.local after the Dockerfile patch removal.
  • skill-agent-e2e (~30 min): Exercises a real OpenClaw assistant flow with skill injection and agent execution. This is the closest existing E2E for catching plugin/extension loading or read-only config regressions from removing the replaceConfigFile EACCES patch.
  • rebuild-openclaw-e2e (~60 min): Covers the sandbox rebuild lifecycle using the current Dockerfile/build context after an older OpenClaw sandbox is created, verifying the changed image patching and optimized build-context staging do not break rebuild or state preservation.

Optional E2E

  • messaging-providers-e2e (~45 min): Useful adjacent coverage because messaging provider tests inspect openclaw.json channel configuration, token placeholders, L7 proxy rewriting, and sandbox startup paths that may interact with OpenClaw plugin/config mutation behavior.
  • network-policy-e2e (~45 min): Optional security-boundary confidence for the same Dockerfile OpenClaw patching area, especially SSRF/proxy behavior, although this PR appears to remove only the replaceConfigFile EACCES patch rather than changing network-policy rules directly.
  • sandbox-operations-e2e (~60 min): Optional lifecycle regression coverage for sandbox list/connect/status/logs/destroy and recovery after changing the sandbox image build inputs.

New E2E recommendations

  • openclaw-plugin-config-write-readonly (high): Existing tests exercise broad assistant/plugin-adjacent flows, but there does not appear to be a targeted E2E that deliberately triggers OpenClaw plugin install/metadata persistence with /sandbox/.openclaw/openclaw.json immutable and asserts the sandbox does not crash or fail on EACCES after removing rcf_patch.py.
    • Suggested test: Add a focused sandbox E2E that starts an OpenClaw sandbox, triggers plugin install or metadata persistence against read-only openclaw.json, verifies no fatal EACCES, and confirms plugins still auto-load from the extensions directory.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: cloud-onboard-e2e,skill-agent-e2e,rebuild-openclaw-e2e

Patch 4 (Dockerfile step 17 + scripts/rcf_patch.py) used a regex over
OpenClaw's compiled JS to wrap `replaceConfigFile` in an EACCES try/catch
suppression for the read-only sandbox. The regex tracked OpenClaw's
internal source shape, which has already broken twice (#2686, #3497) and
will break again every time the upstream refactors the function.

Switch to a runtime trap via `NODE_OPTIONS=--require` instead:

* nemoclaw-blueprint/scripts/openclaw-rcf-shim.js hooks
  `Module.prototype.require`, detects when the OpenClaw `mutate.js`
  module loads, and wraps the exported `replaceConfigFile` with the
  same try/catch behaviour. The wrap is gated on the running OpenClaw
  version being at or below a sentinel
  (`last_openclaw_needing_rcf_shim` in blueprint.yaml, initial value
  `9999.99.99`).  Bump the sentinel DOWN when openclaw/openclaw#72950
  lands; the shim then self-disables for newer versions.
* scripts/nemoclaw-start.sh::install_openclaw_rcf_shim() reads the
  sentinel from blueprint.yaml and the current OpenClaw version from
  `openclaw --version`, exports both as env vars the shim consumes,
  and threads `--require` into `NODE_OPTIONS`.
* The Dockerfile's Step 16 emits a subtle INFO note when the sentinel
  still sits above min_openclaw_version — a maintenance nudge to keep
  the sentinel current as openclaw/openclaw#72950 progresses.
* Drop scripts/rcf_patch.py and test/rcf-patch.test.ts entirely.
* Add test/openclaw-rcf-shim.test.ts covering the six behaviour
  branches (no-op outside sandbox, EACCES swallow + synthesised
  ConfigReplaceResult, non-EACCES re-throw, happy-path pass-through,
  past-sentinel skip, non-matching filename skip).
* schemas/blueprint.schema.json gains the new sentinel key; the
  blueprint validation test covers it.
* src/lib/sandbox/build-context.ts no longer stages rcf_patch.py.

Resolves #2686
Resolves #3497
Refs openclaw/openclaw#72950 (upstream defect; this PR only stops the
crash, the real fix has to land there)

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@laitingsheng laitingsheng changed the title fix(dockerfile): soft-fail Patch 4 outside the known-broken OpenClaw range fix(dockerfile): replace Patch 4 monkey-patch with runtime require-shim May 14, 2026
Patch 4 is a regex-based monkey-patch over OpenClaw's compiled JS that
suppresses EACCES inside replaceConfigFile. Its source-shape coupling
has broken three times in eight days (#2377, #2484, #2876) chasing
upstream refactors; #2686 and #3497 report the latest casualty, where
the regex no longer finds the function in 2026.4.24 and the image build
fails.

Patch 4 is also unnecessary by design:

* In mutable-default mode, openclaw.json is chmod 660 sandbox:sandbox
  and the gateway UID is in the sandbox group (#2681), so plugin
  install writes through without ever hitting EACCES.
* In shields-up mode, the entire config tree (file + parent dir + the
  plugin/extensions state dirs in HIGH_RISK_STATE_DIRS) is locked to
  root:root by design — refusing runtime mutations is the whole point
  of shields-up. Suppressing the EACCES masked that refusal and made
  the install appear to succeed silently while only the auto-discovery
  half landed.

The expected flow is configure-in-mutable-mode → shields up → run.
Plugin install attempted while shielded should fail cleanly, which is
what happens without Patch 4.

Reverts the rcf-shim replacement attempt; the require-hook approach
does not catch OpenClaw's ESM named imports anyway (capture-at-import-
time semantics).

Resolves #2686
Resolves #3497

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@laitingsheng laitingsheng changed the title fix(dockerfile): replace Patch 4 monkey-patch with runtime require-shim fix(dockerfile): drop Patch 4 entirely May 14, 2026
@laitingsheng laitingsheng marked this pull request as ready for review May 14, 2026 12:23
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 25859905690
Target ref: f1d94abcca7c851197362eeecd4d59daca8c71a4
Workflow ref: main
Requested jobs: rebuild-openclaw-e2e,cloud-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
cloud-e2e ✅ success
rebuild-openclaw-e2e ✅ success

@github-actions

Copy link
Copy Markdown
Contributor

Brev E2E (full): FAILED on branch fix/2686-rcf-patch-version-gateSee logs

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 25895823458
Target ref: b1100612149aaf33e2757474930cb40bc65e3f59
Workflow ref: main
Requested jobs: cloud-onboard-e2e,skill-agent-e2e,rebuild-openclaw-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job Result
cloud-onboard-e2e ✅ success
rebuild-openclaw-e2e ✅ success
skill-agent-e2e ✅ success

@cv cv added v0.0.44 and removed v0.0.43 labels May 15, 2026
@cv cv merged commit 0ca95ce into main May 15, 2026
25 checks passed
cv pushed a commit that referenced this pull request May 27, 2026
## Summary
- clarify that `scripts/rcf_patch.py` is intentionally absent from
current blueprints
- replace the stale Patch-4 fail-closed QA expectation with the current
mutable-default / shields-up validation path
- add regression coverage so troubleshooting keeps pointing QA at the
supported checks

## Root Cause
Patch 4 was intentionally deleted by #3500 after NemoClaw moved away
from the build-time `replaceConfigFile` monkey-patch. The issue
reproduces because the old QA test plan still expects
`scripts/rcf_patch.py` to exist even though the supported behavior is
now the mutable-default config model plus shields-up lockdown.

Closes #3944

## Verification
- `test ! -f scripts/rcf_patch.py && test ! -f
nemoclaw-blueprint/scripts/rcf_patch.py`
- `./node_modules/.bin/vitest run test/rcf-patch-removal.test.ts
test/repro-2681-group-writable.test.ts
test/sandbox-build-context.test.ts`
- `npm run build:cli`
- `npm run docs:strict`
- `bash -n test/e2e/test-shields-config.sh`

Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
* Enhanced troubleshooting guide clarifying current sandbox
configuration lifecycle, deprecated test behavior, and updated
validation instructions for shields/config checks.

* **Tests**
* Added automated test to verify removal of the retired patch script
from blueprints and repo locations to prevent regressions in setup and
validation.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/4202?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>
@wscurran wscurran added bug-fix PR fixes a bug or regression and removed fix labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression

Projects

None yet

3 participants