fix(sandbox): restore mutable OpenClaw config perms after raw in-sandbox doctor --fix (#4538) by yimoj · Pull Request #5017 · NVIDIA/NemoClaw

yimoj · 2026-06-09T04:26:23Z

Summary

A raw connect-shell openclaw doctor --fix (run directly inside a mutable sandbox, outside any NemoClaw wrapper) enforces OpenClaw's single-user 700/600 layout, tightening /sandbox/.openclaw and openclaw.json back from the NemoClaw mutable contract (2770 setgid + group-writable dir, 660 config). That blocks the gateway UID — a member of the sandbox group — from persisting config. This fixes the missed path at the always-on in-sandbox openclaw() guard so the contract is restored after the raw command, even when doctor exits nonzero.

Related Issue

Fixes #4538

Changes

Always-on in-sandbox restore. The emitted openclaw() guard function (written into /tmp/nemoclaw-proxy-env.sh, sourced by every interactive/login sandbox shell) now calls a new _nemoclaw_restore_mutable_config_perms helper after every routed openclaw command, re-asserting 2770 dir / 660 config (+ .config-hash). PR fix(sandbox): preserve mutable OpenClaw config perms after doctor --fix (#4538) #4610 only repaired this via host-side wrapper commands (nemoclaw doctor --fix, rebuild, startup), none of which run after the raw in-sandbox path the reporter used.
Safe by construction. The helper skips when shields are up (config dir root-owned, so the lock is never weakened), is a no-op fast path when the contract already holds, and strips group-write from the read-only recovery baseline that the recursive chmod could otherwise loosen in rootless mode.
Survives nonzero exit. Errexit is dropped around the guarded call (mirroring the existing devices approve branch) so a nonzero doctor --fix exit — the exact reported failure (EACCES on a root-locked shell init file) — cannot abort the guard before the restore runs.
Tests. Regression tests for the helper (restore, shields-up no-op, errexit survival, baseline protection) and a real-image reporter-workflow E2E (test/repro-4538-raw-doctor-perms.test.ts) that sources the actual emitted guard, runs raw openclaw doctor --fix in a NemoClaw sandbox image, and asserts the contract is restored. The E2E is gated behind NEMOCLAW_RUN_DOCTOR_PERMS_DOCKER_E2E=1 so it runs where docker + the image are available.

Type of Change

Code change (feature, bug fix, or refactor)

Verification

npm test passes locally (the only failures were 14 pre-existing, load-induced 5s-timeout flakes in unrelated CLI --help oclif-dispatch tests, e.g. test/cli/snapshot-shields.test.ts; they do not touch this change). All CI checks are green: ShellCheck, static-checks, 5 cli-test-shards, plugin-tests, build-typecheck, CodeQL, macos/wsl-e2e, dco-check, codebase-growth-guardrails.
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed

Reporter-workflow E2E evidence

The exact reported workflow — connect to a mutable sandbox, source the always-on guard (as /etc/bash.bashrc does on connect), run the raw in-sandbox openclaw doctor --fix — was reproduced end-to-end against a real NemoClaw sandbox image (nemoclaw-production:latest), as the sandbox user.

Test: test/repro-4538-raw-doctor-perms.test.ts
Env / command: NEMOCLAW_RUN_DOCTOR_PERMS_DOCKER_E2E=1 npx vitest run test/repro-4538-raw-doctor-perms.test.ts

=== BEFORE (baseline mutable contract) ===
/sandbox/.openclaw           2770 sandbox:sandbox
/sandbox/.openclaw/openclaw.json  660 sandbox:sandbox
=== reporter cmd: raw `openclaw doctor --fix` ===
(without the fix this leaves: dir 700 sandbox:sandbox, openclaw.json 600 sandbox:sandbox)
=== AFTER (with the always-on openclaw() guard) ===
/sandbox/.openclaw           2770 sandbox:sandbox
/sandbox/.openclaw/openclaw.json  660 sandbox:sandbox
RESULT dir=2770 file=660
PASS: mutable contract (2770/660) restored after raw openclaw doctor --fix

The before-fix regression (dir→700, file→600) was also reproduced directly in the same image without the guard sourced, confirming the gap. Additional non-docker regression tests in the same file cover the restore helper, shields-up no-op, errexit survival on a nonzero doctor --fix, and recovery-baseline protection.

Signed-off-by: Yimo Jiang yimoj@nvidia.com

coderabbitai · 2026-06-09T04:26:35Z

📝 Walkthrough

Walkthrough

Adds _nemoclaw_restore_mutable_config_perms to the runtime guard and updates the guard's openclaw() wrapper to invoke the restore helper after every in-sandbox openclaw run, preserving exit codes and prior errexit state; includes unit tests and a gated Docker E2E validating permission restoration and baseline invariants.

Changes

Mutable config permission restoration after openclaw doctor --fix

Layer / File(s)	Summary
Permission restoration helper and guard wrapper `scripts/nemoclaw-start.sh`	Adds `_nemoclaw_restore_mutable_config_perms` (skips when config dir is root-owned; otherwise normalizes directory setgid/group rwx and file modes and strips group-write from `openclaw.json.nemoclaw-baseline`) and updates `openclaw()` to capture exit status, call the helper unconditionally, restore prior `errexit` state, and return the original exit code.
Unit helpers and tests `test/repro-4538-raw-doctor-perms.test.ts`	Adds utilities to extract the guard block and compute POSIX modes, creates a simulated tightened `700/600` tree, and unit-tests: restoring permissions after tightening, noop when shields up (root-owned), preserving nonzero `openclaw` exit codes while restoring permissions, preventing an errexit gap, and ensuring the baseline remains non-group-writable.
Gated Docker-based E2E validation `test/repro-4538-raw-doctor-perms.test.ts`	Optional Docker E2E (enabled via `NEMOCLAW_RUN_DOCTOR_PERMS_DOCKER_E2E=1`) that injects the guard into a sandbox image, runs `openclaw doctor --fix`, and asserts `/sandbox/.openclaw` and `/sandbox/.openclaw/openclaw.json` end at `2770` and `660`.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

NVIDIA/NemoClaw#4715: Related changes to scripts/nemoclaw-start.sh handling of .config-hash and mutable config startup behavior.

Suggested labels

fix, area: sandbox, integration: openclaw, bug-fix

Suggested reviewers

cv
cjagwani

🐰 A tiny rabbit hopped into the shell,
found doctor fixes that tightened the well.
It planted a guard and adjusted the perms,
so sandbox and gateway can both take turns.
Now configs are safe — hop, patch, and quell.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly summarizes the main change: fixing mutable OpenClaw config permission restoration after raw in-sandbox doctor --fix, directly addressing issue `#4538`.
Linked Issues check	✅ Passed	The PR fully addresses the core coding requirements from issue `#4538`: it implements permission restoration after openclaw doctor --fix, preserves group-writable contract in mutable mode, and includes comprehensive unit/E2E tests.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to the stated objective: shell script modifications for permission restoration and corresponding test coverage; no unrelated alterations present.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…box doctor --fix (NVIDIA#4538) A raw connect-shell `openclaw doctor --fix` enforces OpenClaw's single-user 700/600 layout, tightening /sandbox/.openclaw and openclaw.json back from the NemoClaw mutable contract (2770 setgid + group-writable dir, 660 config). That blocks the gateway UID — a member of the sandbox group — from persisting config. PR NVIDIA#4610 only repaired this via host-side NemoClaw wrapper commands (`nemoclaw doctor --fix`, rebuild, startup); none run after the raw in-sandbox path the QA reporter used, so the gateway stayed broken until the next restart. Fix the missed path at the always-on in-sandbox guard: the emitted openclaw() shell function (sourced by every interactive/login sandbox shell) now re-asserts the contract after every routed openclaw command, regardless of exit code. - add _nemoclaw_restore_mutable_config_perms to the guard env; skips when shields are up (config dir root-owned), no-op via fast path when the contract already holds, and strips group-write from the read-only recovery baseline the recursive chmod could otherwise loosen - drop errexit around the guarded call (mirroring the devices-approve branch) so a nonzero `doctor --fix` exit cannot abort the guard before the restore runs — the nonzero-exit case is the exact reported failure - regression tests plus a gated real-image reporter-workflow E2E that drives raw `openclaw doctor --fix` and asserts 2770/660 is restored Signed-off-by: Yimo Jiang <yimoj@nvidia.com>

## Summary - Add v0.0.62 release notes from Discussion #5100 and link release highlights to the relevant docs pages. - Document the release's GPU sandbox recreation, sandbox-side local inference verification, and Hermes dashboard port guard in the command and inference references. - Refresh generated NemoClaw user skills for the release-prep docs set. ## Source Summary - #4956 -> `docs/reference/commands.mdx`: Document CDI-first Docker GPU recreation behavior for Linux Docker-driver sandboxes. - #5024 -> `docs/inference/use-local-inference.mdx`: Document sandbox-runtime verification of the `inference.local` local inference route. - #5018 -> `docs/reference/commands.mdx`: Document Jetson/Tegra device-node group propagation for sandbox CUDA initialization. - #5012, #4763, #4706, #5030, #5015 -> `docs/about/release-notes.mdx`: Summarize onboarding and recovery reliability fixes, including the reserved Hermes API port guard. - #5017 and #5043 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Summarize mutable OpenClaw config recovery and host-side `agents list` coverage. - #5010 and #5016 -> `docs/about/release-notes.mdx`: Summarize Hermes upstream metadata visibility and WhatsApp QR rendering reliability. - #5045 and prior source docs in the v0.0.62 range -> `.agents/skills/`: Refresh generated user-skill references from the current docs source. ## Skipped - #5019 -> skipped for new prose because it touched `openclaw-sandbox-permissive.yaml`, which matches `docs/.docs-skip`. Existing source docs remain the source for generated skill synchronization. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` (passes; Fern reports 0 errors and 1 hidden warning) - Pre-commit hooks passed during commit, including docs-to-skills verification, markdown lint, gitleaks, and skills YAML tests.  ## Summary by CodeRabbit * **New Features** * Added `nemoclaw <name> agents list` command. * v0.0.62 release notes added summarizing onboarding and recovery improvements. * **Bug Fixes** * Improved GPU sandbox onboarding reliability (NVIDIA CDI path, Jetson/Tegra device handling). * Better local inference verification and recovery for Linux Docker-driver GPU sandboxes. * Quieter/earlier handling of onboarding drift and port collisions. * **Documentation** * Expanded GPU passthrough, inference verification, writable paths (`/dev/pts`), port 8642 restriction, and command examples.  --------- Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>

…VIDIA#4538, NVIDIA#4859) OpenClaw's `doctor` state-integrity check flags the NemoClaw mutable-config contract — /sandbox/.openclaw (2770 setgid + group-writable) and openclaw.json (660) — as "permissions are too open" and, with --fix, tightens them to single-user 700/600. That durably breaks the group-shared contract the gateway UID (a member of the sandbox group) needs to persist control-UI config writes, and prints scary "too open"/"Tightened permissions" output that misrepresents the intended contract. PR NVIDIA#5017's always-on openclaw() shell guard only restores 2770/660 *after* a raw in-sandbox `openclaw doctor --fix` exits — so the warning/tighten still runs, and any path that bypasses the connect-shell guard (stale image, raw `command openclaw`, the macOS reporter flow in NVIDIA#4859) leaves the tree durably locked at 700/600. Fix the root cause at the OpenClaw source via the established Dockerfile-patch mechanism: a new patch-openclaw-doctor-mutable-perms.mts narrows doctor's over-open test from group+other bits (`stat.mode & 63`) to OTHER (world) bits only (`stat.mode & 7`) when running inside an OpenShell sandbox. Group bits — the gateway's intended shared access — are then tolerated, so the 2770/660 contract is neither warned about nor tightened, while genuinely world-accessible modes (e.g. 2777/664) are still caught and repaired. Out-of-sandbox behavior is byte-identical to upstream. The gate keys on OPENSHELL_SANDBOX being set (not `=== "1"`): OpenShell injects the sandbox name there, verified on 0.0.44. Because the fix lives in the baked image, it applies regardless of host OS and whether the connect-shell guard is loaded, resolving both the All-Platforms NVIDIA#4538 and the macOS NVIDIA#4859 reports. The NVIDIA#5017 guard is kept as defense-in-depth for sandboxes built before this patch. E2E (real onboarded sandbox, patched image, OpenClaw 2026.5.27): - connect-shell `openclaw doctor --fix`: exit 0, 0 scary messages, 0 bashrc EACCES, perms stay 2770/660. - raw `command openclaw doctor --fix` (guard bypassed, NVIDIA#4859 path): exit 0, 0 scary messages, perms stay 2770/660. - unpatched baseline: 4 scary messages, perms regress to 700/600. The patch classifies the compiled doctor state-integrity module by content signature, requires exactly the two reviewed permission gates, fails closed on shape drift, and is idempotent and a no-op when absent. Signed-off-by: Yimo Jiang <yimoj@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

yimoj force-pushed the fix/4538-raw-openclaw-doctor-perms branch from bbe4f63 to baf8fe7 Compare June 9, 2026 04:28

wscurran added the bug-fix PR fixes a bug or regression label Jun 9, 2026

yimoj added the v0.0.62 Release target label Jun 9, 2026

cv approved these changes Jun 9, 2026

View reviewed changes

cv merged commit bc4f726 into NVIDIA:main Jun 9, 2026
36 checks passed

miyoungc mentioned this pull request Jun 10, 2026

docs: refresh v0.0.62 release docs #5157

Merged

yimoj mentioned this pull request Jun 11, 2026

fix(sandbox): stop openclaw doctor tightening mutable config perms (#4538, #4859) #5208

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sandbox): restore mutable OpenClaw config perms after raw in-sandbox doctor --fix (#4538)#5017

fix(sandbox): restore mutable OpenClaw config perms after raw in-sandbox doctor --fix (#4538)#5017
cv merged 1 commit into
NVIDIA:mainfrom
yimoj:fix/4538-raw-openclaw-doctor-perms

yimoj commented Jun 9, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yimoj commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Reporter-workflow E2E evidence

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yimoj commented Jun 9, 2026 •

edited

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading