fix(sandbox): drop CAP_FOWNER/SETUID/SETGID via setpriv (#3280) by Dongni-Yang · Pull Request #3329 · NVIDIA/NemoClaw

Dongni-Yang · 2026-05-11T05:53:33Z

Summary

Follow-up to #3328, which dropped 5/8 of the caps named in issue #3280 and left CAP_FOWNER, CAP_SETUID, CAP_SETGID present because the entrypoint's gosu-based privilege separation prevented dropping them from the bounding set (gosu needs CAP_SETUID in permitted to do its setuid() syscall, but the bounding set can only be modified with CAP_SETPCAP, which only root holds — there's no point in the entrypoint where the drop can happen without breaking either gosu or the privilege transition).

This PR resolves the chicken-and-egg by replacing gosu with setpriv (util-linux, already present in the image). setpriv does reuid + bounding-set drop atomically inside a single process: setuid first (still holds CAP_SETUID), then strip the bounding set (still root long enough to hold CAP_SETPCAP), then exec — no exec between the setuid and the bounding-set drop.

After this PR, all 8 caps named in issue #3280 are absent from the sandbox-user process's CapBnd (verified live; see Test plan).

Stacking note

Stacked on PR #3328. The full PR diff shows 4 commits:

Commit	From PR
`fix(sandbox): tighten bounding-set caps and surface residuals`	#3328
`test(sandbox): inventory dangerous-cap set in bounding-set assertion`	#3328
`fix(sandbox): replace gosu with setpriv to drop all bounding-set caps`	this PR (`05954bf49`)
`test(sandbox): require all 8 issue-3280 caps absent after step-down`	this PR (`a5ea1712c`)

Reviewers should focus on the bottom two commits. Merge after #3328 lands; I'll rebase if #3328 changes during review.

Changes

`scripts/lib/sandbox-init.sh`

Add init_step_down_prefixes() and two file-scope arrays:

STEP_DOWN_PREFIX_SANDBOX — defaults to (gosu sandbox); upgraded by init_step_down_prefixes() to (setpriv --reuid=sandbox --regid=sandbox --init-groups --bounding-set=-setuid,-setgid,-fowner,-chown,-kill --) when setpriv + CAP_SETPCAP are available.
STEP_DOWN_PREFIX_GATEWAY — same shape, gateway user.

If setpriv is missing or CAP_SETPCAP is unavailable, the arrays stay at the gosu fallback (matching the previous behavior) and a [SECURITY WARNING] is logged so the residual cap retention surfaces in the entrypoint log (matches report_residual_capabilities() from #3328).

Implementation notes:

File-scope default is (gosu …), not () — hardens against a theoretical privesc regression: if init_step_down_prefixes() were ever skipped by a future refactor, an empty array would expand to nothing, and exec "${STEP_DOWN_PREFIX_SANDBOX[@]}" "${NEMOCLAW_CMD[@]}" would run the agent as root. The gosu default makes the failure mode safe.
--init-groups (not --clear-groups) — gateway is a member of the sandbox group via usermod -aG sandbox gateway in Dockerfile.base:99, required to write the chmod 660 /sandbox/.openclaw/openclaw.json (setgid'd config dir per OpenClaw UI "Enable Dreaming" doesn't work because of GatewayRequestError: EACCES: permission denied #2681). --clear-groups would strip that membership and break mutateConfigFile with EACCES. --init-groups matches gosu's setgroups + initgroups behaviour. (Addresses CodeRabbit comment.)
Plain array assignment (no declare -ga) — bash 3.2 on macOS rejects declare -g, which would break macOS CI when any test sources sandbox-init.sh. File-scope ARR=() is global by default in bash 3.2+; the function-internal reassignment without local targets the same global. (Addresses CodeRabbit comment.)
setpriv uses unprefixed cap names (per setpriv --list), unlike capsh which uses cap_*. The arrays follow the setpriv convention.
Per-assignment # shellcheck disable=SC2034 — the prefix arrays are consumed cross-file (by scripts/nemoclaw-start.sh and agents/hermes/start.sh), which shellcheck cannot follow from sandbox-init.sh alone.

`scripts/nemoclaw-start.sh` (4 sites) and `agents/hermes/start.sh` (3 sites)

Replace all gosu <user> invocations with "${STEP_DOWN_PREFIX_<USER>[@]}":

File	Line	Role
nemoclaw-start.sh	795	auto-pair (sandbox)
nemoclaw-start.sh	1610	write_auth_profile + harden_auth_profiles (sandbox)
nemoclaw-start.sh	1614	final exec to NEMOCLAW_CMD (sandbox)
nemoclaw-start.sh	1720	OpenClaw gateway (gateway)
hermes/start.sh	294	Discord facade (gateway)
hermes/start.sh	586	final exec to NEMOCLAW_CMD (sandbox)
hermes/start.sh	607	Hermes gateway (gateway)

Non-root fallback path in nemoclaw-start.sh (lines 1488+) and the no-new-privileges history comments at 138-139 / 1490-1493 are unchanged — that path doesn't use a privilege-step-down tool at all.

`test/e2e-gateway-isolation.sh`

Flip CAP_FOWNER / CAP_SETUID / CAP_SETGID in test 14 from allowed to must-drop. Rewrite the test to exercise the full two-stage drop end-to-end: source sandbox-init.sh, run drop_capabilities() (stage 1: capsh), then exec STEP_DOWN_PREFIX_SANDBOX (stage 2: setpriv), then capture CapBnd.

`test/sandbox-init.test.ts`

Two new unit tests for init_step_down_prefixes():

Falls back to gosu when setpriv/capsh are unavailable
Uses setpriv with the issue-3280 bounding-set drop when available

Update the existing start_discord_facade snapshot test to expect the new STEP_DOWN_PREFIX_GATEWAY invocation instead of the legacy gosu gateway sh -c.

`test/nemoclaw-start.test.ts`

Initialise STEP_DOWN_PREFIX_SANDBOX=(gosu sandbox) and STEP_DOWN_PREFIX_GATEWAY=(gosu gateway) in the test scaffolding for both runLaunchBlock() and runPreGatewaySetup(). The extracted launch / setup blocks reference these arrays, and the test scaffolding doesn't source sandbox-init.sh, so without an explicit initialisation set -u fails on the unbound array and the stubbed gosu() never receives the call (this caused the user=gateway CI failure on the prior push).

Test plan

Forward case (full production image, post-build)

Built nemoclaw-3329-test directly from this branch's Dockerfile (63 steps, no overlay). Ran the full two-stage drop end-to-end with --cap-add CAP_SYS_ADMIN --cap-add CAP_SYS_PTRACE (worst-case permissive runtime):

Stage 1 (root, post-capsh):   CapBnd=00000000000001e9
Stage 2 (sandbox, post-setpriv): uid=998(sandbox) gid=998(sandbox) groups=sandbox
                                 CapBnd=0000000000000100  → cap_setpcap only
Issue #3280 caps absent: cap_sys_admin / cap_sys_ptrace / cap_net_raw /
                         cap_net_bind_service / cap_dac_override /
                         cap_fowner / cap_setuid / cap_setgid  ✅ (8/8)

Gateway path (full production image, post-build)

Same image, but invoking STEP_DOWN_PREFIX_GATEWAY instead:

uid=999(gateway) gid=999(gateway) groups=gateway sandbox   ← --init-groups OK
CapBnd=0000000000000100  → cap_setpcap only
/sandbox/.openclaw/openclaw.json (mode 660, sandbox:sandbox) writable by gateway ✅

This is the exact case CodeRabbit flagged: gateway must retain sandbox group membership to write the chmod 660 setgid'd config (per #2681). Confirmed.

Negative case (live container)

Rebuilt with -setuid removed from the setpriv --bounding-set arg. CapBnd=0x180 (bit 7 set = CAP_SETUID). Test correctly fails with "CAP_SETUID still present in sandbox-user CapBnd (issue #3280)" — matches the regression signature this PR is designed to catch.

Full regression baseline

npm test on this branch vs upstream/main:

	Test files failed	Tests failed	Tests passed
`upstream/main` (baseline)	22	67	3418
this branch	22	67	3420
Δ	0	0	+2

Net: 2 new passing tests (the new init_step_down_prefixes cases), zero new failures. All 67 baseline failures pre-date this PR (stale dist/, unrelated TypeScript files).

Targeted

npx vitest run test/{sandbox-init,nemoclaw-start,seccomp-guard,service-env}.test.ts → 132/132 pass.
bash -n clean on all 4 touched shell files.
shfmt -d -i 2 -ci -bn clean.

Security review

CWE	Status	Notes
CWE-269 Improper Privilege Management	✅ no issue	Saved-UID=0 inert — CAP_SETUID gone from bounding set, can't be regained.
CWE-273 Improper Check for Dropped Privileges	⚠️ no regression	Trusts setpriv. `exec` semantics → fail-closed on setpriv failure. E2E test 14 verifies in CI.
CWE-274 Improper Handling of Insufficient Privileges	⚠️ documented trade-off	SETPCAP-missing fallback is fail-open-for-availability + fail-loud-for-posture (`[SECURITY WARNING]` to log).
CWE-367 TOCTOU	✅ no issue	Check and use happen in same root process; CAP_SETPCAP preserved between them.
CWE-426 Untrusted Search Path	✅ no issue	PATH locked at entrypoint top; init runs as root pre-stepdown.
CWE-732 Incorrect Permission Assignment	✅ no issue	`--init-groups` preserves gateway's sandbox-group membership (chmod 660 config write still works).
CWE-77/78 Command Injection	✅ no issue	All setpriv argv literals; array expansion does not word-split.
CWE-200/209/532 Information Exposure	✅ no issue	Warnings contain only public cap names; log is root:600 (sandbox user can't read).
CWE-693 Protection Mechanism Failure	✅ no issue	setpriv 2.38.1, no known CVEs affecting bounding-set ops.

Net assessment: no new CWEs introduced. Sandbox-user CapBnd: 6 entries → 1 entry. Attack surface for setuid-root-binary cap regain: reduced to empty.

Risks and notes for review

setpriv vs gosu setuid semantics. Both use the setuid syscall. setpriv --reuid sets ruid+euid but not saved UID (gosu uses setresuid which sets all three). Saved-UID=0 is inert here because using it requires CAP_SETUID in permitted, which is empty after the bounding-set drop on exec.
No-new-privs interaction. setpriv performs the setuid syscall as root, which is unrestricted regardless of no_new_privs. Different failure mode from gosu (documented at nemoclaw-start.sh:138-139 and :1490-1493). Worth verifying on Spark/arm64 in CI.
Defense-in-depth, not user-facing behaviour change. The agent shell continues to run as the sandbox user with the same supplementary groups; the only observable difference is cat /proc/self/status showing an empty CapBnd (apart from CAP_SETPCAP itself, which is harmless in an unprivileged process).
Fallback warning is a log line, not an exit. If a runtime lacks setpriv or CAP_SETPCAP, the sandbox still boots (under the legacy gosu path) but emits [SECURITY WARNING] so the residual surfaces in docker logs.

Review feedback addressed

CodeRabbit: Bash 3.2 incompat (declare -ga) → replaced with plain array assignment.
CodeRabbit: --clear-groups removes gateway from sandbox group → switched to --init-groups; verified live.
Self-review: unset-array privesc regression risk → file-scope default initialised to (gosu …) instead of (); init_step_down_prefixes() only upgrades.
CI: shellcheck SC2034 → per-assignment # shellcheck disable=SC2034 with cross-file-consumption note.
CI: test/nemoclaw-start.test.ts:1201 user=gateway → scaffolding initialises STEP_DOWN_PREFIX_* in fallback form so the stubbed gosu still receives the call.

Closes #3280.

Signed-off-by: Dongni Yang dongniy@nvidia.com

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

Security Improvements
- Enhanced sandbox isolation through improved removal of dangerous capabilities from restricted environments
- Updated privilege separation mechanism with better fallback handling and more flexible configuration
- Improved capability-dropping logic for comprehensive restriction of high-risk permissions
Tests
- Updated integration tests to verify capability restrictions work as expected

…#3280) Append cap_sys_admin and cap_sys_ptrace to the capsh --drop list so they no longer remain in the bounding set after the entrypoint re-execs. The historical drop list already covered cap_net_raw / cap_dac_override / cap_net_bind_service, but T6002104 still observed them present — the root cause is the CAP_SETPCAP-missing fallback silently skipping the entire drop and inheriting the runtime defaults. Replace the misleading "runtime already restricts capabilities" message on that fallback path with report_residual_capabilities(), which reads CapBnd from /proc/self/status and names which of the 5 must-drop caps remain. Uses bash 64-bit arithmetic so it does not depend on gawk strtonum. Also enumerate the load-bearing kept caps (cap_chown/cap_fowner for post-drop chown, cap_setuid/cap_setgid for gosu, cap_kill for sandbox→ gateway signaling) inline so a future contributor can audit why each one stays. Signed-off-by: Dongni Yang <dongniy@nvidia.com>

…VIDIA#3280) Rewrite e2e-gateway-isolation.sh test 14 to inventory every cap named in issue NVIDIA#3280 (CAP_SYS_ADMIN, CAP_SYS_PTRACE, CAP_NET_RAW, CAP_NET_BIND_SERVICE, CAP_DAC_OVERRIDE, CAP_FOWNER, CAP_SETUID, CAP_SETGID) against CapBnd from /proc/self/status. Each is classified as must-drop or allowed-load-bearing; any must-drop cap still present fails the test by name. The previous assertion only decoded bit 13 (CAP_NET_RAW) and would have passed unchanged for an incomplete drop list or a silently skipped drop step. Run the test container with `--cap-add CAP_SYS_ADMIN --cap-add CAP_SYS_PTRACE` so the bounding set entering capsh matches the permissive OpenShell runtime that triggered T6002104. Without this, docker's default bounding set already excludes those caps and the test would have been a no-op for the regression we care about. Validated locally against a derived nemoclaw-isolation-test image: - drop list including cap_sys_admin,cap_sys_ptrace → PASS, CapBnd=0x1e9 (load-bearing caps only). - drop list with cap_sys_admin omitted → FAIL with "CAP_SYS_ADMIN still present in CapBnd after capsh drop", CapBnd=0x2001e9 (bit 21 set), exactly the T6002104 signature. Signed-off-by: Dongni Yang <dongniy@nvidia.com>

coderabbitai · 2026-05-11T05:53:46Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0ada17f2-810e-49ed-a0eb-61aedc2bdd8c

📥 Commits

Reviewing files that changed from the base of the PR and between a5ea171 and ca30710.

📒 Files selected for processing (4)

agents/hermes/start.sh
scripts/nemoclaw-start.sh
test/nemoclaw-start.test.ts
test/sandbox-init.test.ts

🚧 Files skipped from review as they are similar to previous changes (3)

test/sandbox-init.test.ts
scripts/nemoclaw-start.sh
test/nemoclaw-start.test.ts

📝 Walkthrough

Walkthrough

Implements setpriv-based privilege step-down prefixes, updates drop_capabilities to use capsh with residual-cap diagnostics, replaces gosu invocations across Hermes and Nemoclaw with STEP_DOWN_PREFIX_* prefixes, and adds tests asserting eight dangerous capabilities are removed from CapBnd.

Changes

Privilege Step-Down and Capability Bounding Set Remediation

Layer / File(s)	Summary
Capability Dropping Refinement `scripts/lib/sandbox-init.sh`	`drop_capabilities()` now re-execs via `capsh --drop=...` with an explicit list; added `report_residual_capabilities()` to read `/proc/self/status` CapBnd and log remaining dangerous bits.
Privilege Step-Down Prefix Infrastructure `scripts/lib/sandbox-init.sh`	Added `init_step_down_prefixes()` and globals `STEP_DOWN_PREFIX_SANDBOX` / `STEP_DOWN_PREFIX_GATEWAY`; prefers `setpriv --reuid/--regid --bounding-set` when available, falls back to `gosu` otherwise; prefixes initialized at load time.
Hermes Entrypoint Integration `agents/hermes/start.sh`	Replaced `gosu` calls with `${STEP_DOWN_PREFIX_GATEWAY[@]}` / `${STEP_DOWN_PREFIX_SANDBOX[@]}` for Discord facade, root-path exec, and gateway launch while keeping existing wrappers and env sanitization.
Nemoclaw Entrypoint Integration `scripts/nemoclaw-start.sh`	Switched auto-pair, auth/profile execution, user command exec, and gateway startup to use `STEP_DOWN_PREFIX_SANDBOX` / `STEP_DOWN_PREFIX_GATEWAY`.
Test Validation `test/e2e-gateway-isolation.sh`, `test/sandbox-init.test.ts`, `test/nemoclaw-start.test.ts`	E2E Test 14 now sources `sandbox-init.sh`, runs the two-stage drop, captures `CapBnd`, and asserts absence of eight dangerous capability bits. Unit tests validate `init_step_down_prefixes` fallback and setpriv paths; test scaffolding initializes gosu fallbacks; Hermes test updated for gateway prefix usage.

🎯 4 (Complex) | ⏱️ ~45 minutes

NVIDIA/NemoClaw#3328: Modifies sandbox-init.sh capability-dropping flow and related tests; directly related to bounding-set handling and diagnostics.

Suggested reviewers:

ericksoa

"I hop the caps away tonight,
setpriv trims the bounding light,
gosu's fallback tucked in bed,
safer sandbox — softly said,
nibble bugs until they're right." 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: replacing gosu with setpriv to drop three specific Linux capabilities (CAP_FOWNER, CAP_SETUID, CAP_SETGID) and referencing the issue number.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/lib/sandbox-init.sh`:
- Around line 302-303: Replace the Bash 4+ specific declarations by assigning
empty arrays directly: remove the two uses of "declare -ga" for
STEP_DOWN_PREFIX_SANDBOX and STEP_DOWN_PREFIX_GATEWAY and instead initialize
those symbols with plain array assignments compatible with Bash 3.2 (i.e., set
each variable to an empty array using the array assignment syntax), ensuring the
script remains sourceable on macOS CI; locate the lines referencing
STEP_DOWN_PREFIX_SANDBOX and STEP_DOWN_PREFIX_GATEWAY and change their
initialization accordingly.
- Around line 313-318: When stepping down to the gateway user in the
STEP_DOWN_PREFIX_GATEWAY array, stop clearing supplementary groups so the
gateway process keeps the sandbox group needed to write
/sandbox/.openclaw/openclaw.json; update the setpriv invocation in
STEP_DOWN_PREFIX_GATEWAY (the one that currently includes --clear-groups) to
either remove --clear-groups or replace it with an explicit group list that
includes sandbox (e.g., use --groups=sandbox) so gateway retains group write
access required by mutateConfigFile.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9643b7c2-2a57-497f-a75b-33df3e7e7b5f

📥 Commits

Reviewing files that changed from the base of the PR and between 118541a and 3f5e404.

📒 Files selected for processing (5)

agents/hermes/start.sh
scripts/lib/sandbox-init.sh
scripts/nemoclaw-start.sh
test/e2e-gateway-isolation.sh
test/sandbox-init.test.ts

…NVIDIA#3280) Follow-up to NVIDIA#3328, which dropped 5/8 of the caps named in issue NVIDIA#3280 but left CAP_FOWNER, CAP_SETUID, and CAP_SETGID present in the sandbox- user process's bounding set. Those three were blocked by gosu: gosu needs CAP_SETUID in permitted to make its setuid() syscall, but the bounding set can only be modified with CAP_SETPCAP (root-only). So dropping CAP_SETUID before gosu would break the privilege transition, and dropping it after would be too late because we are no longer root. setpriv from util-linux solves this by performing reuid + bounding-set drop atomically inside a single process: setuid first (still holds CAP_SETUID), then strip the bounding set (still root long enough to hold CAP_SETPCAP), then exec the target. Add init_step_down_prefixes() to scripts/lib/sandbox-init.sh which populates two bash arrays at source time: STEP_DOWN_PREFIX_SANDBOX — step down to sandbox user STEP_DOWN_PREFIX_GATEWAY — step down to gateway user Each array expands to a setpriv invocation that drops cap_setuid / cap_setgid / cap_fowner / cap_chown / cap_kill from the bounding set during the reuid. If setpriv or CAP_SETPCAP is unavailable, the arrays stay at the gosu fallback and a warning is logged so the residual cap retention surfaces in the entrypoint log (matches the design of report_residual_capabilities from NVIDIA#3328). Notes: * Arrays default to (gosu sandbox)/(gosu gateway) at file scope (NOT empty). This prevents a privesc regression if init_step_down_prefixes is ever skipped: an unset/empty array would expand to nothing and `exec "${ARR[@]}" "${NEMOCLAW_CMD[@]}"` would run the agent as root. init_step_down_prefixes() only upgrades to setpriv when available. * setpriv uses unprefixed cap names (per `setpriv --list`), unlike capsh which uses cap_*. The arrays use the setpriv format. * --init-groups (NOT --clear-groups): the gateway user is a member of the sandbox group via `usermod -aG sandbox gateway` in Dockerfile.base, which is required to write the chmod 660 /sandbox/.openclaw/openclaw.json (setgid'd config dir, see NVIDIA#2681). --clear-groups would strip that membership and break mutateConfigFile with EACCES. --init-groups matches gosu's setgroups+initgroups behaviour and restores exactly the groups defined in /etc/group for the target user. * Plain array assignment (not `declare -ga`) at file scope: bash 3.2 on macOS rejects `declare -g`, and bash 3.2+ treats file-scope assignment as global by default. Inside init_step_down_prefixes() the reassignment is unscoped, so it targets the same globals in both bash 3.2 and 4+. * Per-assignment shellcheck SC2034 disables: the prefix arrays are consumed cross-file (by scripts/nemoclaw-start.sh and agents/hermes/start.sh), which shellcheck cannot follow. Replace the seven gosu call sites across both entrypoints: scripts/nemoclaw-start.sh: line 795 — auto-pair (sandbox) line 1610 — write_auth_profile + harden_auth_profiles (sandbox) line 1614 — final exec to NEMOCLAW_CMD (sandbox) line 1720 — OpenClaw gateway (gateway) agents/hermes/start.sh: line 294 — Discord facade (gateway) line 586 — final exec to NEMOCLAW_CMD (sandbox) line 607 — Hermes gateway (gateway) The non-root fallback path in nemoclaw-start.sh (lines 1488+) and the no-new-privileges history comments at lines 138-139 / 1490-1493 are unchanged — that path does not use a privilege-step-down tool at all. Validated live: with --cap-add CAP_SYS_ADMIN --cap-add CAP_SYS_PTRACE (simulating permissive OpenShell runtime), source sandbox-init.sh and chain drop_capabilities + STEP_DOWN_PREFIX_SANDBOX → final sandbox- user CapBnd=0x100 (only CAP_SETPCAP remains; all 8 issue-NVIDIA#3280 caps absent). Negative path: removing -setuid from the setpriv drop list correctly leaves CAP_SETUID present (bit 7), matching the regression signature the test in the follow-up commit catches. Signed-off-by: Dongni Yang <dongniy@nvidia.com>

…VIDIA#3280) Flip CAP_FOWNER / CAP_SETUID / CAP_SETGID in e2e-gateway-isolation.sh test 14 from "allowed" (as documented in NVIDIA#3328) to "must-drop". The preceding commit replaces gosu with setpriv so the three load-bearing caps now drop atomically with reuid; the sandbox-user process should have ALL eight caps named in issue NVIDIA#3280 absent from CapBnd. Rewrite test 14 to exercise the full two-stage drop end-to-end: source sandbox-init.sh, run drop_capabilities() (stage 1: capsh strips the entrypoint-wide --drop list), then exec STEP_DOWN_PREFIX_SANDBOX (stage 2: setpriv strips the load-bearing caps during reuid), then capture CapBnd of the resulting sandbox-user process. The test container is started with --cap-add CAP_SYS_ADMIN --cap-add CAP_SYS_PTRACE so the bounding set entering the entrypoint resembles the permissive OpenShell runtime that triggered T6002104 — otherwise docker's default bounding set already excludes those caps and the test would be a no-op for the bug condition. Use grep ^CapBnd: + awk for extraction rather than a triple-quoted awk script: the awk script's $2 would otherwise be expanded by bash on the way through capsh re-exec, producing /^CapBnd:/{print } which prints the whole line and breaks downstream parsing. Add two unit tests in test/sandbox-init.test.ts for the new init_step_down_prefixes() helper: - falls back to gosu when setpriv/capsh are unavailable - uses setpriv with the issue-3280 bounding-set drop when available Update the existing snapshot-style test for Hermes start.sh's start_discord_facade body to assert on the new STEP_DOWN_PREFIX_GATEWAY invocation instead of the legacy gosu gateway sh -c. Update nemoclaw-start.test.ts test scaffolding to initialise STEP_DOWN_PREFIX_SANDBOX and STEP_DOWN_PREFIX_GATEWAY in the fallback form (gosu sandbox / gosu gateway) inside both runLaunchBlock() and runPreGatewaySetup(). The extracted launch and setup blocks reference these arrays, and the test scaffolding doesn't source sandbox-init.sh, so without an explicit initialisation `set -u` fails on the unbound array and the stubbed gosu() never receives the call. Validated locally with docker build + docker run --cap-add against a test image overlaid with the new sandbox-init.sh: - Forward: CapBnd=0x100 (only CAP_SETPCAP), test PASS. - Regression (omit -setuid from setpriv drop): CapBnd=0x180, test correctly fails with "CAP_SETUID still present" by name. Full npm test on this branch: same 67 failures as upstream/main baseline (all pre-existing on main), +2 new passing tests for init_step_down_prefixes — net zero regressions. Signed-off-by: Dongni Yang <dongniy@nvidia.com>

Signed-off-by: Aaron Erickson <aerickson@nvidia.com> # Conflicts: # scripts/lib/sandbox-init.sh # test/e2e-gateway-isolation.sh

ericksoa

Reviewed the setpriv step-down follow-up and the post-#3328 stack repair. The repaired head is mergeable against current main, CodeRabbit is green with only resolved/outdated threads, PR checks are green, and local validation passed bash syntax, build:cli, diff check, and the focused sandbox-init/nemoclaw-start tests. The full stack nightly also passed on the pre-repair stack head a5ea171.

## Summary Refreshes the release-prep docs for v0.0.39 based on changes merged since the Friday 4pm doc refresh. Updates the source docs, bumps the docs version metadata, and regenerates the NemoClaw user skills from the refreshed docs. ## Changes - #3314 -> `docs/get-started/prerequisites.md`, `docs/get-started/quickstart.md`, `docs/reference/troubleshooting.md`: Documents installer Docker setup, Docker group activation, and retry guidance. - #3317 -> `docs/get-started/quickstart.md`, `docs/reference/commands.md`: Documents the DGX Spark and DGX Station express install prompt and `NEMOCLAW_NO_EXPRESS`. - #3328 and #3329 -> `docs/security/best-practices.md`, `docs/deployment/sandbox-hardening.md`: Updates sandbox capability hardening docs for the stricter bounding-set and `setpriv` step-down behavior. - #3330, #3335, and #3346 -> `docs/inference/use-local-inference.md`: Documents Windows-host Ollama relaunch behavior, NIM key passthrough, early health-fail diagnostics, and mixed-GPU preflight detail. - #2406, #2883, #3001, #3244, #3267, #3318, #3320, and #3354 -> `docs/about/release-notes.md`: Adds the v0.0.39 release-prep section while keeping the v0.0.38 release notes intact. - Advances the release-prep docs metadata from v0.0.38 to v0.0.39. - Regenerates `.agents/skills/nemoclaw-user-*` from the updated source docs. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>  ## Summary by CodeRabbit ## Release Notes v0.0.39 * **New Features** * Host alias management commands for easier configuration * Sandbox GPU control options during onboarding * Update command with check and confirmation modes * **Documentation** * Enhanced Linux installer guidance with Docker and group membership handling * Expanded troubleshooting for permission and connectivity issues * Improved capability-dropping security documentation * Updated inference model switching commands * Brev environment-specific troubleshooting * **Improvements** * DGX Spark/Station express install flow * Windows Ollama relay and health-check enhancements * NVIDIA NIM preflight GPU reporting [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3375)

Dongni-Yang added 2 commits May 11, 2026 11:02

Dongni-Yang mentioned this pull request May 11, 2026

fix(sandbox): tighten bounding-set caps #3328

Merged

7 tasks

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

Comment thread scripts/lib/sandbox-init.sh Outdated

Comment thread scripts/lib/sandbox-init.sh Outdated

Dongni-Yang force-pushed the fix/sandbox-setpriv-3280-followup branch from 3f5e404 to 06c62be Compare May 11, 2026 06:27

Dongni-Yang changed the title ~~fix(sandbox): replace gosu with setpriv to fully close #3280 bounding-set gap~~ fix(sandbox): drop CAP_FOWNER/SETUID/SETGID via setpriv (#3280) May 11, 2026

Dongni-Yang added 2 commits May 11, 2026 14:51

Dongni-Yang force-pushed the fix/sandbox-setpriv-3280-followup branch from 06c62be to a5ea171 Compare May 11, 2026 06:51

Dongni-Yang added v0.0.39 labels May 11, 2026

Merge remote-tracking branch 'origin/main' into pr-3329-repair

ca30710

Signed-off-by: Aaron Erickson <aerickson@nvidia.com> # Conflicts: # scripts/lib/sandbox-init.sh # test/e2e-gateway-isolation.sh

ericksoa approved these changes May 11, 2026

View reviewed changes

ericksoa merged commit 47238e8 into NVIDIA:main May 11, 2026
14 checks passed

miyoungc mentioned this pull request May 12, 2026

docs: refresh 0.0.39 release prep #3375

Merged

12 tasks

Dongni-Yang mentioned this pull request May 18, 2026

[Nemoclaw] [All Platforms] Sandbox allows dangerous capabilities in bounding set despite empty effective set #3280

Open

wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(sandbox): drop CAP_FOWNER/SETUID/SETGID via setpriv (#3280)#3329

fix(sandbox): drop CAP_FOWNER/SETUID/SETGID via setpriv (#3280)#3329
ericksoa merged 5 commits into
NVIDIA:mainfrom
Dongni-Yang:fix/sandbox-setpriv-3280-followup

Dongni-Yang commented May 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 11, 2026 •

edited

Loading

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

ericksoa left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Dongni-Yang commented May 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Stacking note

Changes

scripts/lib/sandbox-init.sh

scripts/nemoclaw-start.sh (4 sites) and agents/hermes/start.sh (3 sites)

test/e2e-gateway-isolation.sh

test/sandbox-init.test.ts

test/nemoclaw-start.test.ts

Test plan

Forward case (full production image, post-build)

Gateway path (full production image, post-build)

Negative case (live container)

Full regression baseline

Targeted

Security review

Risks and notes for review

Review feedback addressed

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ericksoa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Dongni-Yang commented May 11, 2026 •

edited by coderabbitai Bot

Loading

`scripts/lib/sandbox-init.sh`

`scripts/nemoclaw-start.sh` (4 sites) and `agents/hermes/start.sh` (3 sites)

`test/e2e-gateway-isolation.sh`

`test/sandbox-init.test.ts`

`test/nemoclaw-start.test.ts`

coderabbitai Bot commented May 11, 2026 •

edited

Loading