fix: fall back to non-root mode when gosu is blocked by no-new-privileges#846
Conversation
…eges OpenShell runs sandbox containers with --security-opt=no-new-privileges, which blocks gosu's setuid syscall. Detect non-root at startup and skip privilege separation, running gateway and agent as the current user. Full gateway isolation (separate gateway user) still works when the container runs as root (standalone Docker, Brev direct).
|
Caution Review failedPull request was closed or merged during review Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds an early non-root branch to Changes
Sequence Diagram(s)sequenceDiagram
participant Entrypoint as Entrypoint Script
participant Verifier as verify_config_integrity
participant Auth as write_auth_profile
participant Gateway as "$OPENCLAW" gateway
participant Watcher as start_auto_pair
participant Gosu as gosu sandbox
Entrypoint->>Entrypoint: check UID
alt UID != 0 (non-root)
Entrypoint->>Verifier: verify_config_integrity || true
Entrypoint->>Auth: write_auth_profile
Entrypoint->>Gateway: start gateway (background)
Entrypoint->>Watcher: run python3 watcher (direct)
Entrypoint->>Entrypoint: print URLs, wait for gateway PID, exit
else UID == 0 (root)
Entrypoint->>Verifier: verify_config_integrity (full)
Entrypoint->>Auth: write_auth_profile
Entrypoint->>Gosu: prefix watcher with gosu sandbox
Gosu->>Watcher: run python3 watcher (via gosu)
Entrypoint->>Gateway: start gateway (with gosu as needed)
Entrypoint->>Entrypoint: print URLs, wait for gateway PID, exit
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
scripts/nemoclaw-start.sh (1)
187-190: Gateway output not redirected to log file unlike root path.In the root path (line 237), gateway output is redirected to
/tmp/gateway.log. Here it goes to stdout, which may clutter the terminal or be lost. Consider redirecting for consistency and debuggability:♻️ Proposed consistency fix
# Start gateway in background, auto-pair, then wait - "$OPENCLAW" gateway run & + "$OPENCLAW" gateway run >>/tmp/gateway.log 2>&1 &Note: You may need to ensure
/tmp/gateway.logis writable by the current user (likely fine since/tmpis world-writable).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/nemoclaw-start.sh` around lines 187 - 190, The gateway process is launched with "$OPENCLAW" gateway run & and its PID stored in GATEWAY_PID but its stdout/stderr are not redirected (unlike the later invocation that writes to /tmp/gateway.log); update the launch of the gateway in the background to redirect stdout and stderr to /tmp/gateway.log (or the same log path used elsewhere) so logs are captured consistently and then set GATEWAY_PID=$! and echo as before; ensure the chosen log file is writable by the current user.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/nemoclaw-start.sh`:
- Line 180: The line currently silences verify_config_integrity failures with
"|| true", which hides tampered configs; change it to check the command's exit
status and emit a clear, visible warning when it fails (for example, use
verify_config_integrity and if it returns non-zero, print a loudly formatted
warning to stderr or syslog) while preserving the strict root-mode behavior
elsewhere; reference the verify_config_integrity call and mirror the strict
root-path enforcement branch by logging a prominent warning in non-root mode
instead of ignoring the failure.
- Line 191: The start_auto_pair function invokes "gosu sandbox python3 -" which
fails under no-new-privileges/non-root execution; modify start_auto_pair to
detect whether gosu is usable (e.g., test for root or check if /sbin/gosu exists
and can drop to sandbox) and when it isn’t, run the auto-pair subprocess without
gosu (invoke "python3 -" directly or use a non-root variant) so pairing still
works; update the function that currently calls gosu sandbox python3 - to
conditionally choose the gosu path or the direct python3 path and ensure any
environment/user setup done by gosu is replicated for the non-root fallback.
- Line 181: In the non-root fallback branch where write_auth_profile is invoked
(check the id -u != 0 path), explicitly set HOME before calling
write_auth_profile so os.path.expanduser resolves to /sandbox; add export
HOME=/sandbox immediately before the write_auth_profile call (and ensure any
subsequent commands in that branch run with that HOME set). This mirrors the
root-mode behavior (gosu sandbox) and prevents credentials from being written to
an unexpected user home.
---
Nitpick comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 187-190: The gateway process is launched with "$OPENCLAW" gateway
run & and its PID stored in GATEWAY_PID but its stdout/stderr are not redirected
(unlike the later invocation that writes to /tmp/gateway.log); update the launch
of the gateway in the background to redirect stdout and stderr to
/tmp/gateway.log (or the same log path used elsewhere) so logs are captured
consistently and then set GATEWAY_PID=$! and echo as before; ensure the chosen
log file is writable by the current user.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9ae433bd-4d30-45b3-8629-4eb8c2e9f8c5
📒 Files selected for processing (1)
scripts/nemoclaw-start.sh
There was a problem hiding this comment.
♻️ Duplicate comments (1)
scripts/nemoclaw-start.sh (1)
183-197:⚠️ Potential issue | 🟠 MajorPin
HOMEbefore starting the non-root flow.Line 60 writes the auth profile under
~/.openclaw/..., while Line 81 and Lines 39-47 still read/sandbox/.openclawexplicitly. In this branch,gateway runandstart_auto_pairinherit the same currentHOME, so a non-root launch withHOMEunset or pointing elsewhere can read and write a different state tree than the one you verified and printed URLs from. The root path stays aligned because every sandbox-side subprocess runs undergosu sandbox.🛠️ Proposed fix
if [ "$(id -u)" -ne 0 ]; then echo "[gateway] Running as non-root (uid=$(id -u)) — privilege separation disabled" + export HOME=/sandbox verify_config_integrity || true write_auth_profilePlease verify this path with a non-root launch where
HOMEis unset or not/sandbox.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/nemoclaw-start.sh` around lines 183 - 197, The non-root branch can read/write a different $HOME than the sandbox state you validated; pin HOME to the sandbox before any actions in that branch so write_auth_profile, gateway run, start_auto_pair and print_dashboard_urls all operate on the same directory. Update the non-root block (the branch that checks id -u and uses NEMOCLAW_CMD, OPENCLAW, write_auth_profile, verify_config_integrity, gateway run, start_auto_pair, print_dashboard_urls) to export HOME=/sandbox (or the canonical sandbox path used elsewhere) immediately after entering the branch and before calling verify_config_integrity/write_auth_profile or launching processes so subsequent commands inherit the correct HOME. Ensure this change is only applied in the non-root branch and does not alter the root/gosu sandbox behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 183-197: The non-root branch can read/write a different $HOME than
the sandbox state you validated; pin HOME to the sandbox before any actions in
that branch so write_auth_profile, gateway run, start_auto_pair and
print_dashboard_urls all operate on the same directory. Update the non-root
block (the branch that checks id -u and uses NEMOCLAW_CMD, OPENCLAW,
write_auth_profile, verify_config_integrity, gateway run, start_auto_pair,
print_dashboard_urls) to export HOME=/sandbox (or the canonical sandbox path
used elsewhere) immediately after entering the branch and before calling
verify_config_integrity/write_auth_profile or launching processes so subsequent
commands inherit the correct HOME. Ensure this change is only applied in the
non-root branch and does not alter the root/gosu sandbox behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bcf1d40b-bddd-489f-a869-b67b8f6c04d8
📒 Files selected for processing (1)
scripts/nemoclaw-start.sh
Convert run_prefix from string to bash array so it expands correctly when empty (no-op) and when set (gosu sandbox), without triggering SC2086 word-splitting warning.
There was a problem hiding this comment.
♻️ Duplicate comments (2)
scripts/nemoclaw-start.sh (2)
186-186:⚠️ Potential issue | 🟠 MajorMissing
HOMEexport beforewrite_auth_profile.The
write_auth_profilefunction usesos.path.expanduser('~/.openclaw/...'), which depends on theHOMEenvironment variable. In non-root mode with arbitrary UIDs (e.g., Kubernetes mapped users),HOMEmay be unset or incorrect, causing credentials to be written to the wrong location and authentication to fail.The root path handles this via
gosu sandboxwhich inherits HOME from/etc/passwd. The non-root path should explicitly set it:🔧 Proposed fix
echo "[gateway] Running as non-root (uid=$(id -u)) — privilege separation disabled" - verify_config_integrity || true + if ! verify_config_integrity; then + echo "[SECURITY WARNING] Config integrity check failed — proceeding anyway (non-root mode)" + fi + export HOME=/sandbox write_auth_profile🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/nemoclaw-start.sh` at line 186, The non-root startup path calls write_auth_profile but may run with HOME unset or incorrect (so os.path.expanduser('~/.openclaw/...') writes to the wrong place); before invoking write_auth_profile ensure HOME is explicitly exported to the target user's home directory (same approach as the root path uses via gosu sandbox) by determining the intended home (from /etc/passwd or an env value) and exporting HOME in the non-root branch prior to calling write_auth_profile so credentials are written to the correct location.
185-185:⚠️ Potential issue | 🟠 MajorConfig integrity failures are silently ignored.
The
|| truesuppresses verification failures without any warning. While operational flexibility may be intentional, this weakens security guarantees compared to the root path (line 205). At minimum, emit a visible warning when verification fails.🛡️ Proposed fix: warn on failure
- verify_config_integrity || true + if ! verify_config_integrity; then + echo "[SECURITY WARNING] Config integrity check failed — proceeding anyway (non-root mode)" + fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/nemoclaw-start.sh` at line 185, The script currently silences failures from verify_config_integrity by appending "|| true", which hides verification errors; change this to surface a warning instead of swallowing the failure: replace the "verify_config_integrity || true" usage so that when verify_config_integrity exits non‑zero you emit a visible warning to stderr (e.g., via echo >&2 or logger) but continue execution, or alternately remove the "|| true" if you want to fail fast; ensure the change targets the call to verify_config_integrity so verification failures are at least logged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@scripts/nemoclaw-start.sh`:
- Line 186: The non-root startup path calls write_auth_profile but may run with
HOME unset or incorrect (so os.path.expanduser('~/.openclaw/...') writes to the
wrong place); before invoking write_auth_profile ensure HOME is explicitly
exported to the target user's home directory (same approach as the root path
uses via gosu sandbox) by determining the intended home (from /etc/passwd or an
env value) and exporting HOME in the non-root branch prior to calling
write_auth_profile so credentials are written to the correct location.
- Line 185: The script currently silences failures from verify_config_integrity
by appending "|| true", which hides verification errors; change this to surface
a warning instead of swallowing the failure: replace the
"verify_config_integrity || true" usage so that when verify_config_integrity
exits non‑zero you emit a visible warning to stderr (e.g., via echo >&2 or
logger) but continue execution, or alternately remove the "|| true" if you want
to fail fast; ensure the change targets the call to verify_config_integrity so
verification failures are at least logged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8fa83b25-9aff-4295-b955-8f55a5a45e46
📒 Files selected for processing (1)
scripts/nemoclaw-start.sh
…oot mode - Export HOME=/sandbox before write_auth_profile so credentials land in the correct directory when running with arbitrary UIDs (e.g. Kubernetes) - Replace silent `|| true` with visible security warning when config integrity check fails in non-root mode
…eges (NVIDIA#846) * fix: fall back to non-root mode when gosu is blocked by no-new-privileges OpenShell runs sandbox containers with --security-opt=no-new-privileges, which blocks gosu's setuid syscall. Detect non-root at startup and skip privilege separation, running gateway and agent as the current user. Full gateway isolation (separate gateway user) still works when the container runs as root (standalone Docker, Brev direct). * fix: guard all gosu calls behind root check * chore: apply shfmt with -ci flag * fix: use array for run_prefix to satisfy shellcheck SC2086 Convert run_prefix from string to bash array so it expands correctly when empty (no-op) and when set (gosu sandbox), without triggering SC2086 word-splitting warning. * fix(security): set HOME and warn on config integrity failure in non-root mode - Export HOME=/sandbox before write_auth_profile so credentials land in the correct directory when running with arbitrary UIDs (e.g. Kubernetes) - Replace silent `|| true` with visible security warning when config integrity check fails in non-root mode
…eges (NVIDIA#846) * fix: fall back to non-root mode when gosu is blocked by no-new-privileges OpenShell runs sandbox containers with --security-opt=no-new-privileges, which blocks gosu's setuid syscall. Detect non-root at startup and skip privilege separation, running gateway and agent as the current user. Full gateway isolation (separate gateway user) still works when the container runs as root (standalone Docker, Brev direct). * fix: guard all gosu calls behind root check * chore: apply shfmt with -ci flag * fix: use array for run_prefix to satisfy shellcheck SC2086 Convert run_prefix from string to bash array so it expands correctly when empty (no-op) and when set (gosu sandbox), without triggering SC2086 word-splitting warning. * fix(security): set HOME and warn on config integrity failure in non-root mode - Export HOME=/sandbox before write_auth_profile so credentials land in the correct directory when running with arbitrary UIDs (e.g. Kubernetes) - Replace silent `|| true` with visible security warning when config integrity check fails in non-root mode
…eges (NVIDIA#846) * fix: fall back to non-root mode when gosu is blocked by no-new-privileges OpenShell runs sandbox containers with --security-opt=no-new-privileges, which blocks gosu's setuid syscall. Detect non-root at startup and skip privilege separation, running gateway and agent as the current user. Full gateway isolation (separate gateway user) still works when the container runs as root (standalone Docker, Brev direct). * fix: guard all gosu calls behind root check * chore: apply shfmt with -ci flag * fix: use array for run_prefix to satisfy shellcheck SC2086 Convert run_prefix from string to bash array so it expands correctly when empty (no-op) and when set (gosu sandbox), without triggering SC2086 word-splitting warning. * fix(security): set HOME and warn on config integrity failure in non-root mode - Export HOME=/sandbox before write_auth_profile so credentials land in the correct directory when running with arbitrary UIDs (e.g. Kubernetes) - Replace silent `|| true` with visible security warning when config integrity check fails in non-root mode
Summary
"failed switching to 'sandbox': operation not permitted"when OpenShell runs containers with--security-opt=no-new-privilegesRegression from #721.
Test plan
nemoclaw onboardsucceeds on Brev one-click deploymentnemoclaw onboardsucceeds on local Docker (root entrypoint path)Summary by CodeRabbit