fix: lock openclaw.json after startup to prevent agent config tampering#515
fix: lock openclaw.json after startup to prevent agent config tampering#515ericksoa wants to merge 7 commits into
Conversation
The sandboxed AI agent can use its write/edit/exec tools to modify openclaw.json, changing the auth token to a trivial value and CORS origins to ["*"]. This allows any website to connect to the user's bot via websocket. After startup, chown the config to root:root and chmod 444. The gateway (running as root) retains full access, while the agent (running as sandbox user) is blocked from writes. Closes #514
📝 WalkthroughWalkthroughThe changes implement a post-startup security lockdown mechanism for the NemoClaw container. A new entrypoint script orchestrates startup, monitors for gateway configuration generation, then transitions the config file to root ownership with read-only permissions before dropping to an interactive sandbox shell. The script also adds an early-exit condition in the config setup utility when an existing config file is non-writable. Changes
Sequence DiagramsequenceDiagram
participant Container as Container Start
participant Entrypoint as nemoclaw-entrypoint
participant Startup as nemoclaw-start
participant Gateway as Gateway Process
participant FS as File System
Container->>Entrypoint: Execute as root
Note over Entrypoint: Resolve sandbox UID/GID
Entrypoint->>Startup: Execute via setpriv (sandbox context)
Startup->>Startup: fix_openclaw_config
Startup->>FS: Create/validate config
Startup->>Gateway: Start gateway process
Note over Gateway: Gateway generates auth token
Entrypoint->>FS: Poll for gateway.auth.token (30s timeout)
FS-->>Entrypoint: Config with token detected
Entrypoint->>FS: chown root:root openclaw.json
Entrypoint->>FS: chmod 444 openclaw.json
Note over Entrypoint: Config now read-only, owned by root
Entrypoint->>Entrypoint: Drop to sandbox user
Entrypoint->>Entrypoint: exec /bin/bash (interactive shell)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
The gateway generates an auth token asynchronously after startup and overwrites openclaw.json, resetting permissions. Poll for the token (up to 30s) before applying the lock. Also lock the parent directory to root:root 555 so the agent cannot delete and recreate the file. Fixes the race condition found in Test 1.
|
Thanks for addressing the security concern by locking the openclaw.json file to prevent agent config tampering, this is an important fix for the security of our users' configurations. |
The entrypoint runs as USER sandbox, so direct chown root:root fails. Instead: install sudo, create a dedicated lock-gateway-config script, and add a sudoers entry that only permits sandbox to run that one command as root. No other escalation paths are opened.
The lock script now removes its own sudoers entry as its last step. After the config is locked there is no reason for the sandbox user to retain even this narrow sudo access.
Two regressions addressed: 1. Container restarts: fix_openclaw_config now detects a locked (root:root 444) config from a previous run and skips the write. lock_gateway_config also returns early if already locked. 2. Exec path 30s hang: lock_gateway_config accepts a wait_for_token parameter. The exec path passes "false" to skip the poll loop since no gateway is running to generate a token.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/lock-gateway-config.sh`:
- Around line 16-25: Reject symlinks and protect the parent directory before
changing ownership/permissions: add an explicit symlink check for CONFIG (e.g.,
test -L "$CONFIG" or [ -L "$CONFIG" ]) and exit with an error if it's a symlink,
then ensure the parent directory of CONFIG is made immutable/locked or moved to
a secure location (preventing replacement in /sandbox/.openclaw) before running
chown root:root "$CONFIG" and chmod 444 "$CONFIG"; update the script around the
CONFIG existence check and the chown/chmod lines and ensure the sudoers cleanup
(rm -f /etc/sudoers.d/lock-gateway-config) remains but occurs after the file and
parent directory are secured.
In `@scripts/nemoclaw-start.sh`:
- Around line 113-127: The script currently always calls sudo
/usr/local/bin/lock-gateway-config after the wait_for_token loop even if the
token wasn't found; modify the wait_for_token logic to track whether a token was
discovered (e.g., set a token_found flag when the python3 check succeeds) and
then only call sudo /usr/local/bin/lock-gateway-config if token_found is true;
if token_found is false after the loop, skip locking and emit a clear log
message (using echo or logger) mentioning config_path and that the token was not
found so the gateway can still write it.
- Around line 227-229: The script currently calls lock_gateway_config before
executing override commands (when NEMOCLAW_CMD is non-empty), which prevents
those commands from writing gateway config; remove the call to
lock_gateway_config from the branch that handles NEMOCLAW_CMD so the script
simply execs "${NEMOCLAW_CMD[@]}" and lets the override command (or interactive
shell) manage config/token creation; locate the block that checks if [
${`#NEMOCLAW_CMD`[@]} -gt 0 ] and delete the lock_gateway_config false line while
leaving the exec "${NEMOCLAW_CMD[@]}" intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 0a067647-b684-49e4-962e-1564564c0e67
📒 Files selected for processing (3)
Dockerfilescripts/lock-gateway-config.shscripts/nemoclaw-start.sh
| if [ ! -f "$CONFIG" ]; then | ||
| echo "[lock-gateway-config] config not found: $CONFIG" >&2 | ||
| exit 1 | ||
| fi | ||
|
|
||
| chown root:root "$CONFIG" | ||
| chmod 444 "$CONFIG" | ||
|
|
||
| # Revoke our own sudoers entry — this privilege is single-use. | ||
| rm -f /etc/sudoers.d/lock-gateway-config |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n scripts/lock-gateway-config.shRepository: NVIDIA/NemoClaw
Length of output: 1011
🏁 Script executed:
rg -i "symlink|openclaw|sandbox.*lock" --type sh -A 3 -B 3Repository: NVIDIA/NemoClaw
Length of output: 18130
Reject symlinks before locking the configuration file.
Lines 21-22 follow symlinks when applying chown/chmod, allowing a sandbox user to redirect root onto arbitrary files. Additionally, the writable parent directory /sandbox/.openclaw permits replacing the "locked" config after the script completes. The helper must reject symlinks (using [ -L "$CONFIG" ] or test -L) and should lock or relocate the parent directory before touching the file.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/lock-gateway-config.sh` around lines 16 - 25, Reject symlinks and
protect the parent directory before changing ownership/permissions: add an
explicit symlink check for CONFIG (e.g., test -L "$CONFIG" or [ -L "$CONFIG" ])
and exit with an error if it's a symlink, then ensure the parent directory of
CONFIG is made immutable/locked or moved to a secure location (preventing
replacement in /sandbox/.openclaw) before running chown root:root "$CONFIG" and
chmod 444 "$CONFIG"; update the script around the CONFIG existence check and the
chown/chmod lines and ensure the sudoers cleanup (rm -f
/etc/sudoers.d/lock-gateway-config) remains but occurs after the file and parent
directory are secured.
1. Reject symlinks in lock script — prevents agent from pointing openclaw.json at /etc/passwd and tricking root into chowning it. 2. Skip lock if token poll times out — if the gateway takes >30s to write the token, don't lock or the gateway can never write it. Logs a warning instead. 3. Remove lock from exec path — override commands may need to start their own gateway and write the config. The standard path (line 237) still locks after the gateway writes the token.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
scripts/nemoclaw-start.sh (1)
113-113: Minor: Use_for unused loop variable.The loop counter
iis declared but never referenced in the loop body. Using_as the variable name is idiomatic for unused loop variables and silences the Shellcheck warning (SC2034).♻️ Suggested fix
- for i in $(seq 1 30); do + for _ in $(seq 1 30); do🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/nemoclaw-start.sh` at line 113, The loop declares an unused variable `i` in the for loop starting with "for i in $(seq 1 30); do" — change the loop variable to `_` (i.e., "for _ in $(seq 1 30); do") so the variable name communicates it's intentionally unused and silences Shellcheck SC2034; update any references (there should be none) and run Shellcheck to confirm the warning is resolved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@scripts/nemoclaw-start.sh`:
- Line 113: The loop declares an unused variable `i` in the for loop starting
with "for i in $(seq 1 30); do" — change the loop variable to `_` (i.e., "for _
in $(seq 1 30); do") so the variable name communicates it's intentionally unused
and silences Shellcheck SC2034; update any references (there should be none) and
run Shellcheck to confirm the warning is resolved.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f2297b49-fa87-435d-9756-48f9870e60cf
📒 Files selected for processing (2)
scripts/lock-gateway-config.shscripts/nemoclaw-start.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- scripts/lock-gateway-config.sh
OpenShell sets no_new_privs on sandbox containers, which blocks sudo entirely. Replace the sudo-based approach with a root entrypoint wrapper that: 1. Runs nemoclaw-start as sandbox via setpriv (direct syscalls, not setuid bits — works under no_new_privs) 2. Polls for the gateway auth token as root 3. Locks openclaw.json directly (chown/chmod as root, no sudo) 4. Drops to sandbox via setpriv for the interactive shell Removes: sudo dependency, lock-gateway-config.sh, sudoers entry. Adds: nemoclaw-entrypoint.sh (root entrypoint wrapper).
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/nemoclaw-entrypoint.sh`:
- Around line 23-26: The startup currently can "fail open": when as_sandbox
/usr/local/bin/nemoclaw-start "$@" exec's the NEMOCLAW_CMD path the Phase 2
config lock is skipped, and on the returning path the lock timeout only warns
(lines ~46-52), allowing startup without the lock. Fix by ensuring the config
lock is acquired before any exec that never returns (move the Phase 2 lock
acquisition into the Phase 1 flow that runs before calling as_sandbox
/usr/local/bin/nemoclaw-start or require nemoclaw-start to block until the lock
is obtained), and change the timeout handling in the returning path (the code
that prints the warning) to treat timeout as fatal (exit non-zero) so the
container cannot continue without the lock.
- Around line 32-49: The current lock only hardens the CONFIG inode (variable
CONFIG / openclaw.json) but leaves its parent (/sandbox/.openclaw) writable by
sandbox, enabling TOCTOU or unlink attacks; update
scripts/nemoclaw-entrypoint.sh to ensure the parent directory is root-owned and
not writable by sandbox before locking CONFIG: detect the parent dir of $CONFIG,
verify its owner is root and permission bits disallow sandbox write (or chown
the parent to root and set restrictive perms), or fail startup and instruct to
place the config outside the sandbox-writable tree; keep the existing
token_found logic but perform this parent-owner/permission check and remediation
prior to chown root:root and chmod 444 on $CONFIG.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 9467874c-56d2-4c89-88ed-93a7cfaf3490
📒 Files selected for processing (3)
Dockerfilescripts/nemoclaw-entrypoint.shscripts/nemoclaw-start.sh
✅ Files skipped from review due to trivial changes (1)
- scripts/nemoclaw-start.sh
| # ── Phase 1: Setup as sandbox user ────────────────────────────────────── | ||
| # nemoclaw-start handles config, plugins, gateway, auto-pair, etc. | ||
| # In the NEMOCLAW_CMD path it exec's the command and never returns. | ||
| as_sandbox /usr/local/bin/nemoclaw-start "$@" |
There was a problem hiding this comment.
This still has fail-open startup paths.
Line 25 documents that the NEMOCLAW_CMD path never returns, so the Phase 2 lock is skipped there. And on the returning path, Lines 50-51 only warn and continue on timeout. That means the container can still reach a runnable state without ever enforcing the config lock.
Also applies to: 46-52
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/nemoclaw-entrypoint.sh` around lines 23 - 26, The startup currently
can "fail open": when as_sandbox /usr/local/bin/nemoclaw-start "$@" exec's the
NEMOCLAW_CMD path the Phase 2 config lock is skipped, and on the returning path
the lock timeout only warns (lines ~46-52), allowing startup without the lock.
Fix by ensuring the config lock is acquired before any exec that never returns
(move the Phase 2 lock acquisition into the Phase 1 flow that runs before
calling as_sandbox /usr/local/bin/nemoclaw-start or require nemoclaw-start to
block until the lock is obtained), and change the timeout handling in the
returning path (the code that prints the warning) to treat timeout as fatal
(exit non-zero) so the container cannot continue without the lock.
| if [ -f "$CONFIG" ] && [ ! -L "$CONFIG" ] && [ -w "$CONFIG" ]; then | ||
| token_found=false | ||
| for _ in $(seq 1 30); do | ||
| if python3 -c " | ||
| import json, sys | ||
| cfg = json.load(open('$CONFIG')) | ||
| sys.exit(0 if cfg.get('gateway',{}).get('auth',{}).get('token') else 1) | ||
| " 2>/dev/null; then | ||
| token_found=true | ||
| break | ||
| fi | ||
| sleep 1 | ||
| done | ||
|
|
||
| if [ "$token_found" = "true" ]; then | ||
| chown root:root "$CONFIG" | ||
| chmod 444 "$CONFIG" | ||
| echo "[security] gateway config locked: $CONFIG" |
There was a problem hiding this comment.
A sandbox-owned parent directory defeats this lock.
chown root:root + chmod 444 protects only the current inode. Because /sandbox/.openclaw is still owned by sandbox (Dockerfile Lines 22-24 and 55-56), the sandbox can unlink or swap openclaw.json after startup, and the -L / read / chown / chmod sequence is TOCTOU-prone for the same reason. The config needs to live outside the sandbox-writable tree, or the parent directory needs to become root-owned with only the required subdirectories handed back to sandbox.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/nemoclaw-entrypoint.sh` around lines 32 - 49, The current lock only
hardens the CONFIG inode (variable CONFIG / openclaw.json) but leaves its parent
(/sandbox/.openclaw) writable by sandbox, enabling TOCTOU or unlink attacks;
update scripts/nemoclaw-entrypoint.sh to ensure the parent directory is
root-owned and not writable by sandbox before locking CONFIG: detect the parent
dir of $CONFIG, verify its owner is root and permission bits disallow sandbox
write (or chown the parent to root and set restrictive perms), or fail startup
and instruct to place the config outside the sandbox-writable tree; keep the
existing token_found logic but perform this parent-owner/permission check and
remediation prior to chown root:root and chmod 444 on $CONFIG.
|
Closed — superseded by #555 which uses Landlock filesystem policy + DAC (root:root 444 + sticky bit) instead of sandbox-side privilege escalation. This PR documents the approaches we tried and why they didn't work (sudo blocked by no_new_privs, entrypoint replaced by OpenShell, etc.). |
…y with read path (NVIDIA#515)
Summary
Addresses the config tampering vulnerability reported in #514: an AI agent running inside the NemoClaw sandbox can modify
openclaw.jsonto weaken auth tokens and CORS settings, allowing cross-origin websocket connections from any website.Status: Blocked — needs OpenShell-side support
The lock must be applied with root ownership (
root:root 444) after gateway startup. All sandbox-side approaches have been ruled out:chownfrom startup scriptsandboxuser — cannot chown to rootsudowith narrow sudoers entryno_new_privson sandbox processes, blocking sudosetprivdropopenshell-sandboxThe config lock requires a privilege level that the sandbox process cannot reach by design. This is the correct security posture for OpenShell — the fix needs to come from outside the sandbox boundary.
Path forward
Coordinating with the OpenShell team (cc @drewnewberry) on the right mechanism. Options under consideration:
openclaw.jsonas a read-only volume at the orchestrator level, outside the sandbox filesystem policy.Defense-in-depth hardening (tool policy deny lists, Landlock enforcement, directory locking) tracked in #516.
Changes in this branch
The branch contains our iterative attempts at a sandbox-side fix. These serve as documentation of the constraints but are not mergeable as-is:
scripts/nemoclaw-entrypoint.sh— root entrypoint usingsetpriv(blocked by ENTRYPOINT override)scripts/nemoclaw-start.sh— config write skip for locked files, lock function (removed)Dockerfile—USER rootat end, entrypoint swap (blocked by ENTRYPOINT override)Test plan
See
test/test-plan-514-config-lock.md— 13 tests covering permissions, attack reproduction, symlink rejection,setprivunderno_new_privs, and sandbox re-creation. Tests pass in plain Docker but fail in OpenShell sandboxes due to the ENTRYPOINT override.