fix: lock openclaw.json after startup to prevent agent config tampering by ericksoa · Pull Request #515 · NVIDIA/NemoClaw

ericksoa · 2026-03-20T21:22:35Z

Summary

Addresses the config tampering vulnerability reported in #514: an AI agent running inside the NemoClaw sandbox can modify openclaw.json to weaken auth tokens and CORS settings, allowing cross-origin websocket connections from any website.

Status: Blocked — needs OpenShell-side support

The lock must be applied with root ownership (root:root 444) after gateway startup. All sandbox-side approaches have been ruled out:

Approach	Blocker
Direct `chown` from startup script	Script runs as `sandbox` user — cannot chown to root
`sudo` with narrow sudoers entry	OpenShell sets `no_new_privs` on sandbox processes, blocking sudo
Root entrypoint + `setpriv` drop	OpenShell replaces the container ENTRYPOINT with `openshell-sandbox`

The config lock requires a privilege level that the sandbox process cannot reach by design. This is the correct security posture for OpenShell — the fix needs to come from outside the sandbox boundary.

Path forward

Coordinating with the OpenShell team (cc @drewnewberry) on the right mechanism. Options under consideration:

Pre-sandbox init hook — OpenShell runs a user-defined script as root after container setup but before dropping to the sandbox user. Our lock script runs in this window.
Read-only bind mount — OpenShell mounts openclaw.json as a read-only volume at the orchestrator level, outside the sandbox filesystem policy.
Provider-managed config — Move security-sensitive settings (auth token, CORS origins) into OpenShell's provider model, outside the sandbox entirely. The gateway reads them from the provider rather than from a flat file in the writable zone.

Defense-in-depth hardening (tool policy deny lists, Landlock enforcement, directory locking) tracked in #516.

Changes in this branch

The branch contains our iterative attempts at a sandbox-side fix. These serve as documentation of the constraints but are not mergeable as-is:

scripts/nemoclaw-entrypoint.sh — root entrypoint using setpriv (blocked by ENTRYPOINT override)
scripts/nemoclaw-start.sh — config write skip for locked files, lock function (removed)
Dockerfile — USER root at end, entrypoint swap (blocked by ENTRYPOINT override)

Test plan

See test/test-plan-514-config-lock.md — 13 tests covering permissions, attack reproduction, symlink rejection, setpriv under no_new_privs, and sandbox re-creation. Tests pass in plain Docker but fail in OpenShell sandboxes due to the ENTRYPOINT override.

The sandboxed AI agent can use its write/edit/exec tools to modify openclaw.json, changing the auth token to a trivial value and CORS origins to ["*"]. This allows any website to connect to the user's bot via websocket. After startup, chown the config to root:root and chmod 444. The gateway (running as root) retains full access, while the agent (running as sandbox user) is blocked from writes. Closes #514

coderabbitai · 2026-03-20T21:22:41Z

📝 Walkthrough

Walkthrough

The changes implement a post-startup security lockdown mechanism for the NemoClaw container. A new entrypoint script orchestrates startup, monitors for gateway configuration generation, then transitions the config file to root ownership with read-only permissions before dropping to an interactive sandbox shell. The script also adds an early-exit condition in the config setup utility when an existing config file is non-writable.

Changes

Cohort / File(s)	Summary
Container Entrypoint Configuration `Dockerfile`	Modified entrypoint from direct bash invocation to a new orchestration script. Changed execution context from `sandbox` user to `root` at entry, where the entrypoint performs privilege-managed startup, post-gateway security hardening, and sandbox user context drop.
Config Setup Logic `scripts/nemoclaw-start.sh`	Added early termination in `fix_openclaw_config` when an existing config file lacks write permissions, preventing unconditional overwrite attempts.
Entrypoint Orchestration `scripts/nemoclaw-entrypoint.sh`	New executable script that runs startup as the sandbox user via `setpriv`, monitors gateway config generation with up to 30-second token polling, then locks the config file to read-only root ownership before dropping into an interactive shell as sandbox.

Sequence Diagram

sequenceDiagram
    participant Container as Container Start
    participant Entrypoint as nemoclaw-entrypoint
    participant Startup as nemoclaw-start
    participant Gateway as Gateway Process
    participant FS as File System
    
    Container->>Entrypoint: Execute as root
    Note over Entrypoint: Resolve sandbox UID/GID
    Entrypoint->>Startup: Execute via setpriv (sandbox context)
    Startup->>Startup: fix_openclaw_config
    Startup->>FS: Create/validate config
    Startup->>Gateway: Start gateway process
    Note over Gateway: Gateway generates auth token
    Entrypoint->>FS: Poll for gateway.auth.token (30s timeout)
    FS-->>Entrypoint: Config with token detected
    Entrypoint->>FS: chown root:root openclaw.json
    Entrypoint->>FS: chmod 444 openclaw.json
    Note over Entrypoint: Config now read-only, owned by root
    Entrypoint->>Entrypoint: Drop to sandbox user
    Entrypoint->>Entrypoint: exec /bin/bash (interactive shell)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A lock upon the gateway's key,
Secured from paws that shouldn't be,
Root takes guard while startup plays,
Then sandbox frolics through the maze—
Safe and snug, the config stays! 🔒

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly describes the main change: locking openclaw.json after startup to prevent tampering, which matches the primary objective in issue `#514`.
Linked Issues check	✅ Passed	The PR implements the minimum proposed fix from issue `#514` by locking openclaw.json to root:root with mode 444 after startup, preventing the sandboxed agent from modifying auth tokens and CORS controls.
Out of Scope Changes check	✅ Passed	All changes are directly related to implementing the security fix: entrypoint wrapper for privilege escalation, config file locking mechanism, and guard logic to handle existing permissions.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch worktree-triage-config-security-vuln

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

The gateway generates an auth token asynchronously after startup and overwrites openclaw.json, resetting permissions. Poll for the token (up to 30s) before applying the lock. Also lock the parent directory to root:root 555 so the agent cannot delete and recreate the file. Fixes the race condition found in Test 1.

wscurran · 2026-03-20T21:46:42Z

Thanks for addressing the security concern by locking the openclaw.json file to prevent agent config tampering, this is an important fix for the security of our users' configurations.

The entrypoint runs as USER sandbox, so direct chown root:root fails. Instead: install sudo, create a dedicated lock-gateway-config script, and add a sudoers entry that only permits sandbox to run that one command as root. No other escalation paths are opened.

The lock script now removes its own sudoers entry as its last step. After the config is locked there is no reason for the sandbox user to retain even this narrow sudo access.

Two regressions addressed: 1. Container restarts: fix_openclaw_config now detects a locked (root:root 444) config from a previous run and skips the write. lock_gateway_config also returns early if already locked. 2. Exec path 30s hang: lock_gateway_config accepts a wait_for_token parameter. The exec path passes "false" to skip the poll loop since no gateway is running to generate a token.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/lock-gateway-config.sh`:
- Around line 16-25: Reject symlinks and protect the parent directory before
changing ownership/permissions: add an explicit symlink check for CONFIG (e.g.,
test -L "$CONFIG" or [ -L "$CONFIG" ]) and exit with an error if it's a symlink,
then ensure the parent directory of CONFIG is made immutable/locked or moved to
a secure location (preventing replacement in /sandbox/.openclaw) before running
chown root:root "$CONFIG" and chmod 444 "$CONFIG"; update the script around the
CONFIG existence check and the chown/chmod lines and ensure the sudoers cleanup
(rm -f /etc/sudoers.d/lock-gateway-config) remains but occurs after the file and
parent directory are secured.

In `@scripts/nemoclaw-start.sh`:
- Around line 113-127: The script currently always calls sudo
/usr/local/bin/lock-gateway-config after the wait_for_token loop even if the
token wasn't found; modify the wait_for_token logic to track whether a token was
discovered (e.g., set a token_found flag when the python3 check succeeds) and
then only call sudo /usr/local/bin/lock-gateway-config if token_found is true;
if token_found is false after the loop, skip locking and emit a clear log
message (using echo or logger) mentioning config_path and that the token was not
found so the gateway can still write it.
- Around line 227-229: The script currently calls lock_gateway_config before
executing override commands (when NEMOCLAW_CMD is non-empty), which prevents
those commands from writing gateway config; remove the call to
lock_gateway_config from the branch that handles NEMOCLAW_CMD so the script
simply execs "${NEMOCLAW_CMD[@]}" and lets the override command (or interactive
shell) manage config/token creation; locate the block that checks if [
${`#NEMOCLAW_CMD`[@]} -gt 0 ] and delete the lock_gateway_config false line while
leaving the exec "${NEMOCLAW_CMD[@]}" intact.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0a067647-b684-49e4-962e-1564564c0e67

📥 Commits

Reviewing files that changed from the base of the PR and between ab5048a and b029385.

📒 Files selected for processing (3)

Dockerfile
scripts/lock-gateway-config.sh
scripts/nemoclaw-start.sh

coderabbitai · 2026-03-20T23:10:02Z

+if [ ! -f "$CONFIG" ]; then
+  echo "[lock-gateway-config] config not found: $CONFIG" >&2
+  exit 1
+fi
+
+chown root:root "$CONFIG"
+chmod 444 "$CONFIG"
+
+# Revoke our own sudoers entry — this privilege is single-use.
+rm -f /etc/sudoers.d/lock-gateway-config


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n scripts/lock-gateway-config.sh

Repository: NVIDIA/NemoClaw

Length of output: 1011

🏁 Script executed:

rg -i "symlink|openclaw|sandbox.*lock" --type sh -A 3 -B 3

Repository: NVIDIA/NemoClaw

Length of output: 18130

Reject symlinks before locking the configuration file.

Lines 21-22 follow symlinks when applying chown/chmod, allowing a sandbox user to redirect root onto arbitrary files. Additionally, the writable parent directory /sandbox/.openclaw permits replacing the "locked" config after the script completes. The helper must reject symlinks (using [ -L "$CONFIG" ] or test -L) and should lock or relocate the parent directory before touching the file.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/lock-gateway-config.sh` around lines 16 - 25, Reject symlinks and protect the parent directory before changing ownership/permissions: add an explicit symlink check for CONFIG (e.g., test -L "$CONFIG" or [ -L "$CONFIG" ]) and exit with an error if it's a symlink, then ensure the parent directory of CONFIG is made immutable/locked or moved to a secure location (preventing replacement in /sandbox/.openclaw) before running chown root:root "$CONFIG" and chmod 444 "$CONFIG"; update the script around the CONFIG existence check and the chown/chmod lines and ensure the sudoers cleanup (rm -f /etc/sudoers.d/lock-gateway-config) remains but occurs after the file and parent directory are secured.

1. Reject symlinks in lock script — prevents agent from pointing openclaw.json at /etc/passwd and tricking root into chowning it. 2. Skip lock if token poll times out — if the gateway takes >30s to write the token, don't lock or the gateway can never write it. Logs a warning instead. 3. Remove lock from exec path — override commands may need to start their own gateway and write the config. The standard path (line 237) still locks after the gateway writes the token.

coderabbitai

🧹 Nitpick comments (1)

scripts/nemoclaw-start.sh (1)
113-113: Minor: Use _ for unused loop variable.

The loop counter i is declared but never referenced in the loop body. Using _ as the variable name is idiomatic for unused loop variables and silences the Shellcheck warning (SC2034).
♻️ Suggested fix
-  for i in $(seq 1 30); do
+  for _ in $(seq 1 30); do
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-start.sh` at line 113, The loop declares an unused variable
`i` in the for loop starting with "for i in $(seq 1 30); do" — change the loop
variable to `_` (i.e., "for _ in $(seq 1 30); do") so the variable name
communicates it's intentionally unused and silences Shellcheck SC2034; update
any references (there should be none) and run Shellcheck to confirm the warning
is resolved.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scripts/nemoclaw-start.sh`:
- Line 113: The loop declares an unused variable `i` in the for loop starting
with "for i in $(seq 1 30); do" — change the loop variable to `_` (i.e., "for _
in $(seq 1 30); do") so the variable name communicates it's intentionally unused
and silences Shellcheck SC2034; update any references (there should be none) and
run Shellcheck to confirm the warning is resolved.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f2297b49-fa87-435d-9756-48f9870e60cf

📥 Commits

Reviewing files that changed from the base of the PR and between b029385 and ce48dc5.

📒 Files selected for processing (2)

scripts/lock-gateway-config.sh
scripts/nemoclaw-start.sh

🚧 Files skipped from review as they are similar to previous changes (1)

scripts/lock-gateway-config.sh

OpenShell sets no_new_privs on sandbox containers, which blocks sudo entirely. Replace the sudo-based approach with a root entrypoint wrapper that: 1. Runs nemoclaw-start as sandbox via setpriv (direct syscalls, not setuid bits — works under no_new_privs) 2. Polls for the gateway auth token as root 3. Locks openclaw.json directly (chown/chmod as root, no sudo) 4. Drops to sandbox via setpriv for the interactive shell Removes: sudo dependency, lock-gateway-config.sh, sudoers entry. Adds: nemoclaw-entrypoint.sh (root entrypoint wrapper).

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/nemoclaw-entrypoint.sh`:
- Around line 23-26: The startup currently can "fail open": when as_sandbox
/usr/local/bin/nemoclaw-start "$@" exec's the NEMOCLAW_CMD path the Phase 2
config lock is skipped, and on the returning path the lock timeout only warns
(lines ~46-52), allowing startup without the lock. Fix by ensuring the config
lock is acquired before any exec that never returns (move the Phase 2 lock
acquisition into the Phase 1 flow that runs before calling as_sandbox
/usr/local/bin/nemoclaw-start or require nemoclaw-start to block until the lock
is obtained), and change the timeout handling in the returning path (the code
that prints the warning) to treat timeout as fatal (exit non-zero) so the
container cannot continue without the lock.
- Around line 32-49: The current lock only hardens the CONFIG inode (variable
CONFIG / openclaw.json) but leaves its parent (/sandbox/.openclaw) writable by
sandbox, enabling TOCTOU or unlink attacks; update
scripts/nemoclaw-entrypoint.sh to ensure the parent directory is root-owned and
not writable by sandbox before locking CONFIG: detect the parent dir of $CONFIG,
verify its owner is root and permission bits disallow sandbox write (or chown
the parent to root and set restrictive perms), or fail startup and instruct to
place the config outside the sandbox-writable tree; keep the existing
token_found logic but perform this parent-owner/permission check and remediation
prior to chown root:root and chmod 444 on $CONFIG.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9467874c-56d2-4c89-88ed-93a7cfaf3490

📥 Commits

Reviewing files that changed from the base of the PR and between ce48dc5 and 231510f.

📒 Files selected for processing (3)

Dockerfile
scripts/nemoclaw-entrypoint.sh
scripts/nemoclaw-start.sh

✅ Files skipped from review due to trivial changes (1)

scripts/nemoclaw-start.sh

coderabbitai · 2026-03-20T23:33:37Z

+# ── Phase 1: Setup as sandbox user ──────────────────────────────────────
+# nemoclaw-start handles config, plugins, gateway, auto-pair, etc.
+# In the NEMOCLAW_CMD path it exec's the command and never returns.
+as_sandbox /usr/local/bin/nemoclaw-start "$@"


⚠️ Potential issue | 🔴 Critical

This still has fail-open startup paths.

Line 25 documents that the NEMOCLAW_CMD path never returns, so the Phase 2 lock is skipped there. And on the returning path, Lines 50-51 only warn and continue on timeout. That means the container can still reach a runnable state without ever enforcing the config lock.

Also applies to: 46-52

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/nemoclaw-entrypoint.sh` around lines 23 - 26, The startup currently can "fail open": when as_sandbox /usr/local/bin/nemoclaw-start "$@" exec's the NEMOCLAW_CMD path the Phase 2 config lock is skipped, and on the returning path the lock timeout only warns (lines ~46-52), allowing startup without the lock. Fix by ensuring the config lock is acquired before any exec that never returns (move the Phase 2 lock acquisition into the Phase 1 flow that runs before calling as_sandbox /usr/local/bin/nemoclaw-start or require nemoclaw-start to block until the lock is obtained), and change the timeout handling in the returning path (the code that prints the warning) to treat timeout as fatal (exit non-zero) so the container cannot continue without the lock.

coderabbitai · 2026-03-20T23:33:37Z

+if [ -f "$CONFIG" ] && [ ! -L "$CONFIG" ] && [ -w "$CONFIG" ]; then
+  token_found=false
+  for _ in $(seq 1 30); do
+    if python3 -c "
+import json, sys
+cfg = json.load(open('$CONFIG'))
+sys.exit(0 if cfg.get('gateway',{}).get('auth',{}).get('token') else 1)
+" 2>/dev/null; then
+      token_found=true
+      break
+    fi
+    sleep 1
+  done
+
+  if [ "$token_found" = "true" ]; then
+    chown root:root "$CONFIG"
+    chmod 444 "$CONFIG"
+    echo "[security] gateway config locked: $CONFIG"


⚠️ Potential issue | 🔴 Critical

A sandbox-owned parent directory defeats this lock.

chown root:root + chmod 444 protects only the current inode. Because /sandbox/.openclaw is still owned by sandbox (Dockerfile Lines 22-24 and 55-56), the sandbox can unlink or swap openclaw.json after startup, and the -L / read / chown / chmod sequence is TOCTOU-prone for the same reason. The config needs to live outside the sandbox-writable tree, or the parent directory needs to become root-owned with only the required subdirectories handed back to sandbox.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/nemoclaw-entrypoint.sh` around lines 32 - 49, The current lock only hardens the CONFIG inode (variable CONFIG / openclaw.json) but leaves its parent (/sandbox/.openclaw) writable by sandbox, enabling TOCTOU or unlink attacks; update scripts/nemoclaw-entrypoint.sh to ensure the parent directory is root-owned and not writable by sandbox before locking CONFIG: detect the parent dir of $CONFIG, verify its owner is root and permission bits disallow sandbox write (or chown the parent to root and set restrictive perms), or fail startup and instruct to place the config outside the sandbox-writable tree; keep the existing token_found logic but perform this parent-owner/permission check and remediation prior to chown root:root and chmod 444 on $CONFIG.

ericksoa · 2026-03-21T13:37:48Z

Closed — superseded by #555 which uses Landlock filesystem policy + DAC (root:root 444 + sticky bit) instead of sandbox-side privilege escalation. This PR documents the approaches we tried and why they didn't work (sudo blocked by no_new_privs, entrypoint replaced by OpenShell, etc.).

…y with read path (NVIDIA#515)

wscurran added bug Something fails against expected or documented behavior security Potential vulnerability, unsafe behavior, or access risk labels Mar 20, 2026

wscurran added the priority: high label Mar 20, 2026

ericksoa added 3 commits March 20, 2026 14:47

fix: revoke sudoers entry after config lock (single-use privilege)

215c79d

The lock script now removes its own sudoers entry as its last step. After the config is locked there is no reason for the sandbox user to retain even this narrow sudo access.

ericksoa self-assigned this Mar 20, 2026

ericksoa marked this pull request as ready for review March 20, 2026 23:01

coderabbitai Bot reviewed Mar 20, 2026

View reviewed changes

ericksoa marked this pull request as draft March 21, 2026 00:43

ericksoa mentioned this pull request Mar 21, 2026

fix: lock gateway config via Landlock filesystem policy #555

Merged

ericksoa closed this Mar 21, 2026

mafueee pushed a commit to mafueee/NemoClaw that referenced this pull request Mar 28, 2026

refactor(proto): rename UpdateSettings to UpdateConfig for consistenc…

bbcaed2

…y with read path (NVIDIA#515)

wscurran added bug-fix PR fixes a bug or regression and removed priority: high bug Something fails against expected or documented behavior labels Jun 3, 2026

Conversation

ericksoa commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Status: Blocked — needs OpenShell-side support

Path forward

Changes in this branch

Test plan

Uh oh!

coderabbitai Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

wscurran commented Mar 20, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

ericksoa commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ericksoa commented Mar 20, 2026 •

edited

Loading

coderabbitai Bot commented Mar 20, 2026 •

edited

Loading