Skip to content

fix: lock openclaw.json after startup to prevent agent config tampering#515

Closed
ericksoa wants to merge 7 commits into
mainfrom
worktree-triage-config-security-vuln
Closed

fix: lock openclaw.json after startup to prevent agent config tampering#515
ericksoa wants to merge 7 commits into
mainfrom
worktree-triage-config-security-vuln

Conversation

@ericksoa

@ericksoa ericksoa commented Mar 20, 2026

Copy link
Copy Markdown
Contributor

Summary

Addresses the config tampering vulnerability reported in #514: an AI agent running inside the NemoClaw sandbox can modify openclaw.json to weaken auth tokens and CORS settings, allowing cross-origin websocket connections from any website.

Status: Blocked — needs OpenShell-side support

The lock must be applied with root ownership (root:root 444) after gateway startup. All sandbox-side approaches have been ruled out:

Approach Blocker
Direct chown from startup script Script runs as sandbox user — cannot chown to root
sudo with narrow sudoers entry OpenShell sets no_new_privs on sandbox processes, blocking sudo
Root entrypoint + setpriv drop OpenShell replaces the container ENTRYPOINT with openshell-sandbox

The config lock requires a privilege level that the sandbox process cannot reach by design. This is the correct security posture for OpenShell — the fix needs to come from outside the sandbox boundary.

Path forward

Coordinating with the OpenShell team (cc @drewnewberry) on the right mechanism. Options under consideration:

  1. Pre-sandbox init hook — OpenShell runs a user-defined script as root after container setup but before dropping to the sandbox user. Our lock script runs in this window.
  2. Read-only bind mount — OpenShell mounts openclaw.json as a read-only volume at the orchestrator level, outside the sandbox filesystem policy.
  3. Provider-managed config — Move security-sensitive settings (auth token, CORS origins) into OpenShell's provider model, outside the sandbox entirely. The gateway reads them from the provider rather than from a flat file in the writable zone.

Defense-in-depth hardening (tool policy deny lists, Landlock enforcement, directory locking) tracked in #516.

Changes in this branch

The branch contains our iterative attempts at a sandbox-side fix. These serve as documentation of the constraints but are not mergeable as-is:

  • scripts/nemoclaw-entrypoint.sh — root entrypoint using setpriv (blocked by ENTRYPOINT override)
  • scripts/nemoclaw-start.sh — config write skip for locked files, lock function (removed)
  • DockerfileUSER root at end, entrypoint swap (blocked by ENTRYPOINT override)

Test plan

See test/test-plan-514-config-lock.md — 13 tests covering permissions, attack reproduction, symlink rejection, setpriv under no_new_privs, and sandbox re-creation. Tests pass in plain Docker but fail in OpenShell sandboxes due to the ENTRYPOINT override.

The sandboxed AI agent can use its write/edit/exec tools to modify
openclaw.json, changing the auth token to a trivial value and CORS
origins to ["*"].  This allows any website to connect to the user's
bot via websocket.

After startup, chown the config to root:root and chmod 444.  The
gateway (running as root) retains full access, while the agent
(running as sandbox user) is blocked from writes.

Closes #514
@coderabbitai

coderabbitai Bot commented Mar 20, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

The changes implement a post-startup security lockdown mechanism for the NemoClaw container. A new entrypoint script orchestrates startup, monitors for gateway configuration generation, then transitions the config file to root ownership with read-only permissions before dropping to an interactive sandbox shell. The script also adds an early-exit condition in the config setup utility when an existing config file is non-writable.

Changes

Cohort / File(s) Summary
Container Entrypoint Configuration
Dockerfile
Modified entrypoint from direct bash invocation to a new orchestration script. Changed execution context from sandbox user to root at entry, where the entrypoint performs privilege-managed startup, post-gateway security hardening, and sandbox user context drop.
Config Setup Logic
scripts/nemoclaw-start.sh
Added early termination in fix_openclaw_config when an existing config file lacks write permissions, preventing unconditional overwrite attempts.
Entrypoint Orchestration
scripts/nemoclaw-entrypoint.sh
New executable script that runs startup as the sandbox user via setpriv, monitors gateway config generation with up to 30-second token polling, then locks the config file to read-only root ownership before dropping into an interactive shell as sandbox.

Sequence Diagram

sequenceDiagram
    participant Container as Container Start
    participant Entrypoint as nemoclaw-entrypoint
    participant Startup as nemoclaw-start
    participant Gateway as Gateway Process
    participant FS as File System
    
    Container->>Entrypoint: Execute as root
    Note over Entrypoint: Resolve sandbox UID/GID
    Entrypoint->>Startup: Execute via setpriv (sandbox context)
    Startup->>Startup: fix_openclaw_config
    Startup->>FS: Create/validate config
    Startup->>Gateway: Start gateway process
    Note over Gateway: Gateway generates auth token
    Entrypoint->>FS: Poll for gateway.auth.token (30s timeout)
    FS-->>Entrypoint: Config with token detected
    Entrypoint->>FS: chown root:root openclaw.json
    Entrypoint->>FS: chmod 444 openclaw.json
    Note over Entrypoint: Config now read-only, owned by root
    Entrypoint->>Entrypoint: Drop to sandbox user
    Entrypoint->>Entrypoint: exec /bin/bash (interactive shell)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A lock upon the gateway's key,
Secured from paws that shouldn't be,
Root takes guard while startup plays,
Then sandbox frolics through the maze—
Safe and snug, the config stays! 🔒

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main change: locking openclaw.json after startup to prevent tampering, which matches the primary objective in issue #514.
Linked Issues check ✅ Passed The PR implements the minimum proposed fix from issue #514 by locking openclaw.json to root:root with mode 444 after startup, preventing the sandboxed agent from modifying auth tokens and CORS controls.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing the security fix: entrypoint wrapper for privilege escalation, config file locking mechanism, and guard logic to handle existing permissions.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-triage-config-security-vuln

Comment @coderabbitai help to get the list of available commands and usage tips.

The gateway generates an auth token asynchronously after startup and
overwrites openclaw.json, resetting permissions.  Poll for the token
(up to 30s) before applying the lock.  Also lock the parent directory
to root:root 555 so the agent cannot delete and recreate the file.

Fixes the race condition found in Test 1.
@wscurran wscurran added bug Something fails against expected or documented behavior security Potential vulnerability, unsafe behavior, or access risk labels Mar 20, 2026
@wscurran

Copy link
Copy Markdown
Contributor

Thanks for addressing the security concern by locking the openclaw.json file to prevent agent config tampering, this is an important fix for the security of our users' configurations.

The entrypoint runs as USER sandbox, so direct chown root:root fails.
Instead: install sudo, create a dedicated lock-gateway-config script,
and add a sudoers entry that only permits sandbox to run that one
command as root.  No other escalation paths are opened.
The lock script now removes its own sudoers entry as its last step.
After the config is locked there is no reason for the sandbox user
to retain even this narrow sudo access.
Two regressions addressed:

1. Container restarts: fix_openclaw_config now detects a locked
   (root:root 444) config from a previous run and skips the write.
   lock_gateway_config also returns early if already locked.

2. Exec path 30s hang: lock_gateway_config accepts a wait_for_token
   parameter.  The exec path passes "false" to skip the poll loop
   since no gateway is running to generate a token.
@ericksoa ericksoa self-assigned this Mar 20, 2026
@ericksoa ericksoa marked this pull request as ready for review March 20, 2026 23:01

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/lock-gateway-config.sh`:
- Around line 16-25: Reject symlinks and protect the parent directory before
changing ownership/permissions: add an explicit symlink check for CONFIG (e.g.,
test -L "$CONFIG" or [ -L "$CONFIG" ]) and exit with an error if it's a symlink,
then ensure the parent directory of CONFIG is made immutable/locked or moved to
a secure location (preventing replacement in /sandbox/.openclaw) before running
chown root:root "$CONFIG" and chmod 444 "$CONFIG"; update the script around the
CONFIG existence check and the chown/chmod lines and ensure the sudoers cleanup
(rm -f /etc/sudoers.d/lock-gateway-config) remains but occurs after the file and
parent directory are secured.

In `@scripts/nemoclaw-start.sh`:
- Around line 113-127: The script currently always calls sudo
/usr/local/bin/lock-gateway-config after the wait_for_token loop even if the
token wasn't found; modify the wait_for_token logic to track whether a token was
discovered (e.g., set a token_found flag when the python3 check succeeds) and
then only call sudo /usr/local/bin/lock-gateway-config if token_found is true;
if token_found is false after the loop, skip locking and emit a clear log
message (using echo or logger) mentioning config_path and that the token was not
found so the gateway can still write it.
- Around line 227-229: The script currently calls lock_gateway_config before
executing override commands (when NEMOCLAW_CMD is non-empty), which prevents
those commands from writing gateway config; remove the call to
lock_gateway_config from the branch that handles NEMOCLAW_CMD so the script
simply execs "${NEMOCLAW_CMD[@]}" and lets the override command (or interactive
shell) manage config/token creation; locate the block that checks if [
${`#NEMOCLAW_CMD`[@]} -gt 0 ] and delete the lock_gateway_config false line while
leaving the exec "${NEMOCLAW_CMD[@]}" intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0a067647-b684-49e4-962e-1564564c0e67

📥 Commits

Reviewing files that changed from the base of the PR and between ab5048a and b029385.

📒 Files selected for processing (3)
  • Dockerfile
  • scripts/lock-gateway-config.sh
  • scripts/nemoclaw-start.sh

Comment thread scripts/lock-gateway-config.sh Outdated
Comment on lines +16 to +25
if [ ! -f "$CONFIG" ]; then
echo "[lock-gateway-config] config not found: $CONFIG" >&2
exit 1
fi

chown root:root "$CONFIG"
chmod 444 "$CONFIG"

# Revoke our own sudoers entry — this privilege is single-use.
rm -f /etc/sudoers.d/lock-gateway-config

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n scripts/lock-gateway-config.sh

Repository: NVIDIA/NemoClaw

Length of output: 1011


🏁 Script executed:

rg -i "symlink|openclaw|sandbox.*lock" --type sh -A 3 -B 3

Repository: NVIDIA/NemoClaw

Length of output: 18130


Reject symlinks before locking the configuration file.

Lines 21-22 follow symlinks when applying chown/chmod, allowing a sandbox user to redirect root onto arbitrary files. Additionally, the writable parent directory /sandbox/.openclaw permits replacing the "locked" config after the script completes. The helper must reject symlinks (using [ -L "$CONFIG" ] or test -L) and should lock or relocate the parent directory before touching the file.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/lock-gateway-config.sh` around lines 16 - 25, Reject symlinks and
protect the parent directory before changing ownership/permissions: add an
explicit symlink check for CONFIG (e.g., test -L "$CONFIG" or [ -L "$CONFIG" ])
and exit with an error if it's a symlink, then ensure the parent directory of
CONFIG is made immutable/locked or moved to a secure location (preventing
replacement in /sandbox/.openclaw) before running chown root:root "$CONFIG" and
chmod 444 "$CONFIG"; update the script around the CONFIG existence check and the
chown/chmod lines and ensure the sudoers cleanup (rm -f
/etc/sudoers.d/lock-gateway-config) remains but occurs after the file and parent
directory are secured.

Comment thread scripts/nemoclaw-start.sh Outdated
Comment thread scripts/nemoclaw-start.sh
1. Reject symlinks in lock script — prevents agent from pointing
   openclaw.json at /etc/passwd and tricking root into chowning it.

2. Skip lock if token poll times out — if the gateway takes >30s to
   write the token, don't lock or the gateway can never write it.
   Logs a warning instead.

3. Remove lock from exec path — override commands may need to start
   their own gateway and write the config. The standard path (line
   237) still locks after the gateway writes the token.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
scripts/nemoclaw-start.sh (1)

113-113: Minor: Use _ for unused loop variable.

The loop counter i is declared but never referenced in the loop body. Using _ as the variable name is idiomatic for unused loop variables and silences the Shellcheck warning (SC2034).

♻️ Suggested fix
-  for i in $(seq 1 30); do
+  for _ in $(seq 1 30); do
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-start.sh` at line 113, The loop declares an unused variable
`i` in the for loop starting with "for i in $(seq 1 30); do" — change the loop
variable to `_` (i.e., "for _ in $(seq 1 30); do") so the variable name
communicates it's intentionally unused and silences Shellcheck SC2034; update
any references (there should be none) and run Shellcheck to confirm the warning
is resolved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@scripts/nemoclaw-start.sh`:
- Line 113: The loop declares an unused variable `i` in the for loop starting
with "for i in $(seq 1 30); do" — change the loop variable to `_` (i.e., "for _
in $(seq 1 30); do") so the variable name communicates it's intentionally unused
and silences Shellcheck SC2034; update any references (there should be none) and
run Shellcheck to confirm the warning is resolved.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f2297b49-fa87-435d-9756-48f9870e60cf

📥 Commits

Reviewing files that changed from the base of the PR and between b029385 and ce48dc5.

📒 Files selected for processing (2)
  • scripts/lock-gateway-config.sh
  • scripts/nemoclaw-start.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/lock-gateway-config.sh

OpenShell sets no_new_privs on sandbox containers, which blocks sudo
entirely.  Replace the sudo-based approach with a root entrypoint
wrapper that:

1. Runs nemoclaw-start as sandbox via setpriv (direct syscalls,
   not setuid bits — works under no_new_privs)
2. Polls for the gateway auth token as root
3. Locks openclaw.json directly (chown/chmod as root, no sudo)
4. Drops to sandbox via setpriv for the interactive shell

Removes: sudo dependency, lock-gateway-config.sh, sudoers entry.
Adds: nemoclaw-entrypoint.sh (root entrypoint wrapper).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/nemoclaw-entrypoint.sh`:
- Around line 23-26: The startup currently can "fail open": when as_sandbox
/usr/local/bin/nemoclaw-start "$@" exec's the NEMOCLAW_CMD path the Phase 2
config lock is skipped, and on the returning path the lock timeout only warns
(lines ~46-52), allowing startup without the lock. Fix by ensuring the config
lock is acquired before any exec that never returns (move the Phase 2 lock
acquisition into the Phase 1 flow that runs before calling as_sandbox
/usr/local/bin/nemoclaw-start or require nemoclaw-start to block until the lock
is obtained), and change the timeout handling in the returning path (the code
that prints the warning) to treat timeout as fatal (exit non-zero) so the
container cannot continue without the lock.
- Around line 32-49: The current lock only hardens the CONFIG inode (variable
CONFIG / openclaw.json) but leaves its parent (/sandbox/.openclaw) writable by
sandbox, enabling TOCTOU or unlink attacks; update
scripts/nemoclaw-entrypoint.sh to ensure the parent directory is root-owned and
not writable by sandbox before locking CONFIG: detect the parent dir of $CONFIG,
verify its owner is root and permission bits disallow sandbox write (or chown
the parent to root and set restrictive perms), or fail startup and instruct to
place the config outside the sandbox-writable tree; keep the existing
token_found logic but perform this parent-owner/permission check and remediation
prior to chown root:root and chmod 444 on $CONFIG.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9467874c-56d2-4c89-88ed-93a7cfaf3490

📥 Commits

Reviewing files that changed from the base of the PR and between ce48dc5 and 231510f.

📒 Files selected for processing (3)
  • Dockerfile
  • scripts/nemoclaw-entrypoint.sh
  • scripts/nemoclaw-start.sh
✅ Files skipped from review due to trivial changes (1)
  • scripts/nemoclaw-start.sh

Comment on lines +23 to +26
# ── Phase 1: Setup as sandbox user ──────────────────────────────────────
# nemoclaw-start handles config, plugins, gateway, auto-pair, etc.
# In the NEMOCLAW_CMD path it exec's the command and never returns.
as_sandbox /usr/local/bin/nemoclaw-start "$@"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

This still has fail-open startup paths.

Line 25 documents that the NEMOCLAW_CMD path never returns, so the Phase 2 lock is skipped there. And on the returning path, Lines 50-51 only warn and continue on timeout. That means the container can still reach a runnable state without ever enforcing the config lock.

Also applies to: 46-52

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-entrypoint.sh` around lines 23 - 26, The startup currently
can "fail open": when as_sandbox /usr/local/bin/nemoclaw-start "$@" exec's the
NEMOCLAW_CMD path the Phase 2 config lock is skipped, and on the returning path
the lock timeout only warns (lines ~46-52), allowing startup without the lock.
Fix by ensuring the config lock is acquired before any exec that never returns
(move the Phase 2 lock acquisition into the Phase 1 flow that runs before
calling as_sandbox /usr/local/bin/nemoclaw-start or require nemoclaw-start to
block until the lock is obtained), and change the timeout handling in the
returning path (the code that prints the warning) to treat timeout as fatal
(exit non-zero) so the container cannot continue without the lock.

Comment on lines +32 to +49
if [ -f "$CONFIG" ] && [ ! -L "$CONFIG" ] && [ -w "$CONFIG" ]; then
token_found=false
for _ in $(seq 1 30); do
if python3 -c "
import json, sys
cfg = json.load(open('$CONFIG'))
sys.exit(0 if cfg.get('gateway',{}).get('auth',{}).get('token') else 1)
" 2>/dev/null; then
token_found=true
break
fi
sleep 1
done

if [ "$token_found" = "true" ]; then
chown root:root "$CONFIG"
chmod 444 "$CONFIG"
echo "[security] gateway config locked: $CONFIG"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

A sandbox-owned parent directory defeats this lock.

chown root:root + chmod 444 protects only the current inode. Because /sandbox/.openclaw is still owned by sandbox (Dockerfile Lines 22-24 and 55-56), the sandbox can unlink or swap openclaw.json after startup, and the -L / read / chown / chmod sequence is TOCTOU-prone for the same reason. The config needs to live outside the sandbox-writable tree, or the parent directory needs to become root-owned with only the required subdirectories handed back to sandbox.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-entrypoint.sh` around lines 32 - 49, The current lock only
hardens the CONFIG inode (variable CONFIG / openclaw.json) but leaves its parent
(/sandbox/.openclaw) writable by sandbox, enabling TOCTOU or unlink attacks;
update scripts/nemoclaw-entrypoint.sh to ensure the parent directory is
root-owned and not writable by sandbox before locking CONFIG: detect the parent
dir of $CONFIG, verify its owner is root and permission bits disallow sandbox
write (or chown the parent to root and set restrictive perms), or fail startup
and instruct to place the config outside the sandbox-writable tree; keep the
existing token_found logic but perform this parent-owner/permission check and
remediation prior to chown root:root and chmod 444 on $CONFIG.

@ericksoa

Copy link
Copy Markdown
Contributor Author

Closed — superseded by #555 which uses Landlock filesystem policy + DAC (root:root 444 + sticky bit) instead of sandbox-side privilege escalation. This PR documents the approaches we tried and why they didn't work (sudo blocked by no_new_privs, entrypoint replaced by OpenShell, etc.).

@ericksoa ericksoa closed this Mar 21, 2026
mafueee pushed a commit to mafueee/NemoClaw that referenced this pull request Mar 28, 2026
@wscurran wscurran added bug-fix PR fixes a bug or regression and removed priority: high bug Something fails against expected or documented behavior labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression security Potential vulnerability, unsafe behavior, or access risk

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants