Skip to content

feat(security): externalize gateway auth token from openclaw.json#2378

Merged
ericksoa merged 14 commits into
mainfrom
feat/externalize-gateway-token
Apr 24, 2026
Merged

feat(security): externalize gateway auth token from openclaw.json#2378
ericksoa merged 14 commits into
mainfrom
feat/externalize-gateway-token

Conversation

@ericksoa

@ericksoa ericksoa commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Move the gateway auth token out of openclaw.json (readable by the sandbox agent) and generate it at container startup instead. The token is now written to a runtime file and passed via OPENCLAW_GATEWAY_TOKEN env var only to the gateway process.

Root mode (full privilege separation): The token is written to /run/nemoclaw/gateway-token owned by gateway:gateway 0400. The sandbox user cannot access it: the file is owned by a different uid, the env var is only in the gateway process (/proc/pid/environ is uid-gated), and no-new-privileges prevents escalation.

Non-root mode (privilege separation disabled): The token is written to /tmp/.runtime/nemoclaw/gateway-token owned by sandbox:sandbox. Because privilege separation is not available in this mode, the sandbox user can read the token file directly. This is a known limitation consistent with the existing non-root security posture — other trust-boundary files in /tmp have the same caveat (see sandbox-init.sh trust boundary map). The improvement in non-root mode is that the token no longer appears in openclaw.json, shell rc files, or the process environment.

Changes

  • Dockerfile: Build openclaw.json with an empty auth token placeholder; remove secrets import
  • scripts/nemoclaw-start.sh: Replace export_gateway_token() (which wrote tokens into .bashrc/.profile) with generate_gateway_token() that writes to /run/nemoclaw/gateway-token and pass token via env var only to the gateway process launch line
  • src/lib/onboard.ts: fetchGatewayAuthTokenFromSandbox() now reads from /run/nemoclaw/gateway-token first, falls back to openclaw.json for pre-externalization images; update dashboard token guidance
  • src/nemoclaw.ts: Remove MIN_LOGS_OPENSHELL_VERSION gating and printOldLogsCompatibilityGuidance(); simplify sandboxLogs()
  • docs/security/best-practices.md: Document the externalized token architecture and update risk descriptions
  • spark-install.md: Update token location guidance
  • test/cli.test.ts, test/nemoclaw-start.test.ts: Update tests for new token flow

Also includes unrelated changes already merged to main:

  • ci(nightly-e2e): replace per-job failure check with wildcard contains()
  • refactor(e2e): centralize timeout/gtimeout handling into e2e-timeout.sh
  • fix: stream sandbox logs via tail
  • chore(deps): bump setup-uv, setup-qemu-action
  • fix(e2e): make nightly teardown reliable
  • fix(dockerfile): drop invalid channels.defaults.configWrites

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates

Verification

  • npx prek run --all-files passes
  • npm test passes (branch-modified tests: 127/127 pass)
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes

AI Disclosure

  • AI-assisted — tool: Claude Code

Signed-off-by: Aaron Erickson aerickson@nvidia.com

The gateway auth token was stored in openclaw.json (chmod 444, readable
by the sandbox user/agent). This moves it out so the agent can never
read or write the token:

- Build time: gateway.auth.token set to empty string in openclaw.json
- Container startup: entrypoint generates a random token via
  secrets.token_hex(32), writes to /run/nemoclaw/gateway-token
  (gateway:gateway 0400), and passes OPENCLAW_GATEWAY_TOKEN env var
  only on the gateway process launch line
- OpenClaw natively reads OPENCLAW_GATEWAY_TOKEN via its
  resolveGatewayCredentialsFromValues() path — no dist patch needed
- The sandbox user cannot read the token file (wrong uid) or the
  gateway process env (/proc/pid/environ is uid-gated), and
  no-new-privileges blocks escalation to the gateway user
- Host-side token fetch (onboard.ts) downloads the token file via
  openshell sandbox download (runs as root), with fallback to
  openclaw.json for pre-externalization images
- export_gateway_token removed — token no longer exported to sandbox
  user's .bashrc/.profile

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Gateway auth token creation moved from build/config to container runtime: token is generated at startup, written to /run/nemoclaw/gateway-token with restrictive ownership/permissions, and supplied only to the gateway process via OPENCLAW_GATEWAY_TOKEN; JSON-embedded tokens and global exports removed.

Changes

Cohort / File(s) Summary
Startup & Build
Dockerfile, scripts/nemoclaw-start.sh
Removed build-time token embedding in openclaw.json. Added generate_gateway_token() (root-only) that writes /run/nemoclaw/gateway-token (owned gateway:gateway, 0400). Replaced previous export model; _read_gateway_token() reads GATEWAY_TOKEN_FILE. Gateway receives token only in its command environment; rc-file export/persist removed.
Client / Onboarding
src/lib/onboard.ts, spark-install.md
fetchGatewayAuthTokenFromSandbox() now prefers /run/nemoclaw/gateway-token (download into host temp) and falls back to openclaw.json for older images. Onboarding/help text updated to reference new token location and access constraints.
Docs & Security Guidance
.agents/skills/nemoclaw-user-configure-security/references/best-practices.md, docs/security/best-practices.md
Docs updated to state tokens are not stored in /sandbox/.openclaw; token is generated at startup and persisted to /run/nemoclaw/gateway-token with restrictive perms. Threat-model text explains uid separation and no-new-privileges; removed claim that writable .openclaw lets users change tokens.
Tests
test/nemoclaw-start.test.ts
Tests refactored from rc-snippet/export expectations to externalized-token flow: assert GATEWAY_TOKEN_DIR=/run/nemoclaw, GATEWAY_TOKEN_FILE=/run/nemoclaw/gateway-token, presence/behavior of generate_gateway_token() and _read_gateway_token(), file perms/ownership, that OPENCLAW_GATEWAY_TOKEN is injected only on the gateway command line, and that export_gateway_token() is absent.

Sequence Diagram(s)

sequenceDiagram
    participant Container as Container Startup
    participant Entrypoint as Entrypoint / Root
    participant Tmpfs as /run/nemoclaw (tmpfs)
    participant Gateway as Gateway Process
    participant Sandbox as Sandbox User
    participant Host as Host Operator

    Container->>Entrypoint: start nemoclaw-start.sh
    Entrypoint->>Entrypoint: generate_gateway_token()
    Entrypoint->>Tmpfs: write gateway-token (0400, gateway:gateway)
    Entrypoint->>Gateway: exec gateway with OPENCLAW_GATEWAY_TOKEN=<token>
    Gateway->>Gateway: initialize using token
    Sandbox->>Tmpfs: access denied (uid separation / no-new-privileges)
    Host->>Tmpfs: host may read mounted file for management
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I snuffled through startup, not a clue in JSON hay,
At dawn a token blossoms where tmpfs keeps it safe,
Only the gateway peeks — no global crumbs to sway,
Owners and perms guard the burrow's little waif,
Hop, hop, secure — a rabbit's tidy, secret waft. 🥕🔐

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main security-focused change: moving gateway authentication token generation and storage out of openclaw.json to an externalized file.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/externalize-gateway-token

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
spark-install.md (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add the required SPDX header to this Markdown file.

This file is missing the mandatory SPDX copyright/license header comments.

Proposed fix
+<!-- SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -->
+<!-- SPDX-License-Identifier: Apache-2.0 -->
 # NemoClaw on DGX Spark

As per coding guidelines "**/*.{js,mjs,ts,tsx,sh,md}: Every source file must include an SPDX license header ... Use HTML comments for Markdown."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spark-install.md` at line 1, Add the required SPDX license header as an HTML
comment at the top of spark-install.md (above the existing "# NemoClaw on DGX
Spark" title); include both the SPDX copyright/owner line and the
SPDX-License-Identifier line in that comment per project guidelines so the file
complies with the "`**/*.{js,mjs,ts,tsx,sh,md}`" rule.
Dockerfile (1)

185-187: ⚠️ Potential issue | 🟡 Minor

Update stale Build ID comment to match new token flow.

The comment still says build ID is for a fresh auth token, but token generation is now done at container startup.

Proposed fix
-# Unique per build to ensure each image gets a fresh auth token.
-# Pass --build-arg NEMOCLAW_BUILD_ID=$(date +%s) to bust the cache.
+# Unique per build to bust Docker cache for config materialization layers.
+# Pass --build-arg NEMOCLAW_BUILD_ID=$(date +%s) to bust the cache.
+# Gateway auth token is generated at container startup by the entrypoint.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` around lines 185 - 187, The comment above ARG NEMOCLAW_BUILD_ID
is stale: it claims the build ID forces a fresh auth token at build time, but
token generation now happens at container startup; update the comment near ARG
NEMOCLAW_BUILD_ID to describe its current purpose (e.g., cache-busting/unique
per build meta only) and note that auth tokens are generated at runtime on
container startup rather than during image build.
🧹 Nitpick comments (2)
docs/security/best-practices.md (1)

232-236: Sentences should be on separate lines for readable diffs.

Per the documentation style guide, each sentence should be on its own line in the source. Lines 233-235 contain multiple sentences on the same line.

📝 Suggested reformat
 The `/sandbox/.openclaw` directory contains the OpenClaw gateway configuration (model routing, CORS settings, channel config).
-The gateway auth token is **not** stored in this directory — it is generated at container startup and passed via the `OPENCLAW_GATEWAY_TOKEN` environment variable only to the gateway process (which runs as the `gateway` user).
-The token is also persisted to `/run/nemoclaw/gateway-token` (`gateway:gateway 0400`) for host-side reads.
-The sandbox user (agent) cannot access the token: the file is owned by a different uid, the env var is only in the gateway process (`/proc/pid/environ` is uid-gated), and `no-new-privileges` prevents escalation.
+The gateway auth token is **not** stored in this directory.
+It is generated at container startup and passed via the `OPENCLAW_GATEWAY_TOKEN` environment variable only to the gateway process (which runs as the `gateway` user).
+The token is also persisted to `/run/nemoclaw/gateway-token` (`gateway:gateway 0400`) for host-side reads.
+The sandbox user (agent) cannot access the token: the file is owned by a different uid, the env var is only in the gateway process (`/proc/pid/environ` is uid-gated), and `no-new-privileges` prevents escalation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/security/best-practices.md` around lines 232 - 236, Split the
multi-sentence lines into one sentence per line in the paragraph that describes
the OpenClaw gateway token handling: break the combined lines that mention
"/sandbox/.openclaw", the fact the gateway auth token is generated at container
startup and passed via the OPENCLAW_GATEWAY_TOKEN env var to the gateway
process, the persistence to "/run/nemoclaw/gateway-token" with its
ownership/permissions, and the sandbox user/agent access restrictions (uid
gating of /proc/pid/environ and no-new-privileges) so each sentence appears on
its own source line for readable diffs.
scripts/nemoclaw-start.sh (1)

495-499: Consider atomic file creation with restrictive permissions.

There's a brief TOCTOU window between file creation (line 496, created with umask, typically 644) and permission restriction (line 499, chmod 400). During this window, the token file is world-readable.

While the practical risk is low (runs at startup before sandbox processes exist, and /run is tmpfs), atomic creation would eliminate this window entirely.

🛡️ Suggested hardening (optional)
-  python3 -c "import secrets; print(secrets.token_hex(32), end='')" \
-    >"$GATEWAY_TOKEN_FILE"
-
-  chown gateway:gateway "$GATEWAY_TOKEN_FILE"
-  chmod 400 "$GATEWAY_TOKEN_FILE"
+  # Create token file atomically with restrictive permissions
+  _token_tmp="$(mktemp -p "$GATEWAY_TOKEN_DIR")"
+  chmod 400 "$_token_tmp"
+  chown gateway:gateway "$_token_tmp"
+  python3 -c "import secrets; print(secrets.token_hex(32), end='')" >"$_token_tmp"
+  mv "$_token_tmp" "$GATEWAY_TOKEN_FILE"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/nemoclaw-start.sh` around lines 495 - 499, Replace the two-step token
write + chmod with atomic creation using the python3 writer so the file is
created with restrictive permissions from the start: modify the current python3
token generation (the one that writes to GATEWAY_TOKEN_FILE) to open/create the
file atomically with mode 0o400 (or use os.open with O_CREAT|O_EXCL and mode
0o400) and write the token, then run chown gateway:gateway on GATEWAY_TOKEN_FILE
if needed; this removes the TOCTOU window between the write and the chmod while
keeping the existing chown/chmod intent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 6109-6123: The code checks only tmpDir/gateway-token after running
runOpenshell (sandbox download) and misses cases where files are nested; update
the token retrieval (the block using tokenFileResult, runOpenshell, tmpDir, and
path.join("gateway-token")) to search recursively under tmpDir for a file named
"gateway-token" (or the path run/nemoclaw/gateway-token) instead of only
checking tmpDir/gateway-token; if found, read and trim that file as before and
return the token, otherwise continue to the fallback branch. Ensure you reuse
the existing tokenFileResult variable and preserve ignoreError/stdio behavior
when calling runOpenshell.

---

Outside diff comments:
In `@Dockerfile`:
- Around line 185-187: The comment above ARG NEMOCLAW_BUILD_ID is stale: it
claims the build ID forces a fresh auth token at build time, but token
generation now happens at container startup; update the comment near ARG
NEMOCLAW_BUILD_ID to describe its current purpose (e.g., cache-busting/unique
per build meta only) and note that auth tokens are generated at runtime on
container startup rather than during image build.

In `@spark-install.md`:
- Line 1: Add the required SPDX license header as an HTML comment at the top of
spark-install.md (above the existing "# NemoClaw on DGX Spark" title); include
both the SPDX copyright/owner line and the SPDX-License-Identifier line in that
comment per project guidelines so the file complies with the
"`**/*.{js,mjs,ts,tsx,sh,md}`" rule.

---

Nitpick comments:
In `@docs/security/best-practices.md`:
- Around line 232-236: Split the multi-sentence lines into one sentence per line
in the paragraph that describes the OpenClaw gateway token handling: break the
combined lines that mention "/sandbox/.openclaw", the fact the gateway auth
token is generated at container startup and passed via the
OPENCLAW_GATEWAY_TOKEN env var to the gateway process, the persistence to
"/run/nemoclaw/gateway-token" with its ownership/permissions, and the sandbox
user/agent access restrictions (uid gating of /proc/pid/environ and
no-new-privileges) so each sentence appears on its own source line for readable
diffs.

In `@scripts/nemoclaw-start.sh`:
- Around line 495-499: Replace the two-step token write + chmod with atomic
creation using the python3 writer so the file is created with restrictive
permissions from the start: modify the current python3 token generation (the one
that writes to GATEWAY_TOKEN_FILE) to open/create the file atomically with mode
0o400 (or use os.open with O_CREAT|O_EXCL and mode 0o400) and write the token,
then run chown gateway:gateway on GATEWAY_TOKEN_FILE if needed; this removes the
TOCTOU window between the write and the chmod while keeping the existing
chown/chmod intent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 74fe2cba-eb7e-4332-a423-b7c0f7be5bbb

📥 Commits

Reviewing files that changed from the base of the PR and between 02dff1e and 6ea9871.

📒 Files selected for processing (7)
  • .agents/skills/nemoclaw-user-configure-security/references/best-practices.md
  • Dockerfile
  • docs/security/best-practices.md
  • scripts/nemoclaw-start.sh
  • spark-install.md
  • src/lib/onboard.ts
  • test/nemoclaw-start.test.ts

Comment thread src/lib/onboard.ts Outdated
@ericksoa ericksoa self-assigned this Apr 23, 2026
ericksoa and others added 2 commits April 23, 2026 13:06
openshell sandbox download may preserve the directory structure
(run/nemoclaw/gateway-token) rather than flattening to the temp dir
root. Use recursive search to find the token file regardless of
nesting, matching the pattern already used for openclaw.json.

Also update stale Dockerfile comment about NEMOCLAW_BUILD_ID to
reflect that auth tokens are now generated at container startup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ericksoa

Copy link
Copy Markdown
Contributor Author

Addressing the remaining CodeRabbit review items:

Outside-diff: spark-install.md missing SPDX header — Pre-existing; this file wasn't created by this PR. The pre-commit hook auto-inserts SPDX headers, so it'll get fixed next time someone touches it. Not addressing here to keep the diff focused.

Outside-diff: Dockerfile:185-187 stale build ID comment — Fixed in 2f56b07.

Nitpick: docs/security/best-practices.md:232-236 one-sentence-per-line — Fair point on diff readability. Will leave as-is for now since it reads fine and the doc style guide is a soft recommendation, but happy to split if a reviewer prefers it.

Nitpick: scripts/nemoclaw-start.sh:495-499 TOCTOU window — Acknowledged. The practical risk is zero: this runs as root at container startup on tmpfs before any sandbox processes exist, and the gateway user doesn't exist as a running process yet. The suggested mktemp+chmod+chown+mv pattern adds complexity for no real security gain here. Leaving as-is.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 6322-6325: The console fallback guidance incorrectly instructs
users to check `/tmp/gateway.log` for the gateway token; update the message(s)
that contain the string `append  `#token`=<token>  to the URL, or see
/tmp/gateway.log inside the sandbox.` so that for root mode it references the
actual token location `/run/nemoclaw/gateway-token` (privileged host-side read),
and for non-root mode remove the `/tmp/gateway.log` hint and instead instruct
users to "re-run onboard" (no externalized token available). Locate and change
the console.log/print statements matching the shown string to implement this
conditional messaging (or simply replace the misleading path with
`/run/nemoclaw/gateway-token` and/or the "re-run onboard" fallback as
appropriate).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9f6fa625-71f0-4606-a72d-ee2dde875ae8

📥 Commits

Reviewing files that changed from the base of the PR and between 6ea9871 and 2f56b07.

📒 Files selected for processing (2)
  • Dockerfile
  • src/lib/onboard.ts

Comment thread src/lib/onboard.ts
@ericksoa

Copy link
Copy Markdown
Contributor Author

Upgrade compatibility notes for reviewers

Scenario 1: New CLI + Old sandbox image (most common upgrade path)
Works seamlessly. fetchGatewayAuthTokenFromSandbox() tries /run/nemoclaw/gateway-token first (won't exist in old images), then falls back to reading gateway.auth.token from openclaw.json. The old entrypoint is baked into the old image so it continues running the old export_gateway_token() flow. No user action needed.

Scenario 2: Old CLI + New sandbox image (user rebuilds sandbox but hasn't updated CLI)
The gateway and agent work normally — generate_gateway_token() creates the token at startup and passes it to the gateway process via env var. However, the old CLI only knows how to read the token from openclaw.json, which now has an empty placeholder. This means the old CLI won't display the token in onboard output or dashboard URLs.

The sandbox is fully functional; the user just can't easily retrieve the token for the Control UI. Mitigation: upgrade the CLI (npm install -g nemoclaw). This is an unusual edge case since nemoclaw onboard (which triggers the rebuild) ships with the CLI, so both normally update together. Worth a release note callout.

@ericksoa ericksoa added the security Potential vulnerability, unsafe behavior, or access risk label Apr 23, 2026
ericksoa and others added 2 commits April 23, 2026 13:49
openclaw doctor --fix auto-generates a gateway auth token when it
finds an empty one in openclaw.json, defeating the externalization.
Add a post-doctor RUN step that re-clears the token before the
config integrity hash is pinned.

Add Dockerfile-level regression tests verifying the clearing step
exists after doctor and before the hash pin.

Found by automated PR security review probe that deployed a real
sandbox and confirmed the sandbox user could still read a non-empty
token from openclaw.json.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
test/nemoclaw-start.test.ts (2)

457-459: Root-path sequencing regex is unnecessarily newline-sensitive.

This can fail on harmless comments/spacing changes. Use non-greedy ordered matching instead of exact newline adjacency.

Proposed refactor
-    const rootBlock = src.match(
-      /# ── Root path[\s\S]*?apply_model_override\n\s*apply_cors_override\n\s*apply_slack_token_override\n\s*generate_gateway_token/,
-    );
+    const rootBlock = src.match(
+      /# ── Root path[\s\S]*?apply_model_override[\s\S]*?apply_cors_override[\s\S]*?apply_slack_token_override[\s\S]*?generate_gateway_token/,
+    );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/nemoclaw-start.test.ts` around lines 457 - 459, The current src.match
call storing rootBlock uses a regex that requires exact newline adjacency
between the listed functions and is brittle; change the regex to allow any
characters (including newlines and intervening comments/whitespace) between
those tokens using non-greedy, ordered matching so it only requires the sequence
apply_model_override, apply_cors_override, apply_slack_token_override,
generate_gateway_token in order. Update the src.match(...) invocation that
assigns rootBlock to use a pattern that matches "# ── Root path" followed by
non-greedy [\s\S]*? between each listed symbol (apply_model_override,
apply_cors_override, apply_slack_token_override, generate_gateway_token) instead
of forcing exact newline adjacency.

170-188: Dockerfile checks are a bit too formatting-coupled.

indexOf with full literal fragments is fragile to harmless formatting/refactors. Prefer ordered regex anchors around semantic markers.

Proposed refactor
-    const doctorIdx = dockerfile.indexOf("openclaw doctor --fix");
-    const clearIdx = dockerfile.indexOf("cfg.setdefault('gateway', {}).setdefault('auth', {})['token'] = ''");
-    expect(doctorIdx).toBeGreaterThan(-1);
-    expect(clearIdx).toBeGreaterThan(-1);
-    expect(clearIdx).toBeGreaterThan(doctorIdx);
+    expect(dockerfile).toMatch(
+      /openclaw doctor --fix[\s\S]*?cfg\.setdefault\('gateway', \{\}\)\.setdefault\('auth', \{\}\)\['token'\]\s*=\s*''/s,
+    );

-    const clearIdx = dockerfile.indexOf("['token'] = ''");
-    const hashIdx = dockerfile.indexOf("sha256sum /sandbox/.openclaw/openclaw.json");
-    // Both must exist and hash must come after the clear step
-    expect(clearIdx).toBeGreaterThan(-1);
-    expect(hashIdx).toBeGreaterThan(-1);
-    expect(hashIdx).toBeGreaterThan(clearIdx);
+    expect(dockerfile).toMatch(
+      /\['token'\]\s*=\s*''[\s\S]*?sha256sum \/sandbox\/\.openclaw\/openclaw\.json/s,
+    );
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/nemoclaw-start.test.ts` around lines 170 - 188, The tests currently use
brittle indexOf on exact string fragments in dockerfile; replace those with
regex searches to detect the semantic markers more robustly. In the "clears any
auto-generated token..." test use a regex for the doctor invocation (e.g., match
"openclaw doctor" with optional whitespace and flags) to set doctorIdx from the
regex match index, and use a regex that finds the token-clearing expression
(match setdefault('gateway' ... ['token'] = '') with flexible spacing) to set
clearIdx, then assert ordering as before; do the same in "pins config hash..."
by finding the token-clear regex for clearIdx and a regex for the sha256sum line
to set hashIdx, and assert positions are present and ordered. Keep references to
the existing variables (dockerfile, doctorIdx, clearIdx, hashIdx) and replace
indexOf calls with RegExp.exec or String.match to obtain match.index for
ordering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/nemoclaw-start.test.ts`:
- Around line 457-459: The current src.match call storing rootBlock uses a regex
that requires exact newline adjacency between the listed functions and is
brittle; change the regex to allow any characters (including newlines and
intervening comments/whitespace) between those tokens using non-greedy, ordered
matching so it only requires the sequence apply_model_override,
apply_cors_override, apply_slack_token_override, generate_gateway_token in
order. Update the src.match(...) invocation that assigns rootBlock to use a
pattern that matches "# ── Root path" followed by non-greedy [\s\S]*? between
each listed symbol (apply_model_override, apply_cors_override,
apply_slack_token_override, generate_gateway_token) instead of forcing exact
newline adjacency.
- Around line 170-188: The tests currently use brittle indexOf on exact string
fragments in dockerfile; replace those with regex searches to detect the
semantic markers more robustly. In the "clears any auto-generated token..." test
use a regex for the doctor invocation (e.g., match "openclaw doctor" with
optional whitespace and flags) to set doctorIdx from the regex match index, and
use a regex that finds the token-clearing expression (match setdefault('gateway'
... ['token'] = '') with flexible spacing) to set clearIdx, then assert ordering
as before; do the same in "pins config hash..." by finding the token-clear regex
for clearIdx and a regex for the sha256sum line to set hashIdx, and assert
positions are present and ordered. Keep references to the existing variables
(dockerfile, doctorIdx, clearIdx, hashIdx) and replace indexOf calls with
RegExp.exec or String.match to obtain match.index for ordering.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a8313963-45f0-4dd2-ba37-0ea64040e23d

📥 Commits

Reviewing files that changed from the base of the PR and between 2f56b07 and f136714.

📒 Files selected for processing (2)
  • Dockerfile
  • test/nemoclaw-start.test.ts
✅ Files skipped from review due to trivial changes (1)
  • Dockerfile

When the entrypoint runs as non-root (no privilege separation), the
gateway still needs valid auth credentials. Previously, the non-root
path skipped token generation entirely, relying on OpenClaw to
auto-generate one. But with openclaw.json now containing an empty
token placeholder, the gateway starts with no auth and fails (WS
close 1006).

Generate a token in non-root mode and:
- Export OPENCLAW_GATEWAY_TOKEN so the gateway process reads it
- Write to /run/nemoclaw/gateway-token so host-side download works

The token is not isolated from the sandbox user in non-root mode
(no uid separation), but this matches the pre-externalization
security posture for non-root deployments.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 980-988: The non-root token write is silently ignored because
mkdir and printf are masked with "|| true", causing host reads (and
print_dashboard_urls()) to not find the token; change the logic in the non-root
token block that sets _NONROOT_GATEWAY_TOKEN and OPENCLAW_GATEWAY_TOKEN so that
you ensure the token file is actually written: attempt to create
"$GATEWAY_TOKEN_DIR" and write "$GATEWAY_TOKEN_FILE" and if that fails either
(a) fall back to a guaranteed-writable location such as
"${XDG_RUNTIME_DIR:-/tmp}/nemoclaw" updating
GATEWAY_TOKEN_DIR/GATEWAY_TOKEN_FILE accordingly, or (b) fail fast by printing
an error to stderr and exiting non-zero; remove the "|| true" masking and use
the variables _NONROOT_GATEWAY_TOKEN, OPENCLAW_GATEWAY_TOKEN, GATEWAY_TOKEN_DIR,
GATEWAY_TOKEN_FILE (and ensure callers like print_dashboard_urls() will read the
same path).
🪄 Autofix (Beta)

✅ Autofix completed


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2e387f0e-5bcc-4b42-a4ee-57c5f063a480

📥 Commits

Reviewing files that changed from the base of the PR and between f136714 and 7dc0a72.

📒 Files selected for processing (2)
  • scripts/nemoclaw-start.sh
  • test/nemoclaw-start.test.ts

Comment thread scripts/nemoclaw-start.sh Outdated
@coderabbitai

coderabbitai Bot commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

Fixes Applied Successfully

Fixed 1 file(s) based on 1 unresolved review comment.

Files modified:

  • scripts/nemoclaw-start.sh

Commit: 03c15890fe3eedfa9fa30446fc8ddfa7ac2e4668

The changes have been pushed to the feat/externalize-gateway-token branch.

Time taken: 2m 49s

Fixed 1 file(s) based on 1 unresolved review comment.

Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
@copy-pr-bot

copy-pr-bot Bot commented Apr 23, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

ericksoa and others added 3 commits April 23, 2026 16:29
openshell sandbox download runs as the sandbox user (uid 998) via
SSH, so it cannot read /run/nemoclaw/gateway-token (gateway:gateway
0400). Use kubectl exec instead, which runs as root in the pod and
can read the file — the same privileged exec pattern used by the
shields system for managing Landlock-protected paths.

Falls back to openclaw.json download for pre-externalization images
where kubectl exec is not available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mode

The non-root path was writing the gateway token to
$XDG_RUNTIME_DIR/nemoclaw/gateway-token (/tmp/.runtime/nemoclaw/
gateway-token) as sandbox:sandbox 0644 — fully readable by the
agent. It also exported OPENCLAW_GATEWAY_TOKEN to the shell env.

Fix: pass the token only on the gateway launch line (same pattern as
root mode) so it lives solely in the gateway process env. Do not
write a token file or export the env var.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In non-root mode, write the gateway token to
$XDG_RUNTIME_DIR/nemoclaw/gateway-token with chmod 0400. This is no
worse than the pre-PR state (token was in sandbox-readable
openclaw.json) and gives host-side retrieval a working path without
requiring kubectl exec.

The token is still not exported to the shell env — it is passed
only on the gateway launch line.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ericksoa

Copy link
Copy Markdown
Contributor Author

Non-root mode security posture (for adversarial reviewer context)

The automated reviewer has correctly identified that in non-root mode, the sandbox user can still read the gateway token. This is expected and intentional — here's why:

Root mode (production deployment) — full isolation

  • Token generated at startup, written to /run/nemoclaw/gateway-token (gateway:gateway 0400)
  • Passed to gateway process only via OPENCLAW_GATEWAY_TOKEN env var on the launch line
  • Sandbox user (uid 998) cannot read the file (wrong uid), cannot read /proc/<gateway-pid>/environ (uid-gated), and no-new-privileges blocks escalation
  • Host-side retrieval uses kubectl exec (runs as root in the pod) — same pattern as the shields system
  • All three adversarial probes pass: no token in openclaw.json, no token in shell env, no token via /proc scanning

Non-root mode (dev/fallback) — no uid isolation, but hardened vs pre-PR

  • No privilege separation available (single uid runs everything)
  • Token written to $XDG_RUNTIME_DIR/nemoclaw/gateway-token with chmod 0400 (previously was 0644)
  • Token is not exported to the shell env (previously was export OPENCLAW_GATEWAY_TOKEN)
  • Token is not written to .bashrc/.profile (the old export_gateway_token() did this)
  • Token is not in openclaw.json (cleared by post-doctor build step)
  • The sandbox user owns the file and can still read it — this is no worse than the pre-PR state where the token was plaintext in openclaw.json (also sandbox-readable)

What the reviewer should expect

  • Root mode probes: Token should be fully isolated. File at /run/nemoclaw/gateway-token owned by gateway:gateway 0400, unreadable by sandbox user. No token in env, config, or proc.
  • Non-root mode probes: Token file at /tmp/.runtime/nemoclaw/gateway-token owned by sandbox user with 0400. The sandbox user CAN read it. This is the inherent limitation of single-uid operation — the security improvement here is eliminating the config/env/rc-file leak vectors, not achieving full isolation (which requires root + uid separation).

Non-root mode is a small subset of deployments (development environments, restricted container runtimes that don't allow root entrypoints). The production path through Docker + OpenShell always runs root.

fetchGatewayAuthTokenFromSandbox() now tries three paths:
1. kubectl exec cat /run/nemoclaw/gateway-token (root mode)
2. sandbox download /tmp/.runtime/nemoclaw/gateway-token (non-root)
3. sandbox download openclaw.json (pre-externalization legacy)

The non-root token file is sandbox-owned (0400) so openshell sandbox
download (which runs as the sandbox user via SSH) can read it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ericksoa

Copy link
Copy Markdown
Contributor Author

Token retrieval: dual-path architecture (reviewer guidance)

The reviewer's environment runs the entrypoint as non-root, which uses a different token storage path than root mode. The retrieval code now handles both:

Token file locations by mode

Mode Token path Owner Perms How host retrieves it
Root (production) /run/nemoclaw/gateway-token gateway:gateway 0400 kubectl exec (runs as root in pod)
Non-root (dev/fallback) /tmp/.runtime/nemoclaw/gateway-token sandbox:sandbox 0400 openshell sandbox download (runs as sandbox user)

What fetchGatewayAuthTokenFromSandbox() now tries (in order)

  1. kubectl exec ... cat /run/nemoclaw/gateway-token — root mode, same kubectl-via-K3s pattern as shields.ts
  2. openshell sandbox download /tmp/.runtime/nemoclaw/gateway-token — non-root mode, sandbox-owned file
  3. openshell sandbox download openclaw.json → parse gateway.auth.token — legacy pre-externalization images

Why /run/nemoclaw/gateway-token is absent in the reviewer's environment

The reviewer's environment runs the entrypoint as non-root (uid 998). The non-root path cannot write to /run (root-owned tmpfs), so it writes to $XDG_RUNTIME_DIR/nemoclaw/gateway-token/tmp/.runtime/nemoclaw/gateway-token instead. This is by design — the non-root path was never going to create files at /run/nemoclaw/.

The reviewer should probe /tmp/.runtime/nemoclaw/gateway-token to validate the non-root token file exists with 0400 permissions.

ericksoa and others added 3 commits April 23, 2026 18:43
Update all docs, Dockerfile comments, and entrypoint comments to
describe both token file locations:
- Root mode: /run/nemoclaw/gateway-token (gateway:gateway 0400)
- Non-root mode: $XDG_RUNTIME_DIR/nemoclaw/gateway-token (sandbox:sandbox 0400)

Regenerate agent skills from updated docs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add regression tests covering the reviewer's recommended follow-up:
- openclaw.json never contains a non-empty token at build time
- Gateway process runs under distinct uid (gosu gateway)
- Shell env and rc files never receive the token
- Non-root token file uses 0400 permissions
- Stale token file is removed before rewrite
- Host-side retrieval tries three paths in correct order
  (kubectl exec → non-root download → legacy openclaw.json)
- Entrypoint documents both root and non-root token paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ion and Slack guard

Keep PR's generate_gateway_token() and file-based _read_gateway_token(),
add main's Slack channel guard (install_slack_channel_guard), and use
PR's test regex expecting generate_gateway_token in the root path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ericksoa ericksoa requested a review from brandonpelfrey April 24, 2026 03:01
Comment thread src/lib/onboard.ts
}
}

function findFileRecursive(dir, filename) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add typescript type information.

Comment thread src/lib/onboard.ts
const token = result.toString().trim();
if (token.length > 0) return token;
} catch {
// kubectl exec not available or file absent — fall through

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log something in case

@ericksoa ericksoa merged commit 51aa6af into main Apr 24, 2026
13 checks passed
ericksoa added a commit that referenced this pull request Apr 24, 2026
Merge main into remove-config-immutability, resolving conflicts from
PRs #2324 (SSRF blocklist), #2378 (gateway token externalization),
and various CI/docs/test updates.

Key resolution decisions:
- Keep mutable-by-default model (PR #2227) for config layout
- Keep gateway token externalization (PR #2378) in Dockerfile and start scripts
- Remove config set/rotate-token from sandbox-config.ts (PR #2227 removes host-side mutation)
- Keep SSRF blocklist in private-networks.ts (PR #2324) for remaining consumers
- Update sandbox-provisioning tests for the unified .openclaw layout
- Delete test-skip-permissions-policy.sh (feature removed by PR #2227)
@cv cv added the v0.0.25 label Apr 24, 2026
ericksoa added a commit that referenced this pull request Apr 25, 2026
## Summary

- Reverts 51aa6af (`feat(security): externalize gateway auth token from
openclaw.json (#2378)`)
- The externalized token path breaks `openclaw tui` inside the sandbox —
OpenClaw 2026.4.9 requires `OPENCLAW_GATEWAY_TOKEN` but the runtime
injection fails under Landlock (non-root mode) and the token is no
longer in `openclaw.json` where the TUI and gateway can read it
- Restores build-time token generation in `openclaw.json` so gateways
authenticate out-of-the-box again
- The token externalization will be re-introduced in a separate PR with
deeper testing across root/non-root modes and OpenClaw 2026.4.9

Fixes #2480

## Test plan

- [x] `npm run typecheck:cli` passes
- [x] `npx vitest run --project cli` — 2110 tests pass
- [x] All pre-commit and pre-push hooks pass
- [ ] Verify `openclaw tui` works inside sandbox after rebuild
- [ ] Verify gateway auth works on Spark (non-root mode)
- [ ] Verify gateway auth works in root mode

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
* Clarified security guidance: gateway auth tokens are stored in the
sandbox configuration and risk notes updated.

* **Changes**
* Token generation moved earlier in the image/build process so auth is
present in the sandbox config at runtime.
* Runtime token retrieval simplified and connection instructions
updated.
* Gateway token is exported to an environment variable and
persisted/removed in users' shell profiles.

* **Tests**
* Tests updated to validate token export, persistence, and retrieval
behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
…IDIA#2378)

## Summary

Move the gateway auth token out of `openclaw.json` (readable by the
sandbox agent) and generate it at container startup instead. The token
is now written to a runtime file and passed via `OPENCLAW_GATEWAY_TOKEN`
env var only to the gateway process.

**Root mode (full privilege separation):** The token is written to
`/run/nemoclaw/gateway-token` owned by `gateway:gateway 0400`. The
sandbox user cannot access it: the file is owned by a different uid, the
env var is only in the gateway process (`/proc/pid/environ` is
uid-gated), and `no-new-privileges` prevents escalation.

**Non-root mode (privilege separation disabled):** The token is written
to `/tmp/.runtime/nemoclaw/gateway-token` owned by `sandbox:sandbox`.
Because privilege separation is not available in this mode, the sandbox
user can read the token file directly. This is a known limitation
consistent with the existing non-root security posture — other
trust-boundary files in `/tmp` have the same caveat (see
`sandbox-init.sh` trust boundary map). The improvement in non-root mode
is that the token no longer appears in `openclaw.json`, shell rc files,
or the process environment.

## Changes

- **Dockerfile**: Build `openclaw.json` with an empty auth token
placeholder; remove `secrets` import
- **scripts/nemoclaw-start.sh**: Replace `export_gateway_token()` (which
wrote tokens into `.bashrc`/`.profile`) with `generate_gateway_token()`
that writes to `/run/nemoclaw/gateway-token` and pass token via env var
only to the gateway process launch line
- **src/lib/onboard.ts**: `fetchGatewayAuthTokenFromSandbox()` now reads
from `/run/nemoclaw/gateway-token` first, falls back to `openclaw.json`
for pre-externalization images; update dashboard token guidance
- **src/nemoclaw.ts**: Remove `MIN_LOGS_OPENSHELL_VERSION` gating and
`printOldLogsCompatibilityGuidance()`; simplify `sandboxLogs()`
- **docs/security/best-practices.md**: Document the externalized token
architecture and update risk descriptions
- **spark-install.md**: Update token location guidance
- **test/cli.test.ts**, **test/nemoclaw-start.test.ts**: Update tests
for new token flow

Also includes unrelated changes already merged to main:
- ci(nightly-e2e): replace per-job failure check with wildcard
`contains()`
- refactor(e2e): centralize timeout/gtimeout handling into
`e2e-timeout.sh`
- fix: stream sandbox logs via tail
- chore(deps): bump setup-uv, setup-qemu-action
- fix(e2e): make nightly teardown reliable
- fix(dockerfile): drop invalid `channels.defaults.configWrites`

## Type of Change

- [x] Code change (feature, bug fix, or refactor)
- [x] Code change with doc updates

## Verification

- [x] `npx prek run --all-files` passes
- [x] `npm test` passes (branch-modified tests: 127/127 pass)
- [x] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes

## AI Disclosure

- [x] AI-assisted — tool: Claude Code

---
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
## Summary

- Reverts 51aa6af (`feat(security): externalize gateway auth token from
openclaw.json (NVIDIA#2378)`)
- The externalized token path breaks `openclaw tui` inside the sandbox —
OpenClaw 2026.4.9 requires `OPENCLAW_GATEWAY_TOKEN` but the runtime
injection fails under Landlock (non-root mode) and the token is no
longer in `openclaw.json` where the TUI and gateway can read it
- Restores build-time token generation in `openclaw.json` so gateways
authenticate out-of-the-box again
- The token externalization will be re-introduced in a separate PR with
deeper testing across root/non-root modes and OpenClaw 2026.4.9

Fixes NVIDIA#2480

## Test plan

- [x] `npm run typecheck:cli` passes
- [x] `npx vitest run --project cli` — 2110 tests pass
- [x] All pre-commit and pre-push hooks pass
- [ ] Verify `openclaw tui` works inside sandbox after rebuild
- [ ] Verify gateway auth works on Spark (non-root mode)
- [ ] Verify gateway auth works in root mode

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
* Clarified security guidance: gateway auth tokens are stored in the
sandbox configuration and risk notes updated.

* **Changes**
* Token generation moved earlier in the image/build process so auth is
present in the sandbox config at runtime.
* Runtime token retrieval simplified and connection instructions
updated.
* Gateway token is exported to an environment variable and
persisted/removed in users' shell profiles.

* **Tests**
* Tests updated to validate token export, persistence, and retrieval
behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@wscurran wscurran added the feature PR adds or expands user-visible functionality label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature PR adds or expands user-visible functionality security Potential vulnerability, unsafe behavior, or access risk

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants