feat(security): externalize gateway auth token from openclaw.json#2378
Conversation
The gateway auth token was stored in openclaw.json (chmod 444, readable by the sandbox user/agent). This moves it out so the agent can never read or write the token: - Build time: gateway.auth.token set to empty string in openclaw.json - Container startup: entrypoint generates a random token via secrets.token_hex(32), writes to /run/nemoclaw/gateway-token (gateway:gateway 0400), and passes OPENCLAW_GATEWAY_TOKEN env var only on the gateway process launch line - OpenClaw natively reads OPENCLAW_GATEWAY_TOKEN via its resolveGatewayCredentialsFromValues() path — no dist patch needed - The sandbox user cannot read the token file (wrong uid) or the gateway process env (/proc/pid/environ is uid-gated), and no-new-privileges blocks escalation to the gateway user - Host-side token fetch (onboard.ts) downloads the token file via openshell sandbox download (runs as root), with fallback to openclaw.json for pre-externalization images - export_gateway_token removed — token no longer exported to sandbox user's .bashrc/.profile Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughGateway auth token creation moved from build/config to container runtime: token is generated at startup, written to Changes
Sequence Diagram(s)sequenceDiagram
participant Container as Container Startup
participant Entrypoint as Entrypoint / Root
participant Tmpfs as /run/nemoclaw (tmpfs)
participant Gateway as Gateway Process
participant Sandbox as Sandbox User
participant Host as Host Operator
Container->>Entrypoint: start nemoclaw-start.sh
Entrypoint->>Entrypoint: generate_gateway_token()
Entrypoint->>Tmpfs: write gateway-token (0400, gateway:gateway)
Entrypoint->>Gateway: exec gateway with OPENCLAW_GATEWAY_TOKEN=<token>
Gateway->>Gateway: initialize using token
Sandbox->>Tmpfs: access denied (uid separation / no-new-privileges)
Host->>Tmpfs: host may read mounted file for management
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
spark-install.md (1)
1-1:⚠️ Potential issue | 🟠 MajorAdd the required SPDX header to this Markdown file.
This file is missing the mandatory SPDX copyright/license header comments.
Proposed fix
+<!-- SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. --> +<!-- SPDX-License-Identifier: Apache-2.0 --> # NemoClaw on DGX SparkAs per coding guidelines "
**/*.{js,mjs,ts,tsx,sh,md}: Every source file must include an SPDX license header ... Use HTML comments for Markdown."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@spark-install.md` at line 1, Add the required SPDX license header as an HTML comment at the top of spark-install.md (above the existing "# NemoClaw on DGX Spark" title); include both the SPDX copyright/owner line and the SPDX-License-Identifier line in that comment per project guidelines so the file complies with the "`**/*.{js,mjs,ts,tsx,sh,md}`" rule.Dockerfile (1)
185-187:⚠️ Potential issue | 🟡 MinorUpdate stale Build ID comment to match new token flow.
The comment still says build ID is for a fresh auth token, but token generation is now done at container startup.
Proposed fix
-# Unique per build to ensure each image gets a fresh auth token. -# Pass --build-arg NEMOCLAW_BUILD_ID=$(date +%s) to bust the cache. +# Unique per build to bust Docker cache for config materialization layers. +# Pass --build-arg NEMOCLAW_BUILD_ID=$(date +%s) to bust the cache. +# Gateway auth token is generated at container startup by the entrypoint.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile` around lines 185 - 187, The comment above ARG NEMOCLAW_BUILD_ID is stale: it claims the build ID forces a fresh auth token at build time, but token generation now happens at container startup; update the comment near ARG NEMOCLAW_BUILD_ID to describe its current purpose (e.g., cache-busting/unique per build meta only) and note that auth tokens are generated at runtime on container startup rather than during image build.
🧹 Nitpick comments (2)
docs/security/best-practices.md (1)
232-236: Sentences should be on separate lines for readable diffs.Per the documentation style guide, each sentence should be on its own line in the source. Lines 233-235 contain multiple sentences on the same line.
📝 Suggested reformat
The `/sandbox/.openclaw` directory contains the OpenClaw gateway configuration (model routing, CORS settings, channel config). -The gateway auth token is **not** stored in this directory — it is generated at container startup and passed via the `OPENCLAW_GATEWAY_TOKEN` environment variable only to the gateway process (which runs as the `gateway` user). -The token is also persisted to `/run/nemoclaw/gateway-token` (`gateway:gateway 0400`) for host-side reads. -The sandbox user (agent) cannot access the token: the file is owned by a different uid, the env var is only in the gateway process (`/proc/pid/environ` is uid-gated), and `no-new-privileges` prevents escalation. +The gateway auth token is **not** stored in this directory. +It is generated at container startup and passed via the `OPENCLAW_GATEWAY_TOKEN` environment variable only to the gateway process (which runs as the `gateway` user). +The token is also persisted to `/run/nemoclaw/gateway-token` (`gateway:gateway 0400`) for host-side reads. +The sandbox user (agent) cannot access the token: the file is owned by a different uid, the env var is only in the gateway process (`/proc/pid/environ` is uid-gated), and `no-new-privileges` prevents escalation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/security/best-practices.md` around lines 232 - 236, Split the multi-sentence lines into one sentence per line in the paragraph that describes the OpenClaw gateway token handling: break the combined lines that mention "/sandbox/.openclaw", the fact the gateway auth token is generated at container startup and passed via the OPENCLAW_GATEWAY_TOKEN env var to the gateway process, the persistence to "/run/nemoclaw/gateway-token" with its ownership/permissions, and the sandbox user/agent access restrictions (uid gating of /proc/pid/environ and no-new-privileges) so each sentence appears on its own source line for readable diffs.scripts/nemoclaw-start.sh (1)
495-499: Consider atomic file creation with restrictive permissions.There's a brief TOCTOU window between file creation (line 496, created with umask, typically
644) and permission restriction (line 499,chmod 400). During this window, the token file is world-readable.While the practical risk is low (runs at startup before sandbox processes exist, and
/runis tmpfs), atomic creation would eliminate this window entirely.🛡️ Suggested hardening (optional)
- python3 -c "import secrets; print(secrets.token_hex(32), end='')" \ - >"$GATEWAY_TOKEN_FILE" - - chown gateway:gateway "$GATEWAY_TOKEN_FILE" - chmod 400 "$GATEWAY_TOKEN_FILE" + # Create token file atomically with restrictive permissions + _token_tmp="$(mktemp -p "$GATEWAY_TOKEN_DIR")" + chmod 400 "$_token_tmp" + chown gateway:gateway "$_token_tmp" + python3 -c "import secrets; print(secrets.token_hex(32), end='')" >"$_token_tmp" + mv "$_token_tmp" "$GATEWAY_TOKEN_FILE"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/nemoclaw-start.sh` around lines 495 - 499, Replace the two-step token write + chmod with atomic creation using the python3 writer so the file is created with restrictive permissions from the start: modify the current python3 token generation (the one that writes to GATEWAY_TOKEN_FILE) to open/create the file atomically with mode 0o400 (or use os.open with O_CREAT|O_EXCL and mode 0o400) and write the token, then run chown gateway:gateway on GATEWAY_TOKEN_FILE if needed; this removes the TOCTOU window between the write and the chmod while keeping the existing chown/chmod intent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 6109-6123: The code checks only tmpDir/gateway-token after running
runOpenshell (sandbox download) and misses cases where files are nested; update
the token retrieval (the block using tokenFileResult, runOpenshell, tmpDir, and
path.join("gateway-token")) to search recursively under tmpDir for a file named
"gateway-token" (or the path run/nemoclaw/gateway-token) instead of only
checking tmpDir/gateway-token; if found, read and trim that file as before and
return the token, otherwise continue to the fallback branch. Ensure you reuse
the existing tokenFileResult variable and preserve ignoreError/stdio behavior
when calling runOpenshell.
---
Outside diff comments:
In `@Dockerfile`:
- Around line 185-187: The comment above ARG NEMOCLAW_BUILD_ID is stale: it
claims the build ID forces a fresh auth token at build time, but token
generation now happens at container startup; update the comment near ARG
NEMOCLAW_BUILD_ID to describe its current purpose (e.g., cache-busting/unique
per build meta only) and note that auth tokens are generated at runtime on
container startup rather than during image build.
In `@spark-install.md`:
- Line 1: Add the required SPDX license header as an HTML comment at the top of
spark-install.md (above the existing "# NemoClaw on DGX Spark" title); include
both the SPDX copyright/owner line and the SPDX-License-Identifier line in that
comment per project guidelines so the file complies with the
"`**/*.{js,mjs,ts,tsx,sh,md}`" rule.
---
Nitpick comments:
In `@docs/security/best-practices.md`:
- Around line 232-236: Split the multi-sentence lines into one sentence per line
in the paragraph that describes the OpenClaw gateway token handling: break the
combined lines that mention "/sandbox/.openclaw", the fact the gateway auth
token is generated at container startup and passed via the
OPENCLAW_GATEWAY_TOKEN env var to the gateway process, the persistence to
"/run/nemoclaw/gateway-token" with its ownership/permissions, and the sandbox
user/agent access restrictions (uid gating of /proc/pid/environ and
no-new-privileges) so each sentence appears on its own source line for readable
diffs.
In `@scripts/nemoclaw-start.sh`:
- Around line 495-499: Replace the two-step token write + chmod with atomic
creation using the python3 writer so the file is created with restrictive
permissions from the start: modify the current python3 token generation (the one
that writes to GATEWAY_TOKEN_FILE) to open/create the file atomically with mode
0o400 (or use os.open with O_CREAT|O_EXCL and mode 0o400) and write the token,
then run chown gateway:gateway on GATEWAY_TOKEN_FILE if needed; this removes the
TOCTOU window between the write and the chmod while keeping the existing
chown/chmod intent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 74fe2cba-eb7e-4332-a423-b7c0f7be5bbb
📒 Files selected for processing (7)
.agents/skills/nemoclaw-user-configure-security/references/best-practices.mdDockerfiledocs/security/best-practices.mdscripts/nemoclaw-start.shspark-install.mdsrc/lib/onboard.tstest/nemoclaw-start.test.ts
openshell sandbox download may preserve the directory structure (run/nemoclaw/gateway-token) rather than flattening to the temp dir root. Use recursive search to find the token file regardless of nesting, matching the pattern already used for openclaw.json. Also update stale Dockerfile comment about NEMOCLAW_BUILD_ID to reflect that auth tokens are now generated at container startup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Addressing the remaining CodeRabbit review items: Outside-diff: Outside-diff: Nitpick: Nitpick: |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 6322-6325: The console fallback guidance incorrectly instructs
users to check `/tmp/gateway.log` for the gateway token; update the message(s)
that contain the string `append `#token`=<token> to the URL, or see
/tmp/gateway.log inside the sandbox.` so that for root mode it references the
actual token location `/run/nemoclaw/gateway-token` (privileged host-side read),
and for non-root mode remove the `/tmp/gateway.log` hint and instead instruct
users to "re-run onboard" (no externalized token available). Locate and change
the console.log/print statements matching the shown string to implement this
conditional messaging (or simply replace the misleading path with
`/run/nemoclaw/gateway-token` and/or the "re-run onboard" fallback as
appropriate).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 9f6fa625-71f0-4606-a72d-ee2dde875ae8
📒 Files selected for processing (2)
Dockerfilesrc/lib/onboard.ts
Upgrade compatibility notes for reviewersScenario 1: New CLI + Old sandbox image (most common upgrade path) Scenario 2: Old CLI + New sandbox image (user rebuilds sandbox but hasn't updated CLI) The sandbox is fully functional; the user just can't easily retrieve the token for the Control UI. Mitigation: upgrade the CLI ( |
openclaw doctor --fix auto-generates a gateway auth token when it finds an empty one in openclaw.json, defeating the externalization. Add a post-doctor RUN step that re-clears the token before the config integrity hash is pinned. Add Dockerfile-level regression tests verifying the clearing step exists after doctor and before the hash pin. Found by automated PR security review probe that deployed a real sandbox and confirmed the sandbox user could still read a non-empty token from openclaw.json. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
test/nemoclaw-start.test.ts (2)
457-459: Root-path sequencing regex is unnecessarily newline-sensitive.This can fail on harmless comments/spacing changes. Use non-greedy ordered matching instead of exact newline adjacency.
Proposed refactor
- const rootBlock = src.match( - /# ── Root path[\s\S]*?apply_model_override\n\s*apply_cors_override\n\s*apply_slack_token_override\n\s*generate_gateway_token/, - ); + const rootBlock = src.match( + /# ── Root path[\s\S]*?apply_model_override[\s\S]*?apply_cors_override[\s\S]*?apply_slack_token_override[\s\S]*?generate_gateway_token/, + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/nemoclaw-start.test.ts` around lines 457 - 459, The current src.match call storing rootBlock uses a regex that requires exact newline adjacency between the listed functions and is brittle; change the regex to allow any characters (including newlines and intervening comments/whitespace) between those tokens using non-greedy, ordered matching so it only requires the sequence apply_model_override, apply_cors_override, apply_slack_token_override, generate_gateway_token in order. Update the src.match(...) invocation that assigns rootBlock to use a pattern that matches "# ── Root path" followed by non-greedy [\s\S]*? between each listed symbol (apply_model_override, apply_cors_override, apply_slack_token_override, generate_gateway_token) instead of forcing exact newline adjacency.
170-188: Dockerfile checks are a bit too formatting-coupled.
indexOfwith full literal fragments is fragile to harmless formatting/refactors. Prefer ordered regex anchors around semantic markers.Proposed refactor
- const doctorIdx = dockerfile.indexOf("openclaw doctor --fix"); - const clearIdx = dockerfile.indexOf("cfg.setdefault('gateway', {}).setdefault('auth', {})['token'] = ''"); - expect(doctorIdx).toBeGreaterThan(-1); - expect(clearIdx).toBeGreaterThan(-1); - expect(clearIdx).toBeGreaterThan(doctorIdx); + expect(dockerfile).toMatch( + /openclaw doctor --fix[\s\S]*?cfg\.setdefault\('gateway', \{\}\)\.setdefault\('auth', \{\}\)\['token'\]\s*=\s*''/s, + ); - const clearIdx = dockerfile.indexOf("['token'] = ''"); - const hashIdx = dockerfile.indexOf("sha256sum /sandbox/.openclaw/openclaw.json"); - // Both must exist and hash must come after the clear step - expect(clearIdx).toBeGreaterThan(-1); - expect(hashIdx).toBeGreaterThan(-1); - expect(hashIdx).toBeGreaterThan(clearIdx); + expect(dockerfile).toMatch( + /\['token'\]\s*=\s*''[\s\S]*?sha256sum \/sandbox\/\.openclaw\/openclaw\.json/s, + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/nemoclaw-start.test.ts` around lines 170 - 188, The tests currently use brittle indexOf on exact string fragments in dockerfile; replace those with regex searches to detect the semantic markers more robustly. In the "clears any auto-generated token..." test use a regex for the doctor invocation (e.g., match "openclaw doctor" with optional whitespace and flags) to set doctorIdx from the regex match index, and use a regex that finds the token-clearing expression (match setdefault('gateway' ... ['token'] = '') with flexible spacing) to set clearIdx, then assert ordering as before; do the same in "pins config hash..." by finding the token-clear regex for clearIdx and a regex for the sha256sum line to set hashIdx, and assert positions are present and ordered. Keep references to the existing variables (dockerfile, doctorIdx, clearIdx, hashIdx) and replace indexOf calls with RegExp.exec or String.match to obtain match.index for ordering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/nemoclaw-start.test.ts`:
- Around line 457-459: The current src.match call storing rootBlock uses a regex
that requires exact newline adjacency between the listed functions and is
brittle; change the regex to allow any characters (including newlines and
intervening comments/whitespace) between those tokens using non-greedy, ordered
matching so it only requires the sequence apply_model_override,
apply_cors_override, apply_slack_token_override, generate_gateway_token in
order. Update the src.match(...) invocation that assigns rootBlock to use a
pattern that matches "# ── Root path" followed by non-greedy [\s\S]*? between
each listed symbol (apply_model_override, apply_cors_override,
apply_slack_token_override, generate_gateway_token) instead of forcing exact
newline adjacency.
- Around line 170-188: The tests currently use brittle indexOf on exact string
fragments in dockerfile; replace those with regex searches to detect the
semantic markers more robustly. In the "clears any auto-generated token..." test
use a regex for the doctor invocation (e.g., match "openclaw doctor" with
optional whitespace and flags) to set doctorIdx from the regex match index, and
use a regex that finds the token-clearing expression (match setdefault('gateway'
... ['token'] = '') with flexible spacing) to set clearIdx, then assert ordering
as before; do the same in "pins config hash..." by finding the token-clear regex
for clearIdx and a regex for the sha256sum line to set hashIdx, and assert
positions are present and ordered. Keep references to the existing variables
(dockerfile, doctorIdx, clearIdx, hashIdx) and replace indexOf calls with
RegExp.exec or String.match to obtain match.index for ordering.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a8313963-45f0-4dd2-ba37-0ea64040e23d
📒 Files selected for processing (2)
Dockerfiletest/nemoclaw-start.test.ts
✅ Files skipped from review due to trivial changes (1)
- Dockerfile
When the entrypoint runs as non-root (no privilege separation), the gateway still needs valid auth credentials. Previously, the non-root path skipped token generation entirely, relying on OpenClaw to auto-generate one. But with openclaw.json now containing an empty token placeholder, the gateway starts with no auth and fails (WS close 1006). Generate a token in non-root mode and: - Export OPENCLAW_GATEWAY_TOKEN so the gateway process reads it - Write to /run/nemoclaw/gateway-token so host-side download works The token is not isolated from the sandbox user in non-root mode (no uid separation), but this matches the pre-externalization security posture for non-root deployments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 980-988: The non-root token write is silently ignored because
mkdir and printf are masked with "|| true", causing host reads (and
print_dashboard_urls()) to not find the token; change the logic in the non-root
token block that sets _NONROOT_GATEWAY_TOKEN and OPENCLAW_GATEWAY_TOKEN so that
you ensure the token file is actually written: attempt to create
"$GATEWAY_TOKEN_DIR" and write "$GATEWAY_TOKEN_FILE" and if that fails either
(a) fall back to a guaranteed-writable location such as
"${XDG_RUNTIME_DIR:-/tmp}/nemoclaw" updating
GATEWAY_TOKEN_DIR/GATEWAY_TOKEN_FILE accordingly, or (b) fail fast by printing
an error to stderr and exiting non-zero; remove the "|| true" masking and use
the variables _NONROOT_GATEWAY_TOKEN, OPENCLAW_GATEWAY_TOKEN, GATEWAY_TOKEN_DIR,
GATEWAY_TOKEN_FILE (and ensure callers like print_dashboard_urls() will read the
same path).
🪄 Autofix (Beta)
✅ Autofix completed
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 2e387f0e-5bcc-4b42-a4ee-57c5f063a480
📒 Files selected for processing (2)
scripts/nemoclaw-start.shtest/nemoclaw-start.test.ts
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. Fixes Applied SuccessfullyFixed 1 file(s) based on 1 unresolved review comment. Files modified:
Commit: The changes have been pushed to the Time taken: |
Fixed 1 file(s) based on 1 unresolved review comment. Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
openshell sandbox download runs as the sandbox user (uid 998) via SSH, so it cannot read /run/nemoclaw/gateway-token (gateway:gateway 0400). Use kubectl exec instead, which runs as root in the pod and can read the file — the same privileged exec pattern used by the shields system for managing Landlock-protected paths. Falls back to openclaw.json download for pre-externalization images where kubectl exec is not available. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mode The non-root path was writing the gateway token to $XDG_RUNTIME_DIR/nemoclaw/gateway-token (/tmp/.runtime/nemoclaw/ gateway-token) as sandbox:sandbox 0644 — fully readable by the agent. It also exported OPENCLAW_GATEWAY_TOKEN to the shell env. Fix: pass the token only on the gateway launch line (same pattern as root mode) so it lives solely in the gateway process env. Do not write a token file or export the env var. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
In non-root mode, write the gateway token to $XDG_RUNTIME_DIR/nemoclaw/gateway-token with chmod 0400. This is no worse than the pre-PR state (token was in sandbox-readable openclaw.json) and gives host-side retrieval a working path without requiring kubectl exec. The token is still not exported to the shell env — it is passed only on the gateway launch line. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Non-root mode security posture (for adversarial reviewer context)The automated reviewer has correctly identified that in non-root mode, the sandbox user can still read the gateway token. This is expected and intentional — here's why: Root mode (production deployment) — full isolation
Non-root mode (dev/fallback) — no uid isolation, but hardened vs pre-PR
What the reviewer should expect
Non-root mode is a small subset of deployments (development environments, restricted container runtimes that don't allow root entrypoints). The production path through Docker + OpenShell always runs root. |
fetchGatewayAuthTokenFromSandbox() now tries three paths: 1. kubectl exec cat /run/nemoclaw/gateway-token (root mode) 2. sandbox download /tmp/.runtime/nemoclaw/gateway-token (non-root) 3. sandbox download openclaw.json (pre-externalization legacy) The non-root token file is sandbox-owned (0400) so openshell sandbox download (which runs as the sandbox user via SSH) can read it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Token retrieval: dual-path architecture (reviewer guidance)The reviewer's environment runs the entrypoint as non-root, which uses a different token storage path than root mode. The retrieval code now handles both: Token file locations by mode
What
|
Update all docs, Dockerfile comments, and entrypoint comments to describe both token file locations: - Root mode: /run/nemoclaw/gateway-token (gateway:gateway 0400) - Non-root mode: $XDG_RUNTIME_DIR/nemoclaw/gateway-token (sandbox:sandbox 0400) Regenerate agent skills from updated docs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add regression tests covering the reviewer's recommended follow-up: - openclaw.json never contains a non-empty token at build time - Gateway process runs under distinct uid (gosu gateway) - Shell env and rc files never receive the token - Non-root token file uses 0400 permissions - Stale token file is removed before rewrite - Host-side retrieval tries three paths in correct order (kubectl exec → non-root download → legacy openclaw.json) - Entrypoint documents both root and non-root token paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ion and Slack guard Keep PR's generate_gateway_token() and file-based _read_gateway_token(), add main's Slack channel guard (install_slack_channel_guard), and use PR's test regex expecting generate_gateway_token in the root path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| } | ||
| } | ||
|
|
||
| function findFileRecursive(dir, filename) { |
There was a problem hiding this comment.
Add typescript type information.
| const token = result.toString().trim(); | ||
| if (token.length > 0) return token; | ||
| } catch { | ||
| // kubectl exec not available or file absent — fall through |
There was a problem hiding this comment.
Log something in case
Merge main into remove-config-immutability, resolving conflicts from PRs #2324 (SSRF blocklist), #2378 (gateway token externalization), and various CI/docs/test updates. Key resolution decisions: - Keep mutable-by-default model (PR #2227) for config layout - Keep gateway token externalization (PR #2378) in Dockerfile and start scripts - Remove config set/rotate-token from sandbox-config.ts (PR #2227 removes host-side mutation) - Keep SSRF blocklist in private-networks.ts (PR #2324) for remaining consumers - Update sandbox-provisioning tests for the unified .openclaw layout - Delete test-skip-permissions-policy.sh (feature removed by PR #2227)
## Summary - Reverts 51aa6af (`feat(security): externalize gateway auth token from openclaw.json (#2378)`) - The externalized token path breaks `openclaw tui` inside the sandbox — OpenClaw 2026.4.9 requires `OPENCLAW_GATEWAY_TOKEN` but the runtime injection fails under Landlock (non-root mode) and the token is no longer in `openclaw.json` where the TUI and gateway can read it - Restores build-time token generation in `openclaw.json` so gateways authenticate out-of-the-box again - The token externalization will be re-introduced in a separate PR with deeper testing across root/non-root modes and OpenClaw 2026.4.9 Fixes #2480 ## Test plan - [x] `npm run typecheck:cli` passes - [x] `npx vitest run --project cli` — 2110 tests pass - [x] All pre-commit and pre-push hooks pass - [ ] Verify `openclaw tui` works inside sandbox after rebuild - [ ] Verify gateway auth works on Spark (non-root mode) - [ ] Verify gateway auth works in root mode <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Clarified security guidance: gateway auth tokens are stored in the sandbox configuration and risk notes updated. * **Changes** * Token generation moved earlier in the image/build process so auth is present in the sandbox config at runtime. * Runtime token retrieval simplified and connection instructions updated. * Gateway token is exported to an environment variable and persisted/removed in users' shell profiles. * **Tests** * Tests updated to validate token export, persistence, and retrieval behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…IDIA#2378) ## Summary Move the gateway auth token out of `openclaw.json` (readable by the sandbox agent) and generate it at container startup instead. The token is now written to a runtime file and passed via `OPENCLAW_GATEWAY_TOKEN` env var only to the gateway process. **Root mode (full privilege separation):** The token is written to `/run/nemoclaw/gateway-token` owned by `gateway:gateway 0400`. The sandbox user cannot access it: the file is owned by a different uid, the env var is only in the gateway process (`/proc/pid/environ` is uid-gated), and `no-new-privileges` prevents escalation. **Non-root mode (privilege separation disabled):** The token is written to `/tmp/.runtime/nemoclaw/gateway-token` owned by `sandbox:sandbox`. Because privilege separation is not available in this mode, the sandbox user can read the token file directly. This is a known limitation consistent with the existing non-root security posture — other trust-boundary files in `/tmp` have the same caveat (see `sandbox-init.sh` trust boundary map). The improvement in non-root mode is that the token no longer appears in `openclaw.json`, shell rc files, or the process environment. ## Changes - **Dockerfile**: Build `openclaw.json` with an empty auth token placeholder; remove `secrets` import - **scripts/nemoclaw-start.sh**: Replace `export_gateway_token()` (which wrote tokens into `.bashrc`/`.profile`) with `generate_gateway_token()` that writes to `/run/nemoclaw/gateway-token` and pass token via env var only to the gateway process launch line - **src/lib/onboard.ts**: `fetchGatewayAuthTokenFromSandbox()` now reads from `/run/nemoclaw/gateway-token` first, falls back to `openclaw.json` for pre-externalization images; update dashboard token guidance - **src/nemoclaw.ts**: Remove `MIN_LOGS_OPENSHELL_VERSION` gating and `printOldLogsCompatibilityGuidance()`; simplify `sandboxLogs()` - **docs/security/best-practices.md**: Document the externalized token architecture and update risk descriptions - **spark-install.md**: Update token location guidance - **test/cli.test.ts**, **test/nemoclaw-start.test.ts**: Update tests for new token flow Also includes unrelated changes already merged to main: - ci(nightly-e2e): replace per-job failure check with wildcard `contains()` - refactor(e2e): centralize timeout/gtimeout handling into `e2e-timeout.sh` - fix: stream sandbox logs via tail - chore(deps): bump setup-uv, setup-qemu-action - fix(e2e): make nightly teardown reliable - fix(dockerfile): drop invalid `channels.defaults.configWrites` ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [x] Code change with doc updates ## Verification - [x] `npx prek run --all-files` passes - [x] `npm test` passes (branch-modified tests: 127/127 pass) - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes ## AI Disclosure - [x] AI-assisted — tool: Claude Code --- Signed-off-by: Aaron Erickson <aerickson@nvidia.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
## Summary - Reverts 51aa6af (`feat(security): externalize gateway auth token from openclaw.json (NVIDIA#2378)`) - The externalized token path breaks `openclaw tui` inside the sandbox — OpenClaw 2026.4.9 requires `OPENCLAW_GATEWAY_TOKEN` but the runtime injection fails under Landlock (non-root mode) and the token is no longer in `openclaw.json` where the TUI and gateway can read it - Restores build-time token generation in `openclaw.json` so gateways authenticate out-of-the-box again - The token externalization will be re-introduced in a separate PR with deeper testing across root/non-root modes and OpenClaw 2026.4.9 Fixes NVIDIA#2480 ## Test plan - [x] `npm run typecheck:cli` passes - [x] `npx vitest run --project cli` — 2110 tests pass - [x] All pre-commit and pre-push hooks pass - [ ] Verify `openclaw tui` works inside sandbox after rebuild - [ ] Verify gateway auth works on Spark (non-root mode) - [ ] Verify gateway auth works in root mode <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Clarified security guidance: gateway auth tokens are stored in the sandbox configuration and risk notes updated. * **Changes** * Token generation moved earlier in the image/build process so auth is present in the sandbox config at runtime. * Runtime token retrieval simplified and connection instructions updated. * Gateway token is exported to an environment variable and persisted/removed in users' shell profiles. * **Tests** * Tests updated to validate token export, persistence, and retrieval behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Move the gateway auth token out of
openclaw.json(readable by the sandbox agent) and generate it at container startup instead. The token is now written to a runtime file and passed viaOPENCLAW_GATEWAY_TOKENenv var only to the gateway process.Root mode (full privilege separation): The token is written to
/run/nemoclaw/gateway-tokenowned bygateway:gateway 0400. The sandbox user cannot access it: the file is owned by a different uid, the env var is only in the gateway process (/proc/pid/environis uid-gated), andno-new-privilegesprevents escalation.Non-root mode (privilege separation disabled): The token is written to
/tmp/.runtime/nemoclaw/gateway-tokenowned bysandbox:sandbox. Because privilege separation is not available in this mode, the sandbox user can read the token file directly. This is a known limitation consistent with the existing non-root security posture — other trust-boundary files in/tmphave the same caveat (seesandbox-init.shtrust boundary map). The improvement in non-root mode is that the token no longer appears inopenclaw.json, shell rc files, or the process environment.Changes
openclaw.jsonwith an empty auth token placeholder; removesecretsimportexport_gateway_token()(which wrote tokens into.bashrc/.profile) withgenerate_gateway_token()that writes to/run/nemoclaw/gateway-tokenand pass token via env var only to the gateway process launch linefetchGatewayAuthTokenFromSandbox()now reads from/run/nemoclaw/gateway-tokenfirst, falls back toopenclaw.jsonfor pre-externalization images; update dashboard token guidanceMIN_LOGS_OPENSHELL_VERSIONgating andprintOldLogsCompatibilityGuidance(); simplifysandboxLogs()Also includes unrelated changes already merged to main:
contains()e2e-timeout.shchannels.defaults.configWritesType of Change
Verification
npx prek run --all-filespassesnpm testpasses (branch-modified tests: 127/127 pass)AI Disclosure
Signed-off-by: Aaron Erickson aerickson@nvidia.com