fix(security): revert gateway auth token externalization by ericksoa · Pull Request #2482 · NVIDIA/NemoClaw

ericksoa · 2026-04-25T21:20:45Z

Summary

Reverts 51aa6af (feat(security): externalize gateway auth token from openclaw.json (#2378))
The externalized token path breaks openclaw tui inside the sandbox — OpenClaw 2026.4.9 requires OPENCLAW_GATEWAY_TOKEN but the runtime injection fails under Landlock (non-root mode) and the token is no longer in openclaw.json where the TUI and gateway can read it
Restores build-time token generation in openclaw.json so gateways authenticate out-of-the-box again
The token externalization will be re-introduced in a separate PR with deeper testing across root/non-root modes and OpenClaw 2026.4.9

Test plan

npm run typecheck:cli passes
npx vitest run --project cli — 2110 tests pass
All pre-commit and pre-push hooks pass
Verify openclaw tui works inside sandbox after rebuild
Verify gateway auth works on Spark (non-root mode)
Verify gateway auth works in root mode

Summary by CodeRabbit

Documentation
- Clarified security guidance: gateway auth tokens are stored in the sandbox configuration and risk notes updated.
Changes
- Token generation moved earlier in the image/build process so auth is present in the sandbox config at runtime.
- Runtime token retrieval simplified and connection instructions updated.
- Gateway token is exported to an environment variable and persisted/removed in users' shell profiles.
Tests
- Tests updated to validate token export, persistence, and retrieval behavior.

Reverts 51aa6af. The externalized token path breaks `openclaw tui` inside the sandbox — OpenClaw 2026.4.9 requires OPENCLAW_GATEWAY_TOKEN but the runtime injection fails under Landlock (non-root) and the token is no longer in openclaw.json where the TUI can read it. Restores build-time token generation in openclaw.json so gateways authenticate out-of-the-box again. The externalization will be re-introduced in a separate PR with deeper testing. Fixes #2480

coderabbitai · 2026-04-25T21:20:55Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 618627f2-b577-4814-aa8c-c3c5e421c14e

📥 Commits

Reviewing files that changed from the base of the PR and between 4752d10 and 72ca3a0.

📒 Files selected for processing (1)

Dockerfile

🚧 Files skipped from review as they are similar to previous changes (1)

Dockerfile

📝 Walkthrough

Walkthrough

Gateway token handling was changed: a per-build random token is embedded into /sandbox/.openclaw/openclaw.json at image build. Runtime reads gateway.auth.token from that file and exports OPENCLAW_GATEWAY_TOKEN (persisting to user rc files) instead of creating a separate external token file; host-side retrieval relies only on the config path.

Changes

Cohort / File(s)	Summary
Documentation & Build `\.agents/skills/nemoclaw-user-configure-security/references/best-practices.md`, `docs/security/best-practices.md`, `Dockerfile`	Docs updated to state tokens reside in `.openclaw/openclaw.json`. Dockerfile now generates and embeds a per-build random gateway token (`secrets.token_hex(32)`) into `openclaw.json`, removing runtime token-generation/cleanup steps and related comments.
Runtime / Startup Script `scripts/nemoclaw-start.sh`	Replaced external token file flow with `_read_gateway_token()` that parses `gateway.auth.token` from `/sandbox/.openclaw/openclaw.json`. Added `export_gateway_token()` to export `OPENCLAW_GATEWAY_TOKEN` and persist/remove marked export blocks in `${_SANDBOX_HOME}/.bashrc` and `${_SANDBOX_HOME}/.profile`; startup flows updated to call this.
Host-side Onboard Logic `src/lib/onboard.ts`	Removed kubectl-exec and temp-file search fallbacks; `fetchGatewayAuthTokenFromSandbox` now uses only the openclaw.json download path. Updated fallback help text to instruct manual `jq` extraction from `/sandbox/.openclaw/openclaw.json`.
Tests `test/nemoclaw-start.test.ts`	Reworked tests to validate `export_gateway_token` behavior: rc-file marker persistence/removal, shared `_read_gateway_token()` usage, Python `with open(...)` read, shell-escaping, empty-token unset behavior, and updated startup sequencing expectations.

Sequence Diagram(s)

sequenceDiagram
    actor Build as Build Time
    participant Docker as Dockerfile
    participant Config as /sandbox/.openclaw/openclaw.json
    participant StartSh as scripts/nemoclaw-start.sh
    participant RcFiles as .bashrc/.profile
    participant UserShell as User Interactive Shell
    participant TUI as openclaw tui

    Build->>Docker: generate per-build random token (secrets.token_hex(32))
    Docker->>Config: embed token in openclaw.json (gateway.auth.token)

    Note over StartSh,Config: container starts
    StartSh->>Config: _read_gateway_token() parses gateway.auth.token
    Config-->>StartSh: token value
    StartSh->>StartSh: export OPENCLAW_GATEWAY_TOKEN
    StartSh->>RcFiles: write/remove marked export blocks via export_gateway_token()
    RcFiles->>UserShell: rc files sourced on new shell
    UserShell->>TUI: openclaw tui (reads $OPENCLAW_GATEWAY_TOKEN)
    TUI-->>TUI: gateway authentication proceeds

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat(security): externalize gateway auth token from openclaw.json #2378: Conflicting gateway-token handling changes—externalizes token to /run/nemoclaw/gateway-token versus this PR’s embedding in /sandbox/.openclaw/openclaw.json, touching Dockerfile, scripts, onboard logic, docs, and tests.

Suggested labels

security

Suggested reviewers

brandonpelfrey

Poem

🐰 A tiny token tucked in JSON bright,
Built at image time in the quiet night.
At boot I hop out, export with care,
I nest in rc files so shells find me there,
OpenClaw tui greets me—now we're square. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(security): revert gateway auth token externalization' directly summarizes the main change—reverting a previous externalization of the gateway auth token.
Linked Issues check	✅ Passed	The PR addresses issue `#2480` by restoring build-time token generation in openclaw.json, ensuring the token is available for `openclaw tui` and the gateway to authenticate without manual intervention.
Out of Scope Changes check	✅ Passed	All changes are scoped to reverting externalized gateway token handling: documentation updates, Dockerfile changes, token reading logic, and test updates align directly with the fix objective.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch revert/gateway-token-externalization

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Dockerfile`:
- Around line 230-232: The ARG NEMOCLAW_BUILD_ID is declared but never used, so
changing it does not invalidate the token-generation layer; update the
token-generation layer that creates the gateway token (the "token-generation"
RUN/step) to consume NEMOCLAW_BUILD_ID (e.g., reference it in that RUN via ENV
or a no-op echo/printf) so Docker sees the build-arg changes and busts the
cache; ensure you reference ARG NEMOCLAW_BUILD_ID before the token-generation
RUN and use the variable name NEMOCLAW_BUILD_ID in that step so token
regeneration runs on each build-arg change.

In `@scripts/nemoclaw-start.sh`:
- Around line 621-660: The startup currently aborts if writing
${_SANDBOX_HOME}/.bashrc or .profile fails when persisting
OPENCLAW_GATEWAY_TOKEN (snippet using marker_begin/marker_end), which breaks
non-root/sandboxed runs; change the logic to make rc-file writes best-effort by
routing token persistence through the existing /tmp sourced-file pattern (create
a /tmp/openclaw-env-<uid>.sh containing the snippet and ensure rc files source
that file if writable), and if you must directly update ${_SANDBOX_HOME}/.bashrc
or .profile only attempt writes when they are writable and swallow failures (do
not let errors from cat >"$rc_file" or printf >>"$rc_file" abort startup),
leaving the export OPENCLAW_GATEWAY_TOKEN="$token" in the current process
unconditional so gateway startup never depends on rc file writes.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9b2a9d79-dfe8-4da3-93e1-2c11cc9ba0b2

📥 Commits

Reviewing files that changed from the base of the PR and between cc15689 and 1e497c6.

📒 Files selected for processing (6)

.agents/skills/nemoclaw-user-configure-security/references/best-practices.md
Dockerfile
docs/security/best-practices.md
scripts/nemoclaw-start.sh
src/lib/onboard.ts
test/nemoclaw-start.test.ts

…writes The reverted export_gateway_token code predates the Landlock fix in a54f9a3 and lacks || true guards on .bashrc/.profile writes. Under Landlock enforcement, DAC check ([ -w file ]) passes but the actual write is blocked, crashing the entrypoint under set -e — the exact same failure pattern that caused the 5-day non-root outage. Apply the same || true + continue pattern used in install_configure_guard.

NEMOCLAW_BUILD_ID was declared as an ARG but never referenced by any downstream instruction, so changing it via --build-arg had no effect on Docker layer caching. Reference it on the token-generation RUN line so Docker sees the value change and invalidates the cached layer, ensuring each build produces a fresh gateway auth token. Pre-existing issue surfaced by CodeRabbit review.

…d cache (#2483) ## Summary - Fixes 4x build time regression on Spark (400s+ → ~100s) caused by `NEMOCLAW_BUILD_ID` cache-busting the config generation layer, which invalidated the expensive `openclaw doctor --fix` + `openclaw plugins install` layer on every build - Splits token generation into two steps: config layer writes a placeholder (cacheable), then a late layer injects `secrets.token_hex(32)` (cache-busted but trivially fast) - The doctor/plugins layer no longer rebuilds on every build Depends on #2482 ## Test plan - [x] `npx vitest run --project cli` — 1947 tests pass (ssrf-parity skip is pre-existing, needs plugin build) - [x] All pre-commit and pre-push hooks pass - [ ] Verify build time improvement on Spark  ## Summary by CodeRabbit * **Chores** * Optimized Docker image build layers to improve caching efficiency while ensuring unique credentials are generated for each build.

Resolves conflicts in Dockerfile and test/nemoclaw-start.test.ts. - Dockerfile config-generation block: kept the externalized scripts/generate-openclaw-config.py invocation (the PR's purpose) and dropped the inline python3 -c block from main. - Dockerfile token step: dropped the PR's --clear-token step and took main's late-layer secrets.token_hex(32) injection (#2482 reverted gateway auth token externalization, so the token is again baked at build time). - scripts/generate-openclaw-config.py: ported the inference_inputs parsing (#2441) and channel healthMonitor field from main; removed the now-obsolete --clear-token mode. - test/nemoclaw-start.test.ts: took main's version, since the PR's token-externalization regression tests no longer match main's reverted design. - test/generate-openclaw-config.test.ts: removed the --clear-token test cases.

## Summary Refreshes user-facing docs for the last 24 hours of merged NemoClaw history and bumps the docs metadata to 0.0.29, the next version after v0.0.28. The updates are limited to behavior supported by merged PR descriptions and diffs. ## Changes - `docs/reference/commands.md`: documented `nemoclaw <name> policy-add --from-file` and `--from-dir`, including custom preset review guidance, from #2077 / commit `7720b175`. - `docs/deployment/deploy-to-remote-gpu.md`: clarified that non-loopback `CHAT_UI_URL` disables OpenClaw device pairing for remote browser-only deployments, from #2449 / commit `f5ee8a4d`. - `docs/inference/inference-options.md`: documented provider-aware credential retry validation and the NVIDIA-only `nvapi-` prefix check, from #2389 / commit `6f7f0c6d`. - `docs/inference/switch-inference-providers.md`: documented `NEMOCLAW_INFERENCE_INPUTS` for text/image-capable model metadata baked into `openclaw.json`, from #2441 / commit `f4391892`. - `docs/reference/troubleshooting.md`: added the Git certificate verification entry for proxy CA propagation through `GIT_SSL_CAINFO`, `GIT_SSL_CAPATH`, `CURL_CA_BUNDLE`, and `REQUESTS_CA_BUNDLE`, from #2345 / commit `fa0dc1ab`. - `docs/versions1.json` and `docs/project.json`: promoted docs version `0.0.29`; `docs/versions1.json` omits unpublished `0.0.26`, `0.0.27`, and `0.0.28` entries. - `.agents/skills/nemoclaw-user-*`: regenerated derived user skill references from the updated docs. - Reviewed with no extra doc changes: #2575 / `d392ec07`, #2565 / `a3231049`, #1965 / `db1ef3ca`, #1990 / `db665834`, #2495 / `7da86fa3`, #2496 / `3192f4f4`, #2490 / `8c209058`, #2487 / `1f615e2f`, #2483 / `5653d33a`, #2482 / `31c782c0`, #2464 / `23bb5703`, #2472 / `a54f9a34`, and #2437 / `6bc860d7`. - Skipped per docs policy: #2420 / `7b76df6b` touched the experimental sandbox config path listed in `docs/.docs-skip`; #2466 / `cc15689c` touched a skipped term and CI-only sandbox image files. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification  - [x] `npx prek run --all-files` passes - [ ] `npm test` passes — failed locally in installer-integration tests and one onboard helper timeout; the doc-scoped hook test projects passed under `prek`. - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) — build succeeded, but local Sphinx emitted the existing version-switcher file read message. - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ## AI Disclosure  - [x] AI-assisted — tool: Codex ---  Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>  ## Summary by CodeRabbit * **New Features** * Support for custom YAML presets in policy configuration via --from-file and --from-dir. * New build-time inference input option to declare accepted modalities (text or text,image). * **Improvements** * Credential validation now offers interactive recovery: re-enter key, retry, choose another provider, or exit. * Clarified provider-specific API key prefix handling (nvapi- only applies to NVIDIA keys). * **Documentation** * TLS certificate troubleshooting for inspected networks. * Clarified remote dashboard security/device-pairing behavior; command docs updated; docs version bumped.  --------- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

…#2615) ## Summary Add automated E2E test recommendations to PR reviews and selective job dispatch to the nightly E2E workflow. Closes #2564 (Phases 1–3). ## What changed ### 1. CodeRabbit `path_instructions` for E2E recommendations (`.coderabbit.yaml`) 15 new `path_instructions` entries map sensitive file paths to the nightly E2E jobs that exercise them. When a PR touches a mapped path, CodeRabbit posts a review comment recommending specific jobs and a copy-pasteable `gh workflow run` command. | Path Pattern | Recommended Jobs | |-------------|-----------------| | `scripts/nemoclaw-start.sh`, `scripts/lib/sandbox-init.sh` | `sandbox-survival-e2e`, `sandbox-operations-e2e`, `cloud-e2e` | | `Dockerfile`, `Dockerfile.base` | `cloud-e2e`, `sandbox-survival-e2e`, `hermes-e2e`, `rebuild-openclaw-e2e` | | `nemoclaw-blueprint/scripts/http-proxy-fix.js` | `cloud-e2e`, `inference-routing-e2e` | | `src/lib/onboard.ts` | `cloud-e2e`, `sandbox-operations-e2e`, `rebuild-openclaw-e2e` | | `src/nemoclaw.ts` | `sandbox-survival-e2e`, `sandbox-operations-e2e`, `skip-permissions-e2e` | | `src/lib/cluster-image-patch.ts`, `src/lib/preflight.ts` | `overlayfs-autofix-e2e` | | `src/lib/deploy.ts` | `deployment-services-e2e` | | `src/lib/sandbox-state.ts` | `snapshot-commands-e2e`, `rebuild-openclaw-e2e` | | `src/lib/shields*.ts` | `shields-config-e2e` | | `agents/hermes/**` | `hermes-e2e`, `rebuild-hermes-e2e` | | `nemoclaw-blueprint/policies/**` | `network-policy-e2e`, `skip-permissions-e2e` | | `.github/workflows/nightly-e2e.yaml` | Reminds to add CodeRabbit coverage for new jobs | ### 2. Selective job dispatch (`nightly-e2e.yaml`) Added a `jobs` input to `workflow_dispatch` so maintainers can run a subset of nightly jobs on any branch: ``` gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=sandbox-survival-e2e,sandbox-operations-e2e ``` - All 18 E2E jobs get a conditional guard: unselected jobs are skipped - Empty `jobs` input (or scheduled runs) still runs everything - `notify-on-failure` is unaffected: skipped jobs produce `result: 'skipped'`, not `'failure'` ### 3. Cross-validation test (`test/validate-e2e-coverage.test.ts`) Keeps the mapping up to date as files and jobs evolve: | Assertion | What it catches | |-----------|----------------| | Job names in CodeRabbit match `nightly-e2e.yaml` | Renamed/removed jobs | | Path globs match at least one file on disk | Renamed/deleted source files | | Every nightly job has selective dispatch guard | New jobs added without the `if:` pattern | | Advisory: nightly jobs with no CodeRabbit coverage | New jobs added without `path_instructions` | ## Validation - [x] All 4 cross-validation tests pass locally - [x] Existing `validate-config-schemas` tests still pass - [x] Selective dispatch validated: [run 25052625486](https://github.com/NVIDIA/NemoClaw/actions/runs/25052625486) — triggered with `-f jobs=diagnostics-e2e`, 17/18 jobs correctly skipped - [x] `notify-on-failure` does not false-alarm on selective run — [run 25052625486](https://github.com/NVIDIA/NemoClaw/actions/runs/25052625486) confirmed: `notify-on-failure` was skipped (not triggered) - [ ] CodeRabbit posts recommendations on a PR touching a mapped file (post-merge validation) ## Context - Issue: #2564 - Weekend incident: #2471, #2472, #2482, #2490 - E2E strategy: `cloud-experimental-e2e` removal in #2472 left a coverage gap that would have been flagged by these recommendations  ## Summary by CodeRabbit * **Chores** * Expanded review automation to map sensitive paths to targeted nightly E2E jobs and inject instructions for running relevant subsets. * Added manual workflow dispatch allowing selective E2E job execution via a jobs input. * **New Features** * Added a reporting step that, on manual runs, posts a PR comment summarizing passed/failed/skipped E2E jobs. * **Tests** * Added a validation suite that cross-checks review-to-workflow mappings and dispatch guards, warning on uncovered jobs.  ### 4. Substring match fix (`nightly-e2e.yaml`) CodeRabbit review correctly identified that `contains(inputs.jobs, 'cloud-e2e')` performs substring matching — e.g., passing `jobs=e2e` would match every job. All 18 job guards now use delimiter-wrapping: ```yaml contains(format(',{0},', inputs.jobs), ',<job-name>,') ``` This ensures exact token matching within the comma-separated input. The cross-validation test was updated to enforce the new pattern.

## Summary `scripts/brev-launchable-ci-cpu.sh` is the community install path for Brev users — it bootstraps a VM with Docker, Node.js, OpenShell, and NemoClaw. **That script already exists in the repo but has zero CI coverage.** This PR adds a nightly E2E smoke test that validates the script works end-to-end. This is the long-living safety net for the community install flow. If any regression breaks the launchable script (e.g., the Apr 20–25 Brev outage from #2472/#2482, or the container reachability fallback from #2425), this test catches it before community users are affected. ## Related Issue Closes #2599 Related: #2425 (the `isProxyHealthy()` fallback in PR #2453 — if that regresses, onboard will abort on Brev and this smoke test catches it) ## Changes ### New: `test/e2e/test-launchable-smoke.sh` | Phase | What it validates | |-------|-------------------| | 0 | Pre-cleanup + pre-seed clone directory from checkout | | 1 | Prerequisites (Docker, NVIDIA_API_KEY, network, env vars) | | 2 | Run `brev-launchable-ci-cpu.sh` — the existing community bootstrap script | | 3 | Verify artifacts (nemoclaw, openshell, Node.js, Docker, sentinel file, built outputs) | | 4 | `nemoclaw onboard --non-interactive` with cloud provider | | 5 | Sandbox health (list, status, inference config, gateway) | | 6 | Live inference (direct API, routing via inference.local, openclaw agent 6×7=42) | | 7 | Destroy + cleanup | Key design decisions: - **No BREV_API_TOKEN needed** — the launchable script is a generic Ubuntu bootstrap with zero Brev dependencies, so it runs on standard GitHub-hosted `ubuntu-latest` runners - **Tests current code, not main** — pre-seeds the clone directory from the CI checkout so regressions are caught before reaching community users - **Follows existing E2E conventions** — pass/fail/section helpers, e2e-timeout.sh self-wrap, sandbox-teardown.sh EXIT trap, parse_chat_content() for reasoning models ### Modified: `.github/workflows/nightly-e2e.yaml` - Added `launchable-smoke-e2e` job: `ubuntu-latest`, 30min timeout, `NVIDIA_API_KEY` secret - Uploads install/onboard/test logs as artifacts on failure - Added to `notify-on-failure` needs list ## Validation Triggered via fork dispatch (`jyaunches/NemoClaw` → `sparky-dispatch` → `launchable-smoke`): - **Run:** https://github.com/jyaunches/NemoClaw/actions/runs/25075715342 - **Result:** ✅ 24 passed, 0 failed, 1 skipped (Node.js version — GH runner pre-installs Node 20) - **Runtime:** ~12 minutes ## Type of Change - [x] Code change (feature, bug fix, or refactor) ## Checklist - [x] Follows project coding conventions - [x] Tests pass locally or in CI - [x] No secrets/credentials committed  ## Summary by CodeRabbit * **New Features** * Added an end-to-end smoke test and CI job that validates the community launchable CPU install path (install, onboarding, runtime readiness, and a simple inference check). CI now uploads install/onboard/test logs on failures. * **Chores** * Renamed the branch-validation workflow and corresponding test-suite identifiers for clarity. * Updated E2E test documentation and project configuration names to match the new labeling.

## Summary - Reverts 51aa6af (`feat(security): externalize gateway auth token from openclaw.json (NVIDIA#2378)`) - The externalized token path breaks `openclaw tui` inside the sandbox — OpenClaw 2026.4.9 requires `OPENCLAW_GATEWAY_TOKEN` but the runtime injection fails under Landlock (non-root mode) and the token is no longer in `openclaw.json` where the TUI and gateway can read it - Restores build-time token generation in `openclaw.json` so gateways authenticate out-of-the-box again - The token externalization will be re-introduced in a separate PR with deeper testing across root/non-root modes and OpenClaw 2026.4.9 Fixes NVIDIA#2480 ## Test plan - [x] `npm run typecheck:cli` passes - [x] `npx vitest run --project cli` — 2110 tests pass - [x] All pre-commit and pre-push hooks pass - [ ] Verify `openclaw tui` works inside sandbox after rebuild - [ ] Verify gateway auth works on Spark (non-root mode) - [ ] Verify gateway auth works in root mode  ## Summary by CodeRabbit * **Documentation** * Clarified security guidance: gateway auth tokens are stored in the sandbox configuration and risk notes updated. * **Changes** * Token generation moved earlier in the image/build process so auth is present in the sandbox config at runtime. * Runtime token retrieval simplified and connection instructions updated. * Gateway token is exported to an environment variable and persisted/removed in users' shell profiles. * **Tests** * Tests updated to validate token export, persistence, and retrieval behavior.

…d cache (NVIDIA#2483) ## Summary - Fixes 4x build time regression on Spark (400s+ → ~100s) caused by `NEMOCLAW_BUILD_ID` cache-busting the config generation layer, which invalidated the expensive `openclaw doctor --fix` + `openclaw plugins install` layer on every build - Splits token generation into two steps: config layer writes a placeholder (cacheable), then a late layer injects `secrets.token_hex(32)` (cache-busted but trivially fast) - The doctor/plugins layer no longer rebuilds on every build Depends on NVIDIA#2482 ## Test plan - [x] `npx vitest run --project cli` — 1947 tests pass (ssrf-parity skip is pre-existing, needs plugin build) - [x] All pre-commit and pre-push hooks pass - [ ] Verify build time improvement on Spark  ## Summary by CodeRabbit * **Chores** * Optimized Docker image build layers to improve caching efficiency while ensuring unique credentials are generated for each build.

## Summary Refreshes user-facing docs for the last 24 hours of merged NemoClaw history and bumps the docs metadata to 0.0.29, the next version after v0.0.28. The updates are limited to behavior supported by merged PR descriptions and diffs. ## Changes - `docs/reference/commands.md`: documented `nemoclaw <name> policy-add --from-file` and `--from-dir`, including custom preset review guidance, from NVIDIA#2077 / commit `7720b175`. - `docs/deployment/deploy-to-remote-gpu.md`: clarified that non-loopback `CHAT_UI_URL` disables OpenClaw device pairing for remote browser-only deployments, from NVIDIA#2449 / commit `f5ee8a4d`. - `docs/inference/inference-options.md`: documented provider-aware credential retry validation and the NVIDIA-only `nvapi-` prefix check, from NVIDIA#2389 / commit `6f7f0c6d`. - `docs/inference/switch-inference-providers.md`: documented `NEMOCLAW_INFERENCE_INPUTS` for text/image-capable model metadata baked into `openclaw.json`, from NVIDIA#2441 / commit `f4391892`. - `docs/reference/troubleshooting.md`: added the Git certificate verification entry for proxy CA propagation through `GIT_SSL_CAINFO`, `GIT_SSL_CAPATH`, `CURL_CA_BUNDLE`, and `REQUESTS_CA_BUNDLE`, from NVIDIA#2345 / commit `fa0dc1ab`. - `docs/versions1.json` and `docs/project.json`: promoted docs version `0.0.29`; `docs/versions1.json` omits unpublished `0.0.26`, `0.0.27`, and `0.0.28` entries. - `.agents/skills/nemoclaw-user-*`: regenerated derived user skill references from the updated docs. - Reviewed with no extra doc changes: NVIDIA#2575 / `d392ec07`, NVIDIA#2565 / `a3231049`, NVIDIA#1965 / `db1ef3ca`, NVIDIA#1990 / `db665834`, NVIDIA#2495 / `7da86fa3`, NVIDIA#2496 / `3192f4f4`, NVIDIA#2490 / `8c209058`, NVIDIA#2487 / `1f615e2f`, NVIDIA#2483 / `5653d33a`, NVIDIA#2482 / `31c782c0`, NVIDIA#2464 / `23bb5703`, NVIDIA#2472 / `a54f9a34`, and NVIDIA#2437 / `6bc860d7`. - Skipped per docs policy: NVIDIA#2420 / `7b76df6b` touched the experimental sandbox config path listed in `docs/.docs-skip`; NVIDIA#2466 / `cc15689c` touched a skipped term and CI-only sandbox image files. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification  - [x] `npx prek run --all-files` passes - [ ] `npm test` passes — failed locally in installer-integration tests and one onboard helper timeout; the doc-scoped hook test projects passed under `prek`. - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) — build succeeded, but local Sphinx emitted the existing version-switcher file read message. - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ## AI Disclosure  - [x] AI-assisted — tool: Codex ---  Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>  ## Summary by CodeRabbit * **New Features** * Support for custom YAML presets in policy configuration via --from-file and --from-dir. * New build-time inference input option to declare accepted modalities (text or text,image). * **Improvements** * Credential validation now offers interactive recovery: re-enter key, retry, choose another provider, or exit. * Clarified provider-specific API key prefix handling (nvapi- only applies to NVIDIA keys). * **Documentation** * TLS certificate troubleshooting for inspected networks. * Clarified remote dashboard security/device-pairing behavior; command docs updated; docs version bumped.  --------- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

…NVIDIA#2615) ## Summary Add automated E2E test recommendations to PR reviews and selective job dispatch to the nightly E2E workflow. Closes NVIDIA#2564 (Phases 1–3). ## What changed ### 1. CodeRabbit `path_instructions` for E2E recommendations (`.coderabbit.yaml`) 15 new `path_instructions` entries map sensitive file paths to the nightly E2E jobs that exercise them. When a PR touches a mapped path, CodeRabbit posts a review comment recommending specific jobs and a copy-pasteable `gh workflow run` command. | Path Pattern | Recommended Jobs | |-------------|-----------------| | `scripts/nemoclaw-start.sh`, `scripts/lib/sandbox-init.sh` | `sandbox-survival-e2e`, `sandbox-operations-e2e`, `cloud-e2e` | | `Dockerfile`, `Dockerfile.base` | `cloud-e2e`, `sandbox-survival-e2e`, `hermes-e2e`, `rebuild-openclaw-e2e` | | `nemoclaw-blueprint/scripts/http-proxy-fix.js` | `cloud-e2e`, `inference-routing-e2e` | | `src/lib/onboard.ts` | `cloud-e2e`, `sandbox-operations-e2e`, `rebuild-openclaw-e2e` | | `src/nemoclaw.ts` | `sandbox-survival-e2e`, `sandbox-operations-e2e`, `skip-permissions-e2e` | | `src/lib/cluster-image-patch.ts`, `src/lib/preflight.ts` | `overlayfs-autofix-e2e` | | `src/lib/deploy.ts` | `deployment-services-e2e` | | `src/lib/sandbox-state.ts` | `snapshot-commands-e2e`, `rebuild-openclaw-e2e` | | `src/lib/shields*.ts` | `shields-config-e2e` | | `agents/hermes/**` | `hermes-e2e`, `rebuild-hermes-e2e` | | `nemoclaw-blueprint/policies/**` | `network-policy-e2e`, `skip-permissions-e2e` | | `.github/workflows/nightly-e2e.yaml` | Reminds to add CodeRabbit coverage for new jobs | ### 2. Selective job dispatch (`nightly-e2e.yaml`) Added a `jobs` input to `workflow_dispatch` so maintainers can run a subset of nightly jobs on any branch: ``` gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=sandbox-survival-e2e,sandbox-operations-e2e ``` - All 18 E2E jobs get a conditional guard: unselected jobs are skipped - Empty `jobs` input (or scheduled runs) still runs everything - `notify-on-failure` is unaffected: skipped jobs produce `result: 'skipped'`, not `'failure'` ### 3. Cross-validation test (`test/validate-e2e-coverage.test.ts`) Keeps the mapping up to date as files and jobs evolve: | Assertion | What it catches | |-----------|----------------| | Job names in CodeRabbit match `nightly-e2e.yaml` | Renamed/removed jobs | | Path globs match at least one file on disk | Renamed/deleted source files | | Every nightly job has selective dispatch guard | New jobs added without the `if:` pattern | | Advisory: nightly jobs with no CodeRabbit coverage | New jobs added without `path_instructions` | ## Validation - [x] All 4 cross-validation tests pass locally - [x] Existing `validate-config-schemas` tests still pass - [x] Selective dispatch validated: [run 25052625486](https://github.com/NVIDIA/NemoClaw/actions/runs/25052625486) — triggered with `-f jobs=diagnostics-e2e`, 17/18 jobs correctly skipped - [x] `notify-on-failure` does not false-alarm on selective run — [run 25052625486](https://github.com/NVIDIA/NemoClaw/actions/runs/25052625486) confirmed: `notify-on-failure` was skipped (not triggered) - [ ] CodeRabbit posts recommendations on a PR touching a mapped file (post-merge validation) ## Context - Issue: NVIDIA#2564 - Weekend incident: NVIDIA#2471, NVIDIA#2472, NVIDIA#2482, NVIDIA#2490 - E2E strategy: `cloud-experimental-e2e` removal in NVIDIA#2472 left a coverage gap that would have been flagged by these recommendations  ## Summary by CodeRabbit * **Chores** * Expanded review automation to map sensitive paths to targeted nightly E2E jobs and inject instructions for running relevant subsets. * Added manual workflow dispatch allowing selective E2E job execution via a jobs input. * **New Features** * Added a reporting step that, on manual runs, posts a PR comment summarizing passed/failed/skipped E2E jobs. * **Tests** * Added a validation suite that cross-checks review-to-workflow mappings and dispatch guards, warning on uncovered jobs.  ### 4. Substring match fix (`nightly-e2e.yaml`) CodeRabbit review correctly identified that `contains(inputs.jobs, 'cloud-e2e')` performs substring matching — e.g., passing `jobs=e2e` would match every job. All 18 job guards now use delimiter-wrapping: ```yaml contains(format(',{0},', inputs.jobs), ',<job-name>,') ``` This ensures exact token matching within the comma-separated input. The cross-validation test was updated to enforce the new pattern.

## Summary `scripts/brev-launchable-ci-cpu.sh` is the community install path for Brev users — it bootstraps a VM with Docker, Node.js, OpenShell, and NemoClaw. **That script already exists in the repo but has zero CI coverage.** This PR adds a nightly E2E smoke test that validates the script works end-to-end. This is the long-living safety net for the community install flow. If any regression breaks the launchable script (e.g., the Apr 20–25 Brev outage from NVIDIA#2472/NVIDIA#2482, or the container reachability fallback from NVIDIA#2425), this test catches it before community users are affected. ## Related Issue Closes NVIDIA#2599 Related: NVIDIA#2425 (the `isProxyHealthy()` fallback in PR NVIDIA#2453 — if that regresses, onboard will abort on Brev and this smoke test catches it) ## Changes ### New: `test/e2e/test-launchable-smoke.sh` | Phase | What it validates | |-------|-------------------| | 0 | Pre-cleanup + pre-seed clone directory from checkout | | 1 | Prerequisites (Docker, NVIDIA_API_KEY, network, env vars) | | 2 | Run `brev-launchable-ci-cpu.sh` — the existing community bootstrap script | | 3 | Verify artifacts (nemoclaw, openshell, Node.js, Docker, sentinel file, built outputs) | | 4 | `nemoclaw onboard --non-interactive` with cloud provider | | 5 | Sandbox health (list, status, inference config, gateway) | | 6 | Live inference (direct API, routing via inference.local, openclaw agent 6×7=42) | | 7 | Destroy + cleanup | Key design decisions: - **No BREV_API_TOKEN needed** — the launchable script is a generic Ubuntu bootstrap with zero Brev dependencies, so it runs on standard GitHub-hosted `ubuntu-latest` runners - **Tests current code, not main** — pre-seeds the clone directory from the CI checkout so regressions are caught before reaching community users - **Follows existing E2E conventions** — pass/fail/section helpers, e2e-timeout.sh self-wrap, sandbox-teardown.sh EXIT trap, parse_chat_content() for reasoning models ### Modified: `.github/workflows/nightly-e2e.yaml` - Added `launchable-smoke-e2e` job: `ubuntu-latest`, 30min timeout, `NVIDIA_API_KEY` secret - Uploads install/onboard/test logs as artifacts on failure - Added to `notify-on-failure` needs list ## Validation Triggered via fork dispatch (`jyaunches/NemoClaw` → `sparky-dispatch` → `launchable-smoke`): - **Run:** https://github.com/jyaunches/NemoClaw/actions/runs/25075715342 - **Result:** ✅ 24 passed, 0 failed, 1 skipped (Node.js version — GH runner pre-installs Node 20) - **Runtime:** ~12 minutes ## Type of Change - [x] Code change (feature, bug fix, or refactor) ## Checklist - [x] Follows project coding conventions - [x] Tests pass locally or in CI - [x] No secrets/credentials committed  ## Summary by CodeRabbit * **New Features** * Added an end-to-end smoke test and CI job that validates the community launchable CPU install path (install, onboarding, runtime readiness, and a simple inference check). CI now uploads install/onboard/test logs on failures. * **Chores** * Renamed the branch-validation workflow and corresponding test-suite identifiers for clarity. * Updated E2E test documentation and project configuration names to match the new labeling.

## Summary Adds the `test-non-root-sandbox-smoke` test from #2571 — a PR-gate job that runs the production image under `-security-opt no-new-privileges` to catch #2472 and #2482 regressions, without OpenShell, NVIDIA_API_KEY, or live inference. ## Related Issue Part of #2571 ## Changes - New `test/e2e-non-root-smoke.sh` (host-side bash, no `openshell`/`nemoclaw` CLI required): - **Test 1** — entrypoint setup chain completes cleanly under `--security-opt no-new-privileges` (regression guard for # 2472; passes a `true` command via the entrypoint's `NEMOCLAW_CMD` exec path so the gateway-launch branch is bypassed and we don't need the OpenShell-managed runtime). - **Test 2** — kernel confirms `NoNewPrivs=1` inside the container (defends the test itself against silent typos in the docker flag). - New job `test-non-root-sandbox-smoke` in `.github/workflows/pr-self-hosted.yaml` — `linux-amd64-cpu4`, `timeout-minutes: 5`, `needs: build-sandbox-images`, reuses the existing `isolation-image` artifact. - Expected results: ``` my-machine@ab1-cdf40-30:~/NemoClaw$ # Run script bash test/e2e-non-root-smoke.sh TEST: 1. Entrypoint setup chain completes under --security-opt no-new-privileges PASS: entrypoint exited 0 under no-new-privileges (#2472 setup chain healthy) TEST: 2. Kernel confirms NoNewPrivs=1 inside container (defends against silent flag typos) PASS: kernel confirms NoNewPrivs=1 ======================================== Results: 2 passed, 0 failed ======================================== ``` - Upcoming plans: - **Test 3** — `openclaw tui` does not error with "Missing gateway auth token" inside a login shell under the same constraint (regression guard for # 2482) after PR #2485 is merged ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification  - [ ] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [ ] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ---  Signed-off-by: Hung Le <hple@nvidia.com>

coderabbitai Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread Dockerfile

Comment thread scripts/nemoclaw-start.sh Outdated

ericksoa added 2 commits April 25, 2026 14:26

ericksoa merged commit 31c782c into main Apr 25, 2026
39 checks passed

ericksoa mentioned this pull request Apr 25, 2026

perf(dockerfile): move token injection to late layer to preserve build cache #2483

Merged

3 tasks

ericksoa mentioned this pull request Apr 27, 2026

fix: auto-disable device auth for non-loopback URLs (#2341) #2449

Merged

coderabbitai Bot mentioned this pull request Apr 27, 2026

feat(security): runtime gateway token injection #2485

Draft

7 tasks

miyoungc mentioned this pull request Apr 28, 2026

docs: refresh daily docs for 0.0.29 #2576

Merged

13 tasks

This was referenced Apr 28, 2026

ci(e2e): add Brev Launchable install-flow smoke test #2599

Closed

feat(ci): coderabbit E2E recommendations + selective nightly dispatch #2615

Merged

jyaunches mentioned this pull request Apr 29, 2026

test(e2e): add Brev launchable install-flow smoke test #2677

Merged

4 tasks

jyaunches mentioned this pull request Apr 29, 2026

feat(ci): add non-root sandbox smoke test as PR gate #2711

Closed

hunglp6d mentioned this pull request May 7, 2026

test(e2e): add non-root sandbox smoke test #3166

Merged

12 tasks

prekshivyas mentioned this pull request May 12, 2026

[Linux][Security] proxy-env.sh still contains 99 references to deprecated nemoclaw-gateway-token.env #3117

Closed

wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(security): revert gateway auth token externalization#2482

fix(security): revert gateway auth token externalization#2482
ericksoa merged 3 commits into
mainfrom
revert/gateway-token-externalization

ericksoa commented Apr 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ericksoa commented Apr 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ericksoa commented Apr 25, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 25, 2026 •

edited

Loading