fix(security): revert gateway auth token externalization#2482
Conversation
Reverts 51aa6af. The externalized token path breaks `openclaw tui` inside the sandbox — OpenClaw 2026.4.9 requires OPENCLAW_GATEWAY_TOKEN but the runtime injection fails under Landlock (non-root) and the token is no longer in openclaw.json where the TUI can read it. Restores build-time token generation in openclaw.json so gateways authenticate out-of-the-box again. The externalization will be re-introduced in a separate PR with deeper testing. Fixes #2480
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughGateway token handling was changed: a per-build random token is embedded into Changes
Sequence Diagram(s)sequenceDiagram
actor Build as Build Time
participant Docker as Dockerfile
participant Config as /sandbox/.openclaw/openclaw.json
participant StartSh as scripts/nemoclaw-start.sh
participant RcFiles as .bashrc/.profile
participant UserShell as User Interactive Shell
participant TUI as openclaw tui
Build->>Docker: generate per-build random token (secrets.token_hex(32))
Docker->>Config: embed token in openclaw.json (gateway.auth.token)
Note over StartSh,Config: container starts
StartSh->>Config: _read_gateway_token() parses gateway.auth.token
Config-->>StartSh: token value
StartSh->>StartSh: export OPENCLAW_GATEWAY_TOKEN
StartSh->>RcFiles: write/remove marked export blocks via export_gateway_token()
RcFiles->>UserShell: rc files sourced on new shell
UserShell->>TUI: openclaw tui (reads $OPENCLAW_GATEWAY_TOKEN)
TUI-->>TUI: gateway authentication proceeds
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Dockerfile`:
- Around line 230-232: The ARG NEMOCLAW_BUILD_ID is declared but never used, so
changing it does not invalidate the token-generation layer; update the
token-generation layer that creates the gateway token (the "token-generation"
RUN/step) to consume NEMOCLAW_BUILD_ID (e.g., reference it in that RUN via ENV
or a no-op echo/printf) so Docker sees the build-arg changes and busts the
cache; ensure you reference ARG NEMOCLAW_BUILD_ID before the token-generation
RUN and use the variable name NEMOCLAW_BUILD_ID in that step so token
regeneration runs on each build-arg change.
In `@scripts/nemoclaw-start.sh`:
- Around line 621-660: The startup currently aborts if writing
${_SANDBOX_HOME}/.bashrc or .profile fails when persisting
OPENCLAW_GATEWAY_TOKEN (snippet using marker_begin/marker_end), which breaks
non-root/sandboxed runs; change the logic to make rc-file writes best-effort by
routing token persistence through the existing /tmp sourced-file pattern (create
a /tmp/openclaw-env-<uid>.sh containing the snippet and ensure rc files source
that file if writable), and if you must directly update ${_SANDBOX_HOME}/.bashrc
or .profile only attempt writes when they are writable and swallow failures (do
not let errors from cat >"$rc_file" or printf >>"$rc_file" abort startup),
leaving the export OPENCLAW_GATEWAY_TOKEN="$token" in the current process
unconditional so gateway startup never depends on rc file writes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 9b2a9d79-dfe8-4da3-93e1-2c11cc9ba0b2
📒 Files selected for processing (6)
.agents/skills/nemoclaw-user-configure-security/references/best-practices.mdDockerfiledocs/security/best-practices.mdscripts/nemoclaw-start.shsrc/lib/onboard.tstest/nemoclaw-start.test.ts
…writes The reverted export_gateway_token code predates the Landlock fix in a54f9a3 and lacks || true guards on .bashrc/.profile writes. Under Landlock enforcement, DAC check ([ -w file ]) passes but the actual write is blocked, crashing the entrypoint under set -e — the exact same failure pattern that caused the 5-day non-root outage. Apply the same || true + continue pattern used in install_configure_guard.
NEMOCLAW_BUILD_ID was declared as an ARG but never referenced by any downstream instruction, so changing it via --build-arg had no effect on Docker layer caching. Reference it on the token-generation RUN line so Docker sees the value change and invalidates the cached layer, ensuring each build produces a fresh gateway auth token. Pre-existing issue surfaced by CodeRabbit review.
…d cache (#2483) ## Summary - Fixes 4x build time regression on Spark (400s+ → ~100s) caused by `NEMOCLAW_BUILD_ID` cache-busting the config generation layer, which invalidated the expensive `openclaw doctor --fix` + `openclaw plugins install` layer on every build - Splits token generation into two steps: config layer writes a placeholder (cacheable), then a late layer injects `secrets.token_hex(32)` (cache-busted but trivially fast) - The doctor/plugins layer no longer rebuilds on every build Depends on #2482 ## Test plan - [x] `npx vitest run --project cli` — 1947 tests pass (ssrf-parity skip is pre-existing, needs plugin build) - [x] All pre-commit and pre-push hooks pass - [ ] Verify build time improvement on Spark <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Optimized Docker image build layers to improve caching efficiency while ensuring unique credentials are generated for each build. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Resolves conflicts in Dockerfile and test/nemoclaw-start.test.ts. - Dockerfile config-generation block: kept the externalized scripts/generate-openclaw-config.py invocation (the PR's purpose) and dropped the inline python3 -c block from main. - Dockerfile token step: dropped the PR's --clear-token step and took main's late-layer secrets.token_hex(32) injection (#2482 reverted gateway auth token externalization, so the token is again baked at build time). - scripts/generate-openclaw-config.py: ported the inference_inputs parsing (#2441) and channel healthMonitor field from main; removed the now-obsolete --clear-token mode. - test/nemoclaw-start.test.ts: took main's version, since the PR's token-externalization regression tests no longer match main's reverted design. - test/generate-openclaw-config.test.ts: removed the --clear-token test cases.
## Summary Refreshes user-facing docs for the last 24 hours of merged NemoClaw history and bumps the docs metadata to 0.0.29, the next version after v0.0.28. The updates are limited to behavior supported by merged PR descriptions and diffs. ## Changes - `docs/reference/commands.md`: documented `nemoclaw <name> policy-add --from-file` and `--from-dir`, including custom preset review guidance, from #2077 / commit `7720b175`. - `docs/deployment/deploy-to-remote-gpu.md`: clarified that non-loopback `CHAT_UI_URL` disables OpenClaw device pairing for remote browser-only deployments, from #2449 / commit `f5ee8a4d`. - `docs/inference/inference-options.md`: documented provider-aware credential retry validation and the NVIDIA-only `nvapi-` prefix check, from #2389 / commit `6f7f0c6d`. - `docs/inference/switch-inference-providers.md`: documented `NEMOCLAW_INFERENCE_INPUTS` for text/image-capable model metadata baked into `openclaw.json`, from #2441 / commit `f4391892`. - `docs/reference/troubleshooting.md`: added the Git certificate verification entry for proxy CA propagation through `GIT_SSL_CAINFO`, `GIT_SSL_CAPATH`, `CURL_CA_BUNDLE`, and `REQUESTS_CA_BUNDLE`, from #2345 / commit `fa0dc1ab`. - `docs/versions1.json` and `docs/project.json`: promoted docs version `0.0.29`; `docs/versions1.json` omits unpublished `0.0.26`, `0.0.27`, and `0.0.28` entries. - `.agents/skills/nemoclaw-user-*`: regenerated derived user skill references from the updated docs. - Reviewed with no extra doc changes: #2575 / `d392ec07`, #2565 / `a3231049`, #1965 / `db1ef3ca`, #1990 / `db665834`, #2495 / `7da86fa3`, #2496 / `3192f4f4`, #2490 / `8c209058`, #2487 / `1f615e2f`, #2483 / `5653d33a`, #2482 / `31c782c0`, #2464 / `23bb5703`, #2472 / `a54f9a34`, and #2437 / `6bc860d7`. - Skipped per docs policy: #2420 / `7b76df6b` touched the experimental sandbox config path listed in `docs/.docs-skip`; #2466 / `cc15689c` touched a skipped term and CI-only sandbox image files. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification <!-- Check each item you ran and confirmed. Leave unchecked items you skipped. --> - [x] `npx prek run --all-files` passes - [ ] `npm test` passes — failed locally in installer-integration tests and one onboard helper timeout; the doc-scoped hook test projects passed under `prek`. - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) — build succeeded, but local Sphinx emitted the existing version-switcher file read message. - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ## AI Disclosure <!-- If an AI agent authored or co-authored this PR, check the box and name the tool. Remove this section for fully human-authored PRs. --> - [x] AI-assisted — tool: Codex --- <!-- DCO sign-off required by CI. Run: git config user.name && git config user.email --> Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Support for custom YAML presets in policy configuration via --from-file and --from-dir. * New build-time inference input option to declare accepted modalities (text or text,image). * **Improvements** * Credential validation now offers interactive recovery: re-enter key, retry, choose another provider, or exit. * Clarified provider-specific API key prefix handling (nvapi- only applies to NVIDIA keys). * **Documentation** * TLS certificate troubleshooting for inspected networks. * Clarified remote dashboard security/device-pairing behavior; command docs updated; docs version bumped. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>
…#2615) ## Summary Add automated E2E test recommendations to PR reviews and selective job dispatch to the nightly E2E workflow. Closes #2564 (Phases 1–3). ## What changed ### 1. CodeRabbit `path_instructions` for E2E recommendations (`.coderabbit.yaml`) 15 new `path_instructions` entries map sensitive file paths to the nightly E2E jobs that exercise them. When a PR touches a mapped path, CodeRabbit posts a review comment recommending specific jobs and a copy-pasteable `gh workflow run` command. | Path Pattern | Recommended Jobs | |-------------|-----------------| | `scripts/nemoclaw-start.sh`, `scripts/lib/sandbox-init.sh` | `sandbox-survival-e2e`, `sandbox-operations-e2e`, `cloud-e2e` | | `Dockerfile`, `Dockerfile.base` | `cloud-e2e`, `sandbox-survival-e2e`, `hermes-e2e`, `rebuild-openclaw-e2e` | | `nemoclaw-blueprint/scripts/http-proxy-fix.js` | `cloud-e2e`, `inference-routing-e2e` | | `src/lib/onboard.ts` | `cloud-e2e`, `sandbox-operations-e2e`, `rebuild-openclaw-e2e` | | `src/nemoclaw.ts` | `sandbox-survival-e2e`, `sandbox-operations-e2e`, `skip-permissions-e2e` | | `src/lib/cluster-image-patch.ts`, `src/lib/preflight.ts` | `overlayfs-autofix-e2e` | | `src/lib/deploy.ts` | `deployment-services-e2e` | | `src/lib/sandbox-state.ts` | `snapshot-commands-e2e`, `rebuild-openclaw-e2e` | | `src/lib/shields*.ts` | `shields-config-e2e` | | `agents/hermes/**` | `hermes-e2e`, `rebuild-hermes-e2e` | | `nemoclaw-blueprint/policies/**` | `network-policy-e2e`, `skip-permissions-e2e` | | `.github/workflows/nightly-e2e.yaml` | Reminds to add CodeRabbit coverage for new jobs | ### 2. Selective job dispatch (`nightly-e2e.yaml`) Added a `jobs` input to `workflow_dispatch` so maintainers can run a subset of nightly jobs on any branch: ``` gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=sandbox-survival-e2e,sandbox-operations-e2e ``` - All 18 E2E jobs get a conditional guard: unselected jobs are skipped - Empty `jobs` input (or scheduled runs) still runs everything - `notify-on-failure` is unaffected: skipped jobs produce `result: 'skipped'`, not `'failure'` ### 3. Cross-validation test (`test/validate-e2e-coverage.test.ts`) Keeps the mapping up to date as files and jobs evolve: | Assertion | What it catches | |-----------|----------------| | Job names in CodeRabbit match `nightly-e2e.yaml` | Renamed/removed jobs | | Path globs match at least one file on disk | Renamed/deleted source files | | Every nightly job has selective dispatch guard | New jobs added without the `if:` pattern | | Advisory: nightly jobs with no CodeRabbit coverage | New jobs added without `path_instructions` | ## Validation - [x] All 4 cross-validation tests pass locally - [x] Existing `validate-config-schemas` tests still pass - [x] Selective dispatch validated: [run 25052625486](https://github.com/NVIDIA/NemoClaw/actions/runs/25052625486) — triggered with `-f jobs=diagnostics-e2e`, 17/18 jobs correctly skipped - [x] `notify-on-failure` does not false-alarm on selective run — [run 25052625486](https://github.com/NVIDIA/NemoClaw/actions/runs/25052625486) confirmed: `notify-on-failure` was skipped (not triggered) - [ ] CodeRabbit posts recommendations on a PR touching a mapped file (post-merge validation) ## Context - Issue: #2564 - Weekend incident: #2471, #2472, #2482, #2490 - E2E strategy: `cloud-experimental-e2e` removal in #2472 left a coverage gap that would have been flagged by these recommendations <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Expanded review automation to map sensitive paths to targeted nightly E2E jobs and inject instructions for running relevant subsets. * Added manual workflow dispatch allowing selective E2E job execution via a jobs input. * **New Features** * Added a reporting step that, on manual runs, posts a PR comment summarizing passed/failed/skipped E2E jobs. * **Tests** * Added a validation suite that cross-checks review-to-workflow mappings and dispatch guards, warning on uncovered jobs. <!-- end of auto-generated comment: release notes by coderabbit.ai --> ### 4. Substring match fix (`nightly-e2e.yaml`) CodeRabbit review correctly identified that `contains(inputs.jobs, 'cloud-e2e')` performs substring matching — e.g., passing `jobs=e2e` would match every job. All 18 job guards now use delimiter-wrapping: ```yaml contains(format(',{0},', inputs.jobs), ',<job-name>,') ``` This ensures exact token matching within the comma-separated input. The cross-validation test was updated to enforce the new pattern.
## Summary `scripts/brev-launchable-ci-cpu.sh` is the community install path for Brev users — it bootstraps a VM with Docker, Node.js, OpenShell, and NemoClaw. **That script already exists in the repo but has zero CI coverage.** This PR adds a nightly E2E smoke test that validates the script works end-to-end. This is the long-living safety net for the community install flow. If any regression breaks the launchable script (e.g., the Apr 20–25 Brev outage from #2472/#2482, or the container reachability fallback from #2425), this test catches it before community users are affected. ## Related Issue Closes #2599 Related: #2425 (the `isProxyHealthy()` fallback in PR #2453 — if that regresses, onboard will abort on Brev and this smoke test catches it) ## Changes ### New: `test/e2e/test-launchable-smoke.sh` | Phase | What it validates | |-------|-------------------| | 0 | Pre-cleanup + pre-seed clone directory from checkout | | 1 | Prerequisites (Docker, NVIDIA_API_KEY, network, env vars) | | 2 | Run `brev-launchable-ci-cpu.sh` — the existing community bootstrap script | | 3 | Verify artifacts (nemoclaw, openshell, Node.js, Docker, sentinel file, built outputs) | | 4 | `nemoclaw onboard --non-interactive` with cloud provider | | 5 | Sandbox health (list, status, inference config, gateway) | | 6 | Live inference (direct API, routing via inference.local, openclaw agent 6×7=42) | | 7 | Destroy + cleanup | Key design decisions: - **No BREV_API_TOKEN needed** — the launchable script is a generic Ubuntu bootstrap with zero Brev dependencies, so it runs on standard GitHub-hosted `ubuntu-latest` runners - **Tests current code, not main** — pre-seeds the clone directory from the CI checkout so regressions are caught before reaching community users - **Follows existing E2E conventions** — pass/fail/section helpers, e2e-timeout.sh self-wrap, sandbox-teardown.sh EXIT trap, parse_chat_content() for reasoning models ### Modified: `.github/workflows/nightly-e2e.yaml` - Added `launchable-smoke-e2e` job: `ubuntu-latest`, 30min timeout, `NVIDIA_API_KEY` secret - Uploads install/onboard/test logs as artifacts on failure - Added to `notify-on-failure` needs list ## Validation Triggered via fork dispatch (`jyaunches/NemoClaw` → `sparky-dispatch` → `launchable-smoke`): - **Run:** https://github.com/jyaunches/NemoClaw/actions/runs/25075715342 - **Result:** ✅ 24 passed, 0 failed, 1 skipped (Node.js version — GH runner pre-installs Node 20) - **Runtime:** ~12 minutes ## Type of Change - [x] Code change (feature, bug fix, or refactor) ## Checklist - [x] Follows project coding conventions - [x] Tests pass locally or in CI - [x] No secrets/credentials committed <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added an end-to-end smoke test and CI job that validates the community launchable CPU install path (install, onboarding, runtime readiness, and a simple inference check). CI now uploads install/onboard/test logs on failures. * **Chores** * Renamed the branch-validation workflow and corresponding test-suite identifiers for clarity. * Updated E2E test documentation and project configuration names to match the new labeling. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary - Reverts 51aa6af (`feat(security): externalize gateway auth token from openclaw.json (NVIDIA#2378)`) - The externalized token path breaks `openclaw tui` inside the sandbox — OpenClaw 2026.4.9 requires `OPENCLAW_GATEWAY_TOKEN` but the runtime injection fails under Landlock (non-root mode) and the token is no longer in `openclaw.json` where the TUI and gateway can read it - Restores build-time token generation in `openclaw.json` so gateways authenticate out-of-the-box again - The token externalization will be re-introduced in a separate PR with deeper testing across root/non-root modes and OpenClaw 2026.4.9 Fixes NVIDIA#2480 ## Test plan - [x] `npm run typecheck:cli` passes - [x] `npx vitest run --project cli` — 2110 tests pass - [x] All pre-commit and pre-push hooks pass - [ ] Verify `openclaw tui` works inside sandbox after rebuild - [ ] Verify gateway auth works on Spark (non-root mode) - [ ] Verify gateway auth works in root mode <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Clarified security guidance: gateway auth tokens are stored in the sandbox configuration and risk notes updated. * **Changes** * Token generation moved earlier in the image/build process so auth is present in the sandbox config at runtime. * Runtime token retrieval simplified and connection instructions updated. * Gateway token is exported to an environment variable and persisted/removed in users' shell profiles. * **Tests** * Tests updated to validate token export, persistence, and retrieval behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…d cache (NVIDIA#2483) ## Summary - Fixes 4x build time regression on Spark (400s+ → ~100s) caused by `NEMOCLAW_BUILD_ID` cache-busting the config generation layer, which invalidated the expensive `openclaw doctor --fix` + `openclaw plugins install` layer on every build - Splits token generation into two steps: config layer writes a placeholder (cacheable), then a late layer injects `secrets.token_hex(32)` (cache-busted but trivially fast) - The doctor/plugins layer no longer rebuilds on every build Depends on NVIDIA#2482 ## Test plan - [x] `npx vitest run --project cli` — 1947 tests pass (ssrf-parity skip is pre-existing, needs plugin build) - [x] All pre-commit and pre-push hooks pass - [ ] Verify build time improvement on Spark <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Optimized Docker image build layers to improve caching efficiency while ensuring unique credentials are generated for each build. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
## Summary Refreshes user-facing docs for the last 24 hours of merged NemoClaw history and bumps the docs metadata to 0.0.29, the next version after v0.0.28. The updates are limited to behavior supported by merged PR descriptions and diffs. ## Changes - `docs/reference/commands.md`: documented `nemoclaw <name> policy-add --from-file` and `--from-dir`, including custom preset review guidance, from NVIDIA#2077 / commit `7720b175`. - `docs/deployment/deploy-to-remote-gpu.md`: clarified that non-loopback `CHAT_UI_URL` disables OpenClaw device pairing for remote browser-only deployments, from NVIDIA#2449 / commit `f5ee8a4d`. - `docs/inference/inference-options.md`: documented provider-aware credential retry validation and the NVIDIA-only `nvapi-` prefix check, from NVIDIA#2389 / commit `6f7f0c6d`. - `docs/inference/switch-inference-providers.md`: documented `NEMOCLAW_INFERENCE_INPUTS` for text/image-capable model metadata baked into `openclaw.json`, from NVIDIA#2441 / commit `f4391892`. - `docs/reference/troubleshooting.md`: added the Git certificate verification entry for proxy CA propagation through `GIT_SSL_CAINFO`, `GIT_SSL_CAPATH`, `CURL_CA_BUNDLE`, and `REQUESTS_CA_BUNDLE`, from NVIDIA#2345 / commit `fa0dc1ab`. - `docs/versions1.json` and `docs/project.json`: promoted docs version `0.0.29`; `docs/versions1.json` omits unpublished `0.0.26`, `0.0.27`, and `0.0.28` entries. - `.agents/skills/nemoclaw-user-*`: regenerated derived user skill references from the updated docs. - Reviewed with no extra doc changes: NVIDIA#2575 / `d392ec07`, NVIDIA#2565 / `a3231049`, NVIDIA#1965 / `db1ef3ca`, NVIDIA#1990 / `db665834`, NVIDIA#2495 / `7da86fa3`, NVIDIA#2496 / `3192f4f4`, NVIDIA#2490 / `8c209058`, NVIDIA#2487 / `1f615e2f`, NVIDIA#2483 / `5653d33a`, NVIDIA#2482 / `31c782c0`, NVIDIA#2464 / `23bb5703`, NVIDIA#2472 / `a54f9a34`, and NVIDIA#2437 / `6bc860d7`. - Skipped per docs policy: NVIDIA#2420 / `7b76df6b` touched the experimental sandbox config path listed in `docs/.docs-skip`; NVIDIA#2466 / `cc15689c` touched a skipped term and CI-only sandbox image files. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification <!-- Check each item you ran and confirmed. Leave unchecked items you skipped. --> - [x] `npx prek run --all-files` passes - [ ] `npm test` passes — failed locally in installer-integration tests and one onboard helper timeout; the doc-scoped hook test projects passed under `prek`. - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) — build succeeded, but local Sphinx emitted the existing version-switcher file read message. - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) ## AI Disclosure <!-- If an AI agent authored or co-authored this PR, check the box and name the tool. Remove this section for fully human-authored PRs. --> - [x] AI-assisted — tool: Codex --- <!-- DCO sign-off required by CI. Run: git config user.name && git config user.email --> Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Support for custom YAML presets in policy configuration via --from-file and --from-dir. * New build-time inference input option to declare accepted modalities (text or text,image). * **Improvements** * Credential validation now offers interactive recovery: re-enter key, retry, choose another provider, or exit. * Clarified provider-specific API key prefix handling (nvapi- only applies to NVIDIA keys). * **Documentation** * TLS certificate troubleshooting for inspected networks. * Clarified remote dashboard security/device-pairing behavior; command docs updated; docs version bumped. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>
…NVIDIA#2615) ## Summary Add automated E2E test recommendations to PR reviews and selective job dispatch to the nightly E2E workflow. Closes NVIDIA#2564 (Phases 1–3). ## What changed ### 1. CodeRabbit `path_instructions` for E2E recommendations (`.coderabbit.yaml`) 15 new `path_instructions` entries map sensitive file paths to the nightly E2E jobs that exercise them. When a PR touches a mapped path, CodeRabbit posts a review comment recommending specific jobs and a copy-pasteable `gh workflow run` command. | Path Pattern | Recommended Jobs | |-------------|-----------------| | `scripts/nemoclaw-start.sh`, `scripts/lib/sandbox-init.sh` | `sandbox-survival-e2e`, `sandbox-operations-e2e`, `cloud-e2e` | | `Dockerfile`, `Dockerfile.base` | `cloud-e2e`, `sandbox-survival-e2e`, `hermes-e2e`, `rebuild-openclaw-e2e` | | `nemoclaw-blueprint/scripts/http-proxy-fix.js` | `cloud-e2e`, `inference-routing-e2e` | | `src/lib/onboard.ts` | `cloud-e2e`, `sandbox-operations-e2e`, `rebuild-openclaw-e2e` | | `src/nemoclaw.ts` | `sandbox-survival-e2e`, `sandbox-operations-e2e`, `skip-permissions-e2e` | | `src/lib/cluster-image-patch.ts`, `src/lib/preflight.ts` | `overlayfs-autofix-e2e` | | `src/lib/deploy.ts` | `deployment-services-e2e` | | `src/lib/sandbox-state.ts` | `snapshot-commands-e2e`, `rebuild-openclaw-e2e` | | `src/lib/shields*.ts` | `shields-config-e2e` | | `agents/hermes/**` | `hermes-e2e`, `rebuild-hermes-e2e` | | `nemoclaw-blueprint/policies/**` | `network-policy-e2e`, `skip-permissions-e2e` | | `.github/workflows/nightly-e2e.yaml` | Reminds to add CodeRabbit coverage for new jobs | ### 2. Selective job dispatch (`nightly-e2e.yaml`) Added a `jobs` input to `workflow_dispatch` so maintainers can run a subset of nightly jobs on any branch: ``` gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=sandbox-survival-e2e,sandbox-operations-e2e ``` - All 18 E2E jobs get a conditional guard: unselected jobs are skipped - Empty `jobs` input (or scheduled runs) still runs everything - `notify-on-failure` is unaffected: skipped jobs produce `result: 'skipped'`, not `'failure'` ### 3. Cross-validation test (`test/validate-e2e-coverage.test.ts`) Keeps the mapping up to date as files and jobs evolve: | Assertion | What it catches | |-----------|----------------| | Job names in CodeRabbit match `nightly-e2e.yaml` | Renamed/removed jobs | | Path globs match at least one file on disk | Renamed/deleted source files | | Every nightly job has selective dispatch guard | New jobs added without the `if:` pattern | | Advisory: nightly jobs with no CodeRabbit coverage | New jobs added without `path_instructions` | ## Validation - [x] All 4 cross-validation tests pass locally - [x] Existing `validate-config-schemas` tests still pass - [x] Selective dispatch validated: [run 25052625486](https://github.com/NVIDIA/NemoClaw/actions/runs/25052625486) — triggered with `-f jobs=diagnostics-e2e`, 17/18 jobs correctly skipped - [x] `notify-on-failure` does not false-alarm on selective run — [run 25052625486](https://github.com/NVIDIA/NemoClaw/actions/runs/25052625486) confirmed: `notify-on-failure` was skipped (not triggered) - [ ] CodeRabbit posts recommendations on a PR touching a mapped file (post-merge validation) ## Context - Issue: NVIDIA#2564 - Weekend incident: NVIDIA#2471, NVIDIA#2472, NVIDIA#2482, NVIDIA#2490 - E2E strategy: `cloud-experimental-e2e` removal in NVIDIA#2472 left a coverage gap that would have been flagged by these recommendations <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Expanded review automation to map sensitive paths to targeted nightly E2E jobs and inject instructions for running relevant subsets. * Added manual workflow dispatch allowing selective E2E job execution via a jobs input. * **New Features** * Added a reporting step that, on manual runs, posts a PR comment summarizing passed/failed/skipped E2E jobs. * **Tests** * Added a validation suite that cross-checks review-to-workflow mappings and dispatch guards, warning on uncovered jobs. <!-- end of auto-generated comment: release notes by coderabbit.ai --> ### 4. Substring match fix (`nightly-e2e.yaml`) CodeRabbit review correctly identified that `contains(inputs.jobs, 'cloud-e2e')` performs substring matching — e.g., passing `jobs=e2e` would match every job. All 18 job guards now use delimiter-wrapping: ```yaml contains(format(',{0},', inputs.jobs), ',<job-name>,') ``` This ensures exact token matching within the comma-separated input. The cross-validation test was updated to enforce the new pattern.
## Summary `scripts/brev-launchable-ci-cpu.sh` is the community install path for Brev users — it bootstraps a VM with Docker, Node.js, OpenShell, and NemoClaw. **That script already exists in the repo but has zero CI coverage.** This PR adds a nightly E2E smoke test that validates the script works end-to-end. This is the long-living safety net for the community install flow. If any regression breaks the launchable script (e.g., the Apr 20–25 Brev outage from NVIDIA#2472/NVIDIA#2482, or the container reachability fallback from NVIDIA#2425), this test catches it before community users are affected. ## Related Issue Closes NVIDIA#2599 Related: NVIDIA#2425 (the `isProxyHealthy()` fallback in PR NVIDIA#2453 — if that regresses, onboard will abort on Brev and this smoke test catches it) ## Changes ### New: `test/e2e/test-launchable-smoke.sh` | Phase | What it validates | |-------|-------------------| | 0 | Pre-cleanup + pre-seed clone directory from checkout | | 1 | Prerequisites (Docker, NVIDIA_API_KEY, network, env vars) | | 2 | Run `brev-launchable-ci-cpu.sh` — the existing community bootstrap script | | 3 | Verify artifacts (nemoclaw, openshell, Node.js, Docker, sentinel file, built outputs) | | 4 | `nemoclaw onboard --non-interactive` with cloud provider | | 5 | Sandbox health (list, status, inference config, gateway) | | 6 | Live inference (direct API, routing via inference.local, openclaw agent 6×7=42) | | 7 | Destroy + cleanup | Key design decisions: - **No BREV_API_TOKEN needed** — the launchable script is a generic Ubuntu bootstrap with zero Brev dependencies, so it runs on standard GitHub-hosted `ubuntu-latest` runners - **Tests current code, not main** — pre-seeds the clone directory from the CI checkout so regressions are caught before reaching community users - **Follows existing E2E conventions** — pass/fail/section helpers, e2e-timeout.sh self-wrap, sandbox-teardown.sh EXIT trap, parse_chat_content() for reasoning models ### Modified: `.github/workflows/nightly-e2e.yaml` - Added `launchable-smoke-e2e` job: `ubuntu-latest`, 30min timeout, `NVIDIA_API_KEY` secret - Uploads install/onboard/test logs as artifacts on failure - Added to `notify-on-failure` needs list ## Validation Triggered via fork dispatch (`jyaunches/NemoClaw` → `sparky-dispatch` → `launchable-smoke`): - **Run:** https://github.com/jyaunches/NemoClaw/actions/runs/25075715342 - **Result:** ✅ 24 passed, 0 failed, 1 skipped (Node.js version — GH runner pre-installs Node 20) - **Runtime:** ~12 minutes ## Type of Change - [x] Code change (feature, bug fix, or refactor) ## Checklist - [x] Follows project coding conventions - [x] Tests pass locally or in CI - [x] No secrets/credentials committed <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added an end-to-end smoke test and CI job that validates the community launchable CPU install path (install, onboarding, runtime readiness, and a simple inference check). CI now uploads install/onboard/test logs on failures. * **Chores** * Renamed the branch-validation workflow and corresponding test-suite identifiers for clarity. * Updated E2E test documentation and project configuration names to match the new labeling. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
<!-- markdownlint-disable MD041 --> ## Summary Adds the `test-non-root-sandbox-smoke` test from #2571 — a PR-gate job that runs the production image under `-security-opt no-new-privileges` to catch #2472 and #2482 regressions, without OpenShell, NVIDIA_API_KEY, or live inference. ## Related Issue Part of #2571 ## Changes - New `test/e2e-non-root-smoke.sh` (host-side bash, no `openshell`/`nemoclaw` CLI required): - **Test 1** — entrypoint setup chain completes cleanly under `--security-opt no-new-privileges` (regression guard for # 2472; passes a `true` command via the entrypoint's `NEMOCLAW_CMD` exec path so the gateway-launch branch is bypassed and we don't need the OpenShell-managed runtime). - **Test 2** — kernel confirms `NoNewPrivs=1` inside the container (defends the test itself against silent typos in the docker flag). - New job `test-non-root-sandbox-smoke` in `.github/workflows/pr-self-hosted.yaml` — `linux-amd64-cpu4`, `timeout-minutes: 5`, `needs: build-sandbox-images`, reuses the existing `isolation-image` artifact. - Expected results: ``` my-machine@ab1-cdf40-30:~/NemoClaw$ # Run script bash test/e2e-non-root-smoke.sh TEST: 1. Entrypoint setup chain completes under --security-opt no-new-privileges PASS: entrypoint exited 0 under no-new-privileges (#2472 setup chain healthy) TEST: 2. Kernel confirms NoNewPrivs=1 inside container (defends against silent flag typos) PASS: kernel confirms NoNewPrivs=1 ======================================== Results: 2 passed, 0 failed ======================================== ``` - Upcoming plans: - **Test 3** — `openclaw tui` does not error with "Missing gateway auth token" inside a login shell under the same constraint (regression guard for # 2482) after PR #2485 is merged ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification <!-- Check each item you ran and confirmed. Leave unchecked items you skipped. Doc-only changes do not require npm test unless you ran it. --> - [ ] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [ ] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) --- <!-- DCO sign-off required by CI. Run: git config user.name && git config user.email --> Signed-off-by: Hung Le <hple@nvidia.com>
Summary
feat(security): externalize gateway auth token from openclaw.json (#2378))openclaw tuiinside the sandbox — OpenClaw 2026.4.9 requiresOPENCLAW_GATEWAY_TOKENbut the runtime injection fails under Landlock (non-root mode) and the token is no longer inopenclaw.jsonwhere the TUI and gateway can read itopenclaw.jsonso gateways authenticate out-of-the-box againFixes #2480
Test plan
npm run typecheck:clipassesnpx vitest run --project cli— 2110 tests passopenclaw tuiworks inside sandbox after rebuildSummary by CodeRabbit
Documentation
Changes
Tests