Skip to content

fix(security): revert gateway auth token externalization#2482

Merged
ericksoa merged 3 commits into
mainfrom
revert/gateway-token-externalization
Apr 25, 2026
Merged

fix(security): revert gateway auth token externalization#2482
ericksoa merged 3 commits into
mainfrom
revert/gateway-token-externalization

Conversation

@ericksoa

@ericksoa ericksoa commented Apr 25, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Reverts 51aa6af (feat(security): externalize gateway auth token from openclaw.json (#2378))
  • The externalized token path breaks openclaw tui inside the sandbox — OpenClaw 2026.4.9 requires OPENCLAW_GATEWAY_TOKEN but the runtime injection fails under Landlock (non-root mode) and the token is no longer in openclaw.json where the TUI and gateway can read it
  • Restores build-time token generation in openclaw.json so gateways authenticate out-of-the-box again
  • The token externalization will be re-introduced in a separate PR with deeper testing across root/non-root modes and OpenClaw 2026.4.9

Fixes #2480

Test plan

  • npm run typecheck:cli passes
  • npx vitest run --project cli — 2110 tests pass
  • All pre-commit and pre-push hooks pass
  • Verify openclaw tui works inside sandbox after rebuild
  • Verify gateway auth works on Spark (non-root mode)
  • Verify gateway auth works in root mode

Summary by CodeRabbit

  • Documentation

    • Clarified security guidance: gateway auth tokens are stored in the sandbox configuration and risk notes updated.
  • Changes

    • Token generation moved earlier in the image/build process so auth is present in the sandbox config at runtime.
    • Runtime token retrieval simplified and connection instructions updated.
    • Gateway token is exported to an environment variable and persisted/removed in users' shell profiles.
  • Tests

    • Tests updated to validate token export, persistence, and retrieval behavior.

Reverts 51aa6af. The externalized token path breaks `openclaw tui`
inside the sandbox — OpenClaw 2026.4.9 requires OPENCLAW_GATEWAY_TOKEN
but the runtime injection fails under Landlock (non-root) and the token
is no longer in openclaw.json where the TUI can read it.

Restores build-time token generation in openclaw.json so gateways
authenticate out-of-the-box again. The externalization will be
re-introduced in a separate PR with deeper testing.

Fixes #2480
@coderabbitai

coderabbitai Bot commented Apr 25, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 618627f2-b577-4814-aa8c-c3c5e421c14e

📥 Commits

Reviewing files that changed from the base of the PR and between 4752d10 and 72ca3a0.

📒 Files selected for processing (1)
  • Dockerfile
🚧 Files skipped from review as they are similar to previous changes (1)
  • Dockerfile

📝 Walkthrough

Walkthrough

Gateway token handling was changed: a per-build random token is embedded into /sandbox/.openclaw/openclaw.json at image build. Runtime reads gateway.auth.token from that file and exports OPENCLAW_GATEWAY_TOKEN (persisting to user rc files) instead of creating a separate external token file; host-side retrieval relies only on the config path.

Changes

Cohort / File(s) Summary
Documentation & Build
\.agents/skills/nemoclaw-user-configure-security/references/best-practices.md, docs/security/best-practices.md, Dockerfile
Docs updated to state tokens reside in .openclaw/openclaw.json. Dockerfile now generates and embeds a per-build random gateway token (secrets.token_hex(32)) into openclaw.json, removing runtime token-generation/cleanup steps and related comments.
Runtime / Startup Script
scripts/nemoclaw-start.sh
Replaced external token file flow with _read_gateway_token() that parses gateway.auth.token from /sandbox/.openclaw/openclaw.json. Added export_gateway_token() to export OPENCLAW_GATEWAY_TOKEN and persist/remove marked export blocks in ${_SANDBOX_HOME}/.bashrc and ${_SANDBOX_HOME}/.profile; startup flows updated to call this.
Host-side Onboard Logic
src/lib/onboard.ts
Removed kubectl-exec and temp-file search fallbacks; fetchGatewayAuthTokenFromSandbox now uses only the openclaw.json download path. Updated fallback help text to instruct manual jq extraction from /sandbox/.openclaw/openclaw.json.
Tests
test/nemoclaw-start.test.ts
Reworked tests to validate export_gateway_token behavior: rc-file marker persistence/removal, shared _read_gateway_token() usage, Python with open(...) read, shell-escaping, empty-token unset behavior, and updated startup sequencing expectations.

Sequence Diagram(s)

sequenceDiagram
    actor Build as Build Time
    participant Docker as Dockerfile
    participant Config as /sandbox/.openclaw/openclaw.json
    participant StartSh as scripts/nemoclaw-start.sh
    participant RcFiles as .bashrc/.profile
    participant UserShell as User Interactive Shell
    participant TUI as openclaw tui

    Build->>Docker: generate per-build random token (secrets.token_hex(32))
    Docker->>Config: embed token in openclaw.json (gateway.auth.token)

    Note over StartSh,Config: container starts
    StartSh->>Config: _read_gateway_token() parses gateway.auth.token
    Config-->>StartSh: token value
    StartSh->>StartSh: export OPENCLAW_GATEWAY_TOKEN
    StartSh->>RcFiles: write/remove marked export blocks via export_gateway_token()
    RcFiles->>UserShell: rc files sourced on new shell
    UserShell->>TUI: openclaw tui (reads $OPENCLAW_GATEWAY_TOKEN)
    TUI-->>TUI: gateway authentication proceeds
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

security

Suggested reviewers

  • brandonpelfrey

Poem

🐰 A tiny token tucked in JSON bright,
Built at image time in the quiet night.
At boot I hop out, export with care,
I nest in rc files so shells find me there,
OpenClaw tui greets me—now we're square. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(security): revert gateway auth token externalization' directly summarizes the main change—reverting a previous externalization of the gateway auth token.
Linked Issues check ✅ Passed The PR addresses issue #2480 by restoring build-time token generation in openclaw.json, ensuring the token is available for openclaw tui and the gateway to authenticate without manual intervention.
Out of Scope Changes check ✅ Passed All changes are scoped to reverting externalized gateway token handling: documentation updates, Dockerfile changes, token reading logic, and test updates align directly with the fix objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch revert/gateway-token-externalization

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Dockerfile`:
- Around line 230-232: The ARG NEMOCLAW_BUILD_ID is declared but never used, so
changing it does not invalidate the token-generation layer; update the
token-generation layer that creates the gateway token (the "token-generation"
RUN/step) to consume NEMOCLAW_BUILD_ID (e.g., reference it in that RUN via ENV
or a no-op echo/printf) so Docker sees the build-arg changes and busts the
cache; ensure you reference ARG NEMOCLAW_BUILD_ID before the token-generation
RUN and use the variable name NEMOCLAW_BUILD_ID in that step so token
regeneration runs on each build-arg change.

In `@scripts/nemoclaw-start.sh`:
- Around line 621-660: The startup currently aborts if writing
${_SANDBOX_HOME}/.bashrc or .profile fails when persisting
OPENCLAW_GATEWAY_TOKEN (snippet using marker_begin/marker_end), which breaks
non-root/sandboxed runs; change the logic to make rc-file writes best-effort by
routing token persistence through the existing /tmp sourced-file pattern (create
a /tmp/openclaw-env-<uid>.sh containing the snippet and ensure rc files source
that file if writable), and if you must directly update ${_SANDBOX_HOME}/.bashrc
or .profile only attempt writes when they are writable and swallow failures (do
not let errors from cat >"$rc_file" or printf >>"$rc_file" abort startup),
leaving the export OPENCLAW_GATEWAY_TOKEN="$token" in the current process
unconditional so gateway startup never depends on rc file writes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9b2a9d79-dfe8-4da3-93e1-2c11cc9ba0b2

📥 Commits

Reviewing files that changed from the base of the PR and between cc15689 and 1e497c6.

📒 Files selected for processing (6)
  • .agents/skills/nemoclaw-user-configure-security/references/best-practices.md
  • Dockerfile
  • docs/security/best-practices.md
  • scripts/nemoclaw-start.sh
  • src/lib/onboard.ts
  • test/nemoclaw-start.test.ts

Comment thread Dockerfile
Comment thread scripts/nemoclaw-start.sh Outdated
…writes

The reverted export_gateway_token code predates the Landlock fix in
a54f9a3 and lacks || true guards on .bashrc/.profile writes. Under
Landlock enforcement, DAC check ([ -w file ]) passes but the actual
write is blocked, crashing the entrypoint under set -e — the exact
same failure pattern that caused the 5-day non-root outage.

Apply the same || true + continue pattern used in install_configure_guard.
NEMOCLAW_BUILD_ID was declared as an ARG but never referenced by any
downstream instruction, so changing it via --build-arg had no effect on
Docker layer caching. Reference it on the token-generation RUN line so
Docker sees the value change and invalidates the cached layer, ensuring
each build produces a fresh gateway auth token.

Pre-existing issue surfaced by CodeRabbit review.
@ericksoa ericksoa merged commit 31c782c into main Apr 25, 2026
39 checks passed
ericksoa added a commit that referenced this pull request Apr 25, 2026
…d cache (#2483)

## Summary

- Fixes 4x build time regression on Spark (400s+ → ~100s) caused by
`NEMOCLAW_BUILD_ID` cache-busting the config generation layer, which
invalidated the expensive `openclaw doctor --fix` + `openclaw plugins
install` layer on every build
- Splits token generation into two steps: config layer writes a
placeholder (cacheable), then a late layer injects
`secrets.token_hex(32)` (cache-busted but trivially fast)
- The doctor/plugins layer no longer rebuilds on every build

Depends on #2482

## Test plan

- [x] `npx vitest run --project cli` — 1947 tests pass (ssrf-parity skip
is pre-existing, needs plugin build)
- [x] All pre-commit and pre-push hooks pass
- [ ] Verify build time improvement on Spark

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Optimized Docker image build layers to improve caching efficiency
while ensuring unique credentials are generated for each build.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
ericksoa added a commit that referenced this pull request Apr 27, 2026
Resolves conflicts in Dockerfile and test/nemoclaw-start.test.ts.

- Dockerfile config-generation block: kept the externalized
  scripts/generate-openclaw-config.py invocation (the PR's purpose)
  and dropped the inline python3 -c block from main.
- Dockerfile token step: dropped the PR's --clear-token step and took
  main's late-layer secrets.token_hex(32) injection (#2482 reverted
  gateway auth token externalization, so the token is again baked at
  build time).
- scripts/generate-openclaw-config.py: ported the inference_inputs
  parsing (#2441) and channel healthMonitor field from main; removed
  the now-obsolete --clear-token mode.
- test/nemoclaw-start.test.ts: took main's version, since the PR's
  token-externalization regression tests no longer match main's
  reverted design.
- test/generate-openclaw-config.test.ts: removed the --clear-token
  test cases.
@miyoungc miyoungc mentioned this pull request Apr 28, 2026
13 tasks
miyoungc added a commit that referenced this pull request Apr 28, 2026
## Summary
Refreshes user-facing docs for the last 24 hours of merged NemoClaw
history and bumps the docs metadata to 0.0.29, the next version after
v0.0.28. The updates are limited to behavior supported by merged PR
descriptions and diffs.

## Changes
- `docs/reference/commands.md`: documented `nemoclaw <name> policy-add
--from-file` and `--from-dir`, including custom preset review guidance,
from #2077 / commit `7720b175`.
- `docs/deployment/deploy-to-remote-gpu.md`: clarified that non-loopback
`CHAT_UI_URL` disables OpenClaw device pairing for remote browser-only
deployments, from #2449 / commit `f5ee8a4d`.
- `docs/inference/inference-options.md`: documented provider-aware
credential retry validation and the NVIDIA-only `nvapi-` prefix check,
from #2389 / commit `6f7f0c6d`.
- `docs/inference/switch-inference-providers.md`: documented
`NEMOCLAW_INFERENCE_INPUTS` for text/image-capable model metadata baked
into `openclaw.json`, from #2441 / commit `f4391892`.
- `docs/reference/troubleshooting.md`: added the Git certificate
verification entry for proxy CA propagation through `GIT_SSL_CAINFO`,
`GIT_SSL_CAPATH`, `CURL_CA_BUNDLE`, and `REQUESTS_CA_BUNDLE`, from #2345
/ commit `fa0dc1ab`.
- `docs/versions1.json` and `docs/project.json`: promoted docs version
`0.0.29`; `docs/versions1.json` omits unpublished `0.0.26`, `0.0.27`,
and `0.0.28` entries.
- `.agents/skills/nemoclaw-user-*`: regenerated derived user skill
references from the updated docs.
- Reviewed with no extra doc changes: #2575 / `d392ec07`, #2565 /
`a3231049`, #1965 / `db1ef3ca`, #1990 / `db665834`, #2495 / `7da86fa3`,
#2496 / `3192f4f4`, #2490 / `8c209058`, #2487 / `1f615e2f`, #2483 /
`5653d33a`, #2482 / `31c782c0`, #2464 / `23bb5703`, #2472 / `a54f9a34`,
and #2437 / `6bc860d7`.
- Skipped per docs policy: #2420 / `7b76df6b` touched the experimental
sandbox config path listed in `docs/.docs-skip`; #2466 / `cc15689c`
touched a skipped term and CI-only sandbox image files.

## Type of Change
- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [x] Doc only (includes code sample changes)

## Verification
<!-- Check each item you ran and confirmed. Leave unchecked items you
skipped. -->
- [x] `npx prek run --all-files` passes
- [ ] `npm test` passes — failed locally in installer-integration tests
and one onboard helper timeout; the doc-scoped hook test projects passed
under `prek`.
- [ ] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [ ] `make docs` builds without warnings (doc changes only) — build
succeeded, but local Sphinx emitted the existing version-switcher file
read message.
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

## AI Disclosure
<!-- If an AI agent authored or co-authored this PR, check the box and
name the tool. Remove this section for fully human-authored PRs. -->
- [x] AI-assisted — tool: Codex

---
<!-- DCO sign-off required by CI. Run: git config user.name && git
config user.email -->
Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Support for custom YAML presets in policy configuration via
--from-file and --from-dir.
* New build-time inference input option to declare accepted modalities
(text or text,image).

* **Improvements**
* Credential validation now offers interactive recovery: re-enter key,
retry, choose another provider, or exit.
* Clarified provider-specific API key prefix handling (nvapi- only
applies to NVIDIA keys).

* **Documentation**
  * TLS certificate troubleshooting for inspected networks.
* Clarified remote dashboard security/device-pairing behavior; command
docs updated; docs version bumped.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>
jyaunches added a commit that referenced this pull request Apr 28, 2026
…#2615)

## Summary

Add automated E2E test recommendations to PR reviews and selective job
dispatch to the nightly E2E workflow.

Closes #2564 (Phases 1–3).

## What changed

### 1. CodeRabbit `path_instructions` for E2E recommendations
(`.coderabbit.yaml`)

15 new `path_instructions` entries map sensitive file paths to the
nightly E2E jobs that exercise them. When a PR touches a mapped path,
CodeRabbit posts a review comment recommending specific jobs and a
copy-pasteable `gh workflow run` command.

| Path Pattern | Recommended Jobs |
|-------------|-----------------|
| `scripts/nemoclaw-start.sh`, `scripts/lib/sandbox-init.sh` |
`sandbox-survival-e2e`, `sandbox-operations-e2e`, `cloud-e2e` |
| `Dockerfile`, `Dockerfile.base` | `cloud-e2e`, `sandbox-survival-e2e`,
`hermes-e2e`, `rebuild-openclaw-e2e` |
| `nemoclaw-blueprint/scripts/http-proxy-fix.js` | `cloud-e2e`,
`inference-routing-e2e` |
| `src/lib/onboard.ts` | `cloud-e2e`, `sandbox-operations-e2e`,
`rebuild-openclaw-e2e` |
| `src/nemoclaw.ts` | `sandbox-survival-e2e`, `sandbox-operations-e2e`,
`skip-permissions-e2e` |
| `src/lib/cluster-image-patch.ts`, `src/lib/preflight.ts` |
`overlayfs-autofix-e2e` |
| `src/lib/deploy.ts` | `deployment-services-e2e` |
| `src/lib/sandbox-state.ts` | `snapshot-commands-e2e`,
`rebuild-openclaw-e2e` |
| `src/lib/shields*.ts` | `shields-config-e2e` |
| `agents/hermes/**` | `hermes-e2e`, `rebuild-hermes-e2e` |
| `nemoclaw-blueprint/policies/**` | `network-policy-e2e`,
`skip-permissions-e2e` |
| `.github/workflows/nightly-e2e.yaml` | Reminds to add CodeRabbit
coverage for new jobs |

### 2. Selective job dispatch (`nightly-e2e.yaml`)

Added a `jobs` input to `workflow_dispatch` so maintainers can run a
subset of nightly jobs on any branch:

```
gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=sandbox-survival-e2e,sandbox-operations-e2e
```

- All 18 E2E jobs get a conditional guard: unselected jobs are skipped
- Empty `jobs` input (or scheduled runs) still runs everything
- `notify-on-failure` is unaffected: skipped jobs produce `result:
'skipped'`, not `'failure'`

### 3. Cross-validation test (`test/validate-e2e-coverage.test.ts`)

Keeps the mapping up to date as files and jobs evolve:

| Assertion | What it catches |
|-----------|----------------|
| Job names in CodeRabbit match `nightly-e2e.yaml` | Renamed/removed
jobs |
| Path globs match at least one file on disk | Renamed/deleted source
files |
| Every nightly job has selective dispatch guard | New jobs added
without the `if:` pattern |
| Advisory: nightly jobs with no CodeRabbit coverage | New jobs added
without `path_instructions` |

## Validation

- [x] All 4 cross-validation tests pass locally
- [x] Existing `validate-config-schemas` tests still pass
- [x] Selective dispatch validated: [run
25052625486](https://github.com/NVIDIA/NemoClaw/actions/runs/25052625486)
— triggered with `-f jobs=diagnostics-e2e`, 17/18 jobs correctly skipped
- [x] `notify-on-failure` does not false-alarm on selective run — [run
25052625486](https://github.com/NVIDIA/NemoClaw/actions/runs/25052625486)
confirmed: `notify-on-failure` was skipped (not triggered)
- [ ] CodeRabbit posts recommendations on a PR touching a mapped file
(post-merge validation)

## Context

- Issue: #2564
- Weekend incident: #2471, #2472, #2482, #2490
- E2E strategy: `cloud-experimental-e2e` removal in #2472 left a
coverage gap that would have been flagged by these recommendations


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Expanded review automation to map sensitive paths to targeted nightly
E2E jobs and inject instructions for running relevant subsets.
* Added manual workflow dispatch allowing selective E2E job execution
via a jobs input.

* **New Features**
* Added a reporting step that, on manual runs, posts a PR comment
summarizing passed/failed/skipped E2E jobs.

* **Tests**
* Added a validation suite that cross-checks review-to-workflow mappings
and dispatch guards, warning on uncovered jobs.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

### 4. Substring match fix (`nightly-e2e.yaml`)

CodeRabbit review correctly identified that `contains(inputs.jobs,
'cloud-e2e')` performs substring matching — e.g., passing `jobs=e2e`
would match every job. All 18 job guards now use delimiter-wrapping:

```yaml
contains(format(',{0},', inputs.jobs), ',<job-name>,')
```

This ensures exact token matching within the comma-separated input. The
cross-validation test was updated to enforce the new pattern.
jyaunches added a commit that referenced this pull request Apr 29, 2026
## Summary

`scripts/brev-launchable-ci-cpu.sh` is the community install path for
Brev users — it bootstraps a VM with Docker, Node.js, OpenShell, and
NemoClaw. **That script already exists in the repo but has zero CI
coverage.** This PR adds a nightly E2E smoke test that validates the
script works end-to-end.

This is the long-living safety net for the community install flow. If
any regression breaks the launchable script (e.g., the Apr 20–25 Brev
outage from #2472/#2482, or the container reachability fallback from
#2425), this test catches it before community users are affected.

## Related Issue

Closes #2599
Related: #2425 (the `isProxyHealthy()` fallback in PR #2453 — if that
regresses, onboard will abort on Brev and this smoke test catches it)

## Changes

### New: `test/e2e/test-launchable-smoke.sh`

| Phase | What it validates |
|-------|-------------------|
| 0 | Pre-cleanup + pre-seed clone directory from checkout |
| 1 | Prerequisites (Docker, NVIDIA_API_KEY, network, env vars) |
| 2 | Run `brev-launchable-ci-cpu.sh` — the existing community bootstrap
script |
| 3 | Verify artifacts (nemoclaw, openshell, Node.js, Docker, sentinel
file, built outputs) |
| 4 | `nemoclaw onboard --non-interactive` with cloud provider |
| 5 | Sandbox health (list, status, inference config, gateway) |
| 6 | Live inference (direct API, routing via inference.local, openclaw
agent 6×7=42) |
| 7 | Destroy + cleanup |

Key design decisions:
- **No BREV_API_TOKEN needed** — the launchable script is a generic
Ubuntu bootstrap with zero Brev dependencies, so it runs on standard
GitHub-hosted `ubuntu-latest` runners
- **Tests current code, not main** — pre-seeds the clone directory from
the CI checkout so regressions are caught before reaching community
users
- **Follows existing E2E conventions** — pass/fail/section helpers,
e2e-timeout.sh self-wrap, sandbox-teardown.sh EXIT trap,
parse_chat_content() for reasoning models

### Modified: `.github/workflows/nightly-e2e.yaml`

- Added `launchable-smoke-e2e` job: `ubuntu-latest`, 30min timeout,
`NVIDIA_API_KEY` secret
- Uploads install/onboard/test logs as artifacts on failure
- Added to `notify-on-failure` needs list

## Validation

Triggered via fork dispatch (`jyaunches/NemoClaw` → `sparky-dispatch` →
`launchable-smoke`):
- **Run:**
https://github.com/jyaunches/NemoClaw/actions/runs/25075715342
- **Result:** ✅ 24 passed, 0 failed, 1 skipped (Node.js version — GH
runner pre-installs Node 20)
- **Runtime:** ~12 minutes

## Type of Change
- [x] Code change (feature, bug fix, or refactor)

## Checklist
- [x] Follows project coding conventions
- [x] Tests pass locally or in CI
- [x] No secrets/credentials committed


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added an end-to-end smoke test and CI job that validates the community
launchable CPU install path (install, onboarding, runtime readiness, and
a simple inference check). CI now uploads install/onboard/test logs on
failures.

* **Chores**
* Renamed the branch-validation workflow and corresponding test-suite
identifiers for clarity.
* Updated E2E test documentation and project configuration names to
match the new labeling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
## Summary

- Reverts 51aa6af (`feat(security): externalize gateway auth token from
openclaw.json (NVIDIA#2378)`)
- The externalized token path breaks `openclaw tui` inside the sandbox —
OpenClaw 2026.4.9 requires `OPENCLAW_GATEWAY_TOKEN` but the runtime
injection fails under Landlock (non-root mode) and the token is no
longer in `openclaw.json` where the TUI and gateway can read it
- Restores build-time token generation in `openclaw.json` so gateways
authenticate out-of-the-box again
- The token externalization will be re-introduced in a separate PR with
deeper testing across root/non-root modes and OpenClaw 2026.4.9

Fixes NVIDIA#2480

## Test plan

- [x] `npm run typecheck:cli` passes
- [x] `npx vitest run --project cli` — 2110 tests pass
- [x] All pre-commit and pre-push hooks pass
- [ ] Verify `openclaw tui` works inside sandbox after rebuild
- [ ] Verify gateway auth works on Spark (non-root mode)
- [ ] Verify gateway auth works in root mode

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
* Clarified security guidance: gateway auth tokens are stored in the
sandbox configuration and risk notes updated.

* **Changes**
* Token generation moved earlier in the image/build process so auth is
present in the sandbox config at runtime.
* Runtime token retrieval simplified and connection instructions
updated.
* Gateway token is exported to an environment variable and
persisted/removed in users' shell profiles.

* **Tests**
* Tests updated to validate token export, persistence, and retrieval
behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
…d cache (NVIDIA#2483)

## Summary

- Fixes 4x build time regression on Spark (400s+ → ~100s) caused by
`NEMOCLAW_BUILD_ID` cache-busting the config generation layer, which
invalidated the expensive `openclaw doctor --fix` + `openclaw plugins
install` layer on every build
- Splits token generation into two steps: config layer writes a
placeholder (cacheable), then a late layer injects
`secrets.token_hex(32)` (cache-busted but trivially fast)
- The doctor/plugins layer no longer rebuilds on every build

Depends on NVIDIA#2482

## Test plan

- [x] `npx vitest run --project cli` — 1947 tests pass (ssrf-parity skip
is pre-existing, needs plugin build)
- [x] All pre-commit and pre-push hooks pass
- [ ] Verify build time improvement on Spark

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Optimized Docker image build layers to improve caching efficiency
while ensuring unique credentials are generated for each build.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
## Summary
Refreshes user-facing docs for the last 24 hours of merged NemoClaw
history and bumps the docs metadata to 0.0.29, the next version after
v0.0.28. The updates are limited to behavior supported by merged PR
descriptions and diffs.

## Changes
- `docs/reference/commands.md`: documented `nemoclaw <name> policy-add
--from-file` and `--from-dir`, including custom preset review guidance,
from NVIDIA#2077 / commit `7720b175`.
- `docs/deployment/deploy-to-remote-gpu.md`: clarified that non-loopback
`CHAT_UI_URL` disables OpenClaw device pairing for remote browser-only
deployments, from NVIDIA#2449 / commit `f5ee8a4d`.
- `docs/inference/inference-options.md`: documented provider-aware
credential retry validation and the NVIDIA-only `nvapi-` prefix check,
from NVIDIA#2389 / commit `6f7f0c6d`.
- `docs/inference/switch-inference-providers.md`: documented
`NEMOCLAW_INFERENCE_INPUTS` for text/image-capable model metadata baked
into `openclaw.json`, from NVIDIA#2441 / commit `f4391892`.
- `docs/reference/troubleshooting.md`: added the Git certificate
verification entry for proxy CA propagation through `GIT_SSL_CAINFO`,
`GIT_SSL_CAPATH`, `CURL_CA_BUNDLE`, and `REQUESTS_CA_BUNDLE`, from NVIDIA#2345
/ commit `fa0dc1ab`.
- `docs/versions1.json` and `docs/project.json`: promoted docs version
`0.0.29`; `docs/versions1.json` omits unpublished `0.0.26`, `0.0.27`,
and `0.0.28` entries.
- `.agents/skills/nemoclaw-user-*`: regenerated derived user skill
references from the updated docs.
- Reviewed with no extra doc changes: NVIDIA#2575 / `d392ec07`, NVIDIA#2565 /
`a3231049`, NVIDIA#1965 / `db1ef3ca`, NVIDIA#1990 / `db665834`, NVIDIA#2495 / `7da86fa3`,
NVIDIA#2496 / `3192f4f4`, NVIDIA#2490 / `8c209058`, NVIDIA#2487 / `1f615e2f`, NVIDIA#2483 /
`5653d33a`, NVIDIA#2482 / `31c782c0`, NVIDIA#2464 / `23bb5703`, NVIDIA#2472 / `a54f9a34`,
and NVIDIA#2437 / `6bc860d7`.
- Skipped per docs policy: NVIDIA#2420 / `7b76df6b` touched the experimental
sandbox config path listed in `docs/.docs-skip`; NVIDIA#2466 / `cc15689c`
touched a skipped term and CI-only sandbox image files.

## Type of Change
- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [x] Doc only (includes code sample changes)

## Verification
<!-- Check each item you ran and confirmed. Leave unchecked items you
skipped. -->
- [x] `npx prek run --all-files` passes
- [ ] `npm test` passes — failed locally in installer-integration tests
and one onboard helper timeout; the doc-scoped hook test projects passed
under `prek`.
- [ ] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [ ] `make docs` builds without warnings (doc changes only) — build
succeeded, but local Sphinx emitted the existing version-switcher file
read message.
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

## AI Disclosure
<!-- If an AI agent authored or co-authored this PR, check the box and
name the tool. Remove this section for fully human-authored PRs. -->
- [x] AI-assisted — tool: Codex

---
<!-- DCO sign-off required by CI. Run: git config user.name && git
config user.email -->
Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Support for custom YAML presets in policy configuration via
--from-file and --from-dir.
* New build-time inference input option to declare accepted modalities
(text or text,image).

* **Improvements**
* Credential validation now offers interactive recovery: re-enter key,
retry, choose another provider, or exit.
* Clarified provider-specific API key prefix handling (nvapi- only
applies to NVIDIA keys).

* **Documentation**
  * TLS certificate troubleshooting for inspected networks.
* Clarified remote dashboard security/device-pairing behavior; command
docs updated; docs version bumped.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>
DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
…NVIDIA#2615)

## Summary

Add automated E2E test recommendations to PR reviews and selective job
dispatch to the nightly E2E workflow.

Closes NVIDIA#2564 (Phases 1–3).

## What changed

### 1. CodeRabbit `path_instructions` for E2E recommendations
(`.coderabbit.yaml`)

15 new `path_instructions` entries map sensitive file paths to the
nightly E2E jobs that exercise them. When a PR touches a mapped path,
CodeRabbit posts a review comment recommending specific jobs and a
copy-pasteable `gh workflow run` command.

| Path Pattern | Recommended Jobs |
|-------------|-----------------|
| `scripts/nemoclaw-start.sh`, `scripts/lib/sandbox-init.sh` |
`sandbox-survival-e2e`, `sandbox-operations-e2e`, `cloud-e2e` |
| `Dockerfile`, `Dockerfile.base` | `cloud-e2e`, `sandbox-survival-e2e`,
`hermes-e2e`, `rebuild-openclaw-e2e` |
| `nemoclaw-blueprint/scripts/http-proxy-fix.js` | `cloud-e2e`,
`inference-routing-e2e` |
| `src/lib/onboard.ts` | `cloud-e2e`, `sandbox-operations-e2e`,
`rebuild-openclaw-e2e` |
| `src/nemoclaw.ts` | `sandbox-survival-e2e`, `sandbox-operations-e2e`,
`skip-permissions-e2e` |
| `src/lib/cluster-image-patch.ts`, `src/lib/preflight.ts` |
`overlayfs-autofix-e2e` |
| `src/lib/deploy.ts` | `deployment-services-e2e` |
| `src/lib/sandbox-state.ts` | `snapshot-commands-e2e`,
`rebuild-openclaw-e2e` |
| `src/lib/shields*.ts` | `shields-config-e2e` |
| `agents/hermes/**` | `hermes-e2e`, `rebuild-hermes-e2e` |
| `nemoclaw-blueprint/policies/**` | `network-policy-e2e`,
`skip-permissions-e2e` |
| `.github/workflows/nightly-e2e.yaml` | Reminds to add CodeRabbit
coverage for new jobs |

### 2. Selective job dispatch (`nightly-e2e.yaml`)

Added a `jobs` input to `workflow_dispatch` so maintainers can run a
subset of nightly jobs on any branch:

```
gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=sandbox-survival-e2e,sandbox-operations-e2e
```

- All 18 E2E jobs get a conditional guard: unselected jobs are skipped
- Empty `jobs` input (or scheduled runs) still runs everything
- `notify-on-failure` is unaffected: skipped jobs produce `result:
'skipped'`, not `'failure'`

### 3. Cross-validation test (`test/validate-e2e-coverage.test.ts`)

Keeps the mapping up to date as files and jobs evolve:

| Assertion | What it catches |
|-----------|----------------|
| Job names in CodeRabbit match `nightly-e2e.yaml` | Renamed/removed
jobs |
| Path globs match at least one file on disk | Renamed/deleted source
files |
| Every nightly job has selective dispatch guard | New jobs added
without the `if:` pattern |
| Advisory: nightly jobs with no CodeRabbit coverage | New jobs added
without `path_instructions` |

## Validation

- [x] All 4 cross-validation tests pass locally
- [x] Existing `validate-config-schemas` tests still pass
- [x] Selective dispatch validated: [run
25052625486](https://github.com/NVIDIA/NemoClaw/actions/runs/25052625486)
— triggered with `-f jobs=diagnostics-e2e`, 17/18 jobs correctly skipped
- [x] `notify-on-failure` does not false-alarm on selective run — [run
25052625486](https://github.com/NVIDIA/NemoClaw/actions/runs/25052625486)
confirmed: `notify-on-failure` was skipped (not triggered)
- [ ] CodeRabbit posts recommendations on a PR touching a mapped file
(post-merge validation)

## Context

- Issue: NVIDIA#2564
- Weekend incident: NVIDIA#2471, NVIDIA#2472, NVIDIA#2482, NVIDIA#2490
- E2E strategy: `cloud-experimental-e2e` removal in NVIDIA#2472 left a
coverage gap that would have been flagged by these recommendations


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Expanded review automation to map sensitive paths to targeted nightly
E2E jobs and inject instructions for running relevant subsets.
* Added manual workflow dispatch allowing selective E2E job execution
via a jobs input.

* **New Features**
* Added a reporting step that, on manual runs, posts a PR comment
summarizing passed/failed/skipped E2E jobs.

* **Tests**
* Added a validation suite that cross-checks review-to-workflow mappings
and dispatch guards, warning on uncovered jobs.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

### 4. Substring match fix (`nightly-e2e.yaml`)

CodeRabbit review correctly identified that `contains(inputs.jobs,
'cloud-e2e')` performs substring matching — e.g., passing `jobs=e2e`
would match every job. All 18 job guards now use delimiter-wrapping:

```yaml
contains(format(',{0},', inputs.jobs), ',<job-name>,')
```

This ensures exact token matching within the comma-separated input. The
cross-validation test was updated to enforce the new pattern.
DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
## Summary

`scripts/brev-launchable-ci-cpu.sh` is the community install path for
Brev users — it bootstraps a VM with Docker, Node.js, OpenShell, and
NemoClaw. **That script already exists in the repo but has zero CI
coverage.** This PR adds a nightly E2E smoke test that validates the
script works end-to-end.

This is the long-living safety net for the community install flow. If
any regression breaks the launchable script (e.g., the Apr 20–25 Brev
outage from NVIDIA#2472/NVIDIA#2482, or the container reachability fallback from
NVIDIA#2425), this test catches it before community users are affected.

## Related Issue

Closes NVIDIA#2599
Related: NVIDIA#2425 (the `isProxyHealthy()` fallback in PR NVIDIA#2453 — if that
regresses, onboard will abort on Brev and this smoke test catches it)

## Changes

### New: `test/e2e/test-launchable-smoke.sh`

| Phase | What it validates |
|-------|-------------------|
| 0 | Pre-cleanup + pre-seed clone directory from checkout |
| 1 | Prerequisites (Docker, NVIDIA_API_KEY, network, env vars) |
| 2 | Run `brev-launchable-ci-cpu.sh` — the existing community bootstrap
script |
| 3 | Verify artifacts (nemoclaw, openshell, Node.js, Docker, sentinel
file, built outputs) |
| 4 | `nemoclaw onboard --non-interactive` with cloud provider |
| 5 | Sandbox health (list, status, inference config, gateway) |
| 6 | Live inference (direct API, routing via inference.local, openclaw
agent 6×7=42) |
| 7 | Destroy + cleanup |

Key design decisions:
- **No BREV_API_TOKEN needed** — the launchable script is a generic
Ubuntu bootstrap with zero Brev dependencies, so it runs on standard
GitHub-hosted `ubuntu-latest` runners
- **Tests current code, not main** — pre-seeds the clone directory from
the CI checkout so regressions are caught before reaching community
users
- **Follows existing E2E conventions** — pass/fail/section helpers,
e2e-timeout.sh self-wrap, sandbox-teardown.sh EXIT trap,
parse_chat_content() for reasoning models

### Modified: `.github/workflows/nightly-e2e.yaml`

- Added `launchable-smoke-e2e` job: `ubuntu-latest`, 30min timeout,
`NVIDIA_API_KEY` secret
- Uploads install/onboard/test logs as artifacts on failure
- Added to `notify-on-failure` needs list

## Validation

Triggered via fork dispatch (`jyaunches/NemoClaw` → `sparky-dispatch` →
`launchable-smoke`):
- **Run:**
https://github.com/jyaunches/NemoClaw/actions/runs/25075715342
- **Result:** ✅ 24 passed, 0 failed, 1 skipped (Node.js version — GH
runner pre-installs Node 20)
- **Runtime:** ~12 minutes

## Type of Change
- [x] Code change (feature, bug fix, or refactor)

## Checklist
- [x] Follows project coding conventions
- [x] Tests pass locally or in CI
- [x] No secrets/credentials committed


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added an end-to-end smoke test and CI job that validates the community
launchable CPU install path (install, onboarding, runtime readiness, and
a simple inference check). CI now uploads install/onboard/test logs on
failures.

* **Chores**
* Renamed the branch-validation workflow and corresponding test-suite
identifiers for clarity.
* Updated E2E test documentation and project configuration names to
match the new labeling.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
cv pushed a commit that referenced this pull request May 8, 2026
<!-- markdownlint-disable MD041 -->
## Summary
Adds the `test-non-root-sandbox-smoke` test from #2571 — a PR-gate job
that runs the production image under `-security-opt no-new-privileges`
to catch #2472 and #2482 regressions, without OpenShell, NVIDIA_API_KEY,
or live inference.

## Related Issue
Part of #2571

## Changes
- New `test/e2e-non-root-smoke.sh` (host-side bash, no
`openshell`/`nemoclaw` CLI required):
- **Test 1** — entrypoint setup chain completes cleanly under
`--security-opt no-new-privileges` (regression guard for # 2472; passes
a `true` command via the entrypoint's `NEMOCLAW_CMD` exec path so the
gateway-launch branch is bypassed and we don't need the
OpenShell-managed runtime).
- **Test 2** — kernel confirms `NoNewPrivs=1` inside the container
(defends the test itself against silent typos in the docker flag).
- New job `test-non-root-sandbox-smoke` in
`.github/workflows/pr-self-hosted.yaml` — `linux-amd64-cpu4`,
`timeout-minutes: 5`, `needs: build-sandbox-images`, reuses the existing
`isolation-image` artifact.
- Expected results:
```
my-machine@ab1-cdf40-30:~/NemoClaw$ # Run script
bash test/e2e-non-root-smoke.sh
TEST: 1. Entrypoint setup chain completes under --security-opt no-new-privileges
PASS: entrypoint exited 0 under no-new-privileges (#2472 setup chain healthy)
TEST: 2. Kernel confirms NoNewPrivs=1 inside container (defends against silent flag typos)
PASS: kernel confirms NoNewPrivs=1

========================================
  Results: 2 passed, 0 failed
========================================
```
- Upcoming plans:
- **Test 3** — `openclaw tui` does not error with "Missing gateway auth
token" inside a login shell under the same constraint (regression guard
for # 2482) after PR #2485 is merged

## Type of Change

- [x] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification
<!-- Check each item you ran and confirmed. Leave unchecked items you
skipped. Doc-only changes do not require npm test unless you ran it. -->
- [ ] `npx prek run --all-files` passes
- [ ] `npm test` passes
- [ ] Tests added or updated for new or changed behavior
- [ ] No secrets, API keys, or credentials committed
- [ ] Docs updated for user-facing behavior changes
- [ ] `make docs` builds without warnings (doc changes only)
- [ ] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

---
<!-- DCO sign-off required by CI. Run: git config user.name && git
config user.email -->
Signed-off-by: Hung Le <hple@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nemoclaw <sandbox> connect fails to inject OPENCLAW_GATEWAY_TOKEN for openclaw tui

2 participants