Summary
Four E2E test scripts exist in test/e2e/ but have never been wired into the nightly workflow or any automated CI job. They were created between March 19 and April 14, are actively maintained (last touched Apr 23 for the e2e-timeout refactor and sandbox teardown reliability work), and have never appeared in .github/workflows/nightly-e2e.yaml at any point in git history.
These scripts are fully self-contained — they install NemoClaw, run nemoclaw onboard, execute test cases, and clean up. They require NVIDIA_API_KEY (available in nightly via repository secrets) and NEMOCLAW_NON_INTERACTIVE=1.
Scripts to wire
| Script |
Lines |
Created |
What It Tests |
Bugs It Would Have Caught |
test-double-onboard.sh |
478 |
Mar 19 |
Second onboard reuses gateway, multi-sandbox isolation, registry reconciliation against live OpenShell state, gateway rebuild lifecycle guidance |
#2330 (sandbox registry destroyed when gateway drifts), #2220 (port conflict shows stack trace) |
test-sandbox-rebuild.sh |
197 |
Apr 14 |
Rebuild lifecycle: version detection, staleness warning, --from Dockerfile forwarding |
#2302 (rebuild does not forward --from path), #2366 (rebuild not atomic — sandbox destroyed before recreate), #2201 (rebuild builds wrong sandbox type) |
test-onboard-resume.sh |
343 |
Mar 27 |
Interrupted onboard → resume → verify completion, cached preflight skip |
#2430 (stale session auto-resume blocks provider change) |
test-onboard-repair.sh |
350 |
Mar 27 |
Resume recreates a missing recorded sandbox, session invalidation |
Related to #2430 |
Context
Analysis of the last 3 weeks of bug fix PRs (Apr 7–27) shows an E2E catch rate of ~17% — for every bug the nightly catches, roughly 5 more reach NV QA or community users first. The rebuild/lifecycle and onboard-resume domains account for 6 bugs found by humans that these existing scripts would have caught if they were running nightly.
In particular:
Implementation
Each script needs a new job in .github/workflows/nightly-e2e.yaml following the existing pattern (see cloud-e2e as template):
double-onboard-e2e:
if: github.repository == 'NVIDIA/NemoClaw'
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- name: Checkout
uses: actions/checkout@v6
- name: Run double-onboard E2E test
env:
NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
NEMOCLAW_NON_INTERACTIVE: "1"
NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE: "1"
run: bash test/e2e/test-double-onboard.sh
- name: Upload logs on failure
if: failure()
uses: actions/upload-artifact@v4
with:
name: double-onboard-log
path: /tmp/nemoclaw-e2e-*.log
if-no-files-found: ignore
Repeat for sandbox-rebuild-e2e, onboard-resume-e2e, and onboard-repair-e2e.
Also add all four job names to the notify-on-failure job's needs: list so failures auto-create issues.
Acceptance criteria
Summary
Four E2E test scripts exist in
test/e2e/but have never been wired into the nightly workflow or any automated CI job. They were created between March 19 and April 14, are actively maintained (last touched Apr 23 for the e2e-timeout refactor and sandbox teardown reliability work), and have never appeared in.github/workflows/nightly-e2e.yamlat any point in git history.These scripts are fully self-contained — they install NemoClaw, run
nemoclaw onboard, execute test cases, and clean up. They requireNVIDIA_API_KEY(available in nightly via repository secrets) andNEMOCLAW_NON_INTERACTIVE=1.Scripts to wire
test-double-onboard.shtest-sandbox-rebuild.sh--fromDockerfile forwarding--frompath), #2366 (rebuild not atomic — sandbox destroyed before recreate), #2201 (rebuild builds wrong sandbox type)test-onboard-resume.shtest-onboard-repair.shContext
Analysis of the last 3 weeks of bug fix PRs (Apr 7–27) shows an E2E catch rate of ~17% — for every bug the nightly catches, roughly 5 more reach NV QA or community users first. The rebuild/lifecycle and onboard-resume domains account for 6 bugs found by humans that these existing scripts would have caught if they were running nightly.
In particular:
nemoclaw rebuildbuilds the wrong sandbox type and deletes existing sandbox! #2201) —test-sandbox-rebuild.shtests the exact--fromDockerfile and atomic-rebuild scenarios that broke for users. All 3 bugs were reported by NV QA (zNeill) or community (oparoz).test-onboard-resume.shtests the interrupted-onboard-then-resume flow. Issue [NemoClaw][brev][Onboard] Re-running install.sh after failed Ollama onboard auto-resumes stale session — user cannot change provider #2430 (stale session blocks provider change) was reported by NV QA and required PR fix(install): refuse to auto-resume a failed onboarding session #2437 to fix.test-double-onboard.shtests the gateway-drift scenario whereopenshell gateway stop && startswitches the active gateway, causingnemoclaw connectto destroy the registry entry.Implementation
Each script needs a new job in
.github/workflows/nightly-e2e.yamlfollowing the existing pattern (seecloud-e2eas template):Repeat for
sandbox-rebuild-e2e,onboard-resume-e2e, andonboard-repair-e2e.Also add all four job names to the
notify-on-failurejob'sneeds:list so failures auto-create issues.Acceptance criteria
notify-on-failureneeds:list