Skip to content

ci(nightly-e2e): wire 4 self-contained E2E scripts into nightly workflow #2566

@jyaunches

Description

@jyaunches

Summary

Four E2E test scripts exist in test/e2e/ but have never been wired into the nightly workflow or any automated CI job. They were created between March 19 and April 14, are actively maintained (last touched Apr 23 for the e2e-timeout refactor and sandbox teardown reliability work), and have never appeared in .github/workflows/nightly-e2e.yaml at any point in git history.

These scripts are fully self-contained — they install NemoClaw, run nemoclaw onboard, execute test cases, and clean up. They require NVIDIA_API_KEY (available in nightly via repository secrets) and NEMOCLAW_NON_INTERACTIVE=1.

Scripts to wire

Script Lines Created What It Tests Bugs It Would Have Caught
test-double-onboard.sh 478 Mar 19 Second onboard reuses gateway, multi-sandbox isolation, registry reconciliation against live OpenShell state, gateway rebuild lifecycle guidance #2330 (sandbox registry destroyed when gateway drifts), #2220 (port conflict shows stack trace)
test-sandbox-rebuild.sh 197 Apr 14 Rebuild lifecycle: version detection, staleness warning, --from Dockerfile forwarding #2302 (rebuild does not forward --from path), #2366 (rebuild not atomic — sandbox destroyed before recreate), #2201 (rebuild builds wrong sandbox type)
test-onboard-resume.sh 343 Mar 27 Interrupted onboard → resume → verify completion, cached preflight skip #2430 (stale session auto-resume blocks provider change)
test-onboard-repair.sh 350 Mar 27 Resume recreates a missing recorded sandbox, session invalidation Related to #2430

Context

Analysis of the last 3 weeks of bug fix PRs (Apr 7–27) shows an E2E catch rate of ~17% — for every bug the nightly catches, roughly 5 more reach NV QA or community users first. The rebuild/lifecycle and onboard-resume domains account for 6 bugs found by humans that these existing scripts would have caught if they were running nightly.

In particular:

Implementation

Each script needs a new job in .github/workflows/nightly-e2e.yaml following the existing pattern (see cloud-e2e as template):

  double-onboard-e2e:
    if: github.repository == 'NVIDIA/NemoClaw'
    runs-on: ubuntu-latest
    timeout-minutes: 30
    steps:
      - name: Checkout
        uses: actions/checkout@v6
      - name: Run double-onboard E2E test
        env:
          NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
          NEMOCLAW_NON_INTERACTIVE: "1"
          NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE: "1"
        run: bash test/e2e/test-double-onboard.sh
      - name: Upload logs on failure
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: double-onboard-log
          path: /tmp/nemoclaw-e2e-*.log
          if-no-files-found: ignore

Repeat for sandbox-rebuild-e2e, onboard-resume-e2e, and onboard-repair-e2e.

Also add all four job names to the notify-on-failure job's needs: list so failures auto-create issues.

Acceptance criteria

  • All 4 scripts run as nightly jobs
  • All 4 are in the notify-on-failure needs: list
  • First nightly run after merge shows all 4 jobs (pass or fail — if they fail, the nightly catches the regression immediately)

Metadata

Metadata

Assignees

No one assigned

    Labels

    04-25-regressionIssues raised from the Apr 25 weekend regression analysisarea: ciCI workflows, checks, release automation, or GitHub Actionsarea: e2eEnd-to-end tests, nightly failures, or validation infrastructure

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions