Skip to content

ci(nightly): migrate E2E jobs to NVIDIA self-hosted runners#3144

Merged
jyaunches merged 2 commits into
mainfrom
ci/nightly-runner-migration
May 6, 2026
Merged

ci(nightly): migrate E2E jobs to NVIDIA self-hosted runners#3144
jyaunches merged 2 commits into
mainfrom
ci/nightly-runner-migration

Conversation

@jyaunches

@jyaunches jyaunches commented May 6, 2026

Copy link
Copy Markdown
Contributor

Switch all 33 nightly E2E jobs from ubuntu-latest (GitHub-hosted, 2 vCPU) to linux-amd64-cpu4 (NVIDIA self-hosted, 4 vCPU). Meta jobs (notify-on-failure, report-to-pr, scorecard) stay on ubuntu-latest since they only make API calls.

Motivation: Full sandbox onboard E2E tests spend most of their time on Docker image builds. The NVIDIA runners have more CPU and should reduce per-job runtime. The pr-self-hosted workflow already uses these runners successfully for image builds on every PR.

Validated: The device-auth-health-e2e job was tested on linux-amd64-cpu4 during PR #3128 development and completed in ~16 minutes (vs timing out at 15m on ubuntu-latest).

Summary by CodeRabbit

  • Chores
    • Nightly end-to-end test workflow updated to use the standardized Linux CPU runner (linux-amd64-cpu4) for most non-GPU jobs; GPU tests continue using dedicated GPU runners.
    • Reference for the launchable-smoke job updated to the new CPU runner.
    • Failure notification and scorecard jobs retain the same E2E job dependencies.

Switch all 33 nightly E2E jobs from ubuntu-latest (GitHub-hosted, 2 vCPU)
to linux-amd64-cpu4 (NVIDIA self-hosted, 4 vCPU). Meta jobs
(notify-on-failure, report-to-pr, scorecard) stay on ubuntu-latest since
they only make API calls.

Motivation: full sandbox onboard E2E tests spend most of their time on
Docker image builds. The NVIDIA runners have more CPU and should reduce
per-job runtime. The pr-self-hosted workflow already uses these runners
successfully for image builds on every PR.
@coderabbitai

coderabbitai Bot commented May 6, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6270d4c1-4c7a-4431-bbf0-7f42e16f5203

📥 Commits

Reviewing files that changed from the base of the PR and between 86b50a1 and aa123d3.

📒 Files selected for processing (1)
  • .github/workflows/nightly-e2e.yaml

📝 Walkthrough

Walkthrough

The nightly-e2e GitHub Actions workflow updates 33 CPU-based E2E jobs to run on linux-amd64-cpu4 instead of ubuntu-latest. GPU jobs keep their existing GPU runners. Job logic and dependencies are unchanged.

Changes

Nightly E2E Runner Migration

Layer / File(s) Summary
Comments
.github/workflows/nightly-e2e.yaml
Updated inline comments for cloud-e2e and launchable-smoke-e2e to reference linux-amd64-cpu4.
Runner Configuration
.github/workflows/nightly-e2e.yaml
Changed the runs-on value from ubuntu-latest to linux-amd64-cpu4 for 33 CPU-based E2E jobs (e.g., cloud-e2e, cloud-onboard-e2e, cloud-inference-e2e, skill-agent-e2e, docs-validation-e2e, messaging-providers-e2e, messaging-compatible-endpoint-e2e, kimi-inference-compat-e2e, token-rotation-e2e, sandbox-survival-e2e, issue-2478-crash-loop-recovery-e2e, hermes-e2e, hermes-discord-e2e, sandbox-operations-e2e, inference-routing-e2e, network-policy-e2e, deployment-services-e2e, diagnostics-e2e, credential-migration-e2e, snapshot-commands-e2e, shields-config-e2e, rebuild-openclaw-e2e, upgrade-stale-sandbox-e2e, rebuild-hermes-e2e, rebuild-hermes-stale-base-e2e, double-onboard-e2e, onboard-repair-e2e, onboard-resume-e2e, runtime-overrides-e2e, credential-sanitization-e2e, telegram-injection-e2e, overlayfs-autofix-e2e, launchable-smoke-e2e).
Orchestration / Dependencies
.github/workflows/nightly-e2e.yaml
Downstream jobs notify-on-failure and scorecard retain the same job dependencies; no dependency graph changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 We swapped the burrow's shoes, so light and new,
Thirty-three hops now stride on cpu4's dew.
GPU friends keep racing, engines bright and keen,
While nightly tests hum steady and serene.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: migrating E2E jobs to NVIDIA self-hosted runners (linux-amd64-cpu4), which aligns perfectly with the changeset of 33 nightly E2E jobs being updated.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci/nightly-runner-migration

Comment @coderabbitai help to get the list of available commands and usage tips.

@jyaunches jyaunches added v0.0.24 and removed v0.0.24 labels May 6, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/nightly-e2e.yaml (1)

6-7: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update stale runner descriptions in top-of-file comments.

Lines 6 and 29 still describe cloud-e2e and launchable-smoke-e2e as running on ubuntu-latest, but both now run on linux-amd64-cpu4.

✏️ Suggested comment-only fix
-#   cloud-e2e                Cloud inference (NVIDIA Endpoint API) on ubuntu-latest.
+#   cloud-e2e                Cloud inference (NVIDIA Endpoint API) on linux-amd64-cpu4.
...
-#   launchable-smoke-e2e     Community install path (brev-launchable-ci-cpu.sh) on ubuntu-latest.
+#   launchable-smoke-e2e     Community install path (brev-launchable-ci-cpu.sh) on linux-amd64-cpu4.

Also applies to: 29-30

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/nightly-e2e.yaml around lines 6 - 7, Update the stale
top-of-file comments that describe runner platforms: change the descriptions for
"cloud-e2e" and "launchable-smoke-e2e" (the comment lines mentioning cloud-e2e
and launchable-smoke-e2e) to reflect they run on "linux-amd64-cpu4" instead of
"ubuntu-latest" so the header comments match current runner configurations.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/nightly-e2e.yaml:
- Line 87: Update the header comments that still say "ubuntu-latest" to match
the actual runner labels used: change the comment entries that reference
ubuntu-latest (the header comments around the workflow name and the reusable job
descriptions) so they reflect that "cloud-e2e" and "launchable-smoke-e2e" are
running on "linux-amd64-cpu4" instead of ubuntu-latest; search for comment text
containing "ubuntu-latest" and replace or reword them to mention
"linux-amd64-cpu4" and the specific job names "cloud-e2e" and
"launchable-smoke-e2e" so the comments accurately describe the runner
configuration.

---

Outside diff comments:
In @.github/workflows/nightly-e2e.yaml:
- Around line 6-7: Update the stale top-of-file comments that describe runner
platforms: change the descriptions for "cloud-e2e" and "launchable-smoke-e2e"
(the comment lines mentioning cloud-e2e and launchable-smoke-e2e) to reflect
they run on "linux-amd64-cpu4" instead of "ubuntu-latest" so the header comments
match current runner configurations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f5b4acee-e7ae-448b-b4f8-e3151a34fed6

📥 Commits

Reviewing files that changed from the base of the PR and between 283ee50 and 86b50a1.

📒 Files selected for processing (1)
  • .github/workflows/nightly-e2e.yaml

Comment thread .github/workflows/nightly-e2e.yaml
CodeRabbit flagged that the header comments still referenced
ubuntu-latest for cloud-e2e and launchable-smoke-e2e.
@jyaunches jyaunches merged commit f340f66 into main May 6, 2026
14 checks passed
ericksoa added a commit that referenced this pull request May 7, 2026
Admin merge of the #3144 revert after PR checks passed and the branch nightly showed broad green signal on the reverted runner config.
@wscurran wscurran added area: ci CI workflows, checks, release automation, or GitHub Actions chore Build, CI, dependency, or tooling maintenance and removed CI/CD labels Jun 3, 2026
@jyaunches jyaunches deleted the ci/nightly-runner-migration branch June 12, 2026 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: ci CI workflows, checks, release automation, or GitHub Actions chore Build, CI, dependency, or tooling maintenance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants