Skip to content

feat(ci): CodeRabbit E2E test recommendations + selective nightly dispatch for PRs #2564

@jyaunches

Description

@jyaunches

Problem

When a PR touches sensitive code paths (entrypoint scripts, Dockerfile, proxy rewrite, gateway auth), there is no signal telling the reviewer which nightly E2E jobs are relevant. Reviewers must either know the full test matrix by heart or dispatch the entire nightly suite (16+ jobs, ~8 hours of runner time) on faith.

The weekend of Apr 25–27 showed the cost: Bug 2 (gateway token externalization) would have been caught by cloud-experimental-e2e Phase 5e (TUI smoke), but that job was disabled — and no review comment flagged that the PR touched the token flow and needed TUI validation.

Proposal: Two-Part Solution

Part 1: CodeRabbit path_instructions for E2E recommendations

Add path_instructions entries to .coderabbit.yaml that map file-change patterns to recommended nightly E2E jobs. CodeRabbit will surface these as review comments on every PR that touches a mapped path.

File → E2E mapping:

File Pattern Recommended E2E Jobs Rationale
scripts/nemoclaw-start.sh, scripts/lib/sandbox-init.sh cloud-experimental-e2e, sandbox-survival-e2e, sandbox-operations-e2e Entrypoint changes affect every sandbox boot. Landlock/non-root execution is invisible to unit tests.
Dockerfile, Dockerfile.base cloud-e2e, sandbox-survival-e2e, hermes-e2e, rebuild-openclaw-e2e Layer ordering, permissions, baked config affect image behavior.
nemoclaw-blueprint/scripts/http-proxy-fix.js cloud-e2e, inference-routing-e2e Proxy rewrite affects all inference routing. FORWARD-mode path needs manual validation until forward-proxy-e2e exists.
src/lib/onboard.ts cloud-e2e, sandbox-operations-e2e, rebuild-openclaw-e2e Core onboarding logic.
src/nemoclaw.ts (status/recovery/connect functions) sandbox-survival-e2e, sandbox-operations-e2e, skip-permissions-e2e CLI dispatch and gateway recovery. These are the exact jobs that caught the #2398 hang.
src/lib/cluster-image-patch.ts, src/lib/preflight.ts overlayfs-autofix-e2e Docker 26+ compatibility.
src/lib/deploy.ts deployment-services-e2e Deployment lifecycle.
src/lib/sandbox-state.ts snapshot-commands-e2e, rebuild-openclaw-e2e Backup/restore/rebuild.
src/lib/shields*.ts shields-config-e2e Config mutability.
agents/hermes/** hermes-e2e, rebuild-hermes-e2e Hermes agent.

CodeRabbit would generate a comment like:

🧪 E2E Test Recommendation

This PR modifies scripts/nemoclaw-start.sh. Consider running these nightly E2E jobs before merge:

  • sandbox-survival-e2e — gateway restart recovery
  • sandbox-operations-e2e — process recovery after gateway kill
  • cloud-experimental-e2e — Landlock + security checks

To run selectively: gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=sandbox-survival-e2e,sandbox-operations-e2e

Part 2: Selective job dispatch via workflow_dispatch input

Add a jobs input to nightly-e2e.yaml that lets maintainers run a subset of nightly jobs on any branch:

on:
  schedule:
    - cron: "0 0 * * *"
  workflow_dispatch:
    inputs:
      jobs:
        description: "Comma-separated job names to run (empty = all)"
        required: false
        type: string
      ref_override:
        description: "Override ref (e.g. PR branch). Empty = triggering ref."
        required: false
        type: string

Each job gets a conditional:

cloud-e2e:
  if: >-
    github.repository == 'NVIDIA/NemoClaw' &&
    (github.event_name != 'workflow_dispatch' ||
     inputs.jobs == '' ||
     contains(inputs.jobs, 'cloud-e2e'))

Maintainer workflow:

  1. CodeRabbit comments on a PR: "recommend running sandbox-survival-e2e, sandbox-operations-e2e"
  2. Maintainer runs: gh workflow run nightly-e2e.yaml --ref pull-request/2500 -f jobs=sandbox-survival-e2e,sandbox-operations-e2e
  3. Only those 2 jobs run (~10 min instead of ~8 hours)
  4. Results visible in the Actions tab, linked back to the PR branch

Future: Part 3 (stretch) — Automated dispatch from CodeRabbit comment

A GitHub Action triggered by issue_comment that parses a /run-e2e <jobs> command from maintainers:

/run-e2e sandbox-survival-e2e,sandbox-operations-e2e

This would gh workflow run the selective dispatch automatically. Lower priority since the CLI command is already fast.

Implementation Steps

  • Phase 1: Add path_instructions to .coderabbit.yaml with the file→E2E mapping
  • Phase 2: Add inputs.jobs to nightly-e2e.yaml workflow_dispatch with per-job conditionals
  • Phase 3: Update CodeRabbit instructions to include the gh workflow run command in recommendations
  • Phase 4 (stretch): /run-e2e comment-triggered Action

Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    04-25-regressionIssues raised from the Apr 25 weekend regression analysisarea: ciCI workflows, checks, release automation, or GitHub Actionsarea: e2eEnd-to-end tests, nightly failures, or validation infrastructure
    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions