feat(e2e): generate scenario fan-out matrix from typed registry by jyaunches · Pull Request #4359 · NVIDIA/NemoClaw

jyaunches · 2026-05-27T22:10:44Z

Why

The e2e-scenarios-all.yaml fan-out has 8 hand-wired jobs, but the typed scenario registry in baseline.ts already defines 22 scenarios. The other 14 are defined-but-not-dispatched, and every new scenario today requires a YAML edit that's easy to forget. The hand-wired job names also no longer reflect what each scenario actually tests (manifest, suites, expectedState).

This PR makes the fan-out generated from the existing typed registry so the workflow can never drift from baseline.ts again.

What changed

`--emit-matrix` flag on `run.ts`

Walks listScenarios() (the existing registry — no new registry created) and prints a single-line JSON array of GHA matrix include entries:

[
  { "id": "ubuntu-repo-cloud-openclaw",
    "runner": "ubuntu-latest",
    "label": "ubuntu-local · ubuntu-repo-cloud-openclaw · smoke+inference+credentials",
    "platform": "ubuntu-local",
    "suites": ["smoke", "inference", "credentials"] },
  ...
]

`runner-routing.ts` helper

Derives the runs-on label from ScenarioEnvironment.platform, with a runs-on:<label> override path via runnerRequirements. Throws on unknown platforms so missing mappings fail loudly during matrix generation rather than silently falling back to ubuntu-latest (which used to mask routing bugs in the legacy bash ROUTES map).

`e2e-scenarios-all.yaml` refactor

Two-job shape:

generate-matrix — checks out, installs deps, runs --emit-matrix, and writes the JSON to $GITHUB_OUTPUT. Also renders a Markdown table of all matrix entries in the step summary so you can see the full plan at a glance.
run-scenario — uses strategy.matrix.include: ${{ fromJson(needs.generate-matrix.outputs.matrix) }} and calls the existing e2e-scenarios.yaml reusable workflow per scenario. Tile names use ${{ matrix.label }} so the sidebar reflects what's actually being tested.

Adding a scenario from now on

Add the entry to canonicalScenarioInputs in baseline.ts.
That's it. Next workflow run automatically picks it up as a new tile.

What didn't change

e2e-scenarios.yaml (the single-scenario reusable workflow) is untouched. Its resolve-runner job continues to be the authoritative runner-selection path during execution; the runner field in the matrix is informational and used for the sidebar label.
The bash ROUTES map in e2e-scenarios.yaml is left in place for now to keep this PR focused. It becomes effectively dead code once this lands and can be removed in a follow-up.
No new registry. Everything reads from test/e2e-scenario/scenarios/registry.ts.

Verification

Unit tests in e2e-scenario-matrix.test.ts cover:

matrix entry per registered scenario
runner resolves for every scenario
platform-default routing (ubuntu/macos/wsl/gpu)
explicit runs-on:<label> override
loud failure on unknown platform
single-line JSON output suitable for $GITHUB_OUTPUT

Local run:

✓ test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts (6 tests)
✓ test/e2e-scenario/framework-tests/e2e-scenario-registry.test.ts (6 tests)
✓ test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts (2 tests)
✓ test/e2e-scenario/framework-tests/e2e-plan-compiler.test.ts
✓ test/e2e-scenario/framework-tests/e2e-scenario-resolver.test.ts
58 tests passed

Manual smoke:

$ npx tsx test/e2e-scenario/scenarios/run.ts --emit-matrix | jq 'length'
22
$ npx tsx test/e2e-scenario/scenarios/run.ts --emit-matrix | jq -r '.[].runner' | sort -u
linux-amd64-gpu-rtxpro6000-latest-1
macos-26
ubuntu-latest
windows-latest

Test depth

Unit tests are sufficient — this is a build/CI plumbing change, not a behavioral change to the test runner or scenarios themselves. The actual scenario execution path (e2e-scenarios.yaml) is unchanged. After merge, the next manual workflow_dispatch of E2E / Scenario Runner / All will validate the matrix end-to-end.

Follow-ups (separate PRs)

Delete the now-dead ROUTES bash map in e2e-scenarios.yaml and have its resolve-runner job consume runner-routing.ts too.
Consider sharding e2e-scenarios-all.yaml by platform once we cross ~80 scenarios (currently at 22, plenty of headroom against the 256-job GHA ceiling).

Summary by CodeRabbit

Tests
- Added end-to-end tests validating scenario matrix generation, runner selection, and error cases for invalid or missing platform/override values.
Chores
- CI now builds a dynamic test matrix at runtime and runs one job per scenario; fails early if the matrix is empty.
- Scenario runner can emit a deterministic matrix JSON for the workflow and produces human-readable job labels.

Replace the hand-wired job list in e2e-scenarios-all.yaml with a generated matrix sourced from the existing typed scenario registry. Adding a scenario in test/e2e-scenario/scenarios/scenarios/baseline.ts now automatically produces a tile in the fan-out workflow on the next run, with no workflow edits required. This closes the drift gap that left ~14 of 22 registered scenarios undispatched. Changes: - Add --emit-matrix flag to test/e2e-scenario/scenarios/run.ts that walks listScenarios() and prints a single-line JSON array of GHA matrix include entries: { id, runner, label, platform, suites }. - Extract runner-routing.ts that derives the runs-on label from ScenarioEnvironment.platform, with a runs-on:<label> override path via runnerRequirements. Throws on unknown platforms so missing mappings fail loudly during matrix generation. - Refactor .github/workflows/e2e-scenarios-all.yaml to a two-job shape: generate-matrix emits the JSON, run-scenario fans out via strategy.matrix.include and calls the existing e2e-scenarios.yaml reusable workflow. Tile names come from the typed label so the sidebar reflects what is actually being tested. - Guard the run.ts CLI bootstrap so importing buildScenarioMatrix from tests does not trigger the side-effecting main() path. - Add e2e-scenario-matrix.test.ts covering matrix shape, registry parity, runner overrides, unknown-platform errors, and CLI output contract.

copy-pr-bot · 2026-05-27T22:10:48Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-27T22:10:57Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6a305f81-e907-402a-a29c-b704cc5758bb

📥 Commits

Reviewing files that changed from the base of the PR and between b573c81 and e788fbf.

📒 Files selected for processing (2)

test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
test/e2e-scenario/scenarios/runner-routing.ts

🚧 Files skipped from review as they are similar to previous changes (2)

test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
test/e2e-scenario/scenarios/runner-routing.ts

📝 Walkthrough

Walkthrough

This PR replaces static GitHub Actions job enumeration with a runtime-generated matrix by adding typed runner resolution, a --emit-matrix CLI mode to build and emit scenario matrix rows, tests for matrix correctness and error cases, and workflow changes to invoke the generator and fan out per-scenario jobs dynamically.

Changes

Dynamic Scenario Matrix Generation

Layer / File(s)	Summary
Runner routing foundation `test/e2e-scenario/scenarios/runner-routing.ts`	Exports `ResolvedRunner` and `PLATFORM_DEFAULT_RUNNER`, implements `resolveRunnerForScenario()` that prefers `runs-on:<label>` overrides (rejecting empty labels) and falls back to platform defaults, throwing on unmapped platforms.
Scenario matrix construction `test/e2e-scenario/scenarios/run.ts`	Adds `--emit-matrix` option and `buildScenarioMatrix()` producing `ScenarioMatrixEntry[]` with `{id, runner, label, platform, suites}` using `resolveRunnerForScenario`.
Matrix emission and module guarding `test/e2e-scenario/scenarios/run.ts`	Implements `emitMatrix()` to print a single-line JSON array, and guards main execution so the module can be imported by tests; adds try/catch and exit-code handling.
GitHub Actions workflow integration `.github/workflows/e2e-scenarios-all.yaml`	Updates header; adds `generate-matrix` job that invokes the emitter, fails if the matrix is empty, writes a step summary, and uses the emitted matrix to fan out `run-scenario` via a reusable workflow.
E2E matrix tests `test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts`	Adds tests that spawn the emitter, assert one entry per registered scenario, validate `runner`/`label` fields, platform→runner mappings, override and error cases, and single-line JSON output format.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

CI/CD, refactor

Suggested reviewers

cv

Poem

🐰 I built a matrix, neat and spry,
From registry rows that reach the sky.
No hardcoded jobs to bog the night,
Runners chosen, labels right.
Tests hum softly — all systems go, hop high!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely describes the main change: introducing dynamic matrix generation for e2e scenarios from the typed registry, replacing the previous static YAML wiring.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/dynamic-e2e-scenario-matrix

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-27T22:12:21Z

E2E Advisor Recommendation

Required E2E: e2e-scenarios-all
Optional E2E: ubuntu-repo-cloud-openclaw, gpu-repo-local-ollama-openclaw

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

e2e-scenarios-all (high): This is the directly modified workflow. It should be dispatched to validate that generate-matrix can install dependencies, run npx tsx test/e2e-scenario/scenarios/run.ts --emit-matrix, publish a non-empty JSON matrix, and fan out each typed scenario through the reusable scenario runner.

Optional E2E

ubuntu-repo-cloud-openclaw (medium): Useful lower-cost smoke check of the reusable scenario runner path after the run.ts direct-invocation guard and matrix export changes; it validates the canonical Ubuntu cloud OpenClaw user flow without waiting for the full all-scenario fan-out.
gpu-repo-local-ollama-openclaw (high): Optional confidence for the new typed runner-routing mapping of GPU scenarios to the self-hosted GPU runner label, if GPU capacity is available.

New E2E recommendations

scenario-runner-routing-consistency (high): The all-scenarios workflow now derives its matrix from the typed registry, while the called e2e-scenarios.yaml workflow still has its own hardcoded route table. Add coverage that fails if any registered scenario emitted by --emit-matrix cannot be executed by the reusable scenario workflow's route resolver.
- Suggested test: Add a framework or workflow-validation test that compares typed registry scenario IDs against the e2e-scenarios.yaml route map, or migrate e2e-scenarios.yaml to consume the same runner-routing.ts resolver.

Dispatch hint

Workflow: e2e-scenarios-all.yaml
jobs input: ``

github-actions · 2026-05-27T22:12:22Z

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

e2e-scenarios-all: The PR changes the scenario fan-out workflow and shared scenario runner/routing code used to generate and execute the scenario matrix. Per policy, scenario workflow and runner/runtime changes require running the full scenario fan-out.
- Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

None.

Relevant changed files

.github/workflows/e2e-scenarios-all.yaml
test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
test/e2e-scenario/scenarios/run.ts
test/e2e-scenario/scenarios/runner-routing.ts

github-actions · 2026-05-27T22:14:24Z

PR Review Advisor

Findings: 1 needs attention, 5 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 4 still apply, 0 new items found

Review findings

🛠️ Needs attention

Generated fan-out includes scenarios the called workflow still rejects (.github/workflows/e2e-scenarios-all.yaml:80): The all-scenarios workflow now builds one matrix entry for every typed registry scenario and passes each `matrix.id` to `.github/workflows/e2e-scenarios.yaml`, but that called workflow remains the actual execution path and still resolves runners through a separate hardcoded bash `ROUTES` map. Current registry IDs such as `ubuntu-repo-cloud-openclaw-custom-policies`, `ubuntu-invalid-nvidia-key-negative`, and `ubuntu-gateway-port-conflict-negative` are emitted by the matrix but are not present in the child workflow route map, so those generated jobs will fail before execution with `No runner route for scenario`.
- Recommendation: Make the executable routing source match the generated matrix before expanding the fan-out. Either update/remove the child workflow `ROUTES` map so it consumes the same validated typed router, or filter this all-scenarios matrix to only IDs the child workflow can currently execute. Add a static workflow contract test that verifies every generated `listScenarios()` ID is accepted by the called workflow.
- Evidence: `run.ts` builds the matrix from `listScenarios()`, and `e2e-scenarios-all.yaml` uses `include: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}` then calls the reusable workflow with `scenarios: ${{ matrix.id }}`. `.github/workflows/e2e-scenarios.yaml` still uses a hardcoded `ROUTES` map that lacks later registry scenarios including the custom-policies and negative cases.

🔎 Worth checking

Source-of-truth review needed: Workflow runner routing: The advisor marked localized patch analysis as missing.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `e2e-scenarios-all.yaml` emits every registry ID, while `.github/workflows/e2e-scenarios.yaml` still exits on missing `ROUTES[$id]`.
Source-of-truth review needed: Workflow secret contract: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `e2e-scenarios-all.yaml` passes only `NVIDIA_API_KEY`, while registry scenarios require OpenAI-compatible, Brave, Telegram, Discord, and Slack secrets.
Generated fan-out does not pass all secrets required by generated scenarios (.github/workflows/e2e-scenarios-all.yaml:85): The matrix now includes every registry scenario, but the reusable workflow call only passes `NVIDIA_API_KEY`. Several generated scenarios declare other required secrets in the typed registry, including OpenAI-compatible, Brave, Telegram, Discord, and Slack credentials. Even if current dry-run paths do not always consume those secrets, the all-workflow contract is broader than its secret plumbing.
- Recommendation: Either pass and declare the optional secrets required by every generated scenario in the reusable workflow contract, or restrict this workflow to scenarios whose required secrets are actually available. Add a contract test that compares generated matrix IDs and their `requiredSecrets` metadata against the secrets exposed to `.github/workflows/e2e-scenarios.yaml`.
- Evidence: `e2e-scenarios-all.yaml` passes only `NVIDIA_API_KEY`; `baseline.ts` declares `OPENAI_COMPATIBLE_API_KEY`, `BRAVE_API_KEY`, `TELEGRAM_BOT_TOKEN`, `DISCORD_BOT_TOKEN`, and `SLACK_BOT_TOKEN` on scenarios that `buildScenarioMatrix()` now emits.
Runner override accepts arbitrary `runs-on:` labels without an allowlist (test/e2e-scenario/scenarios/runner-routing.ts:37): `resolveRunnerForScenario()` gives precedence to any non-empty `runnerRequirements` entry starting with `runs-on:` and returns the suffix as workflow routing metadata. The changed workflow currently treats `matrix.runner` as informational and the child workflow still ignores it for actual `runs-on`, which limits immediate impact. However, the PR introduces this field as runner-selection data and tests arbitrary `custom-self-hosted` as accepted behavior; if a later consumer wires it directly into `runs-on`, scenario metadata could route secret-bearing jobs to unexpected runner labels.
- Recommendation: Validate `runs-on:` overrides against a narrow allowlist of approved GitHub-hosted and self-hosted labels before emitting them, or remove arbitrary overrides from the workflow-facing matrix until the trusted routing contract is finalized. Add a negative test for rejected runner labels.
- Evidence: `runner-routing.ts` returns `explicit.slice("runs-on:".length).trim()` after only checking non-empty. `e2e-scenario-matrix.test.ts` asserts that `runs-on:custom-self-hosted` resolves successfully.
Matrix tests do not cover the reusable workflow contract (test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts:24): The new tests validate local matrix generation, canonical platform routing, and JSON shape, but they do not verify that the generated matrix can actually be consumed by `.github/workflows/e2e-scenarios.yaml`. That leaves the route-map and secret-contract regressions undetected by the added test suite.
- Recommendation: Add static workflow contract tests that parse or otherwise validate the called workflow contract: every generated scenario ID must be routable by the child workflow, and every generated scenario's required secrets must be declared/passed or explicitly filtered out.
- Evidence: The added tests assert one matrix entry per registered scenario and successful `--emit-matrix` output, but no test inspects the child workflow `ROUTES` map or reusable workflow `secrets` declaration.

🌱 Nice ideas

None.

Since last review details

Current findings:

Source-of-truth review needed: Workflow runner routing: The advisor marked localized patch analysis as missing.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `e2e-scenarios-all.yaml` emits every registry ID, while `.github/workflows/e2e-scenarios.yaml` still exits on missing `ROUTES[$id]`.
Source-of-truth review needed: Workflow secret contract: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `e2e-scenarios-all.yaml` passes only `NVIDIA_API_KEY`, while registry scenarios require OpenAI-compatible, Brave, Telegram, Discord, and Slack secrets.
Generated fan-out includes scenarios the called workflow still rejects (.github/workflows/e2e-scenarios-all.yaml:80): The all-scenarios workflow now builds one matrix entry for every typed registry scenario and passes each `matrix.id` to `.github/workflows/e2e-scenarios.yaml`, but that called workflow remains the actual execution path and still resolves runners through a separate hardcoded bash `ROUTES` map. Current registry IDs such as `ubuntu-repo-cloud-openclaw-custom-policies`, `ubuntu-invalid-nvidia-key-negative`, and `ubuntu-gateway-port-conflict-negative` are emitted by the matrix but are not present in the child workflow route map, so those generated jobs will fail before execution with `No runner route for scenario`.
- Recommendation: Make the executable routing source match the generated matrix before expanding the fan-out. Either update/remove the child workflow `ROUTES` map so it consumes the same validated typed router, or filter this all-scenarios matrix to only IDs the child workflow can currently execute. Add a static workflow contract test that verifies every generated `listScenarios()` ID is accepted by the called workflow.
- Evidence: `run.ts` builds the matrix from `listScenarios()`, and `e2e-scenarios-all.yaml` uses `include: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}` then calls the reusable workflow with `scenarios: ${{ matrix.id }}`. `.github/workflows/e2e-scenarios.yaml` still uses a hardcoded `ROUTES` map that lacks later registry scenarios including the custom-policies and negative cases.
Generated fan-out does not pass all secrets required by generated scenarios (.github/workflows/e2e-scenarios-all.yaml:85): The matrix now includes every registry scenario, but the reusable workflow call only passes `NVIDIA_API_KEY`. Several generated scenarios declare other required secrets in the typed registry, including OpenAI-compatible, Brave, Telegram, Discord, and Slack credentials. Even if current dry-run paths do not always consume those secrets, the all-workflow contract is broader than its secret plumbing.
- Recommendation: Either pass and declare the optional secrets required by every generated scenario in the reusable workflow contract, or restrict this workflow to scenarios whose required secrets are actually available. Add a contract test that compares generated matrix IDs and their `requiredSecrets` metadata against the secrets exposed to `.github/workflows/e2e-scenarios.yaml`.
- Evidence: `e2e-scenarios-all.yaml` passes only `NVIDIA_API_KEY`; `baseline.ts` declares `OPENAI_COMPATIBLE_API_KEY`, `BRAVE_API_KEY`, `TELEGRAM_BOT_TOKEN`, `DISCORD_BOT_TOKEN`, and `SLACK_BOT_TOKEN` on scenarios that `buildScenarioMatrix()` now emits.
Runner override accepts arbitrary `runs-on:` labels without an allowlist (test/e2e-scenario/scenarios/runner-routing.ts:37): `resolveRunnerForScenario()` gives precedence to any non-empty `runnerRequirements` entry starting with `runs-on:` and returns the suffix as workflow routing metadata. The changed workflow currently treats `matrix.runner` as informational and the child workflow still ignores it for actual `runs-on`, which limits immediate impact. However, the PR introduces this field as runner-selection data and tests arbitrary `custom-self-hosted` as accepted behavior; if a later consumer wires it directly into `runs-on`, scenario metadata could route secret-bearing jobs to unexpected runner labels.
- Recommendation: Validate `runs-on:` overrides against a narrow allowlist of approved GitHub-hosted and self-hosted labels before emitting them, or remove arbitrary overrides from the workflow-facing matrix until the trusted routing contract is finalized. Add a negative test for rejected runner labels.
- Evidence: `runner-routing.ts` returns `explicit.slice("runs-on:".length).trim()` after only checking non-empty. `e2e-scenario-matrix.test.ts` asserts that `runs-on:custom-self-hosted` resolves successfully.
Matrix tests do not cover the reusable workflow contract (test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts:24): The new tests validate local matrix generation, canonical platform routing, and JSON shape, but they do not verify that the generated matrix can actually be consumed by `.github/workflows/e2e-scenarios.yaml`. That leaves the route-map and secret-contract regressions undetected by the added test suite.
- Recommendation: Add static workflow contract tests that parse or otherwise validate the called workflow contract: every generated scenario ID must be routable by the child workflow, and every generated scenario's required secrets must be declared/passed or explicitly filtered out.
- Evidence: The added tests assert one matrix entry per registered scenario and successful `--emit-matrix` output, but no test inspects the child workflow `ROUTES` map or reusable workflow `secrets` declaration.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

test/e2e-scenario/scenarios/run.ts (1)

104-119: ⚡ Quick win

Apply the documented id sort before matrix emission.

The doc says matrix entries are sorted by id, but the implementation currently preserves registry order. Sorting here makes output deterministically diffable as intended.

Proposed fix

 export function buildScenarioMatrix(): ScenarioMatrixEntry[] {
-  return listScenarios().map((scenario): ScenarioMatrixEntry => {
+  return [...listScenarios()]
+    .sort((a, b) => a.id.localeCompare(b.id))
+    .map((scenario): ScenarioMatrixEntry => {
     const { runner } = resolveRunnerForScenario(scenario);
     return {
       id: scenario.id,
       runner,
       label: buildLabel(scenario),
       platform: scenario.environment?.platform ?? "unknown",
       suites: scenario.suiteIds ?? [],
     };
   });
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/scenarios/run.ts` around lines 104 - 119, The
buildScenarioMatrix function currently maps listScenarios() without sorting, so
update buildScenarioMatrix to sort the scenarios by their id before mapping
(e.g., call .sort(...) on the array returned by listScenarios()) so the returned
ScenarioMatrixEntry[] is deterministically ordered by scenario.id; locate
buildScenarioMatrix and adjust the pipeline that uses listScenarios() and
resolveRunnerForScenario(scenario) to operate on a sorted array (use a string
compare/localeCompare on scenario.id).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-scenario/scenarios/runner-routing.ts`:
- Around line 39-42: The code currently accepts a `runs-on:` override that can
slice to an empty string; update the block that finds `explicit` from
`scenario.runnerRequirements` so it rejects empty labels — e.g., after computing
`const label = explicit.slice("runs-on:".length)` (or by ensuring `req.length >
"runs-on:".length` when matching) validate `label !== ""` and if empty throw a
clear Error like "Invalid runs-on: override: empty runner label" (or return a
failing result) instead of returning `{ runner: "", ... }`; refer to the
`explicit` variable and the `runnerRequirements` lookup when making this change.

---

Nitpick comments:
In `@test/e2e-scenario/scenarios/run.ts`:
- Around line 104-119: The buildScenarioMatrix function currently maps
listScenarios() without sorting, so update buildScenarioMatrix to sort the
scenarios by their id before mapping (e.g., call .sort(...) on the array
returned by listScenarios()) so the returned ScenarioMatrixEntry[] is
deterministically ordered by scenario.id; locate buildScenarioMatrix and adjust
the pipeline that uses listScenarios() and resolveRunnerForScenario(scenario) to
operate on a sorted array (use a string compare/localeCompare on scenario.id).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 48621b06-d779-4cd5-a9ce-513bb8c854ab

📥 Commits

Reviewing files that changed from the base of the PR and between b14fd76 and b573c81.

📒 Files selected for processing (4)

.github/workflows/e2e-scenarios-all.yaml
test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
test/e2e-scenario/scenarios/run.ts
test/e2e-scenario/scenarios/runner-routing.ts

Resolutions: - run.ts: kept HEAD's removal of --dry-run/--validate-only modes; took main's richer --emit-matrix payload (runner/label/platform/suites) consumed by the dynamic matrix workflow (#4359). - 00-slack-provider-state.sh: kept HEAD's removal of the openclaw dry-run skip block; carried main's 'agent' local var. - docs/README.md: kept HEAD (live-only runner, deprecated bash entrypoints) and folded in main's 'Migration status is tracked outside the repository' line so e2e-migration-inventory-lock stays green. - e2e-coverage-report.test.ts, e2e-scenario-resolver.test.ts, runtime/resolver/coverage.ts: confirmed deletes (intentional in 9da75ac, 'retire legacy expected-states.yaml + runtime/resolver'). - e2e-expected-state.test.ts: applied the trim that 9da75ac's commit message described but did not include — drop the YAML-mirror parity tests that depended on the now-deleted expected-states.yaml, replace with id-coverage assertions against listExpectedStates(). - e2e-scenarios.yaml: retitled the summary step from 'typed dry-run summary' to 'typed scenario run' so the workflow text matches the PR's live-only stance.

…only, and the bash runner (#4380) ## Why Removing `--dry-run` and plan-mode execution from the E2E runner: they let a coding agent ship work behind false-green test runs. The TS runner now has **one execution mode: live.** No flag, env var, helper, or branch in any production path bypasses real assertion execution. Two seams needed closing: 1. `phase.ts:executeStep` had no `child_process.spawn` — every real shell/probe step fell through to a hardcoded `failed: unsupported live step`, and four placeholder refs (`fake-pass`, `fake-retry-once-pass`, `fake-always-transient`, `phase-1-skeleton`) were the only paths that reported `passed`. 2. ~30 shell scripts under `validation_suites/`, `onboarding_assertions/`, and `nemoclaw_scenarios/` began with `if e2e_env_is_dry_run; then exit 0; fi`, so when the workflow passed `--dry-run` the scripts exited 0 before running. ## What changed ### Orchestrator (TypeScript) - `phase.ts:executeStep` now spawns shell steps via `child_process.spawn`, with **detached process groups** so timeouts kill `bash + sleep` cleanly via negative-pid signal. Probe steps return `skipped: "probe not registered"` until the registry lands. Pending steps return `skipped: "pending: <ref>"`. Unknown kinds throw. Real evidence captured to `.e2e/logs/<step-id>.log`. Step-level `reliability.timeoutSeconds` and `retry.{attempts,on}` enforced here, not in clients. - `run.ts`: `--dry-run`, `--validate-only` deleted. Default invocation is live execution. `--list` and `--plan-only` (local debug) survive read-only. **`--emit-matrix` added** for the dynamic-matrix workflow (#4359), with the richer payload (`runner`, `label`, `platform`, `suites`) merged from main. - `types.ts`: `RunContext.dryRun` deleted. ### Workflow - `e2e-scenarios.yaml`: the resolve-runner `--plan-only` warmup, and both `--dry-run` invocations (Linux + WSL), are gone. Workflows execute live. Summary step retitled from "typed dry-run summary" to "typed scenario run". - `tools/e2e-scenarios/workflow-boundary.mts` validator now **rejects** `--dry-run`, `--plan-only`, `--validate-only` in the workflow. ### Bash entrypoints (PR collapses what was originally going to be PR 1 + PR 2) - `runtime/run-scenario.sh`: 483 lines of duplicated install/onboard/gateway-check/suite-execution → **5-line fail-fast stub** pointing at `run.ts`. - `runtime/run-suites.sh`: same treatment. `PhaseOrchestrator.runShellStep` walks typed `assertionGroups` directly; nothing in TS calls a YAML-walking bash runner. ### Shell scripts (the leaves stay, the dry-run skip blocks die) - `validation_suites/**`, `onboarding_assertions/**`, `nemoclaw_scenarios/**`: every `if e2e_env_is_dry_run; then ... exit 0; fi` and every `[[ "${E2E_DRY_RUN:-0}" == "1" ]]` short-circuit removed. The real assertion logic that was hiding underneath now runs unconditionally. - `runtime/lib/env.sh`: `e2e_env_is_dry_run` helper deleted. - `inference_routing.sh`: dead `_e2e_inference_plan` helper deleted. ### Legacy YAML resolver retired (rolled in from the planned follow-up PR) - Deleted `nemoclaw_scenarios/expected-states.yaml` and the entire `runtime/resolver/` tree (`load.ts`, `plan.ts`, `schema.ts`, `coverage.ts`, `index.ts`, `validator.ts`, `expected-failure.ts`, `js-yaml.d.ts`) plus `runtime/coverage-report.sh`. - Six framework tests that exercised only the resolver path were dropped: `e2e-scenario-schema`, `e2e-scenario-resolver`, `e2e-coverage-report`, `e2e-scenario-additional-families`, `e2e-metadata-final-hygiene`, `e2e-expected-failure` (typed equivalent now lives in `e2e-negative-matcher.test.ts`). - `e2e-expected-state.test.ts`: YAML-mirror parity tests dropped; replaced with id-coverage assertions against `listExpectedStates()`. The typed registry in `scenarios/expected-states.ts` is the single source of truth. ### Tests - **DELETED** (validated dead code paths): `e2e-suite-runner.test.ts`, `e2e-scenario-first-migration.test.ts`, `e2e-expected-state-validator.test.ts`. - **REWRITTEN** `e2e-phase-orchestrators.test.ts`: now exercises real shell spawning via temp scripts (pass / fail-with-stderr-tail / timeout / retry-on-classified-transient / missing-ref), real probe skipping with visible reason, and real pending skipping. Replaces the prior placeholder refs with assertions that observe actual subprocess behavior. - **TRIMMED** `e2e-lib-helpers.test.ts`, `e2e-scenario-additional-families.test.ts`, `e2e-scenario-resolver.test.ts`, `e2e-context-helper.test.ts`: dry-run-mode / `run-scenario.sh`-spawning tests deleted; tests of real bash semantics survive. ### Docs - `test/e2e-scenario/docs/README.md`: one runner, one mode (live), no dry-run, no validate-only. Migration tracking section reconciled with main: "Migration status is tracked outside the repository in GitHub issues and pull requests" (parent issue #3588). ### Merge with main - Resolved conflicts in `run.ts`, `00-slack-provider-state.sh`, `docs/README.md`, three modify/delete pairs under `framework-tests/` and `runtime/resolver/`, and `e2e-scenarios.yaml`. - Noted: a small E2E `--validate-only`/`--dry-run` plumbing fix that was layered onto the live runner during merge — `~/.local/bin` is now prepended to child PATH at the framework boundary (`redaction.ts:buildChildEnv`) so post-install nemoclaw shims resolve on runners (notably self-hosted GPU and WSL) where it isn't on the default PATH; and snapshot lifecycle assertions in `validation_suites/lib/sandbox_lifecycle.sh` now use the canonical `nemoclaw <sandbox> snapshot <subcommand>` argv shape, matching `test-snapshot-commands.sh`. ## Verification ```text $ npx tsx test/e2e-scenario/scenarios/run.ts --list hybrid scenario registry - brev-launchable-cloud-openclaw: ... - ubuntu-repo-cloud-openclaw: ... (22 scenarios listed) $ npx tsx test/e2e-scenario/scenarios/run.ts --emit-matrix [{"id":"brev-launchable-cloud-openclaw","runner":"...","label":"...","platform":"...","suites":[...]}, ...] $ E2E_CONTEXT_DIR=/tmp/x npx tsx test/e2e-scenario/scenarios/run.ts \ --scenarios ubuntu-repo-cloud-openclaw ... (compiled plan output) ... Phase results: environment: skipped (skipped=1) onboarding: failed (passed=1 failed=1) runtime: failed (failed=34 skipped=5) $ cat /tmp/x/.e2e/logs/runtime.smoke.cli-available.log smoke:cli-available e2e context: missing required key(s): E2E_SCENARIO $ vitest run test/e2e-scenario/ Test Files 30 passed (30) Tests 366 passed (366) $ npm run typecheck:cli (clean) ``` **Audited absent in production paths:** `--dry-run`, `dryRun`, `E2E_DRY_RUN`, `e2e_env_is_dry_run`, `fake-pass`, `fake-retry-once-pass`, `fake-always-transient`, `phase-1-skeleton`, `unsupported-live-step`, `--validate-only`, `RunContext.dryRun`. ## Spec gates addressed - **Phase 6** — orchestrators execute live shell/probe assertion steps. - **Phase 7** — single TS runtime entrypoint; bash runners deprecated; workflows use the typed runner with no `--dry-run`. - **Workflow side of Phase 9** — `--dry-run`, `--validate-only`, and `suite_filter` gone from active paths. ## Known follow-ups (not blocking this PR) - Two leftover `if [[ -n "${E2E_DRY_RUN:-}" ]]` short-circuits remain inside the hermes branch of `validation_suites/messaging/slack/00-slack-provider-state.sh`. They are dead code now that nothing in production sets `E2E_DRY_RUN`, but should be removed for consistency with the stated audit.  ## Summary by CodeRabbit * **Tests** * E2E now runs live scenarios by default; many legacy plan-only/dry-run tests removed or refocused. Phase execution uses real shell steps with timeouts, retries, richer evidence and failure reporting. * **New Features** * Added probes and lifecycle utilities for CLI/gateway/sandbox/negative/health checks. Typed runner can emit an execution matrix and is the canonical executor. Secret-redaction and child-env minimization added for safer spawn logging. * **Chores** * Docs updated for live-only workflow; legacy bash entrypoints deprecated; CI adds a new scenario and tighter artifact uploads.  --------- Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>

coderabbitai Bot reviewed May 27, 2026

View reviewed changes

Comment thread test/e2e-scenario/scenarios/runner-routing.ts

cv added the v0.0.55 label May 27, 2026

wscurran added E2E labels May 27, 2026

jyaunches mentioned this pull request May 28, 2026

test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner #4380

Merged

cv added v0.0.56 Release target and removed v0.0.55 labels May 29, 2026

test(e2e): reject empty scenario runner labels

e788fbf

cv approved these changes May 29, 2026

View reviewed changes

cv merged commit 8f8f8ff into main May 29, 2026
21 checks passed

cv deleted the feat/dynamic-e2e-scenario-matrix branch May 29, 2026 23:42

wscurran added area: e2e End-to-end tests, nightly failures, or validation infrastructure feature PR adds or expands user-visible functionality and removed enhancement: testing labels Jun 3, 2026

jyaunches mentioned this pull request Jun 8, 2026

Adopt phase fixtures + registry-driven test discovery for Vitest E2E scenarios #4990

Closed

8 tasks

Conversation

jyaunches commented May 27, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What changed

--emit-matrix flag on run.ts

runner-routing.ts helper

e2e-scenarios-all.yaml refactor

Adding a scenario from now on

What didn't change

Verification

Test depth

Follow-ups (separate PRs)

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 27, 2026

Uh oh!

coderabbitai Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jyaunches commented May 27, 2026 •

edited by coderabbitai Bot

Loading

`--emit-matrix` flag on `run.ts`

`runner-routing.ts` helper

`e2e-scenarios-all.yaml` refactor

coderabbitai Bot commented May 27, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading

github-actions Bot commented May 27, 2026 •

edited

Loading