Skip to content

feat(e2e): generate scenario fan-out matrix from typed registry#4359

Merged
cv merged 2 commits into
mainfrom
feat/dynamic-e2e-scenario-matrix
May 29, 2026
Merged

feat(e2e): generate scenario fan-out matrix from typed registry#4359
cv merged 2 commits into
mainfrom
feat/dynamic-e2e-scenario-matrix

Conversation

@jyaunches

@jyaunches jyaunches commented May 27, 2026

Copy link
Copy Markdown
Contributor

Why

The e2e-scenarios-all.yaml fan-out has 8 hand-wired jobs, but the typed scenario registry in baseline.ts already defines 22 scenarios. The other 14 are defined-but-not-dispatched, and every new scenario today requires a YAML edit that's easy to forget. The hand-wired job names also no longer reflect what each scenario actually tests (manifest, suites, expectedState).

This PR makes the fan-out generated from the existing typed registry so the workflow can never drift from baseline.ts again.

What changed

--emit-matrix flag on run.ts

Walks listScenarios() (the existing registry — no new registry created) and prints a single-line JSON array of GHA matrix include entries:

[
  { "id": "ubuntu-repo-cloud-openclaw",
    "runner": "ubuntu-latest",
    "label": "ubuntu-local · ubuntu-repo-cloud-openclaw · smoke+inference+credentials",
    "platform": "ubuntu-local",
    "suites": ["smoke", "inference", "credentials"] },
  ...
]

runner-routing.ts helper

Derives the runs-on label from ScenarioEnvironment.platform, with a runs-on:<label> override path via runnerRequirements. Throws on unknown platforms so missing mappings fail loudly during matrix generation rather than silently falling back to ubuntu-latest (which used to mask routing bugs in the legacy bash ROUTES map).

e2e-scenarios-all.yaml refactor

Two-job shape:

  1. generate-matrix — checks out, installs deps, runs --emit-matrix, and writes the JSON to $GITHUB_OUTPUT. Also renders a Markdown table of all matrix entries in the step summary so you can see the full plan at a glance.
  2. run-scenario — uses strategy.matrix.include: ${{ fromJson(needs.generate-matrix.outputs.matrix) }} and calls the existing e2e-scenarios.yaml reusable workflow per scenario. Tile names use ${{ matrix.label }} so the sidebar reflects what's actually being tested.

Adding a scenario from now on

  1. Add the entry to canonicalScenarioInputs in baseline.ts.
  2. That's it. Next workflow run automatically picks it up as a new tile.

What didn't change

  • e2e-scenarios.yaml (the single-scenario reusable workflow) is untouched. Its resolve-runner job continues to be the authoritative runner-selection path during execution; the runner field in the matrix is informational and used for the sidebar label.
  • The bash ROUTES map in e2e-scenarios.yaml is left in place for now to keep this PR focused. It becomes effectively dead code once this lands and can be removed in a follow-up.
  • No new registry. Everything reads from test/e2e-scenario/scenarios/registry.ts.

Verification

Unit tests in e2e-scenario-matrix.test.ts cover:

  • matrix entry per registered scenario
  • runner resolves for every scenario
  • platform-default routing (ubuntu/macos/wsl/gpu)
  • explicit runs-on:<label> override
  • loud failure on unknown platform
  • single-line JSON output suitable for $GITHUB_OUTPUT

Local run:

✓ test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts (6 tests)
✓ test/e2e-scenario/framework-tests/e2e-scenario-registry.test.ts (6 tests)
✓ test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts (2 tests)
✓ test/e2e-scenario/framework-tests/e2e-plan-compiler.test.ts
✓ test/e2e-scenario/framework-tests/e2e-scenario-resolver.test.ts
58 tests passed

Manual smoke:

$ npx tsx test/e2e-scenario/scenarios/run.ts --emit-matrix | jq 'length'
22
$ npx tsx test/e2e-scenario/scenarios/run.ts --emit-matrix | jq -r '.[].runner' | sort -u
linux-amd64-gpu-rtxpro6000-latest-1
macos-26
ubuntu-latest
windows-latest

Test depth

Unit tests are sufficient — this is a build/CI plumbing change, not a behavioral change to the test runner or scenarios themselves. The actual scenario execution path (e2e-scenarios.yaml) is unchanged. After merge, the next manual workflow_dispatch of E2E / Scenario Runner / All will validate the matrix end-to-end.

Follow-ups (separate PRs)

  • Delete the now-dead ROUTES bash map in e2e-scenarios.yaml and have its resolve-runner job consume runner-routing.ts too.
  • Consider sharding e2e-scenarios-all.yaml by platform once we cross ~80 scenarios (currently at 22, plenty of headroom against the 256-job GHA ceiling).

Summary by CodeRabbit

  • Tests

    • Added end-to-end tests validating scenario matrix generation, runner selection, and error cases for invalid or missing platform/override values.
  • Chores

    • CI now builds a dynamic test matrix at runtime and runs one job per scenario; fails early if the matrix is empty.
    • Scenario runner can emit a deterministic matrix JSON for the workflow and produces human-readable job labels.

Review Change Stack

Replace the hand-wired job list in e2e-scenarios-all.yaml with a
generated matrix sourced from the existing typed scenario registry.
Adding a scenario in test/e2e-scenario/scenarios/scenarios/baseline.ts
now automatically produces a tile in the fan-out workflow on the next
run, with no workflow edits required. This closes the drift gap that
left ~14 of 22 registered scenarios undispatched.

Changes:
- Add --emit-matrix flag to test/e2e-scenario/scenarios/run.ts that
  walks listScenarios() and prints a single-line JSON array of GHA
  matrix include entries: { id, runner, label, platform, suites }.
- Extract runner-routing.ts that derives the runs-on label from
  ScenarioEnvironment.platform, with a runs-on:<label> override path
  via runnerRequirements. Throws on unknown platforms so missing
  mappings fail loudly during matrix generation.
- Refactor .github/workflows/e2e-scenarios-all.yaml to a two-job
  shape: generate-matrix emits the JSON, run-scenario fans out via
  strategy.matrix.include and calls the existing e2e-scenarios.yaml
  reusable workflow. Tile names come from the typed label so the
  sidebar reflects what is actually being tested.
- Guard the run.ts CLI bootstrap so importing buildScenarioMatrix
  from tests does not trigger the side-effecting main() path.
- Add e2e-scenario-matrix.test.ts covering matrix shape, registry
  parity, runner overrides, unknown-platform errors, and CLI output
  contract.
@copy-pr-bot

copy-pr-bot Bot commented May 27, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6a305f81-e907-402a-a29c-b704cc5758bb

📥 Commits

Reviewing files that changed from the base of the PR and between b573c81 and e788fbf.

📒 Files selected for processing (2)
  • test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
  • test/e2e-scenario/scenarios/runner-routing.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
  • test/e2e-scenario/scenarios/runner-routing.ts

📝 Walkthrough

Walkthrough

This PR replaces static GitHub Actions job enumeration with a runtime-generated matrix by adding typed runner resolution, a --emit-matrix CLI mode to build and emit scenario matrix rows, tests for matrix correctness and error cases, and workflow changes to invoke the generator and fan out per-scenario jobs dynamically.

Changes

Dynamic Scenario Matrix Generation

Layer / File(s) Summary
Runner routing foundation
test/e2e-scenario/scenarios/runner-routing.ts
Exports ResolvedRunner and PLATFORM_DEFAULT_RUNNER, implements resolveRunnerForScenario() that prefers runs-on:<label> overrides (rejecting empty labels) and falls back to platform defaults, throwing on unmapped platforms.
Scenario matrix construction
test/e2e-scenario/scenarios/run.ts
Adds --emit-matrix option and buildScenarioMatrix() producing ScenarioMatrixEntry[] with {id, runner, label, platform, suites} using resolveRunnerForScenario.
Matrix emission and module guarding
test/e2e-scenario/scenarios/run.ts
Implements emitMatrix() to print a single-line JSON array, and guards main execution so the module can be imported by tests; adds try/catch and exit-code handling.
GitHub Actions workflow integration
.github/workflows/e2e-scenarios-all.yaml
Updates header; adds generate-matrix job that invokes the emitter, fails if the matrix is empty, writes a step summary, and uses the emitted matrix to fan out run-scenario via a reusable workflow.
E2E matrix tests
test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
Adds tests that spawn the emitter, assert one entry per registered scenario, validate runner/label fields, platform→runner mappings, override and error cases, and single-line JSON output format.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

CI/CD, refactor

Suggested reviewers

  • cv

Poem

🐰 I built a matrix, neat and spry,
From registry rows that reach the sky.
No hardcoded jobs to bog the night,
Runners chosen, labels right.
Tests hum softly — all systems go, hop high!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: introducing dynamic matrix generation for e2e scenarios from the typed registry, replacing the previous static YAML wiring.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/dynamic-e2e-scenario-matrix

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: e2e-scenarios-all
Optional E2E: ubuntu-repo-cloud-openclaw, gpu-repo-local-ollama-openclaw

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • e2e-scenarios-all (high): This is the directly modified workflow. It should be dispatched to validate that generate-matrix can install dependencies, run npx tsx test/e2e-scenario/scenarios/run.ts --emit-matrix, publish a non-empty JSON matrix, and fan out each typed scenario through the reusable scenario runner.

Optional E2E

  • ubuntu-repo-cloud-openclaw (medium): Useful lower-cost smoke check of the reusable scenario runner path after the run.ts direct-invocation guard and matrix export changes; it validates the canonical Ubuntu cloud OpenClaw user flow without waiting for the full all-scenario fan-out.
  • gpu-repo-local-ollama-openclaw (high): Optional confidence for the new typed runner-routing mapping of GPU scenarios to the self-hosted GPU runner label, if GPU capacity is available.

New E2E recommendations

  • scenario-runner-routing-consistency (high): The all-scenarios workflow now derives its matrix from the typed registry, while the called e2e-scenarios.yaml workflow still has its own hardcoded route table. Add coverage that fails if any registered scenario emitted by --emit-matrix cannot be executed by the reusable scenario workflow's route resolver.
    • Suggested test: Add a framework or workflow-validation test that compares typed registry scenario IDs against the e2e-scenarios.yaml route map, or migrate e2e-scenarios.yaml to consume the same runner-routing.ts resolver.

Dispatch hint

  • Workflow: e2e-scenarios-all.yaml
  • jobs input: ``

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • e2e-scenarios-all: The PR changes the scenario fan-out workflow and shared scenario runner/routing code used to generate and execute the scenario matrix. Per policy, scenario workflow and runner/runtime changes require running the full scenario fan-out.
    • Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

  • None.

Relevant changed files

  • .github/workflows/e2e-scenarios-all.yaml
  • test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
  • test/e2e-scenario/scenarios/run.ts
  • test/e2e-scenario/scenarios/runner-routing.ts

@github-actions

github-actions Bot commented May 27, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 1 needs attention, 5 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 4 still apply, 0 new items found

Review findings

🛠️ Needs attention

  • Generated fan-out includes scenarios the called workflow still rejects (.github/workflows/e2e-scenarios-all.yaml:80): The all-scenarios workflow now builds one matrix entry for every typed registry scenario and passes each `matrix.id` to `.github/workflows/e2e-scenarios.yaml`, but that called workflow remains the actual execution path and still resolves runners through a separate hardcoded bash `ROUTES` map. Current registry IDs such as `ubuntu-repo-cloud-openclaw-custom-policies`, `ubuntu-invalid-nvidia-key-negative`, and `ubuntu-gateway-port-conflict-negative` are emitted by the matrix but are not present in the child workflow route map, so those generated jobs will fail before execution with `No runner route for scenario`.
    • Recommendation: Make the executable routing source match the generated matrix before expanding the fan-out. Either update/remove the child workflow `ROUTES` map so it consumes the same validated typed router, or filter this all-scenarios matrix to only IDs the child workflow can currently execute. Add a static workflow contract test that verifies every generated `listScenarios()` ID is accepted by the called workflow.
    • Evidence: `run.ts` builds the matrix from `listScenarios()`, and `e2e-scenarios-all.yaml` uses `include: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}` then calls the reusable workflow with `scenarios: ${{ matrix.id }}`. `.github/workflows/e2e-scenarios.yaml` still uses a hardcoded `ROUTES` map that lacks later registry scenarios including the custom-policies and negative cases.

🔎 Worth checking

  • Source-of-truth review needed: Workflow runner routing: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `e2e-scenarios-all.yaml` emits every registry ID, while `.github/workflows/e2e-scenarios.yaml` still exits on missing `ROUTES[$id]`.
  • Source-of-truth review needed: Workflow secret contract: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `e2e-scenarios-all.yaml` passes only `NVIDIA_API_KEY`, while registry scenarios require OpenAI-compatible, Brave, Telegram, Discord, and Slack secrets.
  • Generated fan-out does not pass all secrets required by generated scenarios (.github/workflows/e2e-scenarios-all.yaml:85): The matrix now includes every registry scenario, but the reusable workflow call only passes `NVIDIA_API_KEY`. Several generated scenarios declare other required secrets in the typed registry, including OpenAI-compatible, Brave, Telegram, Discord, and Slack credentials. Even if current dry-run paths do not always consume those secrets, the all-workflow contract is broader than its secret plumbing.
    • Recommendation: Either pass and declare the optional secrets required by every generated scenario in the reusable workflow contract, or restrict this workflow to scenarios whose required secrets are actually available. Add a contract test that compares generated matrix IDs and their `requiredSecrets` metadata against the secrets exposed to `.github/workflows/e2e-scenarios.yaml`.
    • Evidence: `e2e-scenarios-all.yaml` passes only `NVIDIA_API_KEY`; `baseline.ts` declares `OPENAI_COMPATIBLE_API_KEY`, `BRAVE_API_KEY`, `TELEGRAM_BOT_TOKEN`, `DISCORD_BOT_TOKEN`, and `SLACK_BOT_TOKEN` on scenarios that `buildScenarioMatrix()` now emits.
  • Runner override accepts arbitrary `runs-on:` labels without an allowlist (test/e2e-scenario/scenarios/runner-routing.ts:37): `resolveRunnerForScenario()` gives precedence to any non-empty `runnerRequirements` entry starting with `runs-on:` and returns the suffix as workflow routing metadata. The changed workflow currently treats `matrix.runner` as informational and the child workflow still ignores it for actual `runs-on`, which limits immediate impact. However, the PR introduces this field as runner-selection data and tests arbitrary `custom-self-hosted` as accepted behavior; if a later consumer wires it directly into `runs-on`, scenario metadata could route secret-bearing jobs to unexpected runner labels.
    • Recommendation: Validate `runs-on:` overrides against a narrow allowlist of approved GitHub-hosted and self-hosted labels before emitting them, or remove arbitrary overrides from the workflow-facing matrix until the trusted routing contract is finalized. Add a negative test for rejected runner labels.
    • Evidence: `runner-routing.ts` returns `explicit.slice("runs-on:".length).trim()` after only checking non-empty. `e2e-scenario-matrix.test.ts` asserts that `runs-on:custom-self-hosted` resolves successfully.
  • Matrix tests do not cover the reusable workflow contract (test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts:24): The new tests validate local matrix generation, canonical platform routing, and JSON shape, but they do not verify that the generated matrix can actually be consumed by `.github/workflows/e2e-scenarios.yaml`. That leaves the route-map and secret-contract regressions undetected by the added test suite.
    • Recommendation: Add static workflow contract tests that parse or otherwise validate the called workflow contract: every generated scenario ID must be routable by the child workflow, and every generated scenario's required secrets must be declared/passed or explicitly filtered out.
    • Evidence: The added tests assert one matrix entry per registered scenario and successful `--emit-matrix` output, but no test inspects the child workflow `ROUTES` map or reusable workflow `secrets` declaration.

🌱 Nice ideas

  • None.
Since last review details

Current findings:

  • Source-of-truth review needed: Workflow runner routing: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `e2e-scenarios-all.yaml` emits every registry ID, while `.github/workflows/e2e-scenarios.yaml` still exits on missing `ROUTES[$id]`.
  • Source-of-truth review needed: Workflow secret contract: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `e2e-scenarios-all.yaml` passes only `NVIDIA_API_KEY`, while registry scenarios require OpenAI-compatible, Brave, Telegram, Discord, and Slack secrets.
  • Generated fan-out includes scenarios the called workflow still rejects (.github/workflows/e2e-scenarios-all.yaml:80): The all-scenarios workflow now builds one matrix entry for every typed registry scenario and passes each `matrix.id` to `.github/workflows/e2e-scenarios.yaml`, but that called workflow remains the actual execution path and still resolves runners through a separate hardcoded bash `ROUTES` map. Current registry IDs such as `ubuntu-repo-cloud-openclaw-custom-policies`, `ubuntu-invalid-nvidia-key-negative`, and `ubuntu-gateway-port-conflict-negative` are emitted by the matrix but are not present in the child workflow route map, so those generated jobs will fail before execution with `No runner route for scenario`.
    • Recommendation: Make the executable routing source match the generated matrix before expanding the fan-out. Either update/remove the child workflow `ROUTES` map so it consumes the same validated typed router, or filter this all-scenarios matrix to only IDs the child workflow can currently execute. Add a static workflow contract test that verifies every generated `listScenarios()` ID is accepted by the called workflow.
    • Evidence: `run.ts` builds the matrix from `listScenarios()`, and `e2e-scenarios-all.yaml` uses `include: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}` then calls the reusable workflow with `scenarios: ${{ matrix.id }}`. `.github/workflows/e2e-scenarios.yaml` still uses a hardcoded `ROUTES` map that lacks later registry scenarios including the custom-policies and negative cases.
  • Generated fan-out does not pass all secrets required by generated scenarios (.github/workflows/e2e-scenarios-all.yaml:85): The matrix now includes every registry scenario, but the reusable workflow call only passes `NVIDIA_API_KEY`. Several generated scenarios declare other required secrets in the typed registry, including OpenAI-compatible, Brave, Telegram, Discord, and Slack credentials. Even if current dry-run paths do not always consume those secrets, the all-workflow contract is broader than its secret plumbing.
    • Recommendation: Either pass and declare the optional secrets required by every generated scenario in the reusable workflow contract, or restrict this workflow to scenarios whose required secrets are actually available. Add a contract test that compares generated matrix IDs and their `requiredSecrets` metadata against the secrets exposed to `.github/workflows/e2e-scenarios.yaml`.
    • Evidence: `e2e-scenarios-all.yaml` passes only `NVIDIA_API_KEY`; `baseline.ts` declares `OPENAI_COMPATIBLE_API_KEY`, `BRAVE_API_KEY`, `TELEGRAM_BOT_TOKEN`, `DISCORD_BOT_TOKEN`, and `SLACK_BOT_TOKEN` on scenarios that `buildScenarioMatrix()` now emits.
  • Runner override accepts arbitrary `runs-on:` labels without an allowlist (test/e2e-scenario/scenarios/runner-routing.ts:37): `resolveRunnerForScenario()` gives precedence to any non-empty `runnerRequirements` entry starting with `runs-on:` and returns the suffix as workflow routing metadata. The changed workflow currently treats `matrix.runner` as informational and the child workflow still ignores it for actual `runs-on`, which limits immediate impact. However, the PR introduces this field as runner-selection data and tests arbitrary `custom-self-hosted` as accepted behavior; if a later consumer wires it directly into `runs-on`, scenario metadata could route secret-bearing jobs to unexpected runner labels.
    • Recommendation: Validate `runs-on:` overrides against a narrow allowlist of approved GitHub-hosted and self-hosted labels before emitting them, or remove arbitrary overrides from the workflow-facing matrix until the trusted routing contract is finalized. Add a negative test for rejected runner labels.
    • Evidence: `runner-routing.ts` returns `explicit.slice("runs-on:".length).trim()` after only checking non-empty. `e2e-scenario-matrix.test.ts` asserts that `runs-on:custom-self-hosted` resolves successfully.
  • Matrix tests do not cover the reusable workflow contract (test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts:24): The new tests validate local matrix generation, canonical platform routing, and JSON shape, but they do not verify that the generated matrix can actually be consumed by `.github/workflows/e2e-scenarios.yaml`. That leaves the route-map and secret-contract regressions undetected by the added test suite.
    • Recommendation: Add static workflow contract tests that parse or otherwise validate the called workflow contract: every generated scenario ID must be routable by the child workflow, and every generated scenario's required secrets must be declared/passed or explicitly filtered out.
    • Evidence: The added tests assert one matrix entry per registered scenario and successful `--emit-matrix` output, but no test inspects the child workflow `ROUTES` map or reusable workflow `secrets` declaration.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/e2e-scenario/scenarios/run.ts (1)

104-119: ⚡ Quick win

Apply the documented id sort before matrix emission.

The doc says matrix entries are sorted by id, but the implementation currently preserves registry order. Sorting here makes output deterministically diffable as intended.

Proposed fix
 export function buildScenarioMatrix(): ScenarioMatrixEntry[] {
-  return listScenarios().map((scenario): ScenarioMatrixEntry => {
+  return [...listScenarios()]
+    .sort((a, b) => a.id.localeCompare(b.id))
+    .map((scenario): ScenarioMatrixEntry => {
     const { runner } = resolveRunnerForScenario(scenario);
     return {
       id: scenario.id,
       runner,
       label: buildLabel(scenario),
       platform: scenario.environment?.platform ?? "unknown",
       suites: scenario.suiteIds ?? [],
     };
   });
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/scenarios/run.ts` around lines 104 - 119, The
buildScenarioMatrix function currently maps listScenarios() without sorting, so
update buildScenarioMatrix to sort the scenarios by their id before mapping
(e.g., call .sort(...) on the array returned by listScenarios()) so the returned
ScenarioMatrixEntry[] is deterministically ordered by scenario.id; locate
buildScenarioMatrix and adjust the pipeline that uses listScenarios() and
resolveRunnerForScenario(scenario) to operate on a sorted array (use a string
compare/localeCompare on scenario.id).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-scenario/scenarios/runner-routing.ts`:
- Around line 39-42: The code currently accepts a `runs-on:` override that can
slice to an empty string; update the block that finds `explicit` from
`scenario.runnerRequirements` so it rejects empty labels — e.g., after computing
`const label = explicit.slice("runs-on:".length)` (or by ensuring `req.length >
"runs-on:".length` when matching) validate `label !== ""` and if empty throw a
clear Error like "Invalid runs-on: override: empty runner label" (or return a
failing result) instead of returning `{ runner: "", ... }`; refer to the
`explicit` variable and the `runnerRequirements` lookup when making this change.

---

Nitpick comments:
In `@test/e2e-scenario/scenarios/run.ts`:
- Around line 104-119: The buildScenarioMatrix function currently maps
listScenarios() without sorting, so update buildScenarioMatrix to sort the
scenarios by their id before mapping (e.g., call .sort(...) on the array
returned by listScenarios()) so the returned ScenarioMatrixEntry[] is
deterministically ordered by scenario.id; locate buildScenarioMatrix and adjust
the pipeline that uses listScenarios() and resolveRunnerForScenario(scenario) to
operate on a sorted array (use a string compare/localeCompare on scenario.id).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 48621b06-d779-4cd5-a9ce-513bb8c854ab

📥 Commits

Reviewing files that changed from the base of the PR and between b14fd76 and b573c81.

📒 Files selected for processing (4)
  • .github/workflows/e2e-scenarios-all.yaml
  • test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
  • test/e2e-scenario/scenarios/run.ts
  • test/e2e-scenario/scenarios/runner-routing.ts

Comment thread test/e2e-scenario/scenarios/runner-routing.ts
@cv cv merged commit 8f8f8ff into main May 29, 2026
21 checks passed
@cv cv deleted the feat/dynamic-e2e-scenario-matrix branch May 29, 2026 23:42
@wscurran wscurran added area: e2e End-to-end tests, nightly failures, or validation infrastructure feature PR adds or expands user-visible functionality and removed enhancement: testing labels Jun 3, 2026
jyaunches added a commit that referenced this pull request Jun 8, 2026
Resolutions:
- run.ts: kept HEAD's removal of --dry-run/--validate-only modes; took
  main's richer --emit-matrix payload (runner/label/platform/suites)
  consumed by the dynamic matrix workflow (#4359).
- 00-slack-provider-state.sh: kept HEAD's removal of the openclaw
  dry-run skip block; carried main's 'agent' local var.
- docs/README.md: kept HEAD (live-only runner, deprecated bash
  entrypoints) and folded in main's 'Migration status is tracked
  outside the repository' line so e2e-migration-inventory-lock stays
  green.
- e2e-coverage-report.test.ts, e2e-scenario-resolver.test.ts,
  runtime/resolver/coverage.ts: confirmed deletes (intentional in
  9da75ac, 'retire legacy expected-states.yaml + runtime/resolver').
- e2e-expected-state.test.ts: applied the trim that 9da75ac's
  commit message described but did not include — drop the YAML-mirror
  parity tests that depended on the now-deleted expected-states.yaml,
  replace with id-coverage assertions against listExpectedStates().
- e2e-scenarios.yaml: retitled the summary step from 'typed dry-run
  summary' to 'typed scenario run' so the workflow text matches the
  PR's live-only stance.
jyaunches added a commit that referenced this pull request Jun 8, 2026
…only, and the bash runner (#4380)

## Why

Removing `--dry-run` and plan-mode execution from the E2E runner: they
let a coding agent ship work behind false-green test runs. The TS runner
now has **one execution mode: live.** No flag, env var, helper, or
branch in any production path bypasses real assertion execution.

Two seams needed closing:

1. `phase.ts:executeStep` had no `child_process.spawn` — every real
shell/probe step fell through to a hardcoded `failed: unsupported live
step`, and four placeholder refs (`fake-pass`, `fake-retry-once-pass`,
`fake-always-transient`, `phase-1-skeleton`) were the only paths that
reported `passed`.
2. ~30 shell scripts under `validation_suites/`,
`onboarding_assertions/`, and `nemoclaw_scenarios/` began with `if
e2e_env_is_dry_run; then exit 0; fi`, so when the workflow passed
`--dry-run` the scripts exited 0 before running.

## What changed

### Orchestrator (TypeScript)
- `phase.ts:executeStep` now spawns shell steps via
`child_process.spawn`, with **detached process groups** so timeouts kill
`bash + sleep` cleanly via negative-pid signal. Probe steps return
`skipped: "probe not registered"` until the registry lands. Pending
steps return `skipped: "pending: <ref>"`. Unknown kinds throw. Real
evidence captured to `.e2e/logs/<step-id>.log`. Step-level
`reliability.timeoutSeconds` and `retry.{attempts,on}` enforced here,
not in clients.
- `run.ts`: `--dry-run`, `--validate-only` deleted. Default invocation
is live execution. `--list` and `--plan-only` (local debug) survive
read-only. **`--emit-matrix` added** for the dynamic-matrix workflow
(#4359), with the richer payload (`runner`, `label`, `platform`,
`suites`) merged from main.
- `types.ts`: `RunContext.dryRun` deleted.

### Workflow
- `e2e-scenarios.yaml`: the resolve-runner `--plan-only` warmup, and
both `--dry-run` invocations (Linux + WSL), are gone. Workflows execute
live. Summary step retitled from "typed dry-run summary" to "typed
scenario run".
- `tools/e2e-scenarios/workflow-boundary.mts` validator now **rejects**
`--dry-run`, `--plan-only`, `--validate-only` in the workflow.

### Bash entrypoints (PR collapses what was originally going to be PR 1
+ PR 2)
- `runtime/run-scenario.sh`: 483 lines of duplicated
install/onboard/gateway-check/suite-execution → **5-line fail-fast
stub** pointing at `run.ts`.
- `runtime/run-suites.sh`: same treatment.
`PhaseOrchestrator.runShellStep` walks typed `assertionGroups` directly;
nothing in TS calls a YAML-walking bash runner.

### Shell scripts (the leaves stay, the dry-run skip blocks die)
- `validation_suites/**`, `onboarding_assertions/**`,
`nemoclaw_scenarios/**`: every `if e2e_env_is_dry_run; then ... exit 0;
fi` and every `[[ "${E2E_DRY_RUN:-0}" == "1" ]]` short-circuit removed.
The real assertion logic that was hiding underneath now runs
unconditionally.
- `runtime/lib/env.sh`: `e2e_env_is_dry_run` helper deleted.
- `inference_routing.sh`: dead `_e2e_inference_plan` helper deleted.

### Legacy YAML resolver retired (rolled in from the planned follow-up
PR)
- Deleted `nemoclaw_scenarios/expected-states.yaml` and the entire
`runtime/resolver/` tree (`load.ts`, `plan.ts`, `schema.ts`,
`coverage.ts`, `index.ts`, `validator.ts`, `expected-failure.ts`,
`js-yaml.d.ts`) plus `runtime/coverage-report.sh`.
- Six framework tests that exercised only the resolver path were
dropped: `e2e-scenario-schema`, `e2e-scenario-resolver`,
`e2e-coverage-report`, `e2e-scenario-additional-families`,
`e2e-metadata-final-hygiene`, `e2e-expected-failure` (typed equivalent
now lives in `e2e-negative-matcher.test.ts`).
- `e2e-expected-state.test.ts`: YAML-mirror parity tests dropped;
replaced with id-coverage assertions against `listExpectedStates()`. The
typed registry in `scenarios/expected-states.ts` is the single source of
truth.

### Tests
- **DELETED** (validated dead code paths): `e2e-suite-runner.test.ts`,
`e2e-scenario-first-migration.test.ts`,
`e2e-expected-state-validator.test.ts`.
- **REWRITTEN** `e2e-phase-orchestrators.test.ts`: now exercises real
shell spawning via temp scripts (pass / fail-with-stderr-tail / timeout
/ retry-on-classified-transient / missing-ref), real probe skipping with
visible reason, and real pending skipping. Replaces the prior
placeholder refs with assertions that observe actual subprocess
behavior.
- **TRIMMED** `e2e-lib-helpers.test.ts`,
`e2e-scenario-additional-families.test.ts`,
`e2e-scenario-resolver.test.ts`, `e2e-context-helper.test.ts`:
dry-run-mode / `run-scenario.sh`-spawning tests deleted; tests of real
bash semantics survive.

### Docs
- `test/e2e-scenario/docs/README.md`: one runner, one mode (live), no
dry-run, no validate-only. Migration tracking section reconciled with
main: "Migration status is tracked outside the repository in GitHub
issues and pull requests" (parent issue #3588).

### Merge with main
- Resolved conflicts in `run.ts`, `00-slack-provider-state.sh`,
`docs/README.md`, three modify/delete pairs under `framework-tests/` and
`runtime/resolver/`, and `e2e-scenarios.yaml`.
- Noted: a small E2E `--validate-only`/`--dry-run` plumbing fix that was
layered onto the live runner during merge — `~/.local/bin` is now
prepended to child PATH at the framework boundary
(`redaction.ts:buildChildEnv`) so post-install nemoclaw shims resolve on
runners (notably self-hosted GPU and WSL) where it isn't on the default
PATH; and snapshot lifecycle assertions in
`validation_suites/lib/sandbox_lifecycle.sh` now use the canonical
`nemoclaw <sandbox> snapshot <subcommand>` argv shape, matching
`test-snapshot-commands.sh`.

## Verification

```text
$ npx tsx test/e2e-scenario/scenarios/run.ts --list
hybrid scenario registry
- brev-launchable-cloud-openclaw: ...
- ubuntu-repo-cloud-openclaw: ...
(22 scenarios listed)

$ npx tsx test/e2e-scenario/scenarios/run.ts --emit-matrix
[{"id":"brev-launchable-cloud-openclaw","runner":"...","label":"...","platform":"...","suites":[...]}, ...]

$ E2E_CONTEXT_DIR=/tmp/x npx tsx test/e2e-scenario/scenarios/run.ts \
    --scenarios ubuntu-repo-cloud-openclaw
... (compiled plan output) ...
Phase results:
  environment: skipped (skipped=1)
  onboarding: failed (passed=1 failed=1)
  runtime: failed (failed=34 skipped=5)

$ cat /tmp/x/.e2e/logs/runtime.smoke.cli-available.log
smoke:cli-available
e2e context: missing required key(s): E2E_SCENARIO

$ vitest run test/e2e-scenario/
Test Files  30 passed (30)
     Tests  366 passed (366)

$ npm run typecheck:cli
(clean)
```

**Audited absent in production paths:** `--dry-run`, `dryRun`,
`E2E_DRY_RUN`, `e2e_env_is_dry_run`, `fake-pass`,
`fake-retry-once-pass`, `fake-always-transient`, `phase-1-skeleton`,
`unsupported-live-step`, `--validate-only`, `RunContext.dryRun`.

## Spec gates addressed

- **Phase 6** — orchestrators execute live shell/probe assertion steps.
- **Phase 7** — single TS runtime entrypoint; bash runners deprecated;
workflows use the typed runner with no `--dry-run`.
- **Workflow side of Phase 9** — `--dry-run`, `--validate-only`, and
`suite_filter` gone from active paths.

## Known follow-ups (not blocking this PR)

- Two leftover `if [[ -n "${E2E_DRY_RUN:-}" ]]` short-circuits remain
inside the hermes branch of
`validation_suites/messaging/slack/00-slack-provider-state.sh`. They are
dead code now that nothing in production sets `E2E_DRY_RUN`, but should
be removed for consistency with the stated audit.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
* E2E now runs live scenarios by default; many legacy plan-only/dry-run
tests removed or refocused. Phase execution uses real shell steps with
timeouts, retries, richer evidence and failure reporting.

* **New Features**
* Added probes and lifecycle utilities for
CLI/gateway/sandbox/negative/health checks. Typed runner can emit an
execution matrix and is the canonical executor. Secret-redaction and
child-env minimization added for safer spawn logging.

* **Chores**
* Docs updated for live-only workflow; legacy bash entrypoints
deprecated; CI adds a new scenario and tighter artifact uploads.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: e2e End-to-end tests, nightly failures, or validation infrastructure feature PR adds or expands user-visible functionality v0.0.56 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants