feat(e2e): generate scenario fan-out matrix from typed registry#4359
Conversation
Replace the hand-wired job list in e2e-scenarios-all.yaml with a
generated matrix sourced from the existing typed scenario registry.
Adding a scenario in test/e2e-scenario/scenarios/scenarios/baseline.ts
now automatically produces a tile in the fan-out workflow on the next
run, with no workflow edits required. This closes the drift gap that
left ~14 of 22 registered scenarios undispatched.
Changes:
- Add --emit-matrix flag to test/e2e-scenario/scenarios/run.ts that
walks listScenarios() and prints a single-line JSON array of GHA
matrix include entries: { id, runner, label, platform, suites }.
- Extract runner-routing.ts that derives the runs-on label from
ScenarioEnvironment.platform, with a runs-on:<label> override path
via runnerRequirements. Throws on unknown platforms so missing
mappings fail loudly during matrix generation.
- Refactor .github/workflows/e2e-scenarios-all.yaml to a two-job
shape: generate-matrix emits the JSON, run-scenario fans out via
strategy.matrix.include and calls the existing e2e-scenarios.yaml
reusable workflow. Tile names come from the typed label so the
sidebar reflects what is actually being tested.
- Guard the run.ts CLI bootstrap so importing buildScenarioMatrix
from tests does not trigger the side-effecting main() path.
- Add e2e-scenario-matrix.test.ts covering matrix shape, registry
parity, runner overrides, unknown-platform errors, and CLI output
contract.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThis PR replaces static GitHub Actions job enumeration with a runtime-generated matrix by adding typed runner resolution, a ChangesDynamic Scenario Matrix Generation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
E2E Advisor RecommendationRequired E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: Dispatch required scenario E2E:
Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 1 needs attention, 5 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/e2e-scenario/scenarios/run.ts (1)
104-119: ⚡ Quick winApply the documented id sort before matrix emission.
The doc says matrix entries are sorted by id, but the implementation currently preserves registry order. Sorting here makes output deterministically diffable as intended.
Proposed fix
export function buildScenarioMatrix(): ScenarioMatrixEntry[] { - return listScenarios().map((scenario): ScenarioMatrixEntry => { + return [...listScenarios()] + .sort((a, b) => a.id.localeCompare(b.id)) + .map((scenario): ScenarioMatrixEntry => { const { runner } = resolveRunnerForScenario(scenario); return { id: scenario.id, runner, label: buildLabel(scenario), platform: scenario.environment?.platform ?? "unknown", suites: scenario.suiteIds ?? [], }; }); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e-scenario/scenarios/run.ts` around lines 104 - 119, The buildScenarioMatrix function currently maps listScenarios() without sorting, so update buildScenarioMatrix to sort the scenarios by their id before mapping (e.g., call .sort(...) on the array returned by listScenarios()) so the returned ScenarioMatrixEntry[] is deterministically ordered by scenario.id; locate buildScenarioMatrix and adjust the pipeline that uses listScenarios() and resolveRunnerForScenario(scenario) to operate on a sorted array (use a string compare/localeCompare on scenario.id).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e-scenario/scenarios/runner-routing.ts`:
- Around line 39-42: The code currently accepts a `runs-on:` override that can
slice to an empty string; update the block that finds `explicit` from
`scenario.runnerRequirements` so it rejects empty labels — e.g., after computing
`const label = explicit.slice("runs-on:".length)` (or by ensuring `req.length >
"runs-on:".length` when matching) validate `label !== ""` and if empty throw a
clear Error like "Invalid runs-on: override: empty runner label" (or return a
failing result) instead of returning `{ runner: "", ... }`; refer to the
`explicit` variable and the `runnerRequirements` lookup when making this change.
---
Nitpick comments:
In `@test/e2e-scenario/scenarios/run.ts`:
- Around line 104-119: The buildScenarioMatrix function currently maps
listScenarios() without sorting, so update buildScenarioMatrix to sort the
scenarios by their id before mapping (e.g., call .sort(...) on the array
returned by listScenarios()) so the returned ScenarioMatrixEntry[] is
deterministically ordered by scenario.id; locate buildScenarioMatrix and adjust
the pipeline that uses listScenarios() and resolveRunnerForScenario(scenario) to
operate on a sorted array (use a string compare/localeCompare on scenario.id).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 48621b06-d779-4cd5-a9ce-513bb8c854ab
📒 Files selected for processing (4)
.github/workflows/e2e-scenarios-all.yamltest/e2e-scenario/framework-tests/e2e-scenario-matrix.test.tstest/e2e-scenario/scenarios/run.tstest/e2e-scenario/scenarios/runner-routing.ts
Resolutions: - run.ts: kept HEAD's removal of --dry-run/--validate-only modes; took main's richer --emit-matrix payload (runner/label/platform/suites) consumed by the dynamic matrix workflow (#4359). - 00-slack-provider-state.sh: kept HEAD's removal of the openclaw dry-run skip block; carried main's 'agent' local var. - docs/README.md: kept HEAD (live-only runner, deprecated bash entrypoints) and folded in main's 'Migration status is tracked outside the repository' line so e2e-migration-inventory-lock stays green. - e2e-coverage-report.test.ts, e2e-scenario-resolver.test.ts, runtime/resolver/coverage.ts: confirmed deletes (intentional in 9da75ac, 'retire legacy expected-states.yaml + runtime/resolver'). - e2e-expected-state.test.ts: applied the trim that 9da75ac's commit message described but did not include — drop the YAML-mirror parity tests that depended on the now-deleted expected-states.yaml, replace with id-coverage assertions against listExpectedStates(). - e2e-scenarios.yaml: retitled the summary step from 'typed dry-run summary' to 'typed scenario run' so the workflow text matches the PR's live-only stance.
…only, and the bash runner (#4380) ## Why Removing `--dry-run` and plan-mode execution from the E2E runner: they let a coding agent ship work behind false-green test runs. The TS runner now has **one execution mode: live.** No flag, env var, helper, or branch in any production path bypasses real assertion execution. Two seams needed closing: 1. `phase.ts:executeStep` had no `child_process.spawn` — every real shell/probe step fell through to a hardcoded `failed: unsupported live step`, and four placeholder refs (`fake-pass`, `fake-retry-once-pass`, `fake-always-transient`, `phase-1-skeleton`) were the only paths that reported `passed`. 2. ~30 shell scripts under `validation_suites/`, `onboarding_assertions/`, and `nemoclaw_scenarios/` began with `if e2e_env_is_dry_run; then exit 0; fi`, so when the workflow passed `--dry-run` the scripts exited 0 before running. ## What changed ### Orchestrator (TypeScript) - `phase.ts:executeStep` now spawns shell steps via `child_process.spawn`, with **detached process groups** so timeouts kill `bash + sleep` cleanly via negative-pid signal. Probe steps return `skipped: "probe not registered"` until the registry lands. Pending steps return `skipped: "pending: <ref>"`. Unknown kinds throw. Real evidence captured to `.e2e/logs/<step-id>.log`. Step-level `reliability.timeoutSeconds` and `retry.{attempts,on}` enforced here, not in clients. - `run.ts`: `--dry-run`, `--validate-only` deleted. Default invocation is live execution. `--list` and `--plan-only` (local debug) survive read-only. **`--emit-matrix` added** for the dynamic-matrix workflow (#4359), with the richer payload (`runner`, `label`, `platform`, `suites`) merged from main. - `types.ts`: `RunContext.dryRun` deleted. ### Workflow - `e2e-scenarios.yaml`: the resolve-runner `--plan-only` warmup, and both `--dry-run` invocations (Linux + WSL), are gone. Workflows execute live. Summary step retitled from "typed dry-run summary" to "typed scenario run". - `tools/e2e-scenarios/workflow-boundary.mts` validator now **rejects** `--dry-run`, `--plan-only`, `--validate-only` in the workflow. ### Bash entrypoints (PR collapses what was originally going to be PR 1 + PR 2) - `runtime/run-scenario.sh`: 483 lines of duplicated install/onboard/gateway-check/suite-execution → **5-line fail-fast stub** pointing at `run.ts`. - `runtime/run-suites.sh`: same treatment. `PhaseOrchestrator.runShellStep` walks typed `assertionGroups` directly; nothing in TS calls a YAML-walking bash runner. ### Shell scripts (the leaves stay, the dry-run skip blocks die) - `validation_suites/**`, `onboarding_assertions/**`, `nemoclaw_scenarios/**`: every `if e2e_env_is_dry_run; then ... exit 0; fi` and every `[[ "${E2E_DRY_RUN:-0}" == "1" ]]` short-circuit removed. The real assertion logic that was hiding underneath now runs unconditionally. - `runtime/lib/env.sh`: `e2e_env_is_dry_run` helper deleted. - `inference_routing.sh`: dead `_e2e_inference_plan` helper deleted. ### Legacy YAML resolver retired (rolled in from the planned follow-up PR) - Deleted `nemoclaw_scenarios/expected-states.yaml` and the entire `runtime/resolver/` tree (`load.ts`, `plan.ts`, `schema.ts`, `coverage.ts`, `index.ts`, `validator.ts`, `expected-failure.ts`, `js-yaml.d.ts`) plus `runtime/coverage-report.sh`. - Six framework tests that exercised only the resolver path were dropped: `e2e-scenario-schema`, `e2e-scenario-resolver`, `e2e-coverage-report`, `e2e-scenario-additional-families`, `e2e-metadata-final-hygiene`, `e2e-expected-failure` (typed equivalent now lives in `e2e-negative-matcher.test.ts`). - `e2e-expected-state.test.ts`: YAML-mirror parity tests dropped; replaced with id-coverage assertions against `listExpectedStates()`. The typed registry in `scenarios/expected-states.ts` is the single source of truth. ### Tests - **DELETED** (validated dead code paths): `e2e-suite-runner.test.ts`, `e2e-scenario-first-migration.test.ts`, `e2e-expected-state-validator.test.ts`. - **REWRITTEN** `e2e-phase-orchestrators.test.ts`: now exercises real shell spawning via temp scripts (pass / fail-with-stderr-tail / timeout / retry-on-classified-transient / missing-ref), real probe skipping with visible reason, and real pending skipping. Replaces the prior placeholder refs with assertions that observe actual subprocess behavior. - **TRIMMED** `e2e-lib-helpers.test.ts`, `e2e-scenario-additional-families.test.ts`, `e2e-scenario-resolver.test.ts`, `e2e-context-helper.test.ts`: dry-run-mode / `run-scenario.sh`-spawning tests deleted; tests of real bash semantics survive. ### Docs - `test/e2e-scenario/docs/README.md`: one runner, one mode (live), no dry-run, no validate-only. Migration tracking section reconciled with main: "Migration status is tracked outside the repository in GitHub issues and pull requests" (parent issue #3588). ### Merge with main - Resolved conflicts in `run.ts`, `00-slack-provider-state.sh`, `docs/README.md`, three modify/delete pairs under `framework-tests/` and `runtime/resolver/`, and `e2e-scenarios.yaml`. - Noted: a small E2E `--validate-only`/`--dry-run` plumbing fix that was layered onto the live runner during merge — `~/.local/bin` is now prepended to child PATH at the framework boundary (`redaction.ts:buildChildEnv`) so post-install nemoclaw shims resolve on runners (notably self-hosted GPU and WSL) where it isn't on the default PATH; and snapshot lifecycle assertions in `validation_suites/lib/sandbox_lifecycle.sh` now use the canonical `nemoclaw <sandbox> snapshot <subcommand>` argv shape, matching `test-snapshot-commands.sh`. ## Verification ```text $ npx tsx test/e2e-scenario/scenarios/run.ts --list hybrid scenario registry - brev-launchable-cloud-openclaw: ... - ubuntu-repo-cloud-openclaw: ... (22 scenarios listed) $ npx tsx test/e2e-scenario/scenarios/run.ts --emit-matrix [{"id":"brev-launchable-cloud-openclaw","runner":"...","label":"...","platform":"...","suites":[...]}, ...] $ E2E_CONTEXT_DIR=/tmp/x npx tsx test/e2e-scenario/scenarios/run.ts \ --scenarios ubuntu-repo-cloud-openclaw ... (compiled plan output) ... Phase results: environment: skipped (skipped=1) onboarding: failed (passed=1 failed=1) runtime: failed (failed=34 skipped=5) $ cat /tmp/x/.e2e/logs/runtime.smoke.cli-available.log smoke:cli-available e2e context: missing required key(s): E2E_SCENARIO $ vitest run test/e2e-scenario/ Test Files 30 passed (30) Tests 366 passed (366) $ npm run typecheck:cli (clean) ``` **Audited absent in production paths:** `--dry-run`, `dryRun`, `E2E_DRY_RUN`, `e2e_env_is_dry_run`, `fake-pass`, `fake-retry-once-pass`, `fake-always-transient`, `phase-1-skeleton`, `unsupported-live-step`, `--validate-only`, `RunContext.dryRun`. ## Spec gates addressed - **Phase 6** — orchestrators execute live shell/probe assertion steps. - **Phase 7** — single TS runtime entrypoint; bash runners deprecated; workflows use the typed runner with no `--dry-run`. - **Workflow side of Phase 9** — `--dry-run`, `--validate-only`, and `suite_filter` gone from active paths. ## Known follow-ups (not blocking this PR) - Two leftover `if [[ -n "${E2E_DRY_RUN:-}" ]]` short-circuits remain inside the hermes branch of `validation_suites/messaging/slack/00-slack-provider-state.sh`. They are dead code now that nothing in production sets `E2E_DRY_RUN`, but should be removed for consistency with the stated audit. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * E2E now runs live scenarios by default; many legacy plan-only/dry-run tests removed or refocused. Phase execution uses real shell steps with timeouts, retries, richer evidence and failure reporting. * **New Features** * Added probes and lifecycle utilities for CLI/gateway/sandbox/negative/health checks. Typed runner can emit an execution matrix and is the canonical executor. Secret-redaction and child-env minimization added for safer spawn logging. * **Chores** * Docs updated for live-only workflow; legacy bash entrypoints deprecated; CI adds a new scenario and tighter artifact uploads. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
Why
The
e2e-scenarios-all.yamlfan-out has 8 hand-wired jobs, but the typed scenario registry inbaseline.tsalready defines 22 scenarios. The other 14 are defined-but-not-dispatched, and every new scenario today requires a YAML edit that's easy to forget. The hand-wired job names also no longer reflect what each scenario actually tests (manifest, suites, expectedState).This PR makes the fan-out generated from the existing typed registry so the workflow can never drift from
baseline.tsagain.What changed
--emit-matrixflag onrun.tsWalks
listScenarios()(the existing registry — no new registry created) and prints a single-line JSON array of GHA matrixincludeentries:[ { "id": "ubuntu-repo-cloud-openclaw", "runner": "ubuntu-latest", "label": "ubuntu-local · ubuntu-repo-cloud-openclaw · smoke+inference+credentials", "platform": "ubuntu-local", "suites": ["smoke", "inference", "credentials"] }, ... ]runner-routing.tshelperDerives the
runs-onlabel fromScenarioEnvironment.platform, with aruns-on:<label>override path viarunnerRequirements. Throws on unknown platforms so missing mappings fail loudly during matrix generation rather than silently falling back toubuntu-latest(which used to mask routing bugs in the legacy bashROUTESmap).e2e-scenarios-all.yamlrefactorTwo-job shape:
generate-matrix— checks out, installs deps, runs--emit-matrix, and writes the JSON to$GITHUB_OUTPUT. Also renders a Markdown table of all matrix entries in the step summary so you can see the full plan at a glance.run-scenario— usesstrategy.matrix.include: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}and calls the existinge2e-scenarios.yamlreusable workflow per scenario. Tile names use${{ matrix.label }}so the sidebar reflects what's actually being tested.Adding a scenario from now on
canonicalScenarioInputsinbaseline.ts.What didn't change
e2e-scenarios.yaml(the single-scenario reusable workflow) is untouched. Itsresolve-runnerjob continues to be the authoritative runner-selection path during execution; therunnerfield in the matrix is informational and used for the sidebar label.ROUTESmap ine2e-scenarios.yamlis left in place for now to keep this PR focused. It becomes effectively dead code once this lands and can be removed in a follow-up.test/e2e-scenario/scenarios/registry.ts.Verification
Unit tests in
e2e-scenario-matrix.test.tscover:runs-on:<label>override$GITHUB_OUTPUTLocal run:
Manual smoke:
Test depth
Unit tests are sufficient — this is a build/CI plumbing change, not a behavioral change to the test runner or scenarios themselves. The actual scenario execution path (
e2e-scenarios.yaml) is unchanged. After merge, the next manualworkflow_dispatchofE2E / Scenario Runner / Allwill validate the matrix end-to-end.Follow-ups (separate PRs)
ROUTESbash map ine2e-scenarios.yamland have itsresolve-runnerjob consumerunner-routing.tstoo.e2e-scenarios-all.yamlby platform once we cross ~80 scenarios (currently at 22, plenty of headroom against the 256-job GHA ceiling).Summary by CodeRabbit
Tests
Chores