ci(e2e): fan out Vitest scenarios from registry by cv · Pull Request #5006 · NVIDIA/NemoClaw

cv · 2026-06-09T01:03:54Z

Summary

Changes the Vitest E2E workflow from one manual filter job into a generated typed-scenario matrix. The default matrix runs fixture-supported live scenarios, while explicit scenario selections still produce jobs that can surface structured skip reasons.

Related Issue

Refs #4941
Refs #4990

Changes

Adds --emit-live-matrix and buildLiveScenarioMatrix(...) to the typed scenario runner CLI.
Extends matrix entries with install/runtime/onboarding, expected-state, required-secret, and support metadata.
Updates .github/workflows/e2e-vitest-scenarios.yaml with a generate-matrix job and per-scenario live-scenarios matrix fan-out.
Updates workflow boundary tests to lock pinned actions, matrix-scoped artifacts, NEMOCLAW_CLI_BIN, and safe input handling.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

Tests
- Enhanced e2e scenario testing infrastructure with improved matrix generation and validation for live test scenarios.
- Added comprehensive test coverage for scenario registry and workflow configuration validation.
Chores
- Updated CI/CD workflow to support flexible multi-scenario e2e testing execution with per-scenario artifact management.

coderabbitai · 2026-06-09T01:04:01Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 02b454a0-d7f7-4721-8780-928dabe9ad3a

📥 Commits

Reviewing files that changed from the base of the PR and between a4ab229 and 7844cc3.

📒 Files selected for processing (1)

.github/workflows/e2e-vitest-scenarios.yaml

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/e2e-vitest-scenarios.yaml

📝 Walkthrough

Walkthrough

This PR establishes a matrix-driven infrastructure for live E2E Vitest scenario execution. A new runtime-support module evaluates scenario eligibility. The workflow transitions from single-job test_filter dispatch to a generate-matrix job that produces per-scenario matrix entries, and a live-scenarios matrix job that runs each scenario in isolation with its own test filtering and artifact scoping.

Changes

Live E2E Scenario Matrix Execution

Layer / File(s)	Summary
Live scenario support detection `test/e2e-scenario/scenarios/runtime-support.ts`	New `LiveScenarioSupport` interface and functions to evaluate whether scenarios can run: `liveScenarioTestName()` returns the scenario ID as the canonical test name, and `liveScenarioSupport()` validates environment/metadata against supported platform/install/runtime/onboarding sets.
Skip/name contract enforcement `test/e2e-scenario/framework-tests/e2e-live-skip-name-contract.test.ts`	New tests verify that all scenarios register under test names matching `scenario.id`, that these names match the workflow filter regex `^${scenario.id}$`, and that canonical supported/unsupported scenarios report correct eligibility status.
Live matrix generation CLI `test/e2e-scenario/scenarios/run.ts`	Extended CLI argument parsing and exports: new `LiveScenarioMatrixEntry` interface adds live-runtime metadata, `--emit-live-matrix` flag, `buildLiveScenarioMatrix()` to produce matrix entries (filtered by support or explicit selection), and `emitLiveMatrix()` helper to output JSON and early-return before plan/execute.
Live matrix generation testing `test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts`	New `runEmitLiveMatrix()` spawn helper and test cases validate `buildLiveScenarioMatrix()` produces fixture-supported scenarios by default, handles explicit unsupported selections with correct flags, and `--emit-live-matrix` output is single-line JSON reflecting both defaults and explicit selections.
GitHub Actions workflow matrix infrastructure `.github/workflows/e2e-vitest-scenarios.yaml`	Workflow refactored from `test_filter` dispatch to matrix-driven execution: new `generate-matrix` job produces per-scenario matrix via `--emit-live-matrix`, `live-scenarios` job depends on generate-matrix and runs per-scenario with `matrix.runner`/`matrix.id`, per-scenario env vars (`SCENARIO_ID`, `SCENARIO_LABEL`, `E2E_ARTIFACT_DIR`), Vitest filtering via `-t "^${SCENARIO_ID}$"`, and per-scenario artifact naming/paths under `e2e-artifacts/vitest/${{ matrix.id }}/`.
Workflow boundary validation `tools/e2e-scenarios/workflow-boundary.mts`, `test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts`	Validator updated to require `workflow_dispatch` input `scenarios` (reject legacy `test_filter`), validate `generate-matrix` job structure and `--emit-live-matrix` invocation, enforce `live-scenarios` matrix wiring from `generate-matrix` outputs, validate `SCENARIO_ID`/`SCENARIO_LABEL` env propagation and exact-regex filtering, and update artifact upload naming/scoping. Test expectations expanded to match new validation rules.
Live scenario registry execution `test/e2e-scenario/live/registry-scenarios.test.ts`	New test file iterates over all registry scenarios, conditionally skips unsupported ones (with module-load warning when `SCENARIO_ID` matches unsupported), and for supported scenarios validates required secrets, asserts CLI build presence, verifies required metadata, writes scenario/result artifacts, and performs state validation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

NVIDIA/NemoClaw#4990: PR directly implements matrix-driven per-scenario Vitest execution with live scenario support detection and workflow matrix infrastructure as proposed in this issue.

Possibly related PRs

NVIDIA/NemoClaw#5005: Both PRs introduce and leverage the same live scenario registry/test machinery (runtime-support.ts, registry-scenarios.test.ts), with the main PR extending it via workflow-driven matrix execution.
NVIDIA/NemoClaw#4968: Both PRs modify .github/workflows/e2e-vitest-scenarios.yaml and its boundary validation, transitioning from test_filter to scenario-matrix driven (scenarios, generate-matrix, matrix.id) execution.

Suggested labels

area: e2e, platform: ubuntu, v0.0.62

Suggested reviewers

jyaunches

Poem

🐰 A matrix of tests, each scenario bright,
Per-filter precision, each one running right,
The runtime detects which can wire and run,
GitHub's workflows orchestrate—tests by the ton!
From filter to matrix, the leap has been made,
Vitest scenarios bloom in the e2e arcade.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'ci(e2e): fan out Vitest scenarios from registry' accurately describes the primary change: refactoring the E2E workflow to generate and fan out scenarios from a registry matrix instead of using manual filtering.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/e2e-4990-stack-05-vitest-scenario-matrix

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-09T01:06:32Z

E2E Advisor Recommendation

Required E2E: e2e-vitest-scenarios:default-supported-live-matrix
Optional E2E: e2e-vitest-scenarios:unsupported-scenario-skip, e2e-scenarios:ubuntu-repo-cloud-openclaw

Dispatch hint: scenarios=

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

e2e-vitest-scenarios:default-supported-live-matrix (medium; live Ubuntu install/onboarding/state-validation path and requires NVIDIA_API_KEY): Required because the PR changes the live E2E workflow itself, the matrix generator it invokes, the exact SCENARIO_ID Vitest filter, artifact paths, and CLI/env wiring. Running the workflow with the default empty scenarios input validates the new supported-scenario matrix and exercises the current supported live OpenClaw scenario end to end.

Optional E2E

e2e-vitest-scenarios:unsupported-scenario-skip (low-to-medium; builds CLI and starts the live project but should skip before live onboarding): Useful confidence for the new explicit unsupported-scenario behavior: --emit-live-matrix should keep ubuntu-repo-cloud-hermes in the matrix, the workflow exact-name filter should match its skipped Vitest test, and logs should include the structured not-wired reason instead of failing with zero matched tests.
e2e-scenarios:ubuntu-repo-cloud-openclaw (high; live typed scenario runner with install/onboarding/inference/credentials suites): Optional because test/e2e-scenario/scenarios/run.ts and registry.ts are also used by the typed scenario runner workflow. A single canonical scenario would confirm that the new safe-id validation did not break the established typed scenario execution path, but product runtime code is not changed and framework tests cover most of this contract.

New E2E recommendations

live Vitest workflow skip behavior (medium): The repository has framework tests for the unsupported-scenario matrix and naming contract, but no dedicated automated workflow-dispatch smoke that proves GitHub Actions matrix expansion plus Vitest skipped-test behavior succeeds for an explicitly selected unsupported scenario.
- Suggested test: Add a lightweight workflow-level E2E/contract check or scheduled smoke dispatch of .github/workflows/e2e-vitest-scenarios.yaml with scenarios=ubuntu-repo-cloud-hermes, asserting the job exits successfully with a structured not-wired skip reason and no artifact path leakage.

Dispatch hint

Workflow: .github/workflows/e2e-vitest-scenarios.yaml
jobs input: scenarios=

github-actions · 2026-06-09T01:06:32Z

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

e2e-scenarios-all: Changes touch shared typed scenario registry/execution support used to build and run the scenario matrix (test/e2e-scenario/scenarios/registry.ts, run.ts, and runtime-support.ts) plus scenario workflow-boundary validation. Per policy, shared scenario runner/registry behavior changes require the full scenario fan-out to verify every dispatchable scenario route.
- Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

None.

Relevant changed files

.github/workflows/e2e-vitest-scenarios.yaml
test/e2e-scenario/docs/README.md
test/e2e-scenario/framework-tests/e2e-live-skip-name-contract.test.ts
test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
test/e2e-scenario/framework-tests/e2e-scenario-registry.test.ts
test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts
test/e2e-scenario/live/registry-scenarios.test.ts
test/e2e-scenario/scenarios/registry.ts
test/e2e-scenario/scenarios/run.ts
test/e2e-scenario/scenarios/runtime-support.ts
tools/e2e-scenarios/workflow-boundary.mts

github-actions · 2026-06-09T01:08:22Z

PR Review Advisor

Findings: 0 needs attention, 5 worth checking, 0 nice ideas
Since last review: 2 prior items resolved, 4 still apply, 0 new items found

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Source-of-truth review needed: Explicit unsupported live scenario selections: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: runtime-support.ts liveScenarioTestName() returns scenario.id for supported and unsupported scenarios; registry-scenarios.test.ts logs [not wired] when SCENARIO_ID matches; run.ts buildLiveScenarioMatrix(ids) keeps unsupported explicit selections.
Unsupported live selections still run as secret-bearing matrix jobs (.github/workflows/e2e-vitest-scenarios.yaml:70): Explicit unsupported scenarios are intentionally retained in the live matrix so they can surface skip reasons, but each matrix job still receives NVIDIA_API_KEY at job scope and runs checkout, setup-node, npm install, CLI build, summary, and artifact upload before/around the skipped test path. That keeps a live API secret available to jobs that the matrix already marks as supported:false.
- Recommendation: Gate secret-bearing work on matrix.supported, move NVIDIA_API_KEY to the narrowest supported-scenario step, or surface unsupported skip reasons in the generate-matrix job or a separate no-secret reporting job instead of launching a normal live-scenarios job.
- Evidence: buildLiveScenarioMatrix(ids) preserves explicit unsupported selections with supported:false and supportReasons; .github/workflows/e2e-vitest-scenarios.yaml sets NVIDIA_API_KEY in live-scenarios env for every matrix entry; registry-scenarios.test.ts skips unsupported scenarios after the job has already been allocated.
Generated runner labels are not allowlisted before runs-on (.github/workflows/e2e-vitest-scenarios.yaml:60): The live workflow consumes matrix.runner directly in runs-on, while runner labels are derived from scenario definitions. The resolver still accepts explicit runs-on:<label> requirements after only trimming and checking non-empty, so a future registry change can steer this secret-bearing workflow to an unexpected runner label.
- Recommendation: Add a central allowlist for live Vitest runner labels, validate resolved runners before emitting the live matrix, and add a negative test that an unsafe or unapproved runner label is rejected before GitHub Actions consumes runs-on.
- Evidence: .github/workflows/e2e-vitest-scenarios.yaml uses runs-on: ${{ matrix.runner }}; test/e2e-scenario/scenarios/runner-routing.ts resolves any non-empty runnerRequirements entry beginning with runs-on: without an approved-label check.
Unsupported-selection compatibility path still needs a source-of-truth/removal plan (test/e2e-scenario/scenarios/runtime-support.ts:19): The PR registers unsupported live scenarios under the bare scenario ID and logs [not wired] reasons separately so exact workflow filtering no longer matches zero tests. That is a reasonable bridge while live Vitest fixture support covers only part of the typed registry, but the code does not document why the source cannot be fixed here, when the bridge should be removed, or how it avoids becoming a permanent second source of truth.
- Recommendation: Document the invalid state, source boundary, source-fix constraint, and removal condition near liveScenarioTestName/liveScenarioSupport. Prefer surfacing unsupported selections before launching the secret-bearing live job, and tie the bridge to the fixture-support migration milestone.
- Evidence: liveScenarioTestName(scenario) always returns scenario.id; registry-scenarios.test.ts logs [not wired] only when SCENARIO_ID matches an unsupported scenario; buildLiveScenarioMatrix(ids) preserves unsupported explicit selections instead of filtering them out.
Runtime workflow behavior still needs targeted validation (test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts:169): The added unit and boundary tests cover matrix construction, ID safety, workflow shape, and the skip-name contract, but the changed behavior spans GitHub Actions output parsing, exact Vitest filtering, runner selection, secret-bearing setup, and artifact upload. The current tests do not prove the real workflow-shaped unsupported-selection path avoids a no-test failure, that unknown live selections fail without a consumable matrix, or that unsupported entries avoid secrets and unsafe runners.
- Recommendation: Add focused runtime or integration validation for the workflow-style paths and negative cases, without relying on external CI pass/fail status.
- Evidence: Tests assert buildLiveScenarioMatrix(["ubuntu-repo-cloud-hermes"]) returns supported:false and liveScenarioTestName() matches the exact filter, but there is no observed test for --emit-live-matrix --scenarios does-not-exist, no unsafe-runner rejection test, and no test proving supported:false entries avoid NVIDIA_API_KEY or secret-bearing setup.

🌱 Nice ideas

None.

Consider writing more tests for

**Runtime validation** — `--emit-live-matrix --scenarios does-not-exist` exits nonzero and stdout is not a JSON matrix. The changed behavior crosses workflow matrix generation, GitHub Actions output consumption, exact Vitest filtering, runner selection, secret-bearing setup, and artifact upload. Unit and boundary coverage improved, but runtime-shaped negative paths remain unproven.
**Runtime validation** — `SCENARIO_ID=ubuntu-repo-cloud-hermes` with the exact Vitest `-t "^${SCENARIO_ID}$"` command reports the skipped unsupported test and logs `[not wired]`. The changed behavior crosses workflow matrix generation, GitHub Actions output consumption, exact Vitest filtering, runner selection, secret-bearing setup, and artifact upload. Unit and boundary coverage improved, but runtime-shaped negative paths remain unproven.
**Runtime validation** — Live Vitest matrix generation rejects runner labels outside the approved allowlist before `runs-on` consumes `matrix.runner`. The changed behavior crosses workflow matrix generation, GitHub Actions output consumption, exact Vitest filtering, runner selection, secret-bearing setup, and artifact upload. Unit and boundary coverage improved, but runtime-shaped negative paths remain unproven.
**Runtime validation** — Unsupported `supported:false` matrix entries do not receive job-level `NVIDIA_API_KEY` and do not run secret-bearing checkout/setup/build steps. The changed behavior crosses workflow matrix generation, GitHub Actions output consumption, exact Vitest filtering, runner selection, secret-bearing setup, and artifact upload. Unit and boundary coverage improved, but runtime-shaped negative paths remain unproven.
**Runtime validation** — Workflow boundary rejects job-level secret exposure when the generated live matrix can include `supported:false` entries. The changed behavior crosses workflow matrix generation, GitHub Actions output consumption, exact Vitest filtering, runner selection, secret-bearing setup, and artifact upload. Unit and boundary coverage improved, but runtime-shaped negative paths remain unproven.
**Runtime workflow behavior still needs targeted validation** — Add focused runtime or integration validation for the workflow-style paths and negative cases, without relying on external CI pass/fail status.
**Acceptance clause:** while explicit scenario selections still produce jobs that can surface structured skip reasons. — add test evidence or identify existing coverage. buildLiveScenarioMatrix(["ubuntu-repo-cloud-hermes"]) keeps an unsupported entry with supportReasons, and registry-scenarios.test.ts logs [not wired] for selected unsupported scenarios. Runtime validation for the exact Vitest command is still missing, and the job remains secret-bearing.
**Acceptance clause:** Tests added or updated for new or changed behavior — add test evidence or identify existing coverage. Tests were added for live matrix generation, registry ID safety, skip-name contract, and workflow boundary checks. Runtime validation remains recommended for unknown selections, exact unsupported Vitest filtering, runner allowlisting, and secret gating.

Since last review details

Current findings:

Source-of-truth review needed: Explicit unsupported live scenario selections: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: runtime-support.ts liveScenarioTestName() returns scenario.id for supported and unsupported scenarios; registry-scenarios.test.ts logs [not wired] when SCENARIO_ID matches; run.ts buildLiveScenarioMatrix(ids) keeps unsupported explicit selections.
Unsupported live selections still run as secret-bearing matrix jobs (.github/workflows/e2e-vitest-scenarios.yaml:70): Explicit unsupported scenarios are intentionally retained in the live matrix so they can surface skip reasons, but each matrix job still receives NVIDIA_API_KEY at job scope and runs checkout, setup-node, npm install, CLI build, summary, and artifact upload before/around the skipped test path. That keeps a live API secret available to jobs that the matrix already marks as supported:false.
- Recommendation: Gate secret-bearing work on matrix.supported, move NVIDIA_API_KEY to the narrowest supported-scenario step, or surface unsupported skip reasons in the generate-matrix job or a separate no-secret reporting job instead of launching a normal live-scenarios job.
- Evidence: buildLiveScenarioMatrix(ids) preserves explicit unsupported selections with supported:false and supportReasons; .github/workflows/e2e-vitest-scenarios.yaml sets NVIDIA_API_KEY in live-scenarios env for every matrix entry; registry-scenarios.test.ts skips unsupported scenarios after the job has already been allocated.
Generated runner labels are not allowlisted before runs-on (.github/workflows/e2e-vitest-scenarios.yaml:60): The live workflow consumes matrix.runner directly in runs-on, while runner labels are derived from scenario definitions. The resolver still accepts explicit runs-on:<label> requirements after only trimming and checking non-empty, so a future registry change can steer this secret-bearing workflow to an unexpected runner label.
- Recommendation: Add a central allowlist for live Vitest runner labels, validate resolved runners before emitting the live matrix, and add a negative test that an unsafe or unapproved runner label is rejected before GitHub Actions consumes runs-on.
- Evidence: .github/workflows/e2e-vitest-scenarios.yaml uses runs-on: ${{ matrix.runner }}; test/e2e-scenario/scenarios/runner-routing.ts resolves any non-empty runnerRequirements entry beginning with runs-on: without an approved-label check.
Unsupported-selection compatibility path still needs a source-of-truth/removal plan (test/e2e-scenario/scenarios/runtime-support.ts:19): The PR registers unsupported live scenarios under the bare scenario ID and logs [not wired] reasons separately so exact workflow filtering no longer matches zero tests. That is a reasonable bridge while live Vitest fixture support covers only part of the typed registry, but the code does not document why the source cannot be fixed here, when the bridge should be removed, or how it avoids becoming a permanent second source of truth.
- Recommendation: Document the invalid state, source boundary, source-fix constraint, and removal condition near liveScenarioTestName/liveScenarioSupport. Prefer surfacing unsupported selections before launching the secret-bearing live job, and tie the bridge to the fixture-support migration milestone.
- Evidence: liveScenarioTestName(scenario) always returns scenario.id; registry-scenarios.test.ts logs [not wired] only when SCENARIO_ID matches an unsupported scenario; buildLiveScenarioMatrix(ids) preserves unsupported explicit selections instead of filtering them out.
Runtime workflow behavior still needs targeted validation (test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts:169): The added unit and boundary tests cover matrix construction, ID safety, workflow shape, and the skip-name contract, but the changed behavior spans GitHub Actions output parsing, exact Vitest filtering, runner selection, secret-bearing setup, and artifact upload. The current tests do not prove the real workflow-shaped unsupported-selection path avoids a no-test failure, that unknown live selections fail without a consumable matrix, or that unsupported entries avoid secrets and unsafe runners.
- Recommendation: Add focused runtime or integration validation for the workflow-style paths and negative cases, without relying on external CI pass/fail status.
- Evidence: Tests assert buildLiveScenarioMatrix(["ubuntu-repo-cloud-hermes"]) returns supported:false and liveScenarioTestName() matches the exact filter, but there is no observed test for --emit-live-matrix --scenarios does-not-exist, no unsafe-runner rejection test, and no test proving supported:false entries avoid NVIDIA_API_KEY or secret-bearing setup.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

The Vitest workflow filters scenarios via -t "^${SCENARIO_ID}$". Unsupported scenarios were registered as test.skip(`${id} [not wired: ...]`) so the anchored filter matched zero tests and Vitest exited non-zero without surfacing the structured skip reason — exactly what the PR description claimed it would do. Changes: - Register every scenario (supported and skipped) under exactly scenario.id via a new liveScenarioTestName() helper, so the workflow's `-t "^${SCENARIO_ID}$"` filter selects the right test deterministically for both buckets. - When SCENARIO_ID matches an unsupported scenario, emit a `[not wired] <id>: <reasons>` warning at module load so the job log/summary still captures why the scenario can't run live. - Add e2e-live-skip-name-contract.test.ts to lock the contract: every registered scenario name equals scenario.id, the workflow's anchored regex matches it, and the legacy `[not wired:` suffix cannot reappear. Verified with the workflow's exact invocation: SCENARIO_ID=ubuntu-repo-cloud-hermes npx vitest run \ --project e2e-scenarios-live \ test/e2e-scenario/live/registry-scenarios.test.ts \ -t "^ubuntu-repo-cloud-hermes$" exits 0, reports the targeted test as skipped, and prints the structured `[not wired]` reason. Refs #4990 [skip is local-only: Test (cli) hook tries to run live registry test against macOS without Docker; pre-existing on this branch and not introduced by this change. CI cli-test-shards continue to pass.] Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>

coderabbitai

🧹 Nitpick comments (1)

test/e2e-scenario/scenarios/run.ts (1)

140-145: ⚡ Quick win

Avoid redundant liveScenarioSupport() calls in the default filter path.

When ids is empty, line 143 calls liveScenarioSupport(scenario) during filtering, then line 144 calls it again during mapping. This double invocation can be eliminated by computing support once per scenario.

♻️ Refactor to compute support once per scenario

 export function buildLiveScenarioMatrix(ids: string[] = []): LiveScenarioMatrixEntry[] {
-  const scenarios = ids.length > 0
-    ? requireScenarios(ids)
-    : listScenarios().filter((scenario) => liveScenarioSupport(scenario).supported);
-  return scenarios.map((scenario) => liveMatrixEntry(scenario, liveScenarioSupport(scenario)));
+  if (ids.length > 0) {
+    return requireScenarios(ids).map((scenario) => liveMatrixEntry(scenario, liveScenarioSupport(scenario)));
+  }
+  return listScenarios()
+    .map((scenario) => ({ scenario, support: liveScenarioSupport(scenario) }))
+    .filter(({ support }) => support.supported)
+    .map(({ scenario, support }) => liveMatrixEntry(scenario, support));
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/scenarios/run.ts` around lines 140 - 145, The default path
in buildLiveScenarioMatrix currently calls liveScenarioSupport(scenario) twice
(once in listScenarios().filter(...) and again in scenarios.map(...)), so change
the logic to compute support once per scenario: when ids is empty, map
listScenarios() to pairs of {scenario, support: liveScenarioSupport(scenario)},
filter those pairs by support.supported, then map to liveMatrixEntry(scenario,
support). Keep the requireScenarios(ids) branch unchanged and reuse the same
liveMatrixEntry call for both branches.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/e2e-scenario/scenarios/run.ts`:
- Around line 140-145: The default path in buildLiveScenarioMatrix currently
calls liveScenarioSupport(scenario) twice (once in listScenarios().filter(...)
and again in scenarios.map(...)), so change the logic to compute support once
per scenario: when ids is empty, map listScenarios() to pairs of {scenario,
support: liveScenarioSupport(scenario)}, filter those pairs by
support.supported, then map to liveMatrixEntry(scenario, support). Keep the
requireScenarios(ids) branch unchanged and reuse the same liveMatrixEntry call
for both branches.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: aad6f59c-9dae-42e3-9e97-17cb2828683c

📥 Commits

Reviewing files that changed from the base of the PR and between fc9a123 and a4ab229.

📒 Files selected for processing (21)

.github/workflows/e2e-vitest-scenarios.yaml
test/e2e-scenario/framework-tests/e2e-clients.test.ts
test/e2e-scenario/framework-tests/e2e-live-registry-discovery.test.ts
test/e2e-scenario/framework-tests/e2e-live-skip-name-contract.test.ts
test/e2e-scenario/framework-tests/e2e-phase-environment.test.ts
test/e2e-scenario/framework-tests/e2e-phase-onboarding.test.ts
test/e2e-scenario/framework-tests/e2e-phase-state-validation.test.ts
test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts
test/e2e-scenario/framework/clients/gateway.ts
test/e2e-scenario/framework/clients/host.ts
test/e2e-scenario/framework/clients/sandbox.ts
test/e2e-scenario/framework/e2e-test.ts
test/e2e-scenario/framework/phases/environment.ts
test/e2e-scenario/framework/phases/index.ts
test/e2e-scenario/framework/phases/onboarding.ts
test/e2e-scenario/framework/phases/state-validation.ts
test/e2e-scenario/live/registry-scenarios.test.ts
test/e2e-scenario/scenarios/run.ts
test/e2e-scenario/scenarios/runtime-support.ts
tools/e2e-scenarios/workflow-boundary.mts

Resolves divergence with parent stack PRs (#5002, #5003, #5004, #5005) already merged into main, plus #5022 and #5035 refinements. Conflict resolution prefers main's canonical phase-fixture and framework-plumbing files (those have already received post-#5005 refactors), and re-applies this PR's unique deltas on top: - liveScenarioTestName() helper in runtime-support.ts - Skip-name realignment + module-load [not wired] log in registry-scenarios.test.ts - New e2e-live-skip-name-contract framework test Net effect of this PR vs main is the matrix fan-out workflow, --emit-live-matrix CLI, workflow boundary updates, and the filter-alignment fix. See `git diff origin/main` for the full delta. Co-authored-by: Julie Yaunches <jyaunches@nvidia.com>

copy-pr-bot · 2026-06-09T15:52:01Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

After merging main (which brought #5020 'cover and format all TypeScript files' into the branch), biome flagged 5 PR files for organize-imports and one wrap. Auto-applied via: npx biome check --write \ test/e2e-scenario/framework-tests/e2e-live-skip-name-contract.test.ts \ test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts \ test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts \ test/e2e-scenario/live/registry-scenarios.test.ts \ test/e2e-scenario/scenarios/run.ts Verified post-format: - 24/24 e2e-scenario-framework tests still green (skip-name contract, live-registry-discovery, scenario-matrix, workflow boundary). - Workflow filter sim still works: SCENARIO_ID=ubuntu-repo-cloud-hermes + -t "^ubuntu-repo-cloud-hermes$" exits 0 with structured [not wired] reason in stderr. Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>

Registers the failing-test-first guard for #4423 in the typed scenario registry so the live Vitest matrix from #5006 fans it out as a dedicated CI job. Builds on the framework primitives added earlier in this PR (lifecycle phase fixture, host-side probes, lifecycle whitelist). Additions: * `post-reboot-recovery-ready` expected-state in `scenarios/expected-states.ts` declaring the user-visible invariants that must hold after a `nemoclaw <name> status` call on a freshly-rebooted DGX Spark / Linux Docker-driver host: - cli installed, - gateway healthy (the user-systemd unit from #4580 brings it back up before status runs), - sandbox running (recovery completed in time), - localRegistry entry preserved (the user-visible regression target — destroyed on unfixed `main`), - dockerSandboxContainer present (recovery didn't delete the labeled container or its `*-nemoclaw-gpu-backup-*` sibling). * `ubuntu-repo-docker-post-reboot-recovery` scenario in `scenarios/scenarios/baseline.ts` wiring `ubuntuRepoDockerLifecycle("cloud-openclaw", "post-reboot-recovery")` against the new expected-state and a smoke suite. Carries a description that explains the RED/GREEN contract and points to the PR-A fix landing in `src/lib/`. * `manifests/openclaw-nvidia-post-reboot-recovery.yaml` declares `lifecycle: post-reboot-recovery` and the same NVIDIA_API_KEY credential ref the cloud-openclaw scenarios use. * `.github/workflows/e2e-scenarios.yaml` ROUTES table gains the new scenario so the workflow-boundary test (`e2e-scenarios-workflow.test.ts`) routes every typed id. Test pinning: * `e2e-scenario-matrix.test.ts` updated from a 1-entry to a 2-entry live matrix expectation. The new entry asserts on `expectedStateId: "post-reboot-recovery-ready"` so a future accidental dropped-lifecycle change to the scenario regresses loudly. * `e2e-live-registry-discovery.test.ts` swaps the synthetic whitelist-coverage test for an assertion against the real `ubuntu-repo-docker-post-reboot-recovery` registry entry. Behavior: * On unfixed `main`, the live runner's lifecycle phase stops the OpenShell gateway runtime and `docker stop`s the labeled sandbox container. State-validation then runs `nemoclaw <name> status` (which restarts the gateway via systemd) and the destructive `missing` branch in `src/lib/actions/sandbox/status.ts` wipes the local registry entry. The `local-registry-entry-present` probe fails. Scenario goes RED. * On the PR-A fix branch, the new Docker-driver sandbox recovery helper restarts the labeled container before stale-removal can fire, registry survives, all five probes pass. Scenario flips GREEN. The bash-side legacy compiler emits a `lifecycle.profile.post-reboot-recovery` PhaseAction pointing at `nemoclaw_scenarios/lifecycle/dispatch.sh`, but the legacy bash worker is intentionally not provided: this scenario is Vitest-only. The typed runner's `LifecyclePhaseFixture` handles dispatch directly. If the legacy runner is invoked against this scenario it errors out at the dispatcher; that's the right failure mode while the bash side stays on its own retirement clock. Refs #4423

@ericksoa

## Summary Failing-test-first regression guard for #4423, plus the framework primitives that make it expressible. After this PR lands, the live Vitest matrix from #5006 fans out a dedicated `ubuntu-repo-docker-post-reboot-recovery` job that goes 🔴 RED on `main` (because the registry-destruction bug is still there) and is the executable acceptance criterion for the source fix in PR-A. ## Related Issue Refs #4423 ## What this PR adds ### Framework primitives (commits 1–3) 1. **`SUPPORTED_LIFECYCLES` whitelist** in `scenarios/runtime-support.ts`. Replaces the unconditional reject for `environment.lifecycle` with a lock-step whitelist, so lifecycle scenarios can graduate one profile at a time. Initial whitelist: `post-reboot-recovery`. 2. **Two host-side state-validation probes**: - `local-registry-entry-present` — reads `~/.nemoclaw/sandboxes.json` and asserts the scenario's sandbox name is recorded. This is the user-visible regression target for #4423. - `docker-sandbox-container-present` — runs `docker ps -a --filter label=openshell.ai/sandbox-name=…` and accepts running, stopped, and `*-nemoclaw-gpu-backup-*` siblings. Mirrors `OPENSHELL_SANDBOX_NAME_LABEL` / `findOpenShellDockerSandboxContainerIds` in `src/lib/onboard/docker-gpu-patch.ts`. New `ExpectedState.localRegistry` and `ExpectedState.dockerSandboxContainer` dimensions; `probesForState` emits the probes only for `expected: "present"`. `StateValidationPhaseFixture.from()` now accepts an inline `ExpectedState` so unit tests don't need to register synthetic states. Optional `ProbeIO` injection for the registry reader. 3. **`LifecyclePhaseFixture`** in `framework/phases/lifecycle.ts`. Dispatches `simulate(profile, instance, opts)`. Today the only profile is `post-reboot-recovery`, with two modes: - `stop-original` (default) — `openshell gateway stop` + `docker stop` of the labeled container. - `rename-to-gpu-backup` — additionally renames the container to a `*-nemoclaw-gpu-backup-<ts>` sibling, mirroring `buildBackupContainerName()` in `src/lib/onboard/docker-gpu-patch.ts`. Both modes register reverse-order cleanups so teardown leaves Docker usable. Wired into `E2EScenarioFixtures` and into `live/registry-scenarios.test.ts` between `onboard.from()` and `stateValidation.from()` whenever the scenario declares `environment.lifecycle`. ### The failing regression guard (commit 4) 4. **`ubuntu-repo-docker-post-reboot-recovery` scenario** + **`post-reboot-recovery-ready` expected-state** + manifest YAML. The expected-state requires: - `cli` installed, - `gateway` healthy (the user-systemd unit from #4580 brings it back up), - `sandbox` running (recovery completes in time), - **`localRegistry` entry preserved** (the user-visible regression target), - `dockerSandboxContainer` present (recovery didn't delete the labeled container or its `*-nemoclaw-gpu-backup-*` sibling). The `.github/workflows/e2e-scenarios.yaml` ROUTES table gains the new id so `e2e-scenarios-workflow.test.ts` keeps every typed id routed. ## RED / GREEN contract * **On unfixed `main`:** the live runner stops the OpenShell gateway runtime and `docker stop`s the labeled container. State-validation runs `nemoclaw <name> status`, which restarts the gateway via systemd; the destructive `missing` branch in `src/lib/actions/sandbox/status.ts` then wipes the local registry entry. The `local-registry-entry-present` probe fails. **Scenario goes 🔴 RED.** * **On the PR-A fix branch:** the new Docker-driver sandbox recovery helper restarts the labeled container before stale-removal can fire, registry survives, all five probes pass. **Scenario flips 🟢 GREEN.** ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [x] `npm test` passes — the 150 tests across the 6 directly affected files are all green; full e2e-scenario framework suite has only the pre-existing `shell probe reaps timed-out command process groups` load-flake in `e2e-fixture-context.test.ts` (unchanged on `main`). - [x] `--emit-live-matrix` now emits both `ubuntu-repo-cloud-openclaw` and `ubuntu-repo-docker-post-reboot-recovery` as supported live scenarios; the workflow fan-out from #5006 will dispatch each as its own CI job. - [x] Tests added or updated for new or changed behavior. - [x] No secrets, API keys, or credentials committed. - [ ] Docs updated for user-facing behavior changes. - [ ] `npm run docs` builds without warnings (doc changes only). ## Boundaries kept honest - No imports from `src/lib/**` into `framework/**`. The two duplicated constants (`OPENSHELL_SANDBOX_NAME_LABEL`, `~/.nemoclaw/sandboxes.json` path) carry comments pointing back to the CLI source-of-truth. Drift gets caught by the live scenario itself once the lifecycle phase exercises real Docker labels. - `LifecyclePhaseFixture` profile dispatch is exhaustive (`never`-checked). New profiles must add the dispatcher branch and the `SUPPORTED_LIFECYCLES` whitelist together. - Negative-direction host probes (`expected: "absent"` for `localRegistry` / `dockerSandboxContainer`) deliberately emit nothing today, with a `probesForState` test pinning the gap so a future negative scenario PR has to add them. - The legacy bash dispatcher (`nemoclaw_scenarios/lifecycle/dispatch.sh`) is intentionally **not** updated for `post-reboot-recovery`. This scenario is Vitest-only; the typed runner's `LifecyclePhaseFixture` handles dispatch directly. If the legacy bash runner is ever invoked against this scenario, it errors at the dispatcher — that's the right failure mode while the bash side stays on its own retirement clock. ## Why this doesn't yet flip green The fix lives in `src/lib/actions/sandbox/{status,gateway-state}.ts` plus a new `src/lib/onboard/docker-driver-sandbox-recovery.ts` helper, tracked as **PR-A**. PR-A implements parts 2 & 3 of @ericksoa's #4423 plan: - Docker-driver sandbox recovery from `findOpenShellDockerSandboxContainerIds` labels (including `*-nemoclaw-gpu-backup-*` rename), and - tightened stale-removal in active paths (`status`, `connect`, `ensureLiveSandboxOrExit`) requiring Docker-corroborated absence before destroying the registry. PR-A's CI run will dispatch this scenario via `--scenarios ubuntu-repo-docker-post-reboot-recovery` to prove 🔴 → 🟢. ## Diffstat ``` .github/workflows/e2e-scenarios.yaml | 1 + test/e2e-scenario/framework-tests/e2e-expected-state.test.ts | 35 +++ test/e2e-scenario/framework-tests/e2e-live-registry-discovery.test.ts | 35 +++ test/e2e-scenario/framework-tests/e2e-phase-lifecycle.test.ts | 239 +++++++++++++++++++++ test/e2e-scenario/framework-tests/e2e-phase-state-validation.test.ts | 120 ++++++++++- test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts | 20 +- test/e2e-scenario/framework/e2e-test.ts | 5 + test/e2e-scenario/framework/phases/index.ts | 8 + test/e2e-scenario/framework/phases/lifecycle.ts | 207 ++++++++++++++++++ test/e2e-scenario/framework/phases/state-validation.ts | 113 +++++++++- test/e2e-scenario/live/registry-scenarios.test.ts | 44 +++- test/e2e-scenario/manifests/openclaw-nvidia-post-reboot-recovery.yaml | 41 ++++ test/e2e-scenario/scenarios/expected-states.ts | 35 +++ test/e2e-scenario/scenarios/runtime-support.ts | 8 +- test/e2e-scenario/scenarios/scenarios/baseline.ts | 31 +++ test/e2e-scenario/scenarios/types.ts | 25 ++- 16 files changed, 950 insertions(+), 17 deletions(-) ``` --------- Co-authored-by: Carlos Villela <cvillela@nvidia.com>

cv added 5 commits June 8, 2026 17:39

test(e2e): add environment phase fixture

32ad96a

test(e2e): add onboarding phase fixture

ada6eee

test(e2e): add state validation phase fixture

82c0543

test(e2e): discover live scenarios from registry

93fb874

ci(e2e): fan out Vitest scenarios from registry

13d5b46

cv self-assigned this Jun 9, 2026

cv mentioned this pull request Jun 9, 2026

Adopt Vitest fixtures as the E2E scenario execution model #4941

Open

7 tasks

wscurran added the chore Build, CI, dependency, or tooling maintenance label Jun 9, 2026

Base automatically changed from codex/e2e-4990-stack-04-live-registry-discovery to main June 9, 2026 14:51

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

cv and others added 2 commits June 9, 2026 08:53

Merge branch 'main' into codex/e2e-4990-stack-05-vitest-scenario-matrix

6577218

jyaunches approved these changes Jun 9, 2026

View reviewed changes

ci(e2e): tighten vitest scenario matrix contracts

7844cc3

jyaunches merged commit 91cb1d3 into main Jun 9, 2026
35 checks passed

jyaunches deleted the codex/e2e-4990-stack-05-vitest-scenario-matrix branch June 9, 2026 16:24

jyaunches mentioned this pull request Jun 9, 2026

test(e2e): add post-reboot-recovery regression guard for #4423 #5046

Merged

12 tasks

coderabbitai Bot mentioned this pull request Jun 9, 2026

test(e2e): failing-test-first guard for #2701 + recovery framework helpers #5049

Merged

9 tasks

cv added the v0.0.62 Release target label Jun 9, 2026

coderabbitai Bot mentioned this pull request Jun 10, 2026

test(e2e): migrate cloud inference and routing scenarios #5057

Closed

12 tasks

jyaunches mentioned this pull request Jun 10, 2026

Adopt phase fixtures + registry-driven test discovery for Vitest E2E scenarios #4990

Closed

8 tasks

Conversation

cv commented Jun 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

copy-pr-bot Bot commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cv commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading