Skip to content

ci(e2e): fan out Vitest scenarios from registry#5006

Merged
jyaunches merged 10 commits into
mainfrom
codex/e2e-4990-stack-05-vitest-scenario-matrix
Jun 9, 2026
Merged

ci(e2e): fan out Vitest scenarios from registry#5006
jyaunches merged 10 commits into
mainfrom
codex/e2e-4990-stack-05-vitest-scenario-matrix

Conversation

@cv

@cv cv commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Summary

Changes the Vitest E2E workflow from one manual filter job into a generated typed-scenario matrix. The default matrix runs fixture-supported live scenarios, while explicit scenario selections still produce jobs that can surface structured skip reasons.

Related Issue

Refs #4941
Refs #4990

Changes

  • Adds --emit-live-matrix and buildLiveScenarioMatrix(...) to the typed scenario runner CLI.
  • Extends matrix entries with install/runtime/onboarding, expected-state, required-secret, and support metadata.
  • Updates .github/workflows/e2e-vitest-scenarios.yaml with a generate-matrix job and per-scenario live-scenarios matrix fan-out.
  • Updates workflow boundary tests to lock pinned actions, matrix-scoped artifacts, NEMOCLAW_CLI_BIN, and safe input handling.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

  • Tests

    • Enhanced e2e scenario testing infrastructure with improved matrix generation and validation for live test scenarios.
    • Added comprehensive test coverage for scenario registry and workflow configuration validation.
  • Chores

    • Updated CI/CD workflow to support flexible multi-scenario e2e testing execution with per-scenario artifact management.

@cv cv self-assigned this Jun 9, 2026
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 02b454a0-d7f7-4721-8780-928dabe9ad3a

📥 Commits

Reviewing files that changed from the base of the PR and between a4ab229 and 7844cc3.

📒 Files selected for processing (1)
  • .github/workflows/e2e-vitest-scenarios.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/e2e-vitest-scenarios.yaml

📝 Walkthrough

Walkthrough

This PR establishes a matrix-driven infrastructure for live E2E Vitest scenario execution. A new runtime-support module evaluates scenario eligibility. The workflow transitions from single-job test_filter dispatch to a generate-matrix job that produces per-scenario matrix entries, and a live-scenarios matrix job that runs each scenario in isolation with its own test filtering and artifact scoping.

Changes

Live E2E Scenario Matrix Execution

Layer / File(s) Summary
Live scenario support detection
test/e2e-scenario/scenarios/runtime-support.ts
New LiveScenarioSupport interface and functions to evaluate whether scenarios can run: liveScenarioTestName() returns the scenario ID as the canonical test name, and liveScenarioSupport() validates environment/metadata against supported platform/install/runtime/onboarding sets.
Skip/name contract enforcement
test/e2e-scenario/framework-tests/e2e-live-skip-name-contract.test.ts
New tests verify that all scenarios register under test names matching scenario.id, that these names match the workflow filter regex ^${scenario.id}$, and that canonical supported/unsupported scenarios report correct eligibility status.
Live matrix generation CLI
test/e2e-scenario/scenarios/run.ts
Extended CLI argument parsing and exports: new LiveScenarioMatrixEntry interface adds live-runtime metadata, --emit-live-matrix flag, buildLiveScenarioMatrix() to produce matrix entries (filtered by support or explicit selection), and emitLiveMatrix() helper to output JSON and early-return before plan/execute.
Live matrix generation testing
test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
New runEmitLiveMatrix() spawn helper and test cases validate buildLiveScenarioMatrix() produces fixture-supported scenarios by default, handles explicit unsupported selections with correct flags, and --emit-live-matrix output is single-line JSON reflecting both defaults and explicit selections.
GitHub Actions workflow matrix infrastructure
.github/workflows/e2e-vitest-scenarios.yaml
Workflow refactored from test_filter dispatch to matrix-driven execution: new generate-matrix job produces per-scenario matrix via --emit-live-matrix, live-scenarios job depends on generate-matrix and runs per-scenario with matrix.runner/matrix.id, per-scenario env vars (SCENARIO_ID, SCENARIO_LABEL, E2E_ARTIFACT_DIR), Vitest filtering via -t "^${SCENARIO_ID}$", and per-scenario artifact naming/paths under e2e-artifacts/vitest/${{ matrix.id }}/.
Workflow boundary validation
tools/e2e-scenarios/workflow-boundary.mts, test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts
Validator updated to require workflow_dispatch input scenarios (reject legacy test_filter), validate generate-matrix job structure and --emit-live-matrix invocation, enforce live-scenarios matrix wiring from generate-matrix outputs, validate SCENARIO_ID/SCENARIO_LABEL env propagation and exact-regex filtering, and update artifact upload naming/scoping. Test expectations expanded to match new validation rules.
Live scenario registry execution
test/e2e-scenario/live/registry-scenarios.test.ts
New test file iterates over all registry scenarios, conditionally skips unsupported ones (with module-load warning when SCENARIO_ID matches unsupported), and for supported scenarios validates required secrets, asserts CLI build presence, verifies required metadata, writes scenario/result artifacts, and performs state validation.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

  • NVIDIA/NemoClaw#4990: PR directly implements matrix-driven per-scenario Vitest execution with live scenario support detection and workflow matrix infrastructure as proposed in this issue.

Possibly related PRs

  • NVIDIA/NemoClaw#5005: Both PRs introduce and leverage the same live scenario registry/test machinery (runtime-support.ts, registry-scenarios.test.ts), with the main PR extending it via workflow-driven matrix execution.
  • NVIDIA/NemoClaw#4968: Both PRs modify .github/workflows/e2e-vitest-scenarios.yaml and its boundary validation, transitioning from test_filter to scenario-matrix driven (scenarios, generate-matrix, matrix.id) execution.

Suggested labels

area: e2e, platform: ubuntu, v0.0.62

Suggested reviewers

  • jyaunches

Poem

🐰 A matrix of tests, each scenario bright,
Per-filter precision, each one running right,
The runtime detects which can wire and run,
GitHub's workflows orchestrate—tests by the ton!
From filter to matrix, the leap has been made,
Vitest scenarios bloom in the e2e arcade.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'ci(e2e): fan out Vitest scenarios from registry' accurately describes the primary change: refactoring the E2E workflow to generate and fan out scenarios from a registry matrix instead of using manual filtering.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/e2e-4990-stack-05-vitest-scenario-matrix

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: e2e-vitest-scenarios:default-supported-live-matrix
Optional E2E: e2e-vitest-scenarios:unsupported-scenario-skip, e2e-scenarios:ubuntu-repo-cloud-openclaw

Dispatch hint: scenarios=

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • e2e-vitest-scenarios:default-supported-live-matrix (medium; live Ubuntu install/onboarding/state-validation path and requires NVIDIA_API_KEY): Required because the PR changes the live E2E workflow itself, the matrix generator it invokes, the exact SCENARIO_ID Vitest filter, artifact paths, and CLI/env wiring. Running the workflow with the default empty scenarios input validates the new supported-scenario matrix and exercises the current supported live OpenClaw scenario end to end.

Optional E2E

  • e2e-vitest-scenarios:unsupported-scenario-skip (low-to-medium; builds CLI and starts the live project but should skip before live onboarding): Useful confidence for the new explicit unsupported-scenario behavior: --emit-live-matrix should keep ubuntu-repo-cloud-hermes in the matrix, the workflow exact-name filter should match its skipped Vitest test, and logs should include the structured not-wired reason instead of failing with zero matched tests.
  • e2e-scenarios:ubuntu-repo-cloud-openclaw (high; live typed scenario runner with install/onboarding/inference/credentials suites): Optional because test/e2e-scenario/scenarios/run.ts and registry.ts are also used by the typed scenario runner workflow. A single canonical scenario would confirm that the new safe-id validation did not break the established typed scenario execution path, but product runtime code is not changed and framework tests cover most of this contract.

New E2E recommendations

  • live Vitest workflow skip behavior (medium): The repository has framework tests for the unsupported-scenario matrix and naming contract, but no dedicated automated workflow-dispatch smoke that proves GitHub Actions matrix expansion plus Vitest skipped-test behavior succeeds for an explicitly selected unsupported scenario.
    • Suggested test: Add a lightweight workflow-level E2E/contract check or scheduled smoke dispatch of .github/workflows/e2e-vitest-scenarios.yaml with scenarios=ubuntu-repo-cloud-hermes, asserting the job exits successfully with a structured not-wired skip reason and no artifact path leakage.

Dispatch hint

  • Workflow: .github/workflows/e2e-vitest-scenarios.yaml
  • jobs input: scenarios=

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • e2e-scenarios-all: Changes touch shared typed scenario registry/execution support used to build and run the scenario matrix (test/e2e-scenario/scenarios/registry.ts, run.ts, and runtime-support.ts) plus scenario workflow-boundary validation. Per policy, shared scenario runner/registry behavior changes require the full scenario fan-out to verify every dispatchable scenario route.
    • Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

  • None.

Relevant changed files

  • .github/workflows/e2e-vitest-scenarios.yaml
  • test/e2e-scenario/docs/README.md
  • test/e2e-scenario/framework-tests/e2e-live-skip-name-contract.test.ts
  • test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
  • test/e2e-scenario/framework-tests/e2e-scenario-registry.test.ts
  • test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts
  • test/e2e-scenario/live/registry-scenarios.test.ts
  • test/e2e-scenario/scenarios/registry.ts
  • test/e2e-scenario/scenarios/run.ts
  • test/e2e-scenario/scenarios/runtime-support.ts
  • tools/e2e-scenarios/workflow-boundary.mts

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 5 worth checking, 0 nice ideas
Since last review: 2 prior items resolved, 4 still apply, 0 new items found

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Source-of-truth review needed: Explicit unsupported live scenario selections: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: runtime-support.ts liveScenarioTestName() returns scenario.id for supported and unsupported scenarios; registry-scenarios.test.ts logs [not wired] when SCENARIO_ID matches; run.ts buildLiveScenarioMatrix(ids) keeps unsupported explicit selections.
  • Unsupported live selections still run as secret-bearing matrix jobs (.github/workflows/e2e-vitest-scenarios.yaml:70): Explicit unsupported scenarios are intentionally retained in the live matrix so they can surface skip reasons, but each matrix job still receives NVIDIA_API_KEY at job scope and runs checkout, setup-node, npm install, CLI build, summary, and artifact upload before/around the skipped test path. That keeps a live API secret available to jobs that the matrix already marks as supported:false.
    • Recommendation: Gate secret-bearing work on matrix.supported, move NVIDIA_API_KEY to the narrowest supported-scenario step, or surface unsupported skip reasons in the generate-matrix job or a separate no-secret reporting job instead of launching a normal live-scenarios job.
    • Evidence: buildLiveScenarioMatrix(ids) preserves explicit unsupported selections with supported:false and supportReasons; .github/workflows/e2e-vitest-scenarios.yaml sets NVIDIA_API_KEY in live-scenarios env for every matrix entry; registry-scenarios.test.ts skips unsupported scenarios after the job has already been allocated.
  • Generated runner labels are not allowlisted before runs-on (.github/workflows/e2e-vitest-scenarios.yaml:60): The live workflow consumes matrix.runner directly in runs-on, while runner labels are derived from scenario definitions. The resolver still accepts explicit runs-on:<label> requirements after only trimming and checking non-empty, so a future registry change can steer this secret-bearing workflow to an unexpected runner label.
    • Recommendation: Add a central allowlist for live Vitest runner labels, validate resolved runners before emitting the live matrix, and add a negative test that an unsafe or unapproved runner label is rejected before GitHub Actions consumes runs-on.
    • Evidence: .github/workflows/e2e-vitest-scenarios.yaml uses runs-on: ${{ matrix.runner }}; test/e2e-scenario/scenarios/runner-routing.ts resolves any non-empty runnerRequirements entry beginning with runs-on: without an approved-label check.
  • Unsupported-selection compatibility path still needs a source-of-truth/removal plan (test/e2e-scenario/scenarios/runtime-support.ts:19): The PR registers unsupported live scenarios under the bare scenario ID and logs [not wired] reasons separately so exact workflow filtering no longer matches zero tests. That is a reasonable bridge while live Vitest fixture support covers only part of the typed registry, but the code does not document why the source cannot be fixed here, when the bridge should be removed, or how it avoids becoming a permanent second source of truth.
    • Recommendation: Document the invalid state, source boundary, source-fix constraint, and removal condition near liveScenarioTestName/liveScenarioSupport. Prefer surfacing unsupported selections before launching the secret-bearing live job, and tie the bridge to the fixture-support migration milestone.
    • Evidence: liveScenarioTestName(scenario) always returns scenario.id; registry-scenarios.test.ts logs [not wired] only when SCENARIO_ID matches an unsupported scenario; buildLiveScenarioMatrix(ids) preserves unsupported explicit selections instead of filtering them out.
  • Runtime workflow behavior still needs targeted validation (test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts:169): The added unit and boundary tests cover matrix construction, ID safety, workflow shape, and the skip-name contract, but the changed behavior spans GitHub Actions output parsing, exact Vitest filtering, runner selection, secret-bearing setup, and artifact upload. The current tests do not prove the real workflow-shaped unsupported-selection path avoids a no-test failure, that unknown live selections fail without a consumable matrix, or that unsupported entries avoid secrets and unsafe runners.
    • Recommendation: Add focused runtime or integration validation for the workflow-style paths and negative cases, without relying on external CI pass/fail status.
    • Evidence: Tests assert buildLiveScenarioMatrix(["ubuntu-repo-cloud-hermes"]) returns supported:false and liveScenarioTestName() matches the exact filter, but there is no observed test for --emit-live-matrix --scenarios does-not-exist, no unsafe-runner rejection test, and no test proving supported:false entries avoid NVIDIA_API_KEY or secret-bearing setup.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — `--emit-live-matrix --scenarios does-not-exist` exits nonzero and stdout is not a JSON matrix. The changed behavior crosses workflow matrix generation, GitHub Actions output consumption, exact Vitest filtering, runner selection, secret-bearing setup, and artifact upload. Unit and boundary coverage improved, but runtime-shaped negative paths remain unproven.
  • **Runtime validation** — `SCENARIO_ID=ubuntu-repo-cloud-hermes` with the exact Vitest `-t "^${SCENARIO_ID}$"` command reports the skipped unsupported test and logs `[not wired]`. The changed behavior crosses workflow matrix generation, GitHub Actions output consumption, exact Vitest filtering, runner selection, secret-bearing setup, and artifact upload. Unit and boundary coverage improved, but runtime-shaped negative paths remain unproven.
  • **Runtime validation** — Live Vitest matrix generation rejects runner labels outside the approved allowlist before `runs-on` consumes `matrix.runner`. The changed behavior crosses workflow matrix generation, GitHub Actions output consumption, exact Vitest filtering, runner selection, secret-bearing setup, and artifact upload. Unit and boundary coverage improved, but runtime-shaped negative paths remain unproven.
  • **Runtime validation** — Unsupported `supported:false` matrix entries do not receive job-level `NVIDIA_API_KEY` and do not run secret-bearing checkout/setup/build steps. The changed behavior crosses workflow matrix generation, GitHub Actions output consumption, exact Vitest filtering, runner selection, secret-bearing setup, and artifact upload. Unit and boundary coverage improved, but runtime-shaped negative paths remain unproven.
  • **Runtime validation** — Workflow boundary rejects job-level secret exposure when the generated live matrix can include `supported:false` entries. The changed behavior crosses workflow matrix generation, GitHub Actions output consumption, exact Vitest filtering, runner selection, secret-bearing setup, and artifact upload. Unit and boundary coverage improved, but runtime-shaped negative paths remain unproven.
  • **Runtime workflow behavior still needs targeted validation** — Add focused runtime or integration validation for the workflow-style paths and negative cases, without relying on external CI pass/fail status.
  • **Acceptance clause:** while explicit scenario selections still produce jobs that can surface structured skip reasons. — add test evidence or identify existing coverage. buildLiveScenarioMatrix(["ubuntu-repo-cloud-hermes"]) keeps an unsupported entry with supportReasons, and registry-scenarios.test.ts logs [not wired] for selected unsupported scenarios. Runtime validation for the exact Vitest command is still missing, and the job remains secret-bearing.
  • **Acceptance clause:** Tests added or updated for new or changed behavior — add test evidence or identify existing coverage. Tests were added for live matrix generation, registry ID safety, skip-name contract, and workflow boundary checks. Runtime validation remains recommended for unknown selections, exact unsupported Vitest filtering, runner allowlisting, and secret gating.
Since last review details

Current findings:

  • Source-of-truth review needed: Explicit unsupported live scenario selections: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: runtime-support.ts liveScenarioTestName() returns scenario.id for supported and unsupported scenarios; registry-scenarios.test.ts logs [not wired] when SCENARIO_ID matches; run.ts buildLiveScenarioMatrix(ids) keeps unsupported explicit selections.
  • Unsupported live selections still run as secret-bearing matrix jobs (.github/workflows/e2e-vitest-scenarios.yaml:70): Explicit unsupported scenarios are intentionally retained in the live matrix so they can surface skip reasons, but each matrix job still receives NVIDIA_API_KEY at job scope and runs checkout, setup-node, npm install, CLI build, summary, and artifact upload before/around the skipped test path. That keeps a live API secret available to jobs that the matrix already marks as supported:false.
    • Recommendation: Gate secret-bearing work on matrix.supported, move NVIDIA_API_KEY to the narrowest supported-scenario step, or surface unsupported skip reasons in the generate-matrix job or a separate no-secret reporting job instead of launching a normal live-scenarios job.
    • Evidence: buildLiveScenarioMatrix(ids) preserves explicit unsupported selections with supported:false and supportReasons; .github/workflows/e2e-vitest-scenarios.yaml sets NVIDIA_API_KEY in live-scenarios env for every matrix entry; registry-scenarios.test.ts skips unsupported scenarios after the job has already been allocated.
  • Generated runner labels are not allowlisted before runs-on (.github/workflows/e2e-vitest-scenarios.yaml:60): The live workflow consumes matrix.runner directly in runs-on, while runner labels are derived from scenario definitions. The resolver still accepts explicit runs-on:<label> requirements after only trimming and checking non-empty, so a future registry change can steer this secret-bearing workflow to an unexpected runner label.
    • Recommendation: Add a central allowlist for live Vitest runner labels, validate resolved runners before emitting the live matrix, and add a negative test that an unsafe or unapproved runner label is rejected before GitHub Actions consumes runs-on.
    • Evidence: .github/workflows/e2e-vitest-scenarios.yaml uses runs-on: ${{ matrix.runner }}; test/e2e-scenario/scenarios/runner-routing.ts resolves any non-empty runnerRequirements entry beginning with runs-on: without an approved-label check.
  • Unsupported-selection compatibility path still needs a source-of-truth/removal plan (test/e2e-scenario/scenarios/runtime-support.ts:19): The PR registers unsupported live scenarios under the bare scenario ID and logs [not wired] reasons separately so exact workflow filtering no longer matches zero tests. That is a reasonable bridge while live Vitest fixture support covers only part of the typed registry, but the code does not document why the source cannot be fixed here, when the bridge should be removed, or how it avoids becoming a permanent second source of truth.
    • Recommendation: Document the invalid state, source boundary, source-fix constraint, and removal condition near liveScenarioTestName/liveScenarioSupport. Prefer surfacing unsupported selections before launching the secret-bearing live job, and tie the bridge to the fixture-support migration milestone.
    • Evidence: liveScenarioTestName(scenario) always returns scenario.id; registry-scenarios.test.ts logs [not wired] only when SCENARIO_ID matches an unsupported scenario; buildLiveScenarioMatrix(ids) preserves unsupported explicit selections instead of filtering them out.
  • Runtime workflow behavior still needs targeted validation (test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts:169): The added unit and boundary tests cover matrix construction, ID safety, workflow shape, and the skip-name contract, but the changed behavior spans GitHub Actions output parsing, exact Vitest filtering, runner selection, secret-bearing setup, and artifact upload. The current tests do not prove the real workflow-shaped unsupported-selection path avoids a no-test failure, that unknown live selections fail without a consumable matrix, or that unsupported entries avoid secrets and unsafe runners.
    • Recommendation: Add focused runtime or integration validation for the workflow-style paths and negative cases, without relying on external CI pass/fail status.
    • Evidence: Tests assert buildLiveScenarioMatrix(["ubuntu-repo-cloud-hermes"]) returns supported:false and liveScenarioTestName() matches the exact filter, but there is no observed test for --emit-live-matrix --scenarios does-not-exist, no unsafe-runner rejection test, and no test proving supported:false entries avoid NVIDIA_API_KEY or secret-bearing setup.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@wscurran wscurran added the chore Build, CI, dependency, or tooling maintenance label Jun 9, 2026
Base automatically changed from codex/e2e-4990-stack-04-live-registry-discovery to main June 9, 2026 14:51
The Vitest workflow filters scenarios via -t "^${SCENARIO_ID}$".
Unsupported scenarios were registered as
  test.skip(`${id} [not wired: ...]`)
so the anchored filter matched zero tests and Vitest exited non-zero
without surfacing the structured skip reason — exactly what the PR
description claimed it would do.

Changes:
- Register every scenario (supported and skipped) under exactly
  scenario.id via a new liveScenarioTestName() helper, so the
  workflow's `-t "^${SCENARIO_ID}$"` filter selects the right
  test deterministically for both buckets.
- When SCENARIO_ID matches an unsupported scenario, emit a
  `[not wired] <id>: <reasons>` warning at module load so the job
  log/summary still captures why the scenario can't run live.
- Add e2e-live-skip-name-contract.test.ts to lock the contract:
  every registered scenario name equals scenario.id, the workflow's
  anchored regex matches it, and the legacy `[not wired:` suffix
  cannot reappear.

Verified with the workflow's exact invocation:
  SCENARIO_ID=ubuntu-repo-cloud-hermes npx vitest run \
    --project e2e-scenarios-live \
    test/e2e-scenario/live/registry-scenarios.test.ts \
    -t "^ubuntu-repo-cloud-hermes$"
exits 0, reports the targeted test as skipped, and prints the
structured `[not wired]` reason.

Refs #4990

[skip is local-only: Test (cli) hook tries to run live registry
test against macOS without Docker; pre-existing on this branch
and not introduced by this change. CI cli-test-shards continue
to pass.]

Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/e2e-scenario/scenarios/run.ts (1)

140-145: ⚡ Quick win

Avoid redundant liveScenarioSupport() calls in the default filter path.

When ids is empty, line 143 calls liveScenarioSupport(scenario) during filtering, then line 144 calls it again during mapping. This double invocation can be eliminated by computing support once per scenario.

♻️ Refactor to compute support once per scenario
 export function buildLiveScenarioMatrix(ids: string[] = []): LiveScenarioMatrixEntry[] {
-  const scenarios = ids.length > 0
-    ? requireScenarios(ids)
-    : listScenarios().filter((scenario) => liveScenarioSupport(scenario).supported);
-  return scenarios.map((scenario) => liveMatrixEntry(scenario, liveScenarioSupport(scenario)));
+  if (ids.length > 0) {
+    return requireScenarios(ids).map((scenario) => liveMatrixEntry(scenario, liveScenarioSupport(scenario)));
+  }
+  return listScenarios()
+    .map((scenario) => ({ scenario, support: liveScenarioSupport(scenario) }))
+    .filter(({ support }) => support.supported)
+    .map(({ scenario, support }) => liveMatrixEntry(scenario, support));
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/scenarios/run.ts` around lines 140 - 145, The default path
in buildLiveScenarioMatrix currently calls liveScenarioSupport(scenario) twice
(once in listScenarios().filter(...) and again in scenarios.map(...)), so change
the logic to compute support once per scenario: when ids is empty, map
listScenarios() to pairs of {scenario, support: liveScenarioSupport(scenario)},
filter those pairs by support.supported, then map to liveMatrixEntry(scenario,
support). Keep the requireScenarios(ids) branch unchanged and reuse the same
liveMatrixEntry call for both branches.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/e2e-scenario/scenarios/run.ts`:
- Around line 140-145: The default path in buildLiveScenarioMatrix currently
calls liveScenarioSupport(scenario) twice (once in listScenarios().filter(...)
and again in scenarios.map(...)), so change the logic to compute support once
per scenario: when ids is empty, map listScenarios() to pairs of {scenario,
support: liveScenarioSupport(scenario)}, filter those pairs by
support.supported, then map to liveMatrixEntry(scenario, support). Keep the
requireScenarios(ids) branch unchanged and reuse the same liveMatrixEntry call
for both branches.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: aad6f59c-9dae-42e3-9e97-17cb2828683c

📥 Commits

Reviewing files that changed from the base of the PR and between fc9a123 and a4ab229.

📒 Files selected for processing (21)
  • .github/workflows/e2e-vitest-scenarios.yaml
  • test/e2e-scenario/framework-tests/e2e-clients.test.ts
  • test/e2e-scenario/framework-tests/e2e-live-registry-discovery.test.ts
  • test/e2e-scenario/framework-tests/e2e-live-skip-name-contract.test.ts
  • test/e2e-scenario/framework-tests/e2e-phase-environment.test.ts
  • test/e2e-scenario/framework-tests/e2e-phase-onboarding.test.ts
  • test/e2e-scenario/framework-tests/e2e-phase-state-validation.test.ts
  • test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
  • test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts
  • test/e2e-scenario/framework/clients/gateway.ts
  • test/e2e-scenario/framework/clients/host.ts
  • test/e2e-scenario/framework/clients/sandbox.ts
  • test/e2e-scenario/framework/e2e-test.ts
  • test/e2e-scenario/framework/phases/environment.ts
  • test/e2e-scenario/framework/phases/index.ts
  • test/e2e-scenario/framework/phases/onboarding.ts
  • test/e2e-scenario/framework/phases/state-validation.ts
  • test/e2e-scenario/live/registry-scenarios.test.ts
  • test/e2e-scenario/scenarios/run.ts
  • test/e2e-scenario/scenarios/runtime-support.ts
  • tools/e2e-scenarios/workflow-boundary.mts

Resolves divergence with parent stack PRs (#5002, #5003, #5004, #5005)
already merged into main, plus #5022 and #5035 refinements. Conflict
resolution prefers main's canonical phase-fixture and framework-plumbing
files (those have already received post-#5005 refactors), and re-applies
this PR's unique deltas on top:

- liveScenarioTestName() helper in runtime-support.ts
- Skip-name realignment + module-load [not wired] log in
  registry-scenarios.test.ts
- New e2e-live-skip-name-contract framework test

Net effect of this PR vs main is the matrix fan-out workflow,
--emit-live-matrix CLI, workflow boundary updates, and the
filter-alignment fix. See `git diff origin/main` for the full delta.

Co-authored-by: Julie Yaunches <jyaunches@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 9, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

cv and others added 2 commits June 9, 2026 08:53
After merging main (which brought #5020 'cover and format all
TypeScript files' into the branch), biome flagged 5 PR files for
organize-imports and one wrap. Auto-applied via:

  npx biome check --write \
    test/e2e-scenario/framework-tests/e2e-live-skip-name-contract.test.ts \
    test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts \
    test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts \
    test/e2e-scenario/live/registry-scenarios.test.ts \
    test/e2e-scenario/scenarios/run.ts

Verified post-format:
- 24/24 e2e-scenario-framework tests still green (skip-name contract,
  live-registry-discovery, scenario-matrix, workflow boundary).
- Workflow filter sim still works: SCENARIO_ID=ubuntu-repo-cloud-hermes
  + -t "^ubuntu-repo-cloud-hermes$" exits 0 with structured
  [not wired] reason in stderr.

Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
@jyaunches jyaunches merged commit 91cb1d3 into main Jun 9, 2026
35 checks passed
@jyaunches jyaunches deleted the codex/e2e-4990-stack-05-vitest-scenario-matrix branch June 9, 2026 16:24
jyaunches added a commit that referenced this pull request Jun 9, 2026
Registers the failing-test-first guard for #4423 in the typed scenario
registry so the live Vitest matrix from #5006 fans it out as a
dedicated CI job. Builds on the framework primitives added earlier in
this PR (lifecycle phase fixture, host-side probes, lifecycle whitelist).

Additions:

  * `post-reboot-recovery-ready` expected-state in
    `scenarios/expected-states.ts` declaring the user-visible
    invariants that must hold after a `nemoclaw <name> status` call
    on a freshly-rebooted DGX Spark / Linux Docker-driver host:
      - cli installed,
      - gateway healthy (the user-systemd unit from #4580 brings it
        back up before status runs),
      - sandbox running (recovery completed in time),
      - localRegistry entry preserved (the user-visible regression
        target — destroyed on unfixed `main`),
      - dockerSandboxContainer present (recovery didn't delete the
        labeled container or its `*-nemoclaw-gpu-backup-*` sibling).

  * `ubuntu-repo-docker-post-reboot-recovery` scenario in
    `scenarios/scenarios/baseline.ts` wiring
    `ubuntuRepoDockerLifecycle("cloud-openclaw", "post-reboot-recovery")`
    against the new expected-state and a smoke suite. Carries a
    description that explains the RED/GREEN contract and points to the
    PR-A fix landing in `src/lib/`.

  * `manifests/openclaw-nvidia-post-reboot-recovery.yaml` declares
    `lifecycle: post-reboot-recovery` and the same NVIDIA_API_KEY
    credential ref the cloud-openclaw scenarios use.

  * `.github/workflows/e2e-scenarios.yaml` ROUTES table gains the new
    scenario so the workflow-boundary test
    (`e2e-scenarios-workflow.test.ts`) routes every typed id.

Test pinning:

  * `e2e-scenario-matrix.test.ts` updated from a 1-entry to a 2-entry
    live matrix expectation. The new entry asserts on
    `expectedStateId: "post-reboot-recovery-ready"` so a future
    accidental dropped-lifecycle change to the scenario regresses
    loudly.

  * `e2e-live-registry-discovery.test.ts` swaps the synthetic
    whitelist-coverage test for an assertion against the real
    `ubuntu-repo-docker-post-reboot-recovery` registry entry.

Behavior:

  * On unfixed `main`, the live runner's lifecycle phase stops the
    OpenShell gateway runtime and `docker stop`s the labeled sandbox
    container. State-validation then runs `nemoclaw <name> status`
    (which restarts the gateway via systemd) and the destructive
    `missing` branch in `src/lib/actions/sandbox/status.ts` wipes the
    local registry entry. The `local-registry-entry-present` probe
    fails. Scenario goes RED.

  * On the PR-A fix branch, the new Docker-driver sandbox recovery
    helper restarts the labeled container before stale-removal can
    fire, registry survives, all five probes pass. Scenario flips
    GREEN.

The bash-side legacy compiler emits a
`lifecycle.profile.post-reboot-recovery` PhaseAction pointing at
`nemoclaw_scenarios/lifecycle/dispatch.sh`, but the legacy bash worker
is intentionally not provided: this scenario is Vitest-only. The
typed runner's `LifecyclePhaseFixture` handles dispatch directly. If
the legacy runner is invoked against this scenario it errors out at
the dispatcher; that's the right failure mode while the bash side
stays on its own retirement clock.

Refs #4423
cv added a commit that referenced this pull request Jun 9, 2026
## Summary

Failing-test-first regression guard for #4423, plus the framework
primitives that make it expressible. After this PR lands, the live
Vitest matrix from #5006 fans out a dedicated
`ubuntu-repo-docker-post-reboot-recovery` job that goes 🔴 RED on `main`
(because the registry-destruction bug is still there) and is the
executable acceptance criterion for the source fix in PR-A.

## Related Issue

Refs #4423

## What this PR adds

### Framework primitives (commits 1–3)

1. **`SUPPORTED_LIFECYCLES` whitelist** in
`scenarios/runtime-support.ts`. Replaces the unconditional reject for
`environment.lifecycle` with a lock-step whitelist, so lifecycle
scenarios can graduate one profile at a time. Initial whitelist:
`post-reboot-recovery`.
2. **Two host-side state-validation probes**:
- `local-registry-entry-present` — reads `~/.nemoclaw/sandboxes.json`
and asserts the scenario's sandbox name is recorded. This is the
user-visible regression target for #4423.
- `docker-sandbox-container-present` — runs `docker ps -a --filter
label=openshell.ai/sandbox-name=…` and accepts running, stopped, and
`*-nemoclaw-gpu-backup-*` siblings. Mirrors
`OPENSHELL_SANDBOX_NAME_LABEL` /
`findOpenShellDockerSandboxContainerIds` in
`src/lib/onboard/docker-gpu-patch.ts`.

New `ExpectedState.localRegistry` and
`ExpectedState.dockerSandboxContainer` dimensions; `probesForState`
emits the probes only for `expected: "present"`.
`StateValidationPhaseFixture.from()` now accepts an inline
`ExpectedState` so unit tests don't need to register synthetic states.
Optional `ProbeIO` injection for the registry reader.
3. **`LifecyclePhaseFixture`** in `framework/phases/lifecycle.ts`.
Dispatches `simulate(profile, instance, opts)`. Today the only profile
is `post-reboot-recovery`, with two modes:
- `stop-original` (default) — `openshell gateway stop` + `docker stop`
of the labeled container.
- `rename-to-gpu-backup` — additionally renames the container to a
`*-nemoclaw-gpu-backup-<ts>` sibling, mirroring
`buildBackupContainerName()` in `src/lib/onboard/docker-gpu-patch.ts`.

Both modes register reverse-order cleanups so teardown leaves Docker
usable. Wired into `E2EScenarioFixtures` and into
`live/registry-scenarios.test.ts` between `onboard.from()` and
`stateValidation.from()` whenever the scenario declares
`environment.lifecycle`.

### The failing regression guard (commit 4)

4. **`ubuntu-repo-docker-post-reboot-recovery` scenario** +
**`post-reboot-recovery-ready` expected-state** + manifest YAML. The
expected-state requires:
   - `cli` installed,
- `gateway` healthy (the user-systemd unit from #4580 brings it back
up),
   - `sandbox` running (recovery completes in time),
- **`localRegistry` entry preserved** (the user-visible regression
target),
- `dockerSandboxContainer` present (recovery didn't delete the labeled
container or its `*-nemoclaw-gpu-backup-*` sibling).

The `.github/workflows/e2e-scenarios.yaml` ROUTES table gains the new id
so `e2e-scenarios-workflow.test.ts` keeps every typed id routed.

## RED / GREEN contract

* **On unfixed `main`:** the live runner stops the OpenShell gateway
runtime and `docker stop`s the labeled container. State-validation runs
`nemoclaw <name> status`, which restarts the gateway via systemd; the
destructive `missing` branch in `src/lib/actions/sandbox/status.ts` then
wipes the local registry entry. The `local-registry-entry-present` probe
fails. **Scenario goes 🔴 RED.**
* **On the PR-A fix branch:** the new Docker-driver sandbox recovery
helper restarts the labeled container before stale-removal can fire,
registry survives, all five probes pass. **Scenario flips 🟢 GREEN.**

## Type of Change

- [x] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification

- [x] `npx prek run --all-files` passes
- [x] `npm test` passes — the 150 tests across the 6 directly affected
files are all green; full e2e-scenario framework suite has only the
pre-existing `shell probe reaps timed-out command process groups`
load-flake in `e2e-fixture-context.test.ts` (unchanged on `main`).
- [x] `--emit-live-matrix` now emits both `ubuntu-repo-cloud-openclaw`
and `ubuntu-repo-docker-post-reboot-recovery` as supported live
scenarios; the workflow fan-out from #5006 will dispatch each as its own
CI job.
- [x] Tests added or updated for new or changed behavior.
- [x] No secrets, API keys, or credentials committed.
- [ ] Docs updated for user-facing behavior changes.
- [ ] `npm run docs` builds without warnings (doc changes only).

## Boundaries kept honest

- No imports from `src/lib/**` into `framework/**`. The two duplicated
constants (`OPENSHELL_SANDBOX_NAME_LABEL`, `~/.nemoclaw/sandboxes.json`
path) carry comments pointing back to the CLI source-of-truth. Drift
gets caught by the live scenario itself once the lifecycle phase
exercises real Docker labels.
- `LifecyclePhaseFixture` profile dispatch is exhaustive
(`never`-checked). New profiles must add the dispatcher branch and the
`SUPPORTED_LIFECYCLES` whitelist together.
- Negative-direction host probes (`expected: "absent"` for
`localRegistry` / `dockerSandboxContainer`) deliberately emit nothing
today, with a `probesForState` test pinning the gap so a future negative
scenario PR has to add them.
- The legacy bash dispatcher
(`nemoclaw_scenarios/lifecycle/dispatch.sh`) is intentionally **not**
updated for `post-reboot-recovery`. This scenario is Vitest-only; the
typed runner's `LifecyclePhaseFixture` handles dispatch directly. If the
legacy bash runner is ever invoked against this scenario, it errors at
the dispatcher — that's the right failure mode while the bash side stays
on its own retirement clock.

## Why this doesn't yet flip green

The fix lives in `src/lib/actions/sandbox/{status,gateway-state}.ts`
plus a new `src/lib/onboard/docker-driver-sandbox-recovery.ts` helper,
tracked as **PR-A**. PR-A implements parts 2 & 3 of @ericksoa's #4423
plan:
- Docker-driver sandbox recovery from
`findOpenShellDockerSandboxContainerIds` labels (including
`*-nemoclaw-gpu-backup-*` rename), and
- tightened stale-removal in active paths (`status`, `connect`,
`ensureLiveSandboxOrExit`) requiring Docker-corroborated absence before
destroying the registry.

PR-A's CI run will dispatch this scenario via `--scenarios
ubuntu-repo-docker-post-reboot-recovery` to prove 🔴 → 🟢.

## Diffstat

```
 .github/workflows/e2e-scenarios.yaml                                  |   1 +
 test/e2e-scenario/framework-tests/e2e-expected-state.test.ts          |  35 +++
 test/e2e-scenario/framework-tests/e2e-live-registry-discovery.test.ts |  35 +++
 test/e2e-scenario/framework-tests/e2e-phase-lifecycle.test.ts         | 239 +++++++++++++++++++++
 test/e2e-scenario/framework-tests/e2e-phase-state-validation.test.ts  | 120 ++++++++++-
 test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts         |  20 +-
 test/e2e-scenario/framework/e2e-test.ts                               |   5 +
 test/e2e-scenario/framework/phases/index.ts                           |   8 +
 test/e2e-scenario/framework/phases/lifecycle.ts                       | 207 ++++++++++++++++++
 test/e2e-scenario/framework/phases/state-validation.ts                | 113 +++++++++-
 test/e2e-scenario/live/registry-scenarios.test.ts                     |  44 +++-
 test/e2e-scenario/manifests/openclaw-nvidia-post-reboot-recovery.yaml |  41 ++++
 test/e2e-scenario/scenarios/expected-states.ts                        |  35 +++
 test/e2e-scenario/scenarios/runtime-support.ts                        |   8 +-
 test/e2e-scenario/scenarios/scenarios/baseline.ts                     |  31 +++
 test/e2e-scenario/scenarios/types.ts                                  |  25 ++-
 16 files changed, 950 insertions(+), 17 deletions(-)
```

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
@cv cv added the v0.0.62 Release target label Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore Build, CI, dependency, or tooling maintenance v0.0.62 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants