ci(e2e): fan out Vitest scenarios from registry#5006
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR establishes a matrix-driven infrastructure for live E2E Vitest scenario execution. A new ChangesLive E2E Scenario Matrix Execution
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: Dispatch required scenario E2E:
Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 0 needs attention, 5 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Consider writing more tests for
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
The Vitest workflow filters scenarios via -t "^${SCENARIO_ID}$".
Unsupported scenarios were registered as
test.skip(`${id} [not wired: ...]`)
so the anchored filter matched zero tests and Vitest exited non-zero
without surfacing the structured skip reason — exactly what the PR
description claimed it would do.
Changes:
- Register every scenario (supported and skipped) under exactly
scenario.id via a new liveScenarioTestName() helper, so the
workflow's `-t "^${SCENARIO_ID}$"` filter selects the right
test deterministically for both buckets.
- When SCENARIO_ID matches an unsupported scenario, emit a
`[not wired] <id>: <reasons>` warning at module load so the job
log/summary still captures why the scenario can't run live.
- Add e2e-live-skip-name-contract.test.ts to lock the contract:
every registered scenario name equals scenario.id, the workflow's
anchored regex matches it, and the legacy `[not wired:` suffix
cannot reappear.
Verified with the workflow's exact invocation:
SCENARIO_ID=ubuntu-repo-cloud-hermes npx vitest run \
--project e2e-scenarios-live \
test/e2e-scenario/live/registry-scenarios.test.ts \
-t "^ubuntu-repo-cloud-hermes$"
exits 0, reports the targeted test as skipped, and prints the
structured `[not wired]` reason.
Refs #4990
[skip is local-only: Test (cli) hook tries to run live registry
test against macOS without Docker; pre-existing on this branch
and not introduced by this change. CI cli-test-shards continue
to pass.]
Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/e2e-scenario/scenarios/run.ts (1)
140-145: ⚡ Quick winAvoid redundant
liveScenarioSupport()calls in the default filter path.When
idsis empty, line 143 callsliveScenarioSupport(scenario)during filtering, then line 144 calls it again during mapping. This double invocation can be eliminated by computing support once per scenario.♻️ Refactor to compute support once per scenario
export function buildLiveScenarioMatrix(ids: string[] = []): LiveScenarioMatrixEntry[] { - const scenarios = ids.length > 0 - ? requireScenarios(ids) - : listScenarios().filter((scenario) => liveScenarioSupport(scenario).supported); - return scenarios.map((scenario) => liveMatrixEntry(scenario, liveScenarioSupport(scenario))); + if (ids.length > 0) { + return requireScenarios(ids).map((scenario) => liveMatrixEntry(scenario, liveScenarioSupport(scenario))); + } + return listScenarios() + .map((scenario) => ({ scenario, support: liveScenarioSupport(scenario) })) + .filter(({ support }) => support.supported) + .map(({ scenario, support }) => liveMatrixEntry(scenario, support)); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e-scenario/scenarios/run.ts` around lines 140 - 145, The default path in buildLiveScenarioMatrix currently calls liveScenarioSupport(scenario) twice (once in listScenarios().filter(...) and again in scenarios.map(...)), so change the logic to compute support once per scenario: when ids is empty, map listScenarios() to pairs of {scenario, support: liveScenarioSupport(scenario)}, filter those pairs by support.supported, then map to liveMatrixEntry(scenario, support). Keep the requireScenarios(ids) branch unchanged and reuse the same liveMatrixEntry call for both branches.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@test/e2e-scenario/scenarios/run.ts`:
- Around line 140-145: The default path in buildLiveScenarioMatrix currently
calls liveScenarioSupport(scenario) twice (once in listScenarios().filter(...)
and again in scenarios.map(...)), so change the logic to compute support once
per scenario: when ids is empty, map listScenarios() to pairs of {scenario,
support: liveScenarioSupport(scenario)}, filter those pairs by
support.supported, then map to liveMatrixEntry(scenario, support). Keep the
requireScenarios(ids) branch unchanged and reuse the same liveMatrixEntry call
for both branches.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: aad6f59c-9dae-42e3-9e97-17cb2828683c
📒 Files selected for processing (21)
.github/workflows/e2e-vitest-scenarios.yamltest/e2e-scenario/framework-tests/e2e-clients.test.tstest/e2e-scenario/framework-tests/e2e-live-registry-discovery.test.tstest/e2e-scenario/framework-tests/e2e-live-skip-name-contract.test.tstest/e2e-scenario/framework-tests/e2e-phase-environment.test.tstest/e2e-scenario/framework-tests/e2e-phase-onboarding.test.tstest/e2e-scenario/framework-tests/e2e-phase-state-validation.test.tstest/e2e-scenario/framework-tests/e2e-scenario-matrix.test.tstest/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.tstest/e2e-scenario/framework/clients/gateway.tstest/e2e-scenario/framework/clients/host.tstest/e2e-scenario/framework/clients/sandbox.tstest/e2e-scenario/framework/e2e-test.tstest/e2e-scenario/framework/phases/environment.tstest/e2e-scenario/framework/phases/index.tstest/e2e-scenario/framework/phases/onboarding.tstest/e2e-scenario/framework/phases/state-validation.tstest/e2e-scenario/live/registry-scenarios.test.tstest/e2e-scenario/scenarios/run.tstest/e2e-scenario/scenarios/runtime-support.tstools/e2e-scenarios/workflow-boundary.mts
Resolves divergence with parent stack PRs (#5002, #5003, #5004, #5005) already merged into main, plus #5022 and #5035 refinements. Conflict resolution prefers main's canonical phase-fixture and framework-plumbing files (those have already received post-#5005 refactors), and re-applies this PR's unique deltas on top: - liveScenarioTestName() helper in runtime-support.ts - Skip-name realignment + module-load [not wired] log in registry-scenarios.test.ts - New e2e-live-skip-name-contract framework test Net effect of this PR vs main is the matrix fan-out workflow, --emit-live-matrix CLI, workflow boundary updates, and the filter-alignment fix. See `git diff origin/main` for the full delta. Co-authored-by: Julie Yaunches <jyaunches@nvidia.com>
After merging main (which brought #5020 'cover and format all TypeScript files' into the branch), biome flagged 5 PR files for organize-imports and one wrap. Auto-applied via: npx biome check --write \ test/e2e-scenario/framework-tests/e2e-live-skip-name-contract.test.ts \ test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts \ test/e2e-scenario/framework-tests/e2e-scenarios-workflow.test.ts \ test/e2e-scenario/live/registry-scenarios.test.ts \ test/e2e-scenario/scenarios/run.ts Verified post-format: - 24/24 e2e-scenario-framework tests still green (skip-name contract, live-registry-discovery, scenario-matrix, workflow boundary). - Workflow filter sim still works: SCENARIO_ID=ubuntu-repo-cloud-hermes + -t "^ubuntu-repo-cloud-hermes$" exits 0 with structured [not wired] reason in stderr. Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
Registers the failing-test-first guard for #4423 in the typed scenario registry so the live Vitest matrix from #5006 fans it out as a dedicated CI job. Builds on the framework primitives added earlier in this PR (lifecycle phase fixture, host-side probes, lifecycle whitelist). Additions: * `post-reboot-recovery-ready` expected-state in `scenarios/expected-states.ts` declaring the user-visible invariants that must hold after a `nemoclaw <name> status` call on a freshly-rebooted DGX Spark / Linux Docker-driver host: - cli installed, - gateway healthy (the user-systemd unit from #4580 brings it back up before status runs), - sandbox running (recovery completed in time), - localRegistry entry preserved (the user-visible regression target — destroyed on unfixed `main`), - dockerSandboxContainer present (recovery didn't delete the labeled container or its `*-nemoclaw-gpu-backup-*` sibling). * `ubuntu-repo-docker-post-reboot-recovery` scenario in `scenarios/scenarios/baseline.ts` wiring `ubuntuRepoDockerLifecycle("cloud-openclaw", "post-reboot-recovery")` against the new expected-state and a smoke suite. Carries a description that explains the RED/GREEN contract and points to the PR-A fix landing in `src/lib/`. * `manifests/openclaw-nvidia-post-reboot-recovery.yaml` declares `lifecycle: post-reboot-recovery` and the same NVIDIA_API_KEY credential ref the cloud-openclaw scenarios use. * `.github/workflows/e2e-scenarios.yaml` ROUTES table gains the new scenario so the workflow-boundary test (`e2e-scenarios-workflow.test.ts`) routes every typed id. Test pinning: * `e2e-scenario-matrix.test.ts` updated from a 1-entry to a 2-entry live matrix expectation. The new entry asserts on `expectedStateId: "post-reboot-recovery-ready"` so a future accidental dropped-lifecycle change to the scenario regresses loudly. * `e2e-live-registry-discovery.test.ts` swaps the synthetic whitelist-coverage test for an assertion against the real `ubuntu-repo-docker-post-reboot-recovery` registry entry. Behavior: * On unfixed `main`, the live runner's lifecycle phase stops the OpenShell gateway runtime and `docker stop`s the labeled sandbox container. State-validation then runs `nemoclaw <name> status` (which restarts the gateway via systemd) and the destructive `missing` branch in `src/lib/actions/sandbox/status.ts` wipes the local registry entry. The `local-registry-entry-present` probe fails. Scenario goes RED. * On the PR-A fix branch, the new Docker-driver sandbox recovery helper restarts the labeled container before stale-removal can fire, registry survives, all five probes pass. Scenario flips GREEN. The bash-side legacy compiler emits a `lifecycle.profile.post-reboot-recovery` PhaseAction pointing at `nemoclaw_scenarios/lifecycle/dispatch.sh`, but the legacy bash worker is intentionally not provided: this scenario is Vitest-only. The typed runner's `LifecyclePhaseFixture` handles dispatch directly. If the legacy runner is invoked against this scenario it errors out at the dispatcher; that's the right failure mode while the bash side stays on its own retirement clock. Refs #4423
## Summary Failing-test-first regression guard for #4423, plus the framework primitives that make it expressible. After this PR lands, the live Vitest matrix from #5006 fans out a dedicated `ubuntu-repo-docker-post-reboot-recovery` job that goes 🔴 RED on `main` (because the registry-destruction bug is still there) and is the executable acceptance criterion for the source fix in PR-A. ## Related Issue Refs #4423 ## What this PR adds ### Framework primitives (commits 1–3) 1. **`SUPPORTED_LIFECYCLES` whitelist** in `scenarios/runtime-support.ts`. Replaces the unconditional reject for `environment.lifecycle` with a lock-step whitelist, so lifecycle scenarios can graduate one profile at a time. Initial whitelist: `post-reboot-recovery`. 2. **Two host-side state-validation probes**: - `local-registry-entry-present` — reads `~/.nemoclaw/sandboxes.json` and asserts the scenario's sandbox name is recorded. This is the user-visible regression target for #4423. - `docker-sandbox-container-present` — runs `docker ps -a --filter label=openshell.ai/sandbox-name=…` and accepts running, stopped, and `*-nemoclaw-gpu-backup-*` siblings. Mirrors `OPENSHELL_SANDBOX_NAME_LABEL` / `findOpenShellDockerSandboxContainerIds` in `src/lib/onboard/docker-gpu-patch.ts`. New `ExpectedState.localRegistry` and `ExpectedState.dockerSandboxContainer` dimensions; `probesForState` emits the probes only for `expected: "present"`. `StateValidationPhaseFixture.from()` now accepts an inline `ExpectedState` so unit tests don't need to register synthetic states. Optional `ProbeIO` injection for the registry reader. 3. **`LifecyclePhaseFixture`** in `framework/phases/lifecycle.ts`. Dispatches `simulate(profile, instance, opts)`. Today the only profile is `post-reboot-recovery`, with two modes: - `stop-original` (default) — `openshell gateway stop` + `docker stop` of the labeled container. - `rename-to-gpu-backup` — additionally renames the container to a `*-nemoclaw-gpu-backup-<ts>` sibling, mirroring `buildBackupContainerName()` in `src/lib/onboard/docker-gpu-patch.ts`. Both modes register reverse-order cleanups so teardown leaves Docker usable. Wired into `E2EScenarioFixtures` and into `live/registry-scenarios.test.ts` between `onboard.from()` and `stateValidation.from()` whenever the scenario declares `environment.lifecycle`. ### The failing regression guard (commit 4) 4. **`ubuntu-repo-docker-post-reboot-recovery` scenario** + **`post-reboot-recovery-ready` expected-state** + manifest YAML. The expected-state requires: - `cli` installed, - `gateway` healthy (the user-systemd unit from #4580 brings it back up), - `sandbox` running (recovery completes in time), - **`localRegistry` entry preserved** (the user-visible regression target), - `dockerSandboxContainer` present (recovery didn't delete the labeled container or its `*-nemoclaw-gpu-backup-*` sibling). The `.github/workflows/e2e-scenarios.yaml` ROUTES table gains the new id so `e2e-scenarios-workflow.test.ts` keeps every typed id routed. ## RED / GREEN contract * **On unfixed `main`:** the live runner stops the OpenShell gateway runtime and `docker stop`s the labeled container. State-validation runs `nemoclaw <name> status`, which restarts the gateway via systemd; the destructive `missing` branch in `src/lib/actions/sandbox/status.ts` then wipes the local registry entry. The `local-registry-entry-present` probe fails. **Scenario goes 🔴 RED.** * **On the PR-A fix branch:** the new Docker-driver sandbox recovery helper restarts the labeled container before stale-removal can fire, registry survives, all five probes pass. **Scenario flips 🟢 GREEN.** ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [x] `npm test` passes — the 150 tests across the 6 directly affected files are all green; full e2e-scenario framework suite has only the pre-existing `shell probe reaps timed-out command process groups` load-flake in `e2e-fixture-context.test.ts` (unchanged on `main`). - [x] `--emit-live-matrix` now emits both `ubuntu-repo-cloud-openclaw` and `ubuntu-repo-docker-post-reboot-recovery` as supported live scenarios; the workflow fan-out from #5006 will dispatch each as its own CI job. - [x] Tests added or updated for new or changed behavior. - [x] No secrets, API keys, or credentials committed. - [ ] Docs updated for user-facing behavior changes. - [ ] `npm run docs` builds without warnings (doc changes only). ## Boundaries kept honest - No imports from `src/lib/**` into `framework/**`. The two duplicated constants (`OPENSHELL_SANDBOX_NAME_LABEL`, `~/.nemoclaw/sandboxes.json` path) carry comments pointing back to the CLI source-of-truth. Drift gets caught by the live scenario itself once the lifecycle phase exercises real Docker labels. - `LifecyclePhaseFixture` profile dispatch is exhaustive (`never`-checked). New profiles must add the dispatcher branch and the `SUPPORTED_LIFECYCLES` whitelist together. - Negative-direction host probes (`expected: "absent"` for `localRegistry` / `dockerSandboxContainer`) deliberately emit nothing today, with a `probesForState` test pinning the gap so a future negative scenario PR has to add them. - The legacy bash dispatcher (`nemoclaw_scenarios/lifecycle/dispatch.sh`) is intentionally **not** updated for `post-reboot-recovery`. This scenario is Vitest-only; the typed runner's `LifecyclePhaseFixture` handles dispatch directly. If the legacy bash runner is ever invoked against this scenario, it errors at the dispatcher — that's the right failure mode while the bash side stays on its own retirement clock. ## Why this doesn't yet flip green The fix lives in `src/lib/actions/sandbox/{status,gateway-state}.ts` plus a new `src/lib/onboard/docker-driver-sandbox-recovery.ts` helper, tracked as **PR-A**. PR-A implements parts 2 & 3 of @ericksoa's #4423 plan: - Docker-driver sandbox recovery from `findOpenShellDockerSandboxContainerIds` labels (including `*-nemoclaw-gpu-backup-*` rename), and - tightened stale-removal in active paths (`status`, `connect`, `ensureLiveSandboxOrExit`) requiring Docker-corroborated absence before destroying the registry. PR-A's CI run will dispatch this scenario via `--scenarios ubuntu-repo-docker-post-reboot-recovery` to prove 🔴 → 🟢. ## Diffstat ``` .github/workflows/e2e-scenarios.yaml | 1 + test/e2e-scenario/framework-tests/e2e-expected-state.test.ts | 35 +++ test/e2e-scenario/framework-tests/e2e-live-registry-discovery.test.ts | 35 +++ test/e2e-scenario/framework-tests/e2e-phase-lifecycle.test.ts | 239 +++++++++++++++++++++ test/e2e-scenario/framework-tests/e2e-phase-state-validation.test.ts | 120 ++++++++++- test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts | 20 +- test/e2e-scenario/framework/e2e-test.ts | 5 + test/e2e-scenario/framework/phases/index.ts | 8 + test/e2e-scenario/framework/phases/lifecycle.ts | 207 ++++++++++++++++++ test/e2e-scenario/framework/phases/state-validation.ts | 113 +++++++++- test/e2e-scenario/live/registry-scenarios.test.ts | 44 +++- test/e2e-scenario/manifests/openclaw-nvidia-post-reboot-recovery.yaml | 41 ++++ test/e2e-scenario/scenarios/expected-states.ts | 35 +++ test/e2e-scenario/scenarios/runtime-support.ts | 8 +- test/e2e-scenario/scenarios/scenarios/baseline.ts | 31 +++ test/e2e-scenario/scenarios/types.ts | 25 ++- 16 files changed, 950 insertions(+), 17 deletions(-) ``` --------- Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Summary
Changes the Vitest E2E workflow from one manual filter job into a generated typed-scenario matrix. The default matrix runs fixture-supported live scenarios, while explicit scenario selections still produce jobs that can surface structured skip reasons.
Related Issue
Refs #4941
Refs #4990
Changes
--emit-live-matrixandbuildLiveScenarioMatrix(...)to the typed scenario runner CLI..github/workflows/e2e-vitest-scenarios.yamlwith agenerate-matrixjob and per-scenariolive-scenariosmatrix fan-out.NEMOCLAW_CLI_BIN, and safe input handling.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesnpm run docsbuilds without warnings (doc changes only)Signed-off-by: Carlos Villela cvillela@nvidia.com
Summary by CodeRabbit
Tests
Chores