You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The seam: GHA carries everything that's a precondition for nemoclaw being callable; Vitest carries everything from nemoclaw onboard onward.
The matrix axes survive the migration to Vitest. The simplicity #4941 argued for is preserved (Vitest is the runner, fixtures are the API surface, no custom runner). What this proposal adds is one more layer of fixture composition that keeps the existing matrix dispatch coherent — and uses GHA matrix natively for the part it's good at.
Problem Statement
#4941 decided that Vitest is the E2E scenario execution runner and that NemoClaw provides typed fixtures, clients, assertions, and migration inventory. That decision is settled and the right call: Vitest owns lifecycle, fixture composition, reporters, timeouts, and CI integration; NemoClaw owns the domain.
What #4941 did not nail down is where the scenario matrix lives under the new model. The current foundation stack (#4965 → #4969) lands the right runner and the right primitives, but the first live scenario is a hand-authored single test file (live/ubuntu-repo-cli-smoke.test.ts) that:
hardcodes the platform via process.execPath,
assumes the install state implicitly (repo is cloned, dist is built),
has no notion of runtime axis (docker-running / docker-missing / gpu-cdi / macos-optional),
onboards nothing,
and exists outside scenarios/registry.ts.
If subsequent live scenarios follow the same template, every scenario becomes a hand-written test file. We lose the combinatorial matrix the typed-shell-runner explicitly preserves today via scenarios/scenarios/baseline.ts × scenarios/matrix.ts helpers.
The matrix is not aesthetic. It is the reason "does cloud-openclaw onboarding work on WSL?" is a one-line constructor change today (wslRepoDocker("cloud-openclaw") instead of ubuntuRepoDocker("cloud-openclaw")) rather than a fork-and-edit of an entire test file. It is also what makes --emit-matrix and the dynamic GHA fan-out (#4359) coherent — one row per registry entry.
This issue proposes that the Vitest scenario layer keep the same matrix vocabulary, with three layers carrying it: GHA matrix for platform fan-out and per-job preconditions, workflow setup steps for install/runtime state, and Vitest phase fixtures for onboarding-and-after.
The bash side honors the same axes via nemoclaw_scenarios/{install,onboard,lifecycle,probes}/dispatch.sh routers — one bash worker per id per axis, dispatched by the typed runner.
This decomposition is real architecture, not metadata. It's what makes the matrix dispatch in e2e-scenarios-all.yaml work.
Proposed Design
Where each axis lives
Axis
Carrier
Why this layer
platform
GHA matrix.runs-on (resolved by scenarios/runner-routing.ts)
Native to GHA. ubuntu-local → ubuntu-latest, gpu-runner → self-hosted, macos-local → macos-26, wsl-local → windows-latest (with a WSL bootstrap composite action), brev-launchable → ubuntu-latest (with a Brev provisioning composite action / fixture). Adding a platform value is a registry edit + a routing-table edit.
install
Workflow setup step (matrix-gated)
if: matrix.install == 'repo-current' runs npm ci && npm run build:cli. if: matrix.install == 'launchable' runs the installer. By the time Vitest starts, nemoclaw is on PATH. Phase fixture only asserts readiness, doesn't install.
runtime
Workflow setup step (matrix-gated) for state mutations the runner image doesn't already provide
docker-running is ubuntu-latest default — noop. docker-missing requires a shim setup step (existing nemoclaw_scenarios/onboard/cloud-openclaw-no-docker.sh does this; promotes to a composite action). gpu-docker-cdi is already on the GPU runner image — noop. macos-docker-optional is macos-26's default — noop.
Calling nemoclaw onboard --provider nvidia --agent openclaw --channel slack is application logic. There is no GHA primitive for "run this command and parse the output."
a from(scenarioPart, ...prereqs) method that resolves a registry id to the right call (registry-driven path),
and named methods for explicit one-off scenarios (e.g. onboard.cloudOpenclawSlack({...})).
A single registry-driven scenario file (live/scenarios.test.ts) that iterates listScenarios() and produces one Vitest test per registry entry. Hand-authored live/<name>.test.ts files remain valid for one-off cases that don't fit the matrix.
A GHA matrix workflow (e2e-vitest-scenarios.yaml, evolving from ci(e2e): add Vitest scenario workflow #4968) that consumes --emit-matrix, fans out one job per scenario id, sets up install/runtime preconditions per matrix axis, and invokes Vitest with a scenario-id filter.
scenarios/orchestrators/{phase,runner,context,negative-matcher}.ts (~750 LOC of typed-shell phase orchestration) — once every scenario runs through Vitest phase fixtures, the parallel orchestrator becomes dead code.
nemoclaw_scenarios/{install,onboard,lifecycle,probes}/*.sh workers (~1,500 LOC) — install + runtime prep promote to composite GHA actions; onboarding/lifecycle workers move into phase fixtures one id at a time, files retire per inventory.
validation_suites/**/*.sh (~3,000 LOC of bash assertions) — logic migrates into runtime-suite fixtures one suite at a time, files retire per inventory.
scenarios/run.ts (the typed-shell entry point) — --emit-matrix keeps being the matrix builder; the live-execution path retires once Vitest is the only runner.
What stays as typed test data (per #4941 explicit):
scenarios/types.ts (vocabulary)
scenarios/builder.ts (construction)
scenarios/registry.ts + scenarios/scenarios/baseline.ts (the matrix data)
// test/e2e-scenario/framework/phases/environment.ts//// Assertion-only. The actual install + runtime prep happen as workflow// setup steps before Vitest starts. This fixture verifies the host is// in the state the scenario declared.importtype{ScenarioEnvironment}from"../../scenarios/types.ts";importtype{HostCliClient}from"../clients/index.ts";exportinterfaceEnvironmentReady{readonlyplatform: ScenarioEnvironment["platform"];readonlyinstall: ScenarioEnvironment["install"];readonlyruntime: ScenarioEnvironment["runtime"];readonlycliPath: string;}exportinterfaceEnvironmentFixture{/** Asserts CLI is on PATH and runtime state matches scenario.environment. */assertReady(env: ScenarioEnvironment): Promise<EnvironmentReady>;}
// test/e2e-scenario/framework/phases/onboard.tsimporttype{ScenarioEnvironment}from"../../scenarios/types.ts";importtype{EnvironmentReady}from"./environment.ts";exportinterfaceOpenClawInstance{readonlysandboxName: string;readonlygatewayUrl: string;readonlyagent: "openclaw"|"hermes";readonlyprovider: "nvidia"|"ollama-local"|"openai-compatible";readonlychannels: ReadonlyArray<"slack"|"discord"|"telegram"|"brave">;}exportinterfaceOnboardFixture{/** * Registry-driven entry point. Routes by the scenario's onboarding id * (with the docker-missing rewrite the existing compiler.ts already * does) to the right named method below. */from(env: ScenarioEnvironment,hostState: EnvironmentReady): Promise<OpenClawInstance>;// Named methods — same as bash dispatcher cases.cloudOpenclaw(opts?: {model?: string}): Promise<OpenClawInstance>;cloudOpenclawNoDocker(opts: {expectError: ErrorClass}): Promise<NegativeOutcome>;cloudOpenclawCustomPolicies(opts: {presets: string[]}): Promise<OpenClawInstance>;cloudOpenclawSlack(opts: {allowedChannels?: string[]}): Promise<OpenClawInstance>;cloudOpenclawDiscord(opts: {allowedChannels?: string[]}): Promise<OpenClawInstance>;cloudOpenclawTelegram(opts: {/* ... */}): Promise<OpenClawInstance>;cloudHermes(opts?: {/* ... */}): Promise<OpenClawInstance>;cloudHermesSlack(opts: {/* ... */}): Promise<OpenClawInstance>;cloudHermesDiscord(opts: {/* ... */}): Promise<OpenClawInstance>;localOllamaOpenclaw(opts?: {/* ... */}): Promise<OpenClawInstance>;// ...one per dispatcher case}
// test/e2e-scenario/live/scenarios.test.ts// Registry-driven matrix — one Vitest test per scenario in baseline.ts.import{test,expect}from"../framework/e2e-test.ts";import{listScenarios}from"../scenarios/registry.ts";import{isScenarioFullyWired}from"../scenarios/runtime-support.ts";for(constscenariooflistScenarios()){constwired=isScenarioFullyWired(scenario);if(!wired.ok){test.skip(`${scenario.id} (not yet wired: ${wired.reasons.join("; ")})`,()=>{});continue;}test(scenario.id,async({
environment, onboard, stateValidation, lifecycle, runSuite,})=>{// GHA setup steps already ran install + runtime prep. Just verify.constenv=awaitenvironment.assertReady(scenario.environment);constinstance=awaitonboard.from(scenario.environment,env);awaitstateValidation.from(scenario.expectedStateId,instance);if(scenario.environment.lifecycle){awaitlifecycle.from(scenario.environment.lifecycle,instance);}for(constsuiteIdofscenario.suiteIds){awaitrunSuite(suiteId,instance);}});}
This mirrors the existing e2e-scenarios-all.yaml shape one-to-one, just dispatches Vitest instead of scenarios/run.ts. Same --emit-matrix payload, same runner-routing, same secret allowlist semantics, same fail-fast: false.
Migration Plan
1. Land cv's foundation stack
#4965 → #4969 land as scoped. They give us the runner, primitives, CLI wrappers, first scenario, workflow shape, and migration inventory. Nothing in this proposal blocks them.
2. Add phase fixtures (this proposal)
Authored as one PR per phase fixture so each is small and reviewable. Suggested order:
framework/phases/environment.ts — assertion-only (assertReady(env)). Verifies CLI is on PATH and docker state matches. Setup is in workflow steps.
framework/phases/onboard.ts — starts with cloudOpenclaw and cloudOpenclawNoDocker only. New onboarding profiles slot in one method at a time.
framework/phases/state-validation.ts — implements the existing cli-installed / gateway-healthy / sandbox-running / gateway-absent / sandbox-absent probes from scenarios/expected-states.ts as fixture methods.
framework/phases/lifecycle.ts — starts with rebuildCurrentVersion and snapshotCreateRestore (the two failing today in the typed-shell-runner). Implementing these here naturally fixes the Mode-B failures the typed-shell-runner exposes.
framework/phases/runtime.ts — runSuite(suiteId, instance) dispatcher. One suite at a time, mirroring scenarios/probes/* and validation_suites/<category>/*.sh content.
3. Promote install + runtime prep to composite GHA actions
Once the workflow shape stabilizes, extract the install + runtime-prep steps into reusable composite actions under .github/actions/ so:
e2e-scenarios-all.yaml (typed-shell-runner) and e2e-vitest-scenarios.yaml share the same setup steps.
A new platform value (e.g. a future ARM64 runner) only needs the action updated once.
4. Add the registry-driven scenario file + matrix workflow
live/scenarios.test.ts as sketched above. e2e-vitest-scenarios.yaml evolves to consume --emit-matrix (sketch above). As phase fixtures land, more registry entries flip from test.skip(...) to running.
5. Family-by-family scenario migration
Same as #4941's family-by-family plan. Each family migration:
Implements the missing phase fixture method (e.g. onboard.cloudOpenclawSlack).
Adds the scenario id to SUPPORTED_ONBOARDING_IDS in scenarios/runtime-support.ts.
Verifies parity (Vitest scenario passes the same assertions as the bash suite).
Deletes the bash worker + assertion files in a follow-up PR.
The runtime-support filter ensures unwired scenarios stay registered (visible in the registry, documented as roadmap) but never produce silent-fail jobs.
6. Inventory extends to typed-shell-runner retirement
#4969 currently tracks legacy test/e2e/test-*.sh. Extend to also track:
Each entry gets a bridgeSurface (which Vitest phase fixture or composite action replaces it) and deletionReady flag. When all phase fixtures cover an area, that bash retires.
Alternatives Considered
Per-scenario hand-written test files
This is what live/ubuntu-repo-cli-smoke.test.ts does today. Simple, but loses every matrix axis. Adding wsl-repo-cloud-openclaw-slack becomes "fork the test file, edit the platform call, edit the onboarding call, edit the channel" — exactly the duplication the typed-shell-runner avoids via wslRepoDocker(...). Acceptable for true one-off probes; not acceptable as the default pattern.
Single giant live/all-scenarios.test.ts with it.each(...)
Folds all scenarios into one Vitest file, parameterized by registry. Less flexible than for-of test() because Vitest's it.each doesn't compose nicely with test.extend fixtures. The for-of pattern in the sketch above is idiomatic Vitest and gives each scenario its own test name + artifacts directory.
Keep typed-shell-runner phase orchestrator, just call it from Vitest
Wraps scenarios/orchestrators/runner.ts:ScenarioRunner.run() inside a Vitest test. Preserves the matrix but keeps the duplicated phase orchestration alive forever. Loses #4941's "Vitest owns lifecycle" win.
Do install + runtime prep inside Vitest fixtures (no GHA matrix)
environment.from(env) actually installs (npm ci + build) and mutates runtime state (sets up docker shim) before continuing. Possible but loses GHA's free parallelism on runner selection — one runs-on: ubuntu-latest job iterating internally vs N parallel jobs of the right type. Also re-implements work the runner image already does (e.g., ubuntu-latest already has node + docker; we shouldn't pretend it doesn't). The hybrid (GHA carries preconditions, fixtures carry application logic) is closer to "use each tool for what it's good at."
Ignore the matrix; let it lapse
What we're trending toward today if no one objects. The typed registry stays as data, but nothing reads it for Vitest test discovery. Every new scenario is a hand-authored file. After 20 scenarios we have 20 files with 90% duplicate setup. Fixable later, but expensive.
Proposed Decisions
Agree that the matrix axes (platform / install / runtime / onboarding / lifecycle / runtime-suites) survive the migration to Vitest, split between GHA workflow steps (platform / install / runtime) and Vitest phase fixtures (onboarding / lifecycle / suites).
Agree that live/ test discovery is registry-driven by default — one Vitest test per listScenarios() entry — with hand-authored files allowed for true one-off probes.
Agree that framework/phases/ is the right home for the application-logic phase fixtures, with environment.ts being assertion-only.
Agree that e2e-vitest-scenarios.yaml (ci(e2e): add Vitest scenario workflow #4968) evolves to consume --emit-matrix for fan-out, mirroring the existing e2e-scenarios-all.yaml pattern, with install + runtime prep as matrix-gated workflow steps (eventually composite actions).
Agree that scenarios/runtime-support.ts:isScenarioFullyWired (the existing typed-shell-runner gate) is the same gate for the Vitest matrix — unwired scenarios skip with a structured reason, not silent fail.
Agree to extend migration/legacy-inventory.json (test(e2e): add migration inventory deletion gates #4969) to track retirement of scenarios/orchestrators/, nemoclaw_scenarios/, and validation_suites/ per family migration.
Acceptance Criteria
framework/phases/environment.ts (assertion-only) and framework/phases/onboard.ts exist and expose at least one method each plus from(scenarioPart, ...prereqs).
live/scenarios.test.ts runs the registry-driven matrix, with test.skip for unwired scenarios.
e2e-vitest-scenarios.yaml consumes --emit-matrix, fans out one job per scenario id, and runs install + runtime prep as matrix-gated workflow steps.
One canonical scenario (suggest ubuntu-repo-cloud-openclaw) runs end-to-end through phase fixtures and passes its smoke + inference suites.
The runtime-support filter governs both the typed-shell --emit-matrix (existing) AND the Vitest registry-driven runner (new).
Migration inventory entry exists for at least one phase fixture's bash counterpart with deletionReady: false (until parity proven).
Adding a new scenario in scenarios/scenarios/baseline.ts automatically produces a Vitest test in CI without touching live/ files.
Architecture at a glance
The seam: GHA carries everything that's a precondition for
nemoclawbeing callable; Vitest carries everything fromnemoclaw onboardonward.The matrix axes survive the migration to Vitest. The simplicity #4941 argued for is preserved (Vitest is the runner, fixtures are the API surface, no custom runner). What this proposal adds is one more layer of fixture composition that keeps the existing matrix dispatch coherent — and uses GHA matrix natively for the part it's good at.
Problem Statement
#4941 decided that Vitest is the E2E scenario execution runner and that NemoClaw provides typed fixtures, clients, assertions, and migration inventory. That decision is settled and the right call: Vitest owns lifecycle, fixture composition, reporters, timeouts, and CI integration; NemoClaw owns the domain.
What #4941 did not nail down is where the scenario matrix lives under the new model. The current foundation stack (#4965 → #4969) lands the right runner and the right primitives, but the first live scenario is a hand-authored single test file (
live/ubuntu-repo-cli-smoke.test.ts) that:process.execPath,scenarios/registry.ts.If subsequent live scenarios follow the same template, every scenario becomes a hand-written test file. We lose the combinatorial matrix the typed-shell-runner explicitly preserves today via
scenarios/scenarios/baseline.ts×scenarios/matrix.tshelpers.The matrix is not aesthetic. It is the reason "does cloud-openclaw onboarding work on WSL?" is a one-line constructor change today (
wslRepoDocker("cloud-openclaw")instead ofubuntuRepoDocker("cloud-openclaw")) rather than a fork-and-edit of an entire test file. It is also what makes--emit-matrixand the dynamic GHA fan-out (#4359) coherent — one row per registry entry.This issue proposes that the Vitest scenario layer keep the same matrix vocabulary, with three layers carrying it: GHA matrix for platform fan-out and per-job preconditions, workflow setup steps for install/runtime state, and Vitest phase fixtures for onboarding-and-after.
cc @cv @jyaunches
Background — what the typed-shell-runner already gets right
The current
scenarios/tree decomposes every scenario into 6 axes:ubuntu-local,wsl-local,macos-local,gpu-runner,brev-launchableScenarioEnvironment.platformrepo-current,launchableScenarioEnvironment.installdocker-running,docker-missing,macos-docker-optional,gpu-docker-cdiScenarioEnvironment.runtimecloud-openclaw,cloud-hermes,cloud-nvidia-openclaw-slack,local-ollama-openclawScenarioEnvironment.onboardingrebuild-current-version,snapshot,upgradeScenarioEnvironment.lifecycle[smoke, inference, credentials, security, lifecycle, ...]ScenarioDefinition.suiteIdsThese axes compose via
scenarios/matrix.ts:And the phase orchestrator runs them in fixed order:
The bash side honors the same axes via
nemoclaw_scenarios/{install,onboard,lifecycle,probes}/dispatch.shrouters — one bash worker per id per axis, dispatched by the typed runner.This decomposition is real architecture, not metadata. It's what makes the matrix dispatch in
e2e-scenarios-all.yamlwork.Proposed Design
Where each axis lives
matrix.runs-on(resolved byscenarios/runner-routing.ts)ubuntu-local→ubuntu-latest,gpu-runner→ self-hosted,macos-local→macos-26,wsl-local→windows-latest(with a WSL bootstrap composite action),brev-launchable→ubuntu-latest(with a Brev provisioning composite action / fixture). Adding a platform value is a registry edit + a routing-table edit.if: matrix.install == 'repo-current'runsnpm ci && npm run build:cli.if: matrix.install == 'launchable'runs the installer. By the time Vitest starts,nemoclawis on PATH. Phase fixture only asserts readiness, doesn't install.docker-runningisubuntu-latestdefault — noop.docker-missingrequires a shim setup step (existingnemoclaw_scenarios/onboard/cloud-openclaw-no-docker.shdoes this; promotes to a composite action).gpu-docker-cdiis already on the GPU runner image — noop.macos-docker-optionalismacos-26's default — noop.framework/phases/onboard.ts)nemoclaw onboard --provider nvidia --agent openclaw --channel slackis application logic. There is no GHA primitive for "run this command and parse the output."framework/phases/lifecycle.ts)framework/phases/runtime.ts:runSuite)What stays the same as #4941
framework/owns the domain layer.test/e2e/test-*.sh.What this proposal adds
framework/phases/directory holding phase fixtures:environment.ts(assertion-only),onboard.ts,state-validation.ts,lifecycle.ts,runtime.ts.from(scenarioPart, ...prereqs)method that resolves a registry id to the right call (registry-driven path),onboard.cloudOpenclawSlack({...})).live/scenarios.test.ts) that iterateslistScenarios()and produces one Vitest test per registry entry. Hand-authoredlive/<name>.test.tsfiles remain valid for one-off cases that don't fit the matrix.e2e-vitest-scenarios.yaml, evolving from ci(e2e): add Vitest scenario workflow #4968) that consumes--emit-matrix, fans out one job per scenario id, sets up install/runtime preconditions per matrix axis, and invokes Vitest with a scenario-id filter.runtime-supportfilter from test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner #4380's fix(e2e): gate scenario fan-out by onboarding+secret support contract #4978 follow-up extends to gate the Vitest matrix the same way: scenarios whose phase fixtures aren't wired yet get filtered with structured reasons, not silent fail.What naturally retires
scenarios/orchestrators/{phase,runner,context,negative-matcher}.ts(~750 LOC of typed-shell phase orchestration) — once every scenario runs through Vitest phase fixtures, the parallel orchestrator becomes dead code.scenarios/clients/*stubs (80 LOC) — already replaced byframework/clients/*in test(e2e): add fixture-friendly clients #4966; should be deleted in that PR per the precedent.nemoclaw_scenarios/{install,onboard,lifecycle,probes}/*.shworkers (~1,500 LOC) — install + runtime prep promote to composite GHA actions; onboarding/lifecycle workers move into phase fixtures one id at a time, files retire per inventory.validation_suites/**/*.sh(~3,000 LOC of bash assertions) — logic migrates into runtime-suite fixtures one suite at a time, files retire per inventory.scenarios/run.ts(the typed-shell entry point) —--emit-matrixkeeps being the matrix builder; the live-execution path retires once Vitest is the only runner.What stays as typed test data (per #4941 explicit):
scenarios/types.ts(vocabulary)scenarios/builder.ts(construction)scenarios/registry.ts+scenarios/scenarios/baseline.ts(the matrix data)scenarios/matrix.ts(composer helpers)scenarios/runner-routing.ts(platform → GHA runner)scenarios/runtime-support.ts(wired-fan-out filter)scenarios/run.ts:--emit-matrix(matrix payload builder)Concrete fixture sketch
Concrete workflow sketch
This mirrors the existing
e2e-scenarios-all.yamlshape one-to-one, just dispatches Vitest instead ofscenarios/run.ts. Same--emit-matrixpayload, same runner-routing, same secret allowlist semantics, same fail-fast: false.Migration Plan
1. Land cv's foundation stack
#4965 → #4969 land as scoped. They give us the runner, primitives, CLI wrappers, first scenario, workflow shape, and migration inventory. Nothing in this proposal blocks them.
2. Add phase fixtures (this proposal)
Authored as one PR per phase fixture so each is small and reviewable. Suggested order:
framework/phases/environment.ts— assertion-only (assertReady(env)). Verifies CLI is on PATH and docker state matches. Setup is in workflow steps.framework/phases/onboard.ts— starts withcloudOpenclawandcloudOpenclawNoDockeronly. New onboarding profiles slot in one method at a time.framework/phases/state-validation.ts— implements the existingcli-installed/gateway-healthy/sandbox-running/gateway-absent/sandbox-absentprobes fromscenarios/expected-states.tsas fixture methods.framework/phases/lifecycle.ts— starts withrebuildCurrentVersionandsnapshotCreateRestore(the two failing today in the typed-shell-runner). Implementing these here naturally fixes the Mode-B failures the typed-shell-runner exposes.framework/phases/runtime.ts—runSuite(suiteId, instance)dispatcher. One suite at a time, mirroringscenarios/probes/*andvalidation_suites/<category>/*.shcontent.3. Promote install + runtime prep to composite GHA actions
Once the workflow shape stabilizes, extract the install + runtime-prep steps into reusable composite actions under
.github/actions/so:e2e-scenarios-all.yaml(typed-shell-runner) ande2e-vitest-scenarios.yamlshare the same setup steps.4. Add the registry-driven scenario file + matrix workflow
live/scenarios.test.tsas sketched above.e2e-vitest-scenarios.yamlevolves to consume--emit-matrix(sketch above). As phase fixtures land, more registry entries flip fromtest.skip(...)to running.5. Family-by-family scenario migration
Same as #4941's family-by-family plan. Each family migration:
onboard.cloudOpenclawSlack).SUPPORTED_ONBOARDING_IDSinscenarios/runtime-support.ts.migration/legacy-inventory.json(test(e2e): add migration inventory deletion gates #4969) with the corresponding bash retirement entry.The
runtime-supportfilter ensures unwired scenarios stay registered (visible in the registry, documented as roadmap) but never produce silent-fail jobs.6. Inventory extends to typed-shell-runner retirement
#4969 currently tracks legacy
test/e2e/test-*.sh. Extend to also track:scenarios/orchestrators/{phase,runner,context,negative-matcher}.tsnemoclaw_scenarios/{install,onboard,lifecycle,probes,helpers}/*.shvalidation_suites/**/*.shruntime/lib/*.shEach entry gets a
bridgeSurface(which Vitest phase fixture or composite action replaces it) anddeletionReadyflag. When all phase fixtures cover an area, that bash retires.Alternatives Considered
Per-scenario hand-written test files
This is what
live/ubuntu-repo-cli-smoke.test.tsdoes today. Simple, but loses every matrix axis. Addingwsl-repo-cloud-openclaw-slackbecomes "fork the test file, edit the platform call, edit the onboarding call, edit the channel" — exactly the duplication the typed-shell-runner avoids viawslRepoDocker(...). Acceptable for true one-off probes; not acceptable as the default pattern.Single giant
live/all-scenarios.test.tswithit.each(...)Folds all scenarios into one Vitest file, parameterized by registry. Less flexible than
for-of test()because Vitest'sit.eachdoesn't compose nicely withtest.extendfixtures. The for-of pattern in the sketch above is idiomatic Vitest and gives each scenario its own test name + artifacts directory.Keep typed-shell-runner phase orchestrator, just call it from Vitest
Wraps
scenarios/orchestrators/runner.ts:ScenarioRunner.run()inside a Vitest test. Preserves the matrix but keeps the duplicated phase orchestration alive forever. Loses #4941's "Vitest owns lifecycle" win.Do install + runtime prep inside Vitest fixtures (no GHA matrix)
environment.from(env)actually installs (npm ci + build) and mutates runtime state (sets up docker shim) before continuing. Possible but loses GHA's free parallelism on runner selection — oneruns-on: ubuntu-latestjob iterating internally vs N parallel jobs of the right type. Also re-implements work the runner image already does (e.g.,ubuntu-latestalready has node + docker; we shouldn't pretend it doesn't). The hybrid (GHA carries preconditions, fixtures carry application logic) is closer to "use each tool for what it's good at."Ignore the matrix; let it lapse
What we're trending toward today if no one objects. The typed registry stays as data, but nothing reads it for Vitest test discovery. Every new scenario is a hand-authored file. After 20 scenarios we have 20 files with 90% duplicate setup. Fixable later, but expensive.
Proposed Decisions
live/test discovery is registry-driven by default — one Vitest test perlistScenarios()entry — with hand-authored files allowed for true one-off probes.framework/phases/is the right home for the application-logic phase fixtures, withenvironment.tsbeing assertion-only.e2e-vitest-scenarios.yaml(ci(e2e): add Vitest scenario workflow #4968) evolves to consume--emit-matrixfor fan-out, mirroring the existinge2e-scenarios-all.yamlpattern, with install + runtime prep as matrix-gated workflow steps (eventually composite actions).scenarios/runtime-support.ts:isScenarioFullyWired(the existing typed-shell-runner gate) is the same gate for the Vitest matrix — unwired scenarios skip with a structured reason, not silent fail.migration/legacy-inventory.json(test(e2e): add migration inventory deletion gates #4969) to track retirement ofscenarios/orchestrators/,nemoclaw_scenarios/, andvalidation_suites/per family migration.Acceptance Criteria
framework/phases/environment.ts(assertion-only) andframework/phases/onboard.tsexist and expose at least one method each plusfrom(scenarioPart, ...prereqs).live/scenarios.test.tsruns the registry-driven matrix, withtest.skipfor unwired scenarios.e2e-vitest-scenarios.yamlconsumes--emit-matrix, fans out one job per scenario id, and runs install + runtime prep as matrix-gated workflow steps.ubuntu-repo-cloud-openclaw) runs end-to-end through phase fixtures and passes its smoke + inference suites.runtime-supportfilter governs both the typed-shell--emit-matrix(existing) AND the Vitest registry-driven runner (new).deletionReady: false(until parity proven).scenarios/scenarios/baseline.tsautomatically produces a Vitest test in CI without touchinglive/files.Category
Testing
Checklist