Skip to content

Adopt phase fixtures + registry-driven test discovery for Vitest E2E scenarios #4990

@jyaunches

Description

@jyaunches

Companion design proposal to #4941. Builds on, does not replace, the Vitest fixture decision.

Architecture at a glance

┌───────────────────────────────────────────────────────────────────────────────┐
│  TYPED REGISTRY  —  scenarios/scenarios/baseline.ts (matrix data)              │
│                                                                                 │
│   { id: "ubuntu-repo-cloud-openclaw-slack",                                    │
│     environment: ubuntuRepoDocker("cloud-nvidia-openclaw-slack"),              │
│     suiteIds: ["smoke", "inference", "messaging-slack", "credentials"],        │
│     ... }                                                                       │
│                                                                                 │
│   matrix axes per scenario:                                                     │
│      ┌──────────┬──────────┬──────────┬──────────┬──────────┬──────────┐      │
│      │ platform │ install  │ runtime  │onboarding│lifecycle │  suites  │      │
│      ├──────────┼──────────┼──────────┼──────────┼──────────┼──────────┤      │
│      │ ubuntu   │ repo-    │ docker-  │ cloud-   │ rebuild- │ smoke    │      │
│      │ -local   │ current  │ running  │ openclaw │ current- │ inference│      │
│      │          │          │          │ -slack   │ version  │ messaging│      │
│      └──────────┴──────────┴──────────┴──────────┴──────────┴──────────┘      │
│           │          │          │          │          │          │             │
│           ▼          ▼          ▼          ▼          ▼          ▼             │
│        ╔════════════════════════╗   ╔══════════════════════════════╗          │
│        ║  GHA / WORKFLOW STEPS  ║   ║  VITEST + PHASE FIXTURES     ║          │
│        ║  precondition layer    ║   ║  application-logic layer     ║          │
│        ╚════════════════════════╝   ╚══════════════════════════════╝          │
└───────────────────────────────────────────────────────────────────────────────┘

   scenarios/run.ts --emit-matrix
   produces one matrix entry per
   wired scenario, with platform/
   install/runtime carried as
   matrix.* fields the workflow
   reads at job-level.
                    │
                    ▼
┌─────────────────────────────────────────────────────────┐
│  GHA MATRIX FAN-OUT  (precondition layer)                 │
│                                                           │
│   ✓ runs-on label per scenario      (axis: platform)     │
│   ✓ N parallel jobs                  (free parallelism)  │
│   ✓ secret allowlist per scenario    (requiredSecrets)   │
│   ✓ fail-fast: false                 (negative scenarios) │
│   ✓ matrix.exclude                   (cheap drops)       │
└─────────────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────────────┐
│  PER-JOB SETUP STEPS  (precondition layer, ctd.)          │
│                                                           │
│   ✓ checkout + setup-node + npm ci                       │
│   ✓ install step  (matrix.install:                       │
│                       repo-current → npm run build:cli   │
│                       launchable   → installer)          │
│   ✓ runtime prep  (matrix.runtime:                       │
│                       docker-running → noop              │
│                       docker-missing → install shim      │
│                       gpu-docker-cdi → already on image) │
│   ✓ wsl-bootstrap  (composite action, windows-latest)    │
│   ✓ brev-provision (composite action, ubuntu-latest)     │
│                                                           │
│   By the time Vitest starts, the host satisfies the      │
│   scenario's environment precondition.                    │
└─────────────────────────────────────────────────────────┘
                    │
                    ▼ npx vitest run -t "^${matrix.id}$"
┌─────────────────────────────────────────────────────────┐
│  VITEST + PHASE FIXTURES  (application-logic layer)       │
│                                                           │
│   environment.assertReady(scenario.environment)          │
│     verifies the precondition steps left us in the       │
│     state the scenario declared (CLI on PATH, docker     │
│     state matches, etc.). Asserts, doesn't install.      │
│                                                           │
│   onboard.from(scenario.environment, env) →               │
│     cloudOpenclaw | cloudHermes | cloudOpenclawSlack |   │
│     cloudOpenclawDiscord | cloudOpenclawTelegram |       │
│     localOllamaOpenclaw | …  (one method per dispatcher) │
│                                                           │
│   stateValidation.from(scenario.expectedStateId, instance)│
│     gatewayHealthy + sandboxRunning   (positive)         │
│     gatewayAbsent + sandboxAbsent     (negative)         │
│                                                           │
│   lifecycle.from(scenario.environment.lifecycle, instance)│
│     rebuildCurrentVersion | snapshotCreateRestore | …    │
│                                                           │
│   runSuite(suiteId, instance) →                           │
│     smoke | inference | credentials | security-* |       │
│     sandbox-lifecycle | snapshot | docs-validation | …   │
└─────────────────────────────────────────────────────────┘
                    │
                    ▼ uses
┌─────────────────────────────────────────────────────────┐
│  CLI WRAPPERS  —  framework/clients/  (LANDING in #4966)  │
│   host  ·  gateway  ·  sandbox  ·  provider  ·  state    │
└─────────────────────────────────────────────────────────┘
                    │
                    ▼ uses
┌─────────────────────────────────────────────────────────┐
│  PRIMITIVES  —  framework/  (LANDED in #4965)             │
│   artifacts  ·  secrets  ·  cleanup  ·  shellProbe       │
│   redaction is canonical (parity-tested with             │
│   src/lib/security/secret-patterns.ts)                    │
└─────────────────────────────────────────────────────────┘
                    ▲
                    │ runs everything above
┌─────────────────────────────────────────────────────────┐
│  VITEST  —  the runner  (per #4941)                       │
└─────────────────────────────────────────────────────────┘

The seam: GHA carries everything that's a precondition for nemoclaw being callable; Vitest carries everything from nemoclaw onboard onward.

The matrix axes survive the migration to Vitest. The simplicity #4941 argued for is preserved (Vitest is the runner, fixtures are the API surface, no custom runner). What this proposal adds is one more layer of fixture composition that keeps the existing matrix dispatch coherent — and uses GHA matrix natively for the part it's good at.


Problem Statement

#4941 decided that Vitest is the E2E scenario execution runner and that NemoClaw provides typed fixtures, clients, assertions, and migration inventory. That decision is settled and the right call: Vitest owns lifecycle, fixture composition, reporters, timeouts, and CI integration; NemoClaw owns the domain.

What #4941 did not nail down is where the scenario matrix lives under the new model. The current foundation stack (#4965#4969) lands the right runner and the right primitives, but the first live scenario is a hand-authored single test file (live/ubuntu-repo-cli-smoke.test.ts) that:

  • hardcodes the platform via process.execPath,
  • assumes the install state implicitly (repo is cloned, dist is built),
  • has no notion of runtime axis (docker-running / docker-missing / gpu-cdi / macos-optional),
  • onboards nothing,
  • and exists outside scenarios/registry.ts.

If subsequent live scenarios follow the same template, every scenario becomes a hand-written test file. We lose the combinatorial matrix the typed-shell-runner explicitly preserves today via scenarios/scenarios/baseline.ts × scenarios/matrix.ts helpers.

The matrix is not aesthetic. It is the reason "does cloud-openclaw onboarding work on WSL?" is a one-line constructor change today (wslRepoDocker("cloud-openclaw") instead of ubuntuRepoDocker("cloud-openclaw")) rather than a fork-and-edit of an entire test file. It is also what makes --emit-matrix and the dynamic GHA fan-out (#4359) coherent — one row per registry entry.

This issue proposes that the Vitest scenario layer keep the same matrix vocabulary, with three layers carrying it: GHA matrix for platform fan-out and per-job preconditions, workflow setup steps for install/runtime state, and Vitest phase fixtures for onboarding-and-after.

cc @cv @jyaunches

Background — what the typed-shell-runner already gets right

The current scenarios/ tree decomposes every scenario into 6 axes:

Axis Type Examples Lives in
platform enum ubuntu-local, wsl-local, macos-local, gpu-runner, brev-launchable ScenarioEnvironment.platform
install enum repo-current, launchable ScenarioEnvironment.install
runtime enum docker-running, docker-missing, macos-docker-optional, gpu-docker-cdi ScenarioEnvironment.runtime
onboarding string id cloud-openclaw, cloud-hermes, cloud-nvidia-openclaw-slack, local-ollama-openclaw ScenarioEnvironment.onboarding
lifecycle string id (optional) rebuild-current-version, snapshot, upgrade ScenarioEnvironment.lifecycle
runtime-suites string array [smoke, inference, credentials, security, lifecycle, ...] ScenarioDefinition.suiteIds

These axes compose via scenarios/matrix.ts:

ubuntuRepoDocker("cloud-nvidia-openclaw-slack")    // axes 1+2+3+4
wslRepoDocker("cloud-openclaw")                    // same axis 4, different 1
ubuntuRepoNoDocker("cloud-openclaw")               // axis 3 = docker-missing
                                                   //   compiler rewrites axis 4
                                                   //   → cloud-openclaw-no-docker
ubuntuRepoDockerLifecycle("cloud-openclaw",        // + axis 5
                          "rebuild-current-version")

And the phase orchestrator runs them in fixed order:

environment → onboarding → state-validation → lifecycle → runtime

The bash side honors the same axes via nemoclaw_scenarios/{install,onboard,lifecycle,probes}/dispatch.sh routers — one bash worker per id per axis, dispatched by the typed runner.

This decomposition is real architecture, not metadata. It's what makes the matrix dispatch in e2e-scenarios-all.yaml work.

Proposed Design

Where each axis lives

Axis Carrier Why this layer
platform GHA matrix.runs-on (resolved by scenarios/runner-routing.ts) Native to GHA. ubuntu-localubuntu-latest, gpu-runner → self-hosted, macos-localmacos-26, wsl-localwindows-latest (with a WSL bootstrap composite action), brev-launchableubuntu-latest (with a Brev provisioning composite action / fixture). Adding a platform value is a registry edit + a routing-table edit.
install Workflow setup step (matrix-gated) if: matrix.install == 'repo-current' runs npm ci && npm run build:cli. if: matrix.install == 'launchable' runs the installer. By the time Vitest starts, nemoclaw is on PATH. Phase fixture only asserts readiness, doesn't install.
runtime Workflow setup step (matrix-gated) for state mutations the runner image doesn't already provide docker-running is ubuntu-latest default — noop. docker-missing requires a shim setup step (existing nemoclaw_scenarios/onboard/cloud-openclaw-no-docker.sh does this; promotes to a composite action). gpu-docker-cdi is already on the GPU runner image — noop. macos-docker-optional is macos-26's default — noop.
onboarding Vitest phase fixture (framework/phases/onboard.ts) Calling nemoclaw onboard --provider nvidia --agent openclaw --channel slack is application logic. There is no GHA primitive for "run this command and parse the output."
lifecycle Vitest phase fixture (framework/phases/lifecycle.ts) State mutations on the running system (rebuild, snapshot, upgrade). Sequential, stateful, single-process.
runtime-suites Vitest phase fixture (framework/phases/runtime.ts:runSuite) Assertion bodies. Run sequentially within one Vitest test so they share onboarding state.

What stays the same as #4941

What this proposal adds

  • A framework/phases/ directory holding phase fixtures: environment.ts (assertion-only), onboard.ts, state-validation.ts, lifecycle.ts, runtime.ts.
  • Each phase fixture exposes both:
    • a from(scenarioPart, ...prereqs) method that resolves a registry id to the right call (registry-driven path),
    • and named methods for explicit one-off scenarios (e.g. onboard.cloudOpenclawSlack({...})).
  • A single registry-driven scenario file (live/scenarios.test.ts) that iterates listScenarios() and produces one Vitest test per registry entry. Hand-authored live/<name>.test.ts files remain valid for one-off cases that don't fit the matrix.
  • A GHA matrix workflow (e2e-vitest-scenarios.yaml, evolving from ci(e2e): add Vitest scenario workflow #4968) that consumes --emit-matrix, fans out one job per scenario id, sets up install/runtime preconditions per matrix axis, and invokes Vitest with a scenario-id filter.
  • The runtime-support filter from test(e2e): execute real shell assertions; delete dry-run, --validate-only, and the bash runner #4380's fix(e2e): gate scenario fan-out by onboarding+secret support contract #4978 follow-up extends to gate the Vitest matrix the same way: scenarios whose phase fixtures aren't wired yet get filtered with structured reasons, not silent fail.

What naturally retires

  • scenarios/orchestrators/{phase,runner,context,negative-matcher}.ts (~750 LOC of typed-shell phase orchestration) — once every scenario runs through Vitest phase fixtures, the parallel orchestrator becomes dead code.
  • scenarios/clients/* stubs (80 LOC) — already replaced by framework/clients/* in test(e2e): add fixture-friendly clients #4966; should be deleted in that PR per the precedent.
  • nemoclaw_scenarios/{install,onboard,lifecycle,probes}/*.sh workers (~1,500 LOC) — install + runtime prep promote to composite GHA actions; onboarding/lifecycle workers move into phase fixtures one id at a time, files retire per inventory.
  • validation_suites/**/*.sh (~3,000 LOC of bash assertions) — logic migrates into runtime-suite fixtures one suite at a time, files retire per inventory.
  • scenarios/run.ts (the typed-shell entry point) — --emit-matrix keeps being the matrix builder; the live-execution path retires once Vitest is the only runner.

What stays as typed test data (per #4941 explicit):

  • scenarios/types.ts (vocabulary)
  • scenarios/builder.ts (construction)
  • scenarios/registry.ts + scenarios/scenarios/baseline.ts (the matrix data)
  • scenarios/matrix.ts (composer helpers)
  • scenarios/runner-routing.ts (platform → GHA runner)
  • scenarios/runtime-support.ts (wired-fan-out filter)
  • scenarios/run.ts:--emit-matrix (matrix payload builder)

Concrete fixture sketch

// test/e2e-scenario/framework/phases/environment.ts
//
// Assertion-only. The actual install + runtime prep happen as workflow
// setup steps before Vitest starts. This fixture verifies the host is
// in the state the scenario declared.

import type { ScenarioEnvironment } from "../../scenarios/types.ts";
import type { HostCliClient } from "../clients/index.ts";

export interface EnvironmentReady {
  readonly platform: ScenarioEnvironment["platform"];
  readonly install: ScenarioEnvironment["install"];
  readonly runtime: ScenarioEnvironment["runtime"];
  readonly cliPath: string;
}

export interface EnvironmentFixture {
  /** Asserts CLI is on PATH and runtime state matches scenario.environment. */
  assertReady(env: ScenarioEnvironment): Promise<EnvironmentReady>;
}
// test/e2e-scenario/framework/phases/onboard.ts

import type { ScenarioEnvironment } from "../../scenarios/types.ts";
import type { EnvironmentReady } from "./environment.ts";

export interface OpenClawInstance {
  readonly sandboxName: string;
  readonly gatewayUrl: string;
  readonly agent: "openclaw" | "hermes";
  readonly provider: "nvidia" | "ollama-local" | "openai-compatible";
  readonly channels: ReadonlyArray<"slack" | "discord" | "telegram" | "brave">;
}

export interface OnboardFixture {
  /**
   * Registry-driven entry point. Routes by the scenario's onboarding id
   * (with the docker-missing rewrite the existing compiler.ts already
   * does) to the right named method below.
   */
  from(env: ScenarioEnvironment, hostState: EnvironmentReady): Promise<OpenClawInstance>;

  // Named methods — same as bash dispatcher cases.
  cloudOpenclaw(opts?: { model?: string }): Promise<OpenClawInstance>;
  cloudOpenclawNoDocker(opts: { expectError: ErrorClass }): Promise<NegativeOutcome>;
  cloudOpenclawCustomPolicies(opts: { presets: string[] }): Promise<OpenClawInstance>;
  cloudOpenclawSlack(opts: { allowedChannels?: string[] }): Promise<OpenClawInstance>;
  cloudOpenclawDiscord(opts: { allowedChannels?: string[] }): Promise<OpenClawInstance>;
  cloudOpenclawTelegram(opts: { /* ... */ }): Promise<OpenClawInstance>;
  cloudHermes(opts?: { /* ... */ }): Promise<OpenClawInstance>;
  cloudHermesSlack(opts: { /* ... */ }): Promise<OpenClawInstance>;
  cloudHermesDiscord(opts: { /* ... */ }): Promise<OpenClawInstance>;
  localOllamaOpenclaw(opts?: { /* ... */ }): Promise<OpenClawInstance>;
  // ...one per dispatcher case
}
// test/e2e-scenario/live/scenarios.test.ts
// Registry-driven matrix — one Vitest test per scenario in baseline.ts.

import { test, expect } from "../framework/e2e-test.ts";
import { listScenarios } from "../scenarios/registry.ts";
import { isScenarioFullyWired } from "../scenarios/runtime-support.ts";

for (const scenario of listScenarios()) {
  const wired = isScenarioFullyWired(scenario);
  if (!wired.ok) {
    test.skip(`${scenario.id} (not yet wired: ${wired.reasons.join("; ")})`, () => {});
    continue;
  }

  test(scenario.id, async ({
    environment, onboard, stateValidation, lifecycle, runSuite,
  }) => {
    // GHA setup steps already ran install + runtime prep. Just verify.
    const env = await environment.assertReady(scenario.environment);

    const instance = await onboard.from(scenario.environment, env);
    await stateValidation.from(scenario.expectedStateId, instance);

    if (scenario.environment.lifecycle) {
      await lifecycle.from(scenario.environment.lifecycle, instance);
    }

    for (const suiteId of scenario.suiteIds) {
      await runSuite(suiteId, instance);
    }
  });
}

Concrete workflow sketch

# .github/workflows/e2e-vitest-scenarios.yaml (evolves from #4968)

name: E2E / Vitest Scenarios

on:
  workflow_dispatch:
    inputs:
      scenarios:
        description: "Comma-separated scenario ids, or empty for full registry"
        required: false
        default: ""

permissions:
  contents: read

concurrency:
  group: e2e-vitest-scenarios-${{ github.ref }}-${{ inputs.scenarios || 'all' }}
  cancel-in-progress: false

jobs:
  generate-matrix:
    runs-on: ubuntu-latest
    outputs:
      matrix: ${{ steps.emit.outputs.matrix }}
    steps:
      - uses: actions/checkout@...
      - uses: actions/setup-node@...
      - run: npm ci --ignore-scripts
      - id: emit
        run: |
          matrix="$(npx tsx test/e2e-scenario/scenarios/run.ts --emit-matrix)"
          echo "matrix=$matrix" >> "$GITHUB_OUTPUT"

  run-scenario:
    needs: generate-matrix
    strategy:
      fail-fast: false
      matrix:
        include: ${{ fromJSON(needs.generate-matrix.outputs.matrix) }}
    runs-on: ${{ matrix.runner }}
    timeout-minutes: 30
    steps:
      - uses: actions/checkout@...
      - uses: actions/setup-node@...

      # Install axis — matrix-gated setup step.
      - name: Install (repo-current)
        if: matrix.install == 'repo-current'
        run: npm ci && npm run build:cli && npm link
      - name: Install (launchable)
        if: matrix.install == 'launchable'
        run: ./scripts/install-launchable.sh

      # Runtime axis — matrix-gated for state mutations.
      - name: Runtime prep (docker-missing)
        if: matrix.runtime == 'docker-missing'
        run: sudo install -m 0755 ./scripts/test-fixtures/docker-shim /usr/local/bin/docker

      # Platform-specific bootstraps.
      - name: WSL bootstrap
        if: matrix.platform == 'wsl-local'
        uses: ./.github/actions/wsl-setup
      - name: Brev provision
        if: matrix.platform == 'brev-launchable'
        uses: ./.github/actions/brev-provision

      - name: Run scenario via Vitest
        env:
          NEMOCLAW_RUN_E2E_SCENARIOS: "1"
          E2E_ARTIFACT_DIR: ${{ github.workspace }}/.e2e/vitest
          # Secret allowlist scoped to this scenario only:
          NVIDIA_API_KEY: ${{ secrets.NVIDIA_API_KEY }}
        run: |
          npx vitest run --project e2e-scenarios-live -t "^${{ matrix.id }}$"

      - name: Upload artifacts
        if: always()
        uses: actions/upload-artifact@...
        with:
          name: e2e-scenario-${{ matrix.id }}
          path: .e2e/vitest/

This mirrors the existing e2e-scenarios-all.yaml shape one-to-one, just dispatches Vitest instead of scenarios/run.ts. Same --emit-matrix payload, same runner-routing, same secret allowlist semantics, same fail-fast: false.

Migration Plan

1. Land cv's foundation stack

#4965#4969 land as scoped. They give us the runner, primitives, CLI wrappers, first scenario, workflow shape, and migration inventory. Nothing in this proposal blocks them.

2. Add phase fixtures (this proposal)

Authored as one PR per phase fixture so each is small and reviewable. Suggested order:

  1. framework/phases/environment.ts — assertion-only (assertReady(env)). Verifies CLI is on PATH and docker state matches. Setup is in workflow steps.
  2. framework/phases/onboard.ts — starts with cloudOpenclaw and cloudOpenclawNoDocker only. New onboarding profiles slot in one method at a time.
  3. framework/phases/state-validation.ts — implements the existing cli-installed / gateway-healthy / sandbox-running / gateway-absent / sandbox-absent probes from scenarios/expected-states.ts as fixture methods.
  4. framework/phases/lifecycle.ts — starts with rebuildCurrentVersion and snapshotCreateRestore (the two failing today in the typed-shell-runner). Implementing these here naturally fixes the Mode-B failures the typed-shell-runner exposes.
  5. framework/phases/runtime.tsrunSuite(suiteId, instance) dispatcher. One suite at a time, mirroring scenarios/probes/* and validation_suites/<category>/*.sh content.

3. Promote install + runtime prep to composite GHA actions

Once the workflow shape stabilizes, extract the install + runtime-prep steps into reusable composite actions under .github/actions/ so:

  • e2e-scenarios-all.yaml (typed-shell-runner) and e2e-vitest-scenarios.yaml share the same setup steps.
  • A new platform value (e.g. a future ARM64 runner) only needs the action updated once.

4. Add the registry-driven scenario file + matrix workflow

live/scenarios.test.ts as sketched above. e2e-vitest-scenarios.yaml evolves to consume --emit-matrix (sketch above). As phase fixtures land, more registry entries flip from test.skip(...) to running.

5. Family-by-family scenario migration

Same as #4941's family-by-family plan. Each family migration:

  1. Implements the missing phase fixture method (e.g. onboard.cloudOpenclawSlack).
  2. Adds the scenario id to SUPPORTED_ONBOARDING_IDS in scenarios/runtime-support.ts.
  3. Updates migration/legacy-inventory.json (test(e2e): add migration inventory deletion gates #4969) with the corresponding bash retirement entry.
  4. Verifies parity (Vitest scenario passes the same assertions as the bash suite).
  5. Deletes the bash worker + assertion files in a follow-up PR.

The runtime-support filter ensures unwired scenarios stay registered (visible in the registry, documented as roadmap) but never produce silent-fail jobs.

6. Inventory extends to typed-shell-runner retirement

#4969 currently tracks legacy test/e2e/test-*.sh. Extend to also track:

  • scenarios/orchestrators/{phase,runner,context,negative-matcher}.ts
  • nemoclaw_scenarios/{install,onboard,lifecycle,probes,helpers}/*.sh
  • validation_suites/**/*.sh
  • runtime/lib/*.sh

Each entry gets a bridgeSurface (which Vitest phase fixture or composite action replaces it) and deletionReady flag. When all phase fixtures cover an area, that bash retires.

Alternatives Considered

Per-scenario hand-written test files

This is what live/ubuntu-repo-cli-smoke.test.ts does today. Simple, but loses every matrix axis. Adding wsl-repo-cloud-openclaw-slack becomes "fork the test file, edit the platform call, edit the onboarding call, edit the channel" — exactly the duplication the typed-shell-runner avoids via wslRepoDocker(...). Acceptable for true one-off probes; not acceptable as the default pattern.

Single giant live/all-scenarios.test.ts with it.each(...)

Folds all scenarios into one Vitest file, parameterized by registry. Less flexible than for-of test() because Vitest's it.each doesn't compose nicely with test.extend fixtures. The for-of pattern in the sketch above is idiomatic Vitest and gives each scenario its own test name + artifacts directory.

Keep typed-shell-runner phase orchestrator, just call it from Vitest

Wraps scenarios/orchestrators/runner.ts:ScenarioRunner.run() inside a Vitest test. Preserves the matrix but keeps the duplicated phase orchestration alive forever. Loses #4941's "Vitest owns lifecycle" win.

Do install + runtime prep inside Vitest fixtures (no GHA matrix)

environment.from(env) actually installs (npm ci + build) and mutates runtime state (sets up docker shim) before continuing. Possible but loses GHA's free parallelism on runner selection — one runs-on: ubuntu-latest job iterating internally vs N parallel jobs of the right type. Also re-implements work the runner image already does (e.g., ubuntu-latest already has node + docker; we shouldn't pretend it doesn't). The hybrid (GHA carries preconditions, fixtures carry application logic) is closer to "use each tool for what it's good at."

Ignore the matrix; let it lapse

What we're trending toward today if no one objects. The typed registry stays as data, but nothing reads it for Vitest test discovery. Every new scenario is a hand-authored file. After 20 scenarios we have 20 files with 90% duplicate setup. Fixable later, but expensive.

Proposed Decisions

  • Agree that the matrix axes (platform / install / runtime / onboarding / lifecycle / runtime-suites) survive the migration to Vitest, split between GHA workflow steps (platform / install / runtime) and Vitest phase fixtures (onboarding / lifecycle / suites).
  • Agree that live/ test discovery is registry-driven by default — one Vitest test per listScenarios() entry — with hand-authored files allowed for true one-off probes.
  • Agree that framework/phases/ is the right home for the application-logic phase fixtures, with environment.ts being assertion-only.
  • Agree that e2e-vitest-scenarios.yaml (ci(e2e): add Vitest scenario workflow #4968) evolves to consume --emit-matrix for fan-out, mirroring the existing e2e-scenarios-all.yaml pattern, with install + runtime prep as matrix-gated workflow steps (eventually composite actions).
  • Agree that scenarios/runtime-support.ts:isScenarioFullyWired (the existing typed-shell-runner gate) is the same gate for the Vitest matrix — unwired scenarios skip with a structured reason, not silent fail.
  • Agree to extend migration/legacy-inventory.json (test(e2e): add migration inventory deletion gates #4969) to track retirement of scenarios/orchestrators/, nemoclaw_scenarios/, and validation_suites/ per family migration.

Acceptance Criteria

  • framework/phases/environment.ts (assertion-only) and framework/phases/onboard.ts exist and expose at least one method each plus from(scenarioPart, ...prereqs).
  • live/scenarios.test.ts runs the registry-driven matrix, with test.skip for unwired scenarios.
  • e2e-vitest-scenarios.yaml consumes --emit-matrix, fans out one job per scenario id, and runs install + runtime prep as matrix-gated workflow steps.
  • One canonical scenario (suggest ubuntu-repo-cloud-openclaw) runs end-to-end through phase fixtures and passes its smoke + inference suites.
  • The runtime-support filter governs both the typed-shell --emit-matrix (existing) AND the Vitest registry-driven runner (new).
  • Migration inventory entry exists for at least one phase fixture's bash counterpart with deletionReady: false (until parity proven).
  • Adding a new scenario in scenarios/scenarios/baseline.ts automatically produces a Vitest test in CI without touching live/ files.

Category

Testing

Checklist

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: architectureArchitecture, design debt, major refactors, or maintainabilityarea: e2eEnd-to-end tests, nightly failures, or validation infrastructureenhancementNew capability or improvement request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions