Skip to content

Epic: Migrate legacy bash E2E into the Vitest E2E system #5098

@jyaunches

Description

@jyaunches

Goal

Migrate every direct legacy bash E2E entry point under test/e2e/test-*.sh into the typed Vitest E2E system, then delete or retire the legacy shell entry points as cleanup.

The end state is:

  • E2E tests are TypeScript/Vitest tests.
  • Real system boundaries are still tested when they are the contract: Vitest may invoke bash, bash -lc, installer scripts, host commands, sudo-gated commands, process signals, /proc probes, Docker/OpenShell commands, and sandbox exec.
  • Shared setup, clients, shell probes, process supervision, artifacts, cleanup, and redaction live in Vitest fixtures/support code.
  • Legacy bash scripts are not kept as a second durable E2E suite.

Core Principle

Do not classify tests as "keep bash" just because they need shell behavior.

If a contract needs shell, process, host, platform, installer, or sandbox behavior, preserve that boundary from a typed Vitest test:

  • installer fidelity: run bash install.sh ... from Vitest;
  • shell profile behavior: run bash -lc or explicitly source profile files from Vitest;
  • /proc and kernel/process probes: run the probe in the same host or sandbox context from Vitest;
  • signal handling: use Node process control or host shell commands from Vitest;
  • privileged/platform checks: use guarded host exec from Vitest and skip with evidence when unavailable;
  • sandbox context: use typed sandbox clients / openshell sandbox exec wrappers from Vitest;
  • full user journeys: keep them as full-flow Vitest tests instead of splitting away their integration value.

Definition of Done for a Conversion

A legacy script is considered converted when all of the following are true:

  1. Equivalent typed Vitest coverage exists under the E2E Vitest tree or an appropriate focused Vitest test.
  2. The new test preserves the same user-visible contract, including shell/system boundaries where those boundaries matter.
  3. The test has deterministic artifacts, cleanup, timeout behavior, and secret redaction.
  4. The PR explains the contract mapping from the legacy shell script to the new Vitest coverage.
  5. The replacement test is wired into .github/workflows/e2e-vitest-scenarios.yaml in the conversion PR so it can be dispatched on the same runner class as the legacy lane.
  6. Any legacy shell workflow references that must be removed are removed, replaced, or explicitly deferred to Phase 11 in the same PR.

Shell Deletion Deferral and Governance

Default conversion PRs should defer legacy shell script deletion unless the PR is explicitly scoped to retire the script and its workflow lane. A script not being deleted is not by itself a reason to leave the conversion unchecked.

Deletion/quarantine of converted shell scripts is intentionally deferred to Phase 11 cleanup so parallel conversion PRs can land with less workflow and allowlist churn. During conversion phases:

  • do not add new legacy test/e2e/test-*.sh entry points;
  • do not expand or casually rewrite existing legacy shell scripts;
  • do not add new nightly/regression workflow wiring that runs legacy shell scripts unless maintainers intentionally approve it;
  • every conversion PR must add the replacement Vitest execution path to .github/workflows/e2e-vitest-scenarios.yaml before it is considered converted;
  • free-standing live/process Vitest replacements must get a discrete dispatchable job in e2e-vitest-scenarios.yaml unless they are registered as a typed live scenario in the existing matrix;
  • do not rely on e2e-scenarios-all / empty scenarios dispatch as proof for a newly added free-standing test unless the PR also wires that job into e2e-vitest-scenarios.yaml;
  • move legacy shell workflow lanes to Vitest only when a conversion PR intentionally replaces that execution path; otherwise leave stable shell wiring for Phase 11 cleanup.

Repository governance exists to enforce this freeze:

  • test/e2e-script-workflow.test.ts freezes the top-level legacy shell script allowlist;
  • the same test freezes scheduled nightly legacy shell wiring and verifies referenced files still exist;
  • the test is included in the cli Vitest project and runs through PR cli-test-shards / main cli-test-shards.

If a PR intentionally changes the legacy shell set or nightly shell wiring, it must update the allowlist/contract test with an explicit migration rationale.

Parallelization Model

The remaining work should be run as parallel tracks, not a single linear phase ladder.

  • Phase 1 anchor PRs can start in parallel across tracks.
  • Within a track, dependent migrations should wait until that track's anchor PR has established the helper/test shape, but they do not need to wait for anchors in other tracks.
  • Prefer one owner per anchor to avoid competing helper designs.
  • Avoid broad shared-fixture or workflow refactors inside individual conversions.
  • Use local helpers first. Promote helpers to shared fixtures only when 3+ migrations clearly need the same boundary.
  • Minimize workflow YAML contention, but do not skip required Vitest wiring: each migration still needs its own smallest safe e2e-vitest-scenarios.yaml dispatch path.
  • Use unique sandbox names, ports, artifact directories, and workflow job names in parallel PRs.

Phase Tracker

Legend: checked means equivalent Vitest conversion coverage has landed. Line counts are sizing hints from the legacy script inventory.

Phase 0 — Converted / already covered

Phase 1 — Parallel anchor PRs

Start these in parallel. Each anchor should establish the smallest useful local/helper shape for its track, then dependent scripts in that track can fan out.

Phase 2 — Independent quick wins / no anchor dependency

These should not need a large shared helper or anchor PR. Convert opportunistically in parallel.

Phase 3 — Onboarding and baseline dependents

Start after the onboarding anchor shape exists. These may share onboard/run/resume/repair helpers but can be separate PRs.

  • test-full-e2e.sh (510) → full Vitest user journey: install → onboard → inference → CLI ops → cleanup
  • test-cloud-onboard-e2e.sh (338) → public/cloud onboarding Vitest flow
  • test-cloud-inference-e2e.sh (291) → cloud inference Vitest flow — PR test(e2e): migrate cloud inference scenario #5361 merged
  • test-common-egress-agent-e2e.sh (452) → common egress agent onboarding/runtime flow — PR test(e2e): migrate common-egress agent scenario #5360 merged
  • test-gpu-double-onboard.sh (579) → GPU repeated onboarding variant
  • test-onboard-repair.sh (400) → repair flow
  • test-onboard-resume.sh (350) → interrupted/resumed onboarding
  • test-onboard-negative-paths.sh (521) → invalid input and failure handling — PR test(e2e): bridge onboard negative paths to Vitest #5152 merged
  • test-issue-4462-scope-upgrade-approval.sh (1058) → scope upgrade approval and no-leak process checks

Phase 4 — Sandbox and gateway dependents

Start after the sandbox/gateway anchor shape exists. These can run in parallel with inference, messaging, Hermes, security, rebuild, and platform tracks.

  • test-sandbox-survival.sh (795) → lifecycle survival/restart behavior — PR test(e2e): migrate test-sandbox-survival.sh to vitest #5332 merged
  • test-snapshot-commands.sh (288) → snapshot command behavior
  • test-diagnostics.sh (513) → diagnostics collection and expected output
  • test-issue-2478-crash-loop-recovery.sh (636) → gateway/sandbox crash-loop recovery
  • test-concurrent-gateway-ports.sh (370) → concurrent gateway port allocation
  • test-gateway-drift-preflight.sh (423) → gateway drift preflight detection — PR test(e2e): migrate test-gateway-drift-preflight.sh to vitest #5350 merged
  • test-gateway-health-honest.sh (234) → honest gateway health reporting

Phase 5 — Inference and provider dependents

Start after the inference/provider anchor shape exists. Mock-provider and live-provider slices can be split across owners.

  • test-gpu-e2e.sh (693) → GPU/Ollama inference flow
  • test-ollama-auth-proxy-e2e.sh (568) → Ollama auth proxy flow
  • test-kimi-inference-compat.sh (800) → Kimi compatibility
  • test-openclaw-inference-switch.sh (519) → OpenClaw inference provider switching — PR test(e2e): migrate OpenClaw inference switch scenario #5357 merged
  • test-hermes-inference-switch.sh (615) → Hermes inference provider switching
  • test-bedrock-runtime-compatible-anthropic.sh (1020) → Bedrock/Anthropic-compatible runtime — PR test(e2e): migrate Bedrock Runtime compatible Anthropic scenario #5356 merged
  • test-brave-search-e2e.sh (438) → Brave search integration
  • test-agent-turn-latency-e2e.sh (629) → agent turn latency/runtime smoke
  • test-cron-preflight-inference-local-e2e.sh (366) → cron preflight inference.local / env-proxy mode

Phase 6 — Messaging, channels, and pairing dependents

Start after the messaging anchor shape exists. Avoid making test-messaging-providers.sh the first PR; split provider/channel slices where possible.

Phase 7 — Hermes dependents

Start after the Hermes anchor shape exists. Keep Hermes-specific helpers separate from OpenClaw helpers unless repeated boundaries prove otherwise.

  • test-hermes-sandbox-secret-boundary.sh (416) → Hermes sandbox secret boundary
  • test-hermes-slack-e2e.sh (663) → Hermes Slack integration
  • test-hermes-discord-e2e.sh (656) → Hermes Discord integration

Phase 8 — Security and policy dependents

Start after the security/policy anchor shape exists, except the independent quick wins already listed in Phase 2.

Phase 9 — Rebuild, state, and runtime dependents

Start after the rebuild/state anchor shape exists. Runtime-only scripts that stay local can proceed independently if they do not need rebuild helpers.

Phase 10 — Platform and resource-constrained dependents

These are valid parallel work, but they need explicit runner/resource planning and should not block non-platform tracks.

  • test-spark-install.sh (157) → Spark install path
  • test-jetson-nvmap-gpu.sh (323) → Jetson nvmap/GPU platform path

Phase 11 — Review, refactor, and shell retirement cleanup

Run this after enough anchor/dependent migrations have landed to see real duplication patterns.

  • Review all migrated tests for repeated local helpers used by 3+ migrations.
  • Promote repeated helper patterns into shared Vitest fixture/support locations with focused tests.
  • Remove redundant local helper copies after shared helpers exist.
  • Delete or quarantine converted legacy shell scripts that remain present only for transition.
  • Remove stale shell workflow wiring and update existing workflow/allowlist contract tests.
  • Confirm no durable legacy bash suite remains beyond intentionally documented platform/manual exceptions.

Tracking Guidance

Use this issue for migration order, parallelization guidance, and track ownership. Use individual PRs for exact assertion mapping, validation evidence, e2e-vitest-scenarios.yaml wiring, same-runner dispatch evidence, and script deletion/retirement notes. When a PR lands equivalent Vitest coverage with a dispatchable Vitest workflow path, link it here and mark the item complete even if deletion is deferred to Phase 11 cleanup.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: e2eEnd-to-end tests, nightly failures, or validation infrastructuresprint 6Sprint 6

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions