Goal
Migrate every direct legacy bash E2E entry point under test/e2e/test-*.sh into the typed Vitest E2E system, then delete or retire the legacy shell entry points as cleanup.
The end state is:
- E2E tests are TypeScript/Vitest tests.
- Real system boundaries are still tested when they are the contract: Vitest may invoke
bash, bash -lc, installer scripts, host commands, sudo-gated commands, process signals, /proc probes, Docker/OpenShell commands, and sandbox exec.
- Shared setup, clients, shell probes, process supervision, artifacts, cleanup, and redaction live in Vitest fixtures/support code.
- Legacy bash scripts are not kept as a second durable E2E suite.
Core Principle
Do not classify tests as "keep bash" just because they need shell behavior.
If a contract needs shell, process, host, platform, installer, or sandbox behavior, preserve that boundary from a typed Vitest test:
- installer fidelity: run
bash install.sh ... from Vitest;
- shell profile behavior: run
bash -lc or explicitly source profile files from Vitest;
/proc and kernel/process probes: run the probe in the same host or sandbox context from Vitest;
- signal handling: use Node process control or host shell commands from Vitest;
- privileged/platform checks: use guarded host exec from Vitest and skip with evidence when unavailable;
- sandbox context: use typed sandbox clients /
openshell sandbox exec wrappers from Vitest;
- full user journeys: keep them as full-flow Vitest tests instead of splitting away their integration value.
Definition of Done for a Conversion
A legacy script is considered converted when all of the following are true:
- Equivalent typed Vitest coverage exists under the E2E Vitest tree or an appropriate focused Vitest test.
- The new test preserves the same user-visible contract, including shell/system boundaries where those boundaries matter.
- The test has deterministic artifacts, cleanup, timeout behavior, and secret redaction.
- The PR explains the contract mapping from the legacy shell script to the new Vitest coverage.
- The replacement test is wired into
.github/workflows/e2e-vitest-scenarios.yaml in the conversion PR so it can be dispatched on the same runner class as the legacy lane.
- Any legacy shell workflow references that must be removed are removed, replaced, or explicitly deferred to Phase 11 in the same PR.
Shell Deletion Deferral and Governance
Default conversion PRs should defer legacy shell script deletion unless the PR is explicitly scoped to retire the script and its workflow lane. A script not being deleted is not by itself a reason to leave the conversion unchecked.
Deletion/quarantine of converted shell scripts is intentionally deferred to Phase 11 cleanup so parallel conversion PRs can land with less workflow and allowlist churn. During conversion phases:
- do not add new legacy
test/e2e/test-*.sh entry points;
- do not expand or casually rewrite existing legacy shell scripts;
- do not add new nightly/regression workflow wiring that runs legacy shell scripts unless maintainers intentionally approve it;
- every conversion PR must add the replacement Vitest execution path to
.github/workflows/e2e-vitest-scenarios.yaml before it is considered converted;
- free-standing live/process Vitest replacements must get a discrete dispatchable job in
e2e-vitest-scenarios.yaml unless they are registered as a typed live scenario in the existing matrix;
- do not rely on
e2e-scenarios-all / empty scenarios dispatch as proof for a newly added free-standing test unless the PR also wires that job into e2e-vitest-scenarios.yaml;
- move legacy shell workflow lanes to Vitest only when a conversion PR intentionally replaces that execution path; otherwise leave stable shell wiring for Phase 11 cleanup.
Repository governance exists to enforce this freeze:
test/e2e-script-workflow.test.ts freezes the top-level legacy shell script allowlist;
- the same test freezes scheduled nightly legacy shell wiring and verifies referenced files still exist;
- the test is included in the
cli Vitest project and runs through PR cli-test-shards / main cli-test-shards.
If a PR intentionally changes the legacy shell set or nightly shell wiring, it must update the allowlist/contract test with an explicit migration rationale.
Parallelization Model
The remaining work should be run as parallel tracks, not a single linear phase ladder.
- Phase 1 anchor PRs can start in parallel across tracks.
- Within a track, dependent migrations should wait until that track's anchor PR has established the helper/test shape, but they do not need to wait for anchors in other tracks.
- Prefer one owner per anchor to avoid competing helper designs.
- Avoid broad shared-fixture or workflow refactors inside individual conversions.
- Use local helpers first. Promote helpers to shared fixtures only when 3+ migrations clearly need the same boundary.
- Minimize workflow YAML contention, but do not skip required Vitest wiring: each migration still needs its own smallest safe
e2e-vitest-scenarios.yaml dispatch path.
- Use unique sandbox names, ports, artifact directories, and workflow job names in parallel PRs.
Phase Tracker
Legend: checked means equivalent Vitest conversion coverage has landed. Line counts are sizing hints from the legacy script inventory.
Phase 0 — Converted / already covered
Phase 1 — Parallel anchor PRs
Start these in parallel. Each anchor should establish the smallest useful local/helper shape for its track, then dependent scripts in that track can fan out.
Phase 2 — Independent quick wins / no anchor dependency
These should not need a large shared helper or anchor PR. Convert opportunistically in parallel.
Phase 3 — Onboarding and baseline dependents
Start after the onboarding anchor shape exists. These may share onboard/run/resume/repair helpers but can be separate PRs.
Phase 4 — Sandbox and gateway dependents
Start after the sandbox/gateway anchor shape exists. These can run in parallel with inference, messaging, Hermes, security, rebuild, and platform tracks.
Phase 5 — Inference and provider dependents
Start after the inference/provider anchor shape exists. Mock-provider and live-provider slices can be split across owners.
Phase 6 — Messaging, channels, and pairing dependents
Start after the messaging anchor shape exists. Avoid making test-messaging-providers.sh the first PR; split provider/channel slices where possible.
Phase 7 — Hermes dependents
Start after the Hermes anchor shape exists. Keep Hermes-specific helpers separate from OpenClaw helpers unless repeated boundaries prove otherwise.
Phase 8 — Security and policy dependents
Start after the security/policy anchor shape exists, except the independent quick wins already listed in Phase 2.
Phase 9 — Rebuild, state, and runtime dependents
Start after the rebuild/state anchor shape exists. Runtime-only scripts that stay local can proceed independently if they do not need rebuild helpers.
Phase 10 — Platform and resource-constrained dependents
These are valid parallel work, but they need explicit runner/resource planning and should not block non-platform tracks.
Phase 11 — Review, refactor, and shell retirement cleanup
Run this after enough anchor/dependent migrations have landed to see real duplication patterns.
Tracking Guidance
Use this issue for migration order, parallelization guidance, and track ownership. Use individual PRs for exact assertion mapping, validation evidence, e2e-vitest-scenarios.yaml wiring, same-runner dispatch evidence, and script deletion/retirement notes. When a PR lands equivalent Vitest coverage with a dispatchable Vitest workflow path, link it here and mark the item complete even if deletion is deferred to Phase 11 cleanup.
Goal
Migrate every direct legacy bash E2E entry point under
test/e2e/test-*.shinto the typed Vitest E2E system, then delete or retire the legacy shell entry points as cleanup.The end state is:
bash,bash -lc, installer scripts, host commands,sudo-gated commands, process signals,/procprobes, Docker/OpenShell commands, and sandbox exec.Core Principle
Do not classify tests as "keep bash" just because they need shell behavior.
If a contract needs shell, process, host, platform, installer, or sandbox behavior, preserve that boundary from a typed Vitest test:
bash install.sh ...from Vitest;bash -lcor explicitly source profile files from Vitest;/procand kernel/process probes: run the probe in the same host or sandbox context from Vitest;openshell sandbox execwrappers from Vitest;Definition of Done for a Conversion
A legacy script is considered converted when all of the following are true:
.github/workflows/e2e-vitest-scenarios.yamlin the conversion PR so it can be dispatched on the same runner class as the legacy lane.Shell Deletion Deferral and Governance
Default conversion PRs should defer legacy shell script deletion unless the PR is explicitly scoped to retire the script and its workflow lane. A script not being deleted is not by itself a reason to leave the conversion unchecked.
Deletion/quarantine of converted shell scripts is intentionally deferred to Phase 11 cleanup so parallel conversion PRs can land with less workflow and allowlist churn. During conversion phases:
test/e2e/test-*.shentry points;.github/workflows/e2e-vitest-scenarios.yamlbefore it is considered converted;e2e-vitest-scenarios.yamlunless they are registered as a typed live scenario in the existing matrix;e2e-scenarios-all/ emptyscenariosdispatch as proof for a newly added free-standing test unless the PR also wires that job intoe2e-vitest-scenarios.yaml;Repository governance exists to enforce this freeze:
test/e2e-script-workflow.test.tsfreezes the top-level legacy shell script allowlist;cliVitest project and runs through PRcli-test-shards/ maincli-test-shards.If a PR intentionally changes the legacy shell set or nightly shell wiring, it must update the allowlist/contract test with an explicit migration rationale.
Parallelization Model
The remaining work should be run as parallel tracks, not a single linear phase ladder.
e2e-vitest-scenarios.yamldispatch path.Phase Tracker
Legend: checked means equivalent Vitest conversion coverage has landed. Line counts are sizing hints from the legacy script inventory.
Phase 0 — Converted / already covered
test-onboard-inference-smoke.sh(163) → compact onboard + inference smoke — PR test(e2e): retire onboard inference smoke script #5155 mergedtest-docker-unreachable-gateway-start.sh(162) → gateway start failure/preflight Vitest regression — PR test(e2e): retire docker-unreachable gateway script #5119 merged (coverage seeded by PR test(onboard): add helper-level Vitest coverage for docker-unreachable gateway-start abort (#4355) #5109)test-dashboard-remote-bind.sh(72) → dashboard bind/remote accessibility guard — PR test(e2e): migrate dashboard remote bind #5186 mergedtest-whatsapp-qr-compact-e2e.sh(189) → WhatsApp QR compact behavior — PR test(e2e): migrate WhatsApp QR compact guard #5187 mergedtest-strict-tool-call-probe.sh(377) → strict tool-call probe — PR test(e2e): retire test-strict-tool-call-probe.sh #5153 mergedtest-vm-driver-privileged-exec-routing.sh(142) → privileged exec routing guard — PR test(e2e): migrate VM driver privexec coverage #5189 mergedtest-docs-validation.sh(163) → docs validation E2E — PR test(e2e): add Vitest docs validation coverage #5185 mergedPhase 1 — Parallel anchor PRs
Start these in parallel. Each anchor should establish the smallest useful local/helper shape for its track, then dependent scripts in that track can fan out.
test-double-onboard.sh(972) → onboarding lifecycle anchor: repeated onboarding, idempotency, repair prompts — PR test(e2e): migrate double onboard to Vitest [ANCHOR-1] #5218 mergedtest-sandbox-operations.sh(884) → sandbox/gateway anchor: create/list/status/connect/destroy operations — PR test(e2e): migrate test-sandbox-operations.sh to vitest [ANCHOR-2] #5224 mergedtest-inference-routing.sh(715) → inference/provider anchor: routing and error behavior — PR test(e2e): add inference routing Vitest coverage [ANCHOR-3] #5231 mergedtest-token-rotation.sh(603) → messaging/channel anchor: token lifecycle and channel secret refresh behavior — PR test(e2e): migrate test-token-rotation.sh to vitest [ANCHOR-4] #5236 mergedtest-hermes-e2e.sh(762) → Hermes anchor: install/onboard/runtime full flow — PR test(e2e): add Hermes live Vitest migration [ANCHOR-5] #5256 mergedtest-network-policy.sh(1133) → security/policy anchor: network policy enforcement and allow/deny probes — PR test(e2e): migrate network policy to Vitest [ANCHOR-6] #5226 mergedtest-rebuild-openclaw.sh(541) → rebuild/state anchor: OpenClaw rebuild flow and state preservation — PR test(e2e): migrate test-rebuild-openclaw.sh to vitest [ANCHOR-7] #5223 mergedtest-launchable-smoke.sh(593) → platform/manual anchor: launchable image smoke path — PR test(e2e): migrate test-launchable-smoke.sh to vitest [ANCHOR-8] #5219 mergedPhase 2 — Independent quick wins / no anchor dependency
These should not need a large shared helper or anchor PR. Convert opportunistically in parallel.
test-openshell-version-pin.sh(288) → installer-script Vitest test for OpenShell pinning — PR test(e2e): migrate test-openshell-version-pin.sh to free-standing Vitest live test #5107 mergedtest-model-router-provider-routed-inference.sh(196) → model-router provider-routed inference — PR test(e2e): migrate test-model-router-provider-routed-inference.sh to vitest [IND-2] #5221 mergedtest-issue-4434-tui-unreachable-inference.sh(197) → unreachable inference handling — PR test(e2e): migrate test-issue-4434-tui-unreachable-inference.sh to vitest #5233 mergedtest-hermes-root-entrypoint-smoke.sh(202) → Hermes root entrypoint smoke — PR test(e2e): migrate test-hermes-root-entrypoint-smoke.sh to vitest [IND-4] #5220 mergedtest-credential-migration.sh(302) → credential migration behavior — PR test(e2e): migrate credential migration to Vitest [IND-5] #5228 mergedtest-openclaw-plugin-runtime-exdev.sh(209) → plugin runtime EXDEV behavior — PR test(e2e): migrate openclaw plugin EXDEV guard [IND-6] #5232 mergedtest-runtime-overrides.sh(337) → runtime override behavior — PR test(e2e): migrate test-runtime-overrides.sh to vitest #5229 mergedtest-openclaw-tui-chat-correlation.sh(63) → TUI chat correlation smoke — PR test(e2e): add OpenClaw TUI chat-correlation coverage slice #5150 mergedtest-skill-agent-e2e.sh(268) → skill-agent E2E — PR test(e2e): migrate test-skill-agent-e2e.sh to vitest #5222 mergedPhase 3 — Onboarding and baseline dependents
Start after the onboarding anchor shape exists. These may share onboard/run/resume/repair helpers but can be separate PRs.
test-full-e2e.sh(510) → full Vitest user journey: install → onboard → inference → CLI ops → cleanuptest-cloud-onboard-e2e.sh(338) → public/cloud onboarding Vitest flowtest-cloud-inference-e2e.sh(291) → cloud inference Vitest flow — PR test(e2e): migrate cloud inference scenario #5361 mergedtest-common-egress-agent-e2e.sh(452) → common egress agent onboarding/runtime flow — PR test(e2e): migrate common-egress agent scenario #5360 mergedtest-gpu-double-onboard.sh(579) → GPU repeated onboarding varianttest-onboard-repair.sh(400) → repair flowtest-onboard-resume.sh(350) → interrupted/resumed onboardingtest-onboard-negative-paths.sh(521) → invalid input and failure handling — PR test(e2e): bridge onboard negative paths to Vitest #5152 mergedtest-issue-4462-scope-upgrade-approval.sh(1058) → scope upgrade approval and no-leak process checksPhase 4 — Sandbox and gateway dependents
Start after the sandbox/gateway anchor shape exists. These can run in parallel with inference, messaging, Hermes, security, rebuild, and platform tracks.
test-sandbox-survival.sh(795) → lifecycle survival/restart behavior — PR test(e2e): migrate test-sandbox-survival.sh to vitest #5332 mergedtest-snapshot-commands.sh(288) → snapshot command behaviortest-diagnostics.sh(513) → diagnostics collection and expected outputtest-issue-2478-crash-loop-recovery.sh(636) → gateway/sandbox crash-loop recoverytest-concurrent-gateway-ports.sh(370) → concurrent gateway port allocationtest-gateway-drift-preflight.sh(423) → gateway drift preflight detection — PR test(e2e): migrate test-gateway-drift-preflight.sh to vitest #5350 mergedtest-gateway-health-honest.sh(234) → honest gateway health reportingPhase 5 — Inference and provider dependents
Start after the inference/provider anchor shape exists. Mock-provider and live-provider slices can be split across owners.
test-gpu-e2e.sh(693) → GPU/Ollama inference flowtest-ollama-auth-proxy-e2e.sh(568) → Ollama auth proxy flowtest-kimi-inference-compat.sh(800) → Kimi compatibilitytest-openclaw-inference-switch.sh(519) → OpenClaw inference provider switching — PR test(e2e): migrate OpenClaw inference switch scenario #5357 mergedtest-hermes-inference-switch.sh(615) → Hermes inference provider switchingtest-bedrock-runtime-compatible-anthropic.sh(1020) → Bedrock/Anthropic-compatible runtime — PR test(e2e): migrate Bedrock Runtime compatible Anthropic scenario #5356 mergedtest-brave-search-e2e.sh(438) → Brave search integrationtest-agent-turn-latency-e2e.sh(629) → agent turn latency/runtime smoketest-cron-preflight-inference-local-e2e.sh(366) → cron preflightinference.local/ env-proxy modePhase 6 — Messaging, channels, and pairing dependents
Start after the messaging anchor shape exists. Avoid making
test-messaging-providers.shthe first PR; split provider/channel slices where possible.test-messaging-providers.sh(3204) → provider matrix for Telegram/Discord/Slack/WeChat/WhatsApp behavior — PR test(e2e): add messaging providers vitest coverage #5364 mergedtest-telegram-injection.sh(476) → Telegram injection guardtest-messaging-compatible-endpoint.sh(679) → compatible endpoint messaging path — PR test(e2e): migrate messaging compatible endpoint #5362 mergedtest-channels-add-remove.sh(619) → channel add/remove lifecycle — PR test(e2e): migrate channels add remove scenario #5355 mergedtest-channels-stop-start.sh(813) → channel stop/start lifecycletest-openclaw-discord-pairing.sh(637) → OpenClaw Discord pairingtest-openclaw-slack-pairing.sh(860) → OpenClaw Slack pairingPhase 7 — Hermes dependents
Start after the Hermes anchor shape exists. Keep Hermes-specific helpers separate from OpenClaw helpers unless repeated boundaries prove otherwise.
test-hermes-sandbox-secret-boundary.sh(416) → Hermes sandbox secret boundarytest-hermes-slack-e2e.sh(663) → Hermes Slack integrationtest-hermes-discord-e2e.sh(656) → Hermes Discord integrationPhase 8 — Security and policy dependents
Start after the security/policy anchor shape exists, except the independent quick wins already listed in Phase 2.
test-shields-config.sh(671) → shields configuration policy — PR test(e2e): migrate test-shields-config.sh to vitest #5337 mergedtest-credential-sanitization.sh(816) → credential sanitization and leak checks — PR test(e2e): migrate test-credential-sanitization.sh to vitest #5336 mergedPhase 9 — Rebuild, state, and runtime dependents
Start after the rebuild/state anchor shape exists. Runtime-only scripts that stay local can proceed independently if they do not need rebuild helpers.
test-rebuild-hermes.sh(406) → Hermes rebuild flowtest-upgrade-stale-sandbox.sh(241) → stale sandbox upgrade detection/recoverytest-sandbox-rebuild.sh(197) → sandbox rebuild smoke or fold into rebuild coverage — PR test(e2e): migrate test-sandbox-rebuild.sh to vitest #5333 mergedtest-openshell-gateway-upgrade.sh(793) → OpenShell gateway upgrade behaviortest-state-backup-restore.sh(379) → state backup/restore lifecycle — PR test(e2e): migrate state backup restore to vitest #5353 mergedtest-overlayfs-autofix.sh(549) → overlayfs autofix behaviortest-device-auth-health.sh(375) → device auth health flowtest-tunnel-lifecycle.sh(516) → tunnel lifecycletest-openclaw-skill-cli-e2e.sh(340) → OpenClaw skill CLI E2E — PR test(e2e): migrate OpenClaw skill CLI scenario #5354 mergedtest-sessions-agents-cli.sh(501) → sessions/agents CLI E2E — PR test(e2e): migrate sessions agents cli scenario #5363 mergedPhase 10 — Platform and resource-constrained dependents
These are valid parallel work, but they need explicit runner/resource planning and should not block non-platform tracks.
test-spark-install.sh(157) → Spark install pathtest-jetson-nvmap-gpu.sh(323) → Jetson nvmap/GPU platform pathPhase 11 — Review, refactor, and shell retirement cleanup
Run this after enough anchor/dependent migrations have landed to see real duplication patterns.
Tracking Guidance
Use this issue for migration order, parallelization guidance, and track ownership. Use individual PRs for exact assertion mapping, validation evidence,
e2e-vitest-scenarios.yamlwiring, same-runner dispatch evidence, and script deletion/retirement notes. When a PR lands equivalent Vitest coverage with a dispatchable Vitest workflow path, link it here and mark the item complete even if deletion is deferred to Phase 11 cleanup.