Epic: Migrate legacy bash E2E into the Vitest E2E system

## Goal

Migrate every direct legacy bash E2E entry point under `test/e2e/test-*.sh` into the typed Vitest E2E system, then delete or retire the legacy shell entry points as cleanup.

The end state is:

- E2E tests are TypeScript/Vitest tests.
- Real system boundaries are still tested when they are the contract: Vitest may invoke `bash`, `bash -lc`, installer scripts, host commands, `sudo`-gated commands, process signals, `/proc` probes, Docker/OpenShell commands, and sandbox exec.
- Shared setup, clients, shell probes, process supervision, artifacts, cleanup, and redaction live in Vitest fixtures/support code.
- Legacy bash scripts are not kept as a second durable E2E suite.

## Core Principle

Do not classify tests as "keep bash" just because they need shell behavior.

If a contract needs shell, process, host, platform, installer, or sandbox behavior, preserve that boundary from a typed Vitest test:

- installer fidelity: run `bash install.sh ...` from Vitest;
- shell profile behavior: run `bash -lc` or explicitly source profile files from Vitest;
- `/proc` and kernel/process probes: run the probe in the same host or sandbox context from Vitest;
- signal handling: use Node process control or host shell commands from Vitest;
- privileged/platform checks: use guarded host exec from Vitest and skip with evidence when unavailable;
- sandbox context: use typed sandbox clients / `openshell sandbox exec` wrappers from Vitest;
- full user journeys: keep them as full-flow Vitest tests instead of splitting away their integration value.

## Definition of Done for a Conversion

A legacy script is considered converted when all of the following are true:

1. Equivalent typed Vitest coverage exists under the E2E Vitest tree or an appropriate focused Vitest test.
2. The new test preserves the same user-visible contract, including shell/system boundaries where those boundaries matter.
3. The test has deterministic artifacts, cleanup, timeout behavior, and secret redaction.
4. The PR explains the contract mapping from the legacy shell script to the new Vitest coverage.
5. The replacement test is wired into `.github/workflows/e2e-vitest-scenarios.yaml` in the conversion PR so it can be dispatched on the same runner class as the legacy lane.
6. Any legacy shell workflow references that must be removed are removed, replaced, or explicitly deferred to Phase 11 in the same PR.

## Shell Deletion Deferral and Governance

Default conversion PRs should **defer legacy shell script deletion** unless the PR is explicitly scoped to retire the script and its workflow lane. A script not being deleted is not by itself a reason to leave the conversion unchecked.

Deletion/quarantine of converted shell scripts is intentionally deferred to Phase 11 cleanup so parallel conversion PRs can land with less workflow and allowlist churn. During conversion phases:

- do not add new legacy `test/e2e/test-*.sh` entry points;
- do not expand or casually rewrite existing legacy shell scripts;
- do not add new nightly/regression workflow wiring that runs legacy shell scripts unless maintainers intentionally approve it;
- every conversion PR must add the replacement Vitest execution path to `.github/workflows/e2e-vitest-scenarios.yaml` before it is considered converted;
- free-standing live/process Vitest replacements must get a discrete dispatchable job in `e2e-vitest-scenarios.yaml` unless they are registered as a typed live scenario in the existing matrix;
- do not rely on `e2e-scenarios-all` / empty `scenarios` dispatch as proof for a newly added free-standing test unless the PR also wires that job into `e2e-vitest-scenarios.yaml`;
- move legacy shell workflow lanes to Vitest only when a conversion PR intentionally replaces that execution path; otherwise leave stable shell wiring for Phase 11 cleanup.

Repository governance exists to enforce this freeze:

- `test/e2e-script-workflow.test.ts` freezes the top-level legacy shell script allowlist;
- the same test freezes scheduled nightly legacy shell wiring and verifies referenced files still exist;
- the test is included in the `cli` Vitest project and runs through PR `cli-test-shards` / main `cli-test-shards`.

If a PR intentionally changes the legacy shell set or nightly shell wiring, it must update the allowlist/contract test with an explicit migration rationale.

## Parallelization Model

The remaining work should be run as parallel tracks, not a single linear phase ladder.

- Phase 1 anchor PRs can start in parallel across tracks.
- Within a track, dependent migrations should wait until that track's anchor PR has established the helper/test shape, but they do **not** need to wait for anchors in other tracks.
- Prefer one owner per anchor to avoid competing helper designs.
- Avoid broad shared-fixture or workflow refactors inside individual conversions.
- Use local helpers first. Promote helpers to shared fixtures only when 3+ migrations clearly need the same boundary.
- Minimize workflow YAML contention, but do not skip required Vitest wiring: each migration still needs its own smallest safe `e2e-vitest-scenarios.yaml` dispatch path.
- Use unique sandbox names, ports, artifact directories, and workflow job names in parallel PRs.

## Phase Tracker

Legend: checked means equivalent Vitest conversion coverage has landed. Line counts are sizing hints from the legacy script inventory.

### Phase 0 — Converted / already covered

- [x] `test-onboard-inference-smoke.sh` (163) → compact onboard + inference smoke — PR #5155 merged
- [x] `test-docker-unreachable-gateway-start.sh` (162) → gateway start failure/preflight Vitest regression — PR #5119 merged (coverage seeded by PR #5109)
- [x] `test-dashboard-remote-bind.sh` (72) → dashboard bind/remote accessibility guard — PR #5186 merged
- [x] `test-whatsapp-qr-compact-e2e.sh` (189) → WhatsApp QR compact behavior — PR #5187 merged
- [x] `test-strict-tool-call-probe.sh` (377) → strict tool-call probe — PR #5153 merged
- [x] `test-vm-driver-privileged-exec-routing.sh` (142) → privileged exec routing guard — PR #5189 merged
- [x] `test-docs-validation.sh` (163) → docs validation E2E — PR #5185 merged

### Phase 1 — Parallel anchor PRs

Start these in parallel. Each anchor should establish the smallest useful local/helper shape for its track, then dependent scripts in that track can fan out.

- [x] `test-double-onboard.sh` (972) → onboarding lifecycle anchor: repeated onboarding, idempotency, repair prompts — PR #5218 merged
- [x] `test-sandbox-operations.sh` (884) → sandbox/gateway anchor: create/list/status/connect/destroy operations — PR #5224 merged
- [x] `test-inference-routing.sh` (715) → inference/provider anchor: routing and error behavior — PR #5231 merged
- [x] `test-token-rotation.sh` (603) → messaging/channel anchor: token lifecycle and channel secret refresh behavior — PR #5236 merged
- [x] `test-hermes-e2e.sh` (762) → Hermes anchor: install/onboard/runtime full flow — PR #5256 merged
- [x] `test-network-policy.sh` (1133) → security/policy anchor: network policy enforcement and allow/deny probes — PR #5226 merged
- [x] `test-rebuild-openclaw.sh` (541) → rebuild/state anchor: OpenClaw rebuild flow and state preservation — PR #5223 merged
- [x] `test-launchable-smoke.sh` (593) → platform/manual anchor: launchable image smoke path — PR #5219 merged

### Phase 2 — Independent quick wins / no anchor dependency

These should not need a large shared helper or anchor PR. Convert opportunistically in parallel.

- [x] `test-openshell-version-pin.sh` (288) → installer-script Vitest test for OpenShell pinning — PR #5107 merged
- [x] `test-model-router-provider-routed-inference.sh` (196) → model-router provider-routed inference — PR #5221 merged
- [x] `test-issue-4434-tui-unreachable-inference.sh` (197) → unreachable inference handling — PR #5233 merged
- [x] `test-hermes-root-entrypoint-smoke.sh` (202) → Hermes root entrypoint smoke — PR #5220 merged
- [x] `test-credential-migration.sh` (302) → credential migration behavior — PR #5228 merged
- [x] `test-openclaw-plugin-runtime-exdev.sh` (209) → plugin runtime EXDEV behavior — PR #5232 merged
- [x] `test-runtime-overrides.sh` (337) → runtime override behavior — PR #5229 merged
- [x] `test-openclaw-tui-chat-correlation.sh` (63) → TUI chat correlation smoke — PR #5150 merged
- [x] `test-skill-agent-e2e.sh` (268) → skill-agent E2E — PR #5222 merged

### Phase 3 — Onboarding and baseline dependents

Start after the onboarding anchor shape exists. These may share onboard/run/resume/repair helpers but can be separate PRs.

- [ ] `test-full-e2e.sh` (510) → full Vitest user journey: install → onboard → inference → CLI ops → cleanup
- [ ] `test-cloud-onboard-e2e.sh` (338) → public/cloud onboarding Vitest flow
- [x] `test-cloud-inference-e2e.sh` (291) → cloud inference Vitest flow — PR #5361 merged
- [x] `test-common-egress-agent-e2e.sh` (452) → common egress agent onboarding/runtime flow — PR #5360 merged
- [ ] `test-gpu-double-onboard.sh` (579) → GPU repeated onboarding variant
- [ ] `test-onboard-repair.sh` (400) → repair flow
- [ ] `test-onboard-resume.sh` (350) → interrupted/resumed onboarding
- [x] `test-onboard-negative-paths.sh` (521) → invalid input and failure handling — PR #5152 merged
- [ ] `test-issue-4462-scope-upgrade-approval.sh` (1058) → scope upgrade approval and no-leak process checks

### Phase 4 — Sandbox and gateway dependents

Start after the sandbox/gateway anchor shape exists. These can run in parallel with inference, messaging, Hermes, security, rebuild, and platform tracks.

- [x] `test-sandbox-survival.sh` (795) → lifecycle survival/restart behavior — PR #5332 merged
- [ ] `test-snapshot-commands.sh` (288) → snapshot command behavior
- [ ] `test-diagnostics.sh` (513) → diagnostics collection and expected output
- [ ] `test-issue-2478-crash-loop-recovery.sh` (636) → gateway/sandbox crash-loop recovery
- [ ] `test-concurrent-gateway-ports.sh` (370) → concurrent gateway port allocation
- [x] `test-gateway-drift-preflight.sh` (423) → gateway drift preflight detection — PR #5350 merged
- [ ] `test-gateway-health-honest.sh` (234) → honest gateway health reporting

### Phase 5 — Inference and provider dependents

Start after the inference/provider anchor shape exists. Mock-provider and live-provider slices can be split across owners.

- [ ] `test-gpu-e2e.sh` (693) → GPU/Ollama inference flow
- [ ] `test-ollama-auth-proxy-e2e.sh` (568) → Ollama auth proxy flow
- [ ] `test-kimi-inference-compat.sh` (800) → Kimi compatibility
- [x] `test-openclaw-inference-switch.sh` (519) → OpenClaw inference provider switching — PR #5357 merged
- [ ] `test-hermes-inference-switch.sh` (615) → Hermes inference provider switching
- [x] `test-bedrock-runtime-compatible-anthropic.sh` (1020) → Bedrock/Anthropic-compatible runtime — PR #5356 merged
- [ ] `test-brave-search-e2e.sh` (438) → Brave search integration
- [ ] `test-agent-turn-latency-e2e.sh` (629) → agent turn latency/runtime smoke
- [ ] `test-cron-preflight-inference-local-e2e.sh` (366) → cron preflight `inference.local` / env-proxy mode

### Phase 6 — Messaging, channels, and pairing dependents

Start after the messaging anchor shape exists. Avoid making `test-messaging-providers.sh` the first PR; split provider/channel slices where possible.

- [x] `test-messaging-providers.sh` (3204) → provider matrix for Telegram/Discord/Slack/WeChat/WhatsApp behavior — PR #5364 merged
- [ ] `test-telegram-injection.sh` (476) → Telegram injection guard
- [x] `test-messaging-compatible-endpoint.sh` (679) → compatible endpoint messaging path — PR #5362 merged
- [x] `test-channels-add-remove.sh` (619) → channel add/remove lifecycle — PR #5355 merged
- [ ] `test-channels-stop-start.sh` (813) → channel stop/start lifecycle
- [ ] `test-openclaw-discord-pairing.sh` (637) → OpenClaw Discord pairing
- [ ] `test-openclaw-slack-pairing.sh` (860) → OpenClaw Slack pairing

### Phase 7 — Hermes dependents

Start after the Hermes anchor shape exists. Keep Hermes-specific helpers separate from OpenClaw helpers unless repeated boundaries prove otherwise.

- [ ] `test-hermes-sandbox-secret-boundary.sh` (416) → Hermes sandbox secret boundary
- [ ] `test-hermes-slack-e2e.sh` (663) → Hermes Slack integration
- [ ] `test-hermes-discord-e2e.sh` (656) → Hermes Discord integration

### Phase 8 — Security and policy dependents

Start after the security/policy anchor shape exists, except the independent quick wins already listed in Phase 2.

- [x] `test-shields-config.sh` (671) → shields configuration policy — PR #5337 merged
- [x] `test-credential-sanitization.sh` (816) → credential sanitization and leak checks — PR #5336 merged

### Phase 9 — Rebuild, state, and runtime dependents

Start after the rebuild/state anchor shape exists. Runtime-only scripts that stay local can proceed independently if they do not need rebuild helpers.

- [ ] `test-rebuild-hermes.sh` (406) → Hermes rebuild flow
- [ ] `test-upgrade-stale-sandbox.sh` (241) → stale sandbox upgrade detection/recovery
- [x] `test-sandbox-rebuild.sh` (197) → sandbox rebuild smoke or fold into rebuild coverage — PR #5333 merged
- [ ] `test-openshell-gateway-upgrade.sh` (793) → OpenShell gateway upgrade behavior
- [x] `test-state-backup-restore.sh` (379) → state backup/restore lifecycle — PR #5353 merged
- [ ] `test-overlayfs-autofix.sh` (549) → overlayfs autofix behavior
- [ ] `test-device-auth-health.sh` (375) → device auth health flow
- [ ] `test-tunnel-lifecycle.sh` (516) → tunnel lifecycle
- [x] `test-openclaw-skill-cli-e2e.sh` (340) → OpenClaw skill CLI E2E — PR #5354 merged
- [x] `test-sessions-agents-cli.sh` (501) → sessions/agents CLI E2E — PR #5363 merged

### Phase 10 — Platform and resource-constrained dependents

These are valid parallel work, but they need explicit runner/resource planning and should not block non-platform tracks.

- [ ] `test-spark-install.sh` (157) → Spark install path
- [ ] `test-jetson-nvmap-gpu.sh` (323) → Jetson nvmap/GPU platform path

### Phase 11 — Review, refactor, and shell retirement cleanup

Run this after enough anchor/dependent migrations have landed to see real duplication patterns.

- [ ] Review all migrated tests for repeated local helpers used by 3+ migrations.
- [ ] Promote repeated helper patterns into shared Vitest fixture/support locations with focused tests.
- [ ] Remove redundant local helper copies after shared helpers exist.
- [ ] Delete or quarantine converted legacy shell scripts that remain present only for transition.
- [ ] Remove stale shell workflow wiring and update existing workflow/allowlist contract tests.
- [ ] Confirm no durable legacy bash suite remains beyond intentionally documented platform/manual exceptions.

## Tracking Guidance

Use this issue for migration order, parallelization guidance, and track ownership. Use individual PRs for exact assertion mapping, validation evidence, `e2e-vitest-scenarios.yaml` wiring, same-runner dispatch evidence, and script deletion/retirement notes. When a PR lands equivalent Vitest coverage with a dispatchable Vitest workflow path, link it here and mark the item complete even if deletion is deferred to Phase 11 cleanup.








Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Epic: Migrate legacy bash E2E into the Vitest E2E system #5098

Goal

Core Principle

Definition of Done for a Conversion

Shell Deletion Deferral and Governance

Parallelization Model

Phase Tracker

Phase 0 — Converted / already covered

Phase 1 — Parallel anchor PRs

Phase 2 — Independent quick wins / no anchor dependency

Phase 3 — Onboarding and baseline dependents

Phase 4 — Sandbox and gateway dependents

Phase 5 — Inference and provider dependents

Phase 6 — Messaging, channels, and pairing dependents

Phase 7 — Hermes dependents

Phase 8 — Security and policy dependents

Phase 9 — Rebuild, state, and runtime dependents

Phase 10 — Platform and resource-constrained dependents

Phase 11 — Review, refactor, and shell retirement cleanup

Tracking Guidance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Epic: Migrate legacy bash E2E into the Vitest E2E system #5098

Description

Goal

Core Principle

Definition of Done for a Conversion

Shell Deletion Deferral and Governance

Parallelization Model

Phase Tracker

Phase 0 — Converted / already covered

Phase 1 — Parallel anchor PRs

Phase 2 — Independent quick wins / no anchor dependency

Phase 3 — Onboarding and baseline dependents

Phase 4 — Sandbox and gateway dependents

Phase 5 — Inference and provider dependents

Phase 6 — Messaging, channels, and pairing dependents

Phase 7 — Hermes dependents

Phase 8 — Security and policy dependents

Phase 9 — Rebuild, state, and runtime dependents

Phase 10 — Platform and resource-constrained dependents

Phase 11 — Review, refactor, and shell retirement cleanup

Tracking Guidance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions