test(e2e): migrate platform and remote coverage to scenario suites

Parent epic: #3588

## Goal

Migrate the `platform-remote` E2E coverage area into the layered scenario framework without porting legacy scripts line-for-line. Add the missing primitive layer first, then move assertions into scenario plans/suites with stable IDs.

This issue is also the input for `/vd_spec`: create an implementation spec **and a validation spec** for platform/remote scenario coverage. The spec should make it unambiguous which checks are expected to pass, which are intentionally deferred/skipped, and what evidence the implementation PR must show.

## Scope definition: `platform-remote`

This domain covers E2E behaviors tied to platform-specific or remote execution paths, including:

- GPU/local Ollama host flows
- Brev CPU/GPU remote branch validation and launchable install flows
- DGX Spark / DGX Station style local-model hosts
- macOS and Windows/WSL workflow/platform execution paths
- OpenShell Docker-driver gateway/network topology where it differs by platform
- platform-specific install/onboard preflight and recovery behavior

Out of scope unless explicitly pulled in by the spec: messaging-provider behavior, general policy presets, generic negative-path onboarding, and non-platform-specific CLI parsing.

## Legacy / current E2E coverage to absorb

Migrate the highest-value assertions from these existing E2E assets into scenario-suite coverage. Do not port scripts line-for-line; extract the stable behaviors and classify the rest as `covered`, `deferred`, or `retired`.

### Existing E2E assets and expected migrated assertions

| Existing asset | Coverage to consider migrating |
| --- | --- |
| `test/e2e/test-gpu-e2e.sh` | GPU host preflight; Ollama install/start; sandbox GPU enabled; GPU proof logs; auth proxy token persistence and permissions; proxy reject/accept auth behavior; proxy recovery from persisted token; direct Ollama inference; sandbox `inference.local` to Ollama inference; destroy/uninstall cleanup. |
| `test/e2e/test-gpu-double-onboard.sh` | Re-onboard with Ollama keeps the persisted proxy token valid and sandbox inference still works. This is the core regression from #2606 / PR #2617. |
| `test/e2e/test-launchable-smoke.sh` | Brev launchable bootstrap artifacts; sentinel file; `nemoclaw`/`openshell` availability; non-interactive cloud onboard; sandbox health; `inference.local` routing; `openclaw agent` mediated inference. |
| `test/e2e/test-spark-install.sh` | Linux/DGX Spark install path succeeds; CLI and OpenShell are on PATH; non-interactive install works. |
| `test/e2e/brev-e2e.test.ts` | Clean Brev CPU/GPU VM branch validation; source install; remote setup; selected suite dispatch; Brev GPU runtime/network preparation. |
| `.github/workflows/macos-e2e.yaml` | macOS platform workflow/runner requirements and dispatch metadata. |
| `.github/workflows/wsl-e2e.yaml` | Windows/WSL platform workflow/runner requirements and dispatch metadata. |

## New merged/platform-remote work to evaluate for scenario coverage

These are not necessarily already covered by existing E2E. The spec should decide whether each becomes a scenario assertion, onboarding assertion, workflow metadata item, or deferred item.

| Issue / PR | Existing E2E added? | Coverage decision needed for #3816 |
| --- | --- | --- |
| #3975 / PR #4180 | No | DGX Spark/aarch64 OpenShell-managed runtime health: delivery-chain health should be accepted when direct in-container `127.0.0.1` probe fails but gateway process/forward is healthy. |
| #4178 / PR #4186 | No | DGX Spark old Ollama upgrade loop: old host Ollama should trigger explicit upgrade path, not silent reuse/validation loop. |
| #4113 / PR #4132 | No | DGX Spark/Ollama model selection: local model selection should use available memory, not total memory. |
| #4114 / PR #4135 | No | Headless/non-interactive Linux Ollama install should support user-local fallback when sudo/system install is unavailable. |
| #3989 / PR #4060 | No | WSL/source install should bootstrap OpenShell before onboard instead of failing with circular advice. |
| #3974 / PR #4101 | No | Fresh Windows ARM with WSL enabled but no distro should install/register Ubuntu 24.04 or emit actionable failure. |
| #3986 / PR #4106 | No | WSL idle OpenShell gateway recovery: maintenance/list/backup path should recover named gateway and retry. |
| #3988 / PR #4062 | No | Windows ARM fake GPU detection: WDDM placeholder/non-NVIDIA GPU names should not pass NVIDIA GPU preflight. |
| #4177 / PR #4183 | No | Sandbox build context permissions: staged `/opt/nemoclaw` files should be readable by sandbox user so OpenClaw plugin install can succeed. |
| PR #4008 | No | Jetson GPU backend: Jetson/Tegra sandbox GPU mode uses NVIDIA runtime path rather than NVML/CDI assumptions. |
| #3473 / #3710 / PR #3965 | No | Jetson forced GPU passthrough should fail early with guidance; default/auto should stay CPU path. |
| PR #3963 | No | Spark GPU recreate should preserve the `nemoclaw-start` sandbox command and avoid stale Hermes runtime lock failure. |
| #3959 / PR #3960 | Partially | Brev GPU bridge gateway reachability: existing E2E infra was hardened in `brev-e2e.test.ts` and `test-gpu-e2e.sh`; decide whether a scenario assertion is still needed. #3959 remains open, so this may be deferred until live validation. |
| PR #4214 | Partially | Public installer E2Es must install the target ref, not silently install `main`; existing workflow/script validation was added, but scenario workflow metadata may need to preserve this. |
| PR #4046 | Workflow only | macOS/WSL/nightly workflow Node/action runtime updates; likely metadata only, not a behavior assertion. |
| PR #4038 | Yes | Existing E2E OpenClaw JSON parsing was hardened. Migrate only the stable platform/remote behavior if needed; helper parsing itself is not the domain behavior. |
| PR #4039 | Yes | Launchable smoke agent probe was hardened with `--thinking off` and better failure evidence; fold into launchable-smoke migrated assertions if retained. |

## Architecture contract

- Add or extend the domain primitive library: `test/e2e/validation_suites/lib/platform_remote.sh`.
- Helpers must consume `$E2E_CONTEXT_DIR/context.env`; suites must not reinstall, onboard, or rediscover setup state.
- Add/extend suite family entries in `test/e2e/validation_suites/suites.yaml`.
- Add onboarding profiles/test plans/onboarding assertions only when the behavior belongs before expected-state validation.
- Emit stable assertion IDs using `<layer>.<domain>.<behavior>`.
- Preserve compatibility with existing `run-scenario.sh <id> --plan-only` behavior.
- If parity-map metadata still exists in the target branch, update `test/e2e/docs/parity-map.yaml` with `layer`, `gap_domain`, `owner`, and runner/secret requirements where applicable. If the parity workflow/map has been removed, capture the same metadata in the current scenario coverage/reporting mechanism instead.

## Spec requirements for `/vd_spec`

The generated spec should include:

1. A coverage inventory mapping every item above to one of: `covered`, `new assertion`, `deferred`, or `retired`.
2. A scenario/test-plan design for each `covered` or `new assertion` item.
3. Stable assertion IDs for each migrated/new check.
4. Runner/secret/platform requirements for each scenario, including GPU, Brev, DGX Spark, macOS, WSL, and NVIDIA API key requirements where applicable.
5. A validation spec that states which commands/workflows must pass, which scenario IDs are expected to be runnable, and which are intentionally skipped/deferred due to unavailable platform/secrets.
6. Clear pass/fail expectations for the implementation PR: what should pass on the PR branch, what should fail on main or remain unimplemented if the PR is only adding coverage, and what evidence should be attached.

## Validation expectations for the implementation PR

The PR opened from this work should include evidence that:

- Scenario framework/unit validation passes locally for resolver/schema/suite/coverage-report behavior.
- `run-scenario.sh <id> --plan-only` works for each new or changed platform/remote scenario.
- The GitHub **E2E scenario workflow** runs on the PR branch and shows the relevant scenario jobs/checks passing, failing, skipped, or deferred according to the validation spec.
- Any expected failures are intentional and documented in the PR body or validation artifact, with clear follow-up items if they cannot be made green in the same PR.
- Platform-specific scenarios that cannot run on the default PR infrastructure are still represented with metadata and plan-only validation, and are marked as deferred/manual with the exact runner/secret requirement.

## Acceptance criteria

- Domain primitive helpers exist and are used by migrated suite steps.
- At least the highest-value assertions from the listed legacy coverage are mapped to stable scenario assertion IDs.
- New merged platform/remote bugs/features listed above are explicitly accepted into scope, deferred, or rejected with rationale.
- Remaining legacy assertions are explicitly classified as `deferred` or `retired` with layer/domain metadata.
- Scenario framework tests pass for resolver/schema/suite/coverage-report validation.
- The coverage report makes this domain visible as covered, deferred, or retired.
- The implementation PR includes the validation evidence described above, including E2E scenario workflow results on the PR branch.


Existing asset	Coverage to consider migrating
`test/e2e/test-gpu-e2e.sh`	GPU host preflight; Ollama install/start; sandbox GPU enabled; GPU proof logs; auth proxy token persistence and permissions; proxy reject/accept auth behavior; proxy recovery from persisted token; direct Ollama inference; sandbox `inference.local` to Ollama inference; destroy/uninstall cleanup.
`test/e2e/test-gpu-double-onboard.sh`	Re-onboard with Ollama keeps the persisted proxy token valid and sandbox inference still works. This is the core regression from #2606 / PR #2617.
`test/e2e/test-launchable-smoke.sh`	Brev launchable bootstrap artifacts; sentinel file; `nemoclaw`/`openshell` availability; non-interactive cloud onboard; sandbox health; `inference.local` routing; `openclaw agent` mediated inference.
`test/e2e/test-spark-install.sh`	Linux/DGX Spark install path succeeds; CLI and OpenShell are on PATH; non-interactive install works.
`test/e2e/brev-e2e.test.ts`	Clean Brev CPU/GPU VM branch validation; source install; remote setup; selected suite dispatch; Brev GPU runtime/network preparation.
`.github/workflows/macos-e2e.yaml`	macOS platform workflow/runner requirements and dispatch metadata.
`.github/workflows/wsl-e2e.yaml`	Windows/WSL platform workflow/runner requirements and dispatch metadata.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e): migrate platform and remote coverage to scenario suites #3816

Goal

Scope definition: `platform-remote`

Legacy / current E2E coverage to absorb

Existing E2E assets and expected migrated assertions

New merged/platform-remote work to evaluate for scenario coverage

Architecture contract

Spec requirements for `/vd_spec`

Validation expectations for the implementation PR

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue / PR	Existing E2E added?	Coverage decision needed for #3816
#3975 / PR #4180	No	DGX Spark/aarch64 OpenShell-managed runtime health: delivery-chain health should be accepted when direct in-container `127.0.0.1` probe fails but gateway process/forward is healthy.
#4178 / PR #4186	No	DGX Spark old Ollama upgrade loop: old host Ollama should trigger explicit upgrade path, not silent reuse/validation loop.
#4113 / PR #4132	No	DGX Spark/Ollama model selection: local model selection should use available memory, not total memory.
#4114 / PR #4135	No	Headless/non-interactive Linux Ollama install should support user-local fallback when sudo/system install is unavailable.
#3989 / PR #4060	No	WSL/source install should bootstrap OpenShell before onboard instead of failing with circular advice.
#3974 / PR #4101	No	Fresh Windows ARM with WSL enabled but no distro should install/register Ubuntu 24.04 or emit actionable failure.
#3986 / PR #4106	No	WSL idle OpenShell gateway recovery: maintenance/list/backup path should recover named gateway and retry.
#3988 / PR #4062	No	Windows ARM fake GPU detection: WDDM placeholder/non-NVIDIA GPU names should not pass NVIDIA GPU preflight.
#4177 / PR #4183	No	Sandbox build context permissions: staged `/opt/nemoclaw` files should be readable by sandbox user so OpenClaw plugin install can succeed.
PR #4008	No	Jetson GPU backend: Jetson/Tegra sandbox GPU mode uses NVIDIA runtime path rather than NVML/CDI assumptions.
#3473 / #3710 / PR #3965	No	Jetson forced GPU passthrough should fail early with guidance; default/auto should stay CPU path.
PR #3963	No	Spark GPU recreate should preserve the `nemoclaw-start` sandbox command and avoid stale Hermes runtime lock failure.
#3959 / PR #3960	Partially	Brev GPU bridge gateway reachability: existing E2E infra was hardened in `brev-e2e.test.ts` and `test-gpu-e2e.sh`; decide whether a scenario assertion is still needed. #3959 remains open, so this may be deferred until live validation.
PR #4214	Partially	Public installer E2Es must install the target ref, not silently install `main`; existing workflow/script validation was added, but scenario workflow metadata may need to preserve this.
PR #4046	Workflow only	macOS/WSL/nightly workflow Node/action runtime updates; likely metadata only, not a behavior assertion.
PR #4038	Yes	Existing E2E OpenClaw JSON parsing was hardened. Migrate only the stable platform/remote behavior if needed; helper parsing itself is not the domain behavior.
PR #4039	Yes	Launchable smoke agent probe was hardened with `--thinking off` and better failure evidence; fold into launchable-smoke migrated assertions if retained.

test(e2e): migrate platform and remote coverage to scenario suites #3816

Description

Goal

Scope definition: platform-remote

Legacy / current E2E coverage to absorb

Existing E2E assets and expected migrated assertions

New merged/platform-remote work to evaluate for scenario coverage

Architecture contract

Spec requirements for /vd_spec

Validation expectations for the implementation PR

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Scope definition: `platform-remote`

Spec requirements for `/vd_spec`