Parent epic: #3588
Goal
Migrate the platform-remote E2E coverage area into the layered scenario framework without porting legacy scripts line-for-line. Add the missing primitive layer first, then move assertions into scenario plans/suites with stable IDs.
This issue is also the input for /vd_spec: create an implementation spec and a validation spec for platform/remote scenario coverage. The spec should make it unambiguous which checks are expected to pass, which are intentionally deferred/skipped, and what evidence the implementation PR must show.
Scope definition: platform-remote
This domain covers E2E behaviors tied to platform-specific or remote execution paths, including:
- GPU/local Ollama host flows
- Brev CPU/GPU remote branch validation and launchable install flows
- DGX Spark / DGX Station style local-model hosts
- macOS and Windows/WSL workflow/platform execution paths
- OpenShell Docker-driver gateway/network topology where it differs by platform
- platform-specific install/onboard preflight and recovery behavior
Out of scope unless explicitly pulled in by the spec: messaging-provider behavior, general policy presets, generic negative-path onboarding, and non-platform-specific CLI parsing.
Legacy / current E2E coverage to absorb
Migrate the highest-value assertions from these existing E2E assets into scenario-suite coverage. Do not port scripts line-for-line; extract the stable behaviors and classify the rest as covered, deferred, or retired.
Existing E2E assets and expected migrated assertions
| Existing asset |
Coverage to consider migrating |
test/e2e/test-gpu-e2e.sh |
GPU host preflight; Ollama install/start; sandbox GPU enabled; GPU proof logs; auth proxy token persistence and permissions; proxy reject/accept auth behavior; proxy recovery from persisted token; direct Ollama inference; sandbox inference.local to Ollama inference; destroy/uninstall cleanup. |
test/e2e/test-gpu-double-onboard.sh |
Re-onboard with Ollama keeps the persisted proxy token valid and sandbox inference still works. This is the core regression from #2606 / PR #2617. |
test/e2e/test-launchable-smoke.sh |
Brev launchable bootstrap artifacts; sentinel file; nemoclaw/openshell availability; non-interactive cloud onboard; sandbox health; inference.local routing; openclaw agent mediated inference. |
test/e2e/test-spark-install.sh |
Linux/DGX Spark install path succeeds; CLI and OpenShell are on PATH; non-interactive install works. |
test/e2e/brev-e2e.test.ts |
Clean Brev CPU/GPU VM branch validation; source install; remote setup; selected suite dispatch; Brev GPU runtime/network preparation. |
.github/workflows/macos-e2e.yaml |
macOS platform workflow/runner requirements and dispatch metadata. |
.github/workflows/wsl-e2e.yaml |
Windows/WSL platform workflow/runner requirements and dispatch metadata. |
New merged/platform-remote work to evaluate for scenario coverage
These are not necessarily already covered by existing E2E. The spec should decide whether each becomes a scenario assertion, onboarding assertion, workflow metadata item, or deferred item.
| Issue / PR |
Existing E2E added? |
Coverage decision needed for #3816 |
| #3975 / PR #4180 |
No |
DGX Spark/aarch64 OpenShell-managed runtime health: delivery-chain health should be accepted when direct in-container 127.0.0.1 probe fails but gateway process/forward is healthy. |
| #4178 / PR #4186 |
No |
DGX Spark old Ollama upgrade loop: old host Ollama should trigger explicit upgrade path, not silent reuse/validation loop. |
| #4113 / PR #4132 |
No |
DGX Spark/Ollama model selection: local model selection should use available memory, not total memory. |
| #4114 / PR #4135 |
No |
Headless/non-interactive Linux Ollama install should support user-local fallback when sudo/system install is unavailable. |
| #3989 / PR #4060 |
No |
WSL/source install should bootstrap OpenShell before onboard instead of failing with circular advice. |
| #3974 / PR #4101 |
No |
Fresh Windows ARM with WSL enabled but no distro should install/register Ubuntu 24.04 or emit actionable failure. |
| #3986 / PR #4106 |
No |
WSL idle OpenShell gateway recovery: maintenance/list/backup path should recover named gateway and retry. |
| #3988 / PR #4062 |
No |
Windows ARM fake GPU detection: WDDM placeholder/non-NVIDIA GPU names should not pass NVIDIA GPU preflight. |
| #4177 / PR #4183 |
No |
Sandbox build context permissions: staged /opt/nemoclaw files should be readable by sandbox user so OpenClaw plugin install can succeed. |
| PR #4008 |
No |
Jetson GPU backend: Jetson/Tegra sandbox GPU mode uses NVIDIA runtime path rather than NVML/CDI assumptions. |
| #3473 / #3710 / PR #3965 |
No |
Jetson forced GPU passthrough should fail early with guidance; default/auto should stay CPU path. |
| PR #3963 |
No |
Spark GPU recreate should preserve the nemoclaw-start sandbox command and avoid stale Hermes runtime lock failure. |
| #3959 / PR #3960 |
Partially |
Brev GPU bridge gateway reachability: existing E2E infra was hardened in brev-e2e.test.ts and test-gpu-e2e.sh; decide whether a scenario assertion is still needed. #3959 remains open, so this may be deferred until live validation. |
| PR #4214 |
Partially |
Public installer E2Es must install the target ref, not silently install main; existing workflow/script validation was added, but scenario workflow metadata may need to preserve this. |
| PR #4046 |
Workflow only |
macOS/WSL/nightly workflow Node/action runtime updates; likely metadata only, not a behavior assertion. |
| PR #4038 |
Yes |
Existing E2E OpenClaw JSON parsing was hardened. Migrate only the stable platform/remote behavior if needed; helper parsing itself is not the domain behavior. |
| PR #4039 |
Yes |
Launchable smoke agent probe was hardened with --thinking off and better failure evidence; fold into launchable-smoke migrated assertions if retained. |
Architecture contract
- Add or extend the domain primitive library:
test/e2e/validation_suites/lib/platform_remote.sh.
- Helpers must consume
$E2E_CONTEXT_DIR/context.env; suites must not reinstall, onboard, or rediscover setup state.
- Add/extend suite family entries in
test/e2e/validation_suites/suites.yaml.
- Add onboarding profiles/test plans/onboarding assertions only when the behavior belongs before expected-state validation.
- Emit stable assertion IDs using
<layer>.<domain>.<behavior>.
- Preserve compatibility with existing
run-scenario.sh <id> --plan-only behavior.
- If parity-map metadata still exists in the target branch, update
test/e2e/docs/parity-map.yaml with layer, gap_domain, owner, and runner/secret requirements where applicable. If the parity workflow/map has been removed, capture the same metadata in the current scenario coverage/reporting mechanism instead.
Spec requirements for /vd_spec
The generated spec should include:
- A coverage inventory mapping every item above to one of:
covered, new assertion, deferred, or retired.
- A scenario/test-plan design for each
covered or new assertion item.
- Stable assertion IDs for each migrated/new check.
- Runner/secret/platform requirements for each scenario, including GPU, Brev, DGX Spark, macOS, WSL, and NVIDIA API key requirements where applicable.
- A validation spec that states which commands/workflows must pass, which scenario IDs are expected to be runnable, and which are intentionally skipped/deferred due to unavailable platform/secrets.
- Clear pass/fail expectations for the implementation PR: what should pass on the PR branch, what should fail on main or remain unimplemented if the PR is only adding coverage, and what evidence should be attached.
Validation expectations for the implementation PR
The PR opened from this work should include evidence that:
- Scenario framework/unit validation passes locally for resolver/schema/suite/coverage-report behavior.
run-scenario.sh <id> --plan-only works for each new or changed platform/remote scenario.
- The GitHub E2E scenario workflow runs on the PR branch and shows the relevant scenario jobs/checks passing, failing, skipped, or deferred according to the validation spec.
- Any expected failures are intentional and documented in the PR body or validation artifact, with clear follow-up items if they cannot be made green in the same PR.
- Platform-specific scenarios that cannot run on the default PR infrastructure are still represented with metadata and plan-only validation, and are marked as deferred/manual with the exact runner/secret requirement.
Acceptance criteria
- Domain primitive helpers exist and are used by migrated suite steps.
- At least the highest-value assertions from the listed legacy coverage are mapped to stable scenario assertion IDs.
- New merged platform/remote bugs/features listed above are explicitly accepted into scope, deferred, or rejected with rationale.
- Remaining legacy assertions are explicitly classified as
deferred or retired with layer/domain metadata.
- Scenario framework tests pass for resolver/schema/suite/coverage-report validation.
- The coverage report makes this domain visible as covered, deferred, or retired.
- The implementation PR includes the validation evidence described above, including E2E scenario workflow results on the PR branch.
Parent epic: #3588
Goal
Migrate the
platform-remoteE2E coverage area into the layered scenario framework without porting legacy scripts line-for-line. Add the missing primitive layer first, then move assertions into scenario plans/suites with stable IDs.This issue is also the input for
/vd_spec: create an implementation spec and a validation spec for platform/remote scenario coverage. The spec should make it unambiguous which checks are expected to pass, which are intentionally deferred/skipped, and what evidence the implementation PR must show.Scope definition:
platform-remoteThis domain covers E2E behaviors tied to platform-specific or remote execution paths, including:
Out of scope unless explicitly pulled in by the spec: messaging-provider behavior, general policy presets, generic negative-path onboarding, and non-platform-specific CLI parsing.
Legacy / current E2E coverage to absorb
Migrate the highest-value assertions from these existing E2E assets into scenario-suite coverage. Do not port scripts line-for-line; extract the stable behaviors and classify the rest as
covered,deferred, orretired.Existing E2E assets and expected migrated assertions
test/e2e/test-gpu-e2e.shinference.localto Ollama inference; destroy/uninstall cleanup.test/e2e/test-gpu-double-onboard.shtest/e2e/test-launchable-smoke.shnemoclaw/openshellavailability; non-interactive cloud onboard; sandbox health;inference.localrouting;openclaw agentmediated inference.test/e2e/test-spark-install.shtest/e2e/brev-e2e.test.ts.github/workflows/macos-e2e.yaml.github/workflows/wsl-e2e.yamlNew merged/platform-remote work to evaluate for scenario coverage
These are not necessarily already covered by existing E2E. The spec should decide whether each becomes a scenario assertion, onboarding assertion, workflow metadata item, or deferred item.
127.0.0.1probe fails but gateway process/forward is healthy./opt/nemoclawfiles should be readable by sandbox user so OpenClaw plugin install can succeed.nemoclaw-startsandbox command and avoid stale Hermes runtime lock failure.brev-e2e.test.tsandtest-gpu-e2e.sh; decide whether a scenario assertion is still needed. #3959 remains open, so this may be deferred until live validation.main; existing workflow/script validation was added, but scenario workflow metadata may need to preserve this.--thinking offand better failure evidence; fold into launchable-smoke migrated assertions if retained.Architecture contract
test/e2e/validation_suites/lib/platform_remote.sh.$E2E_CONTEXT_DIR/context.env; suites must not reinstall, onboard, or rediscover setup state.test/e2e/validation_suites/suites.yaml.<layer>.<domain>.<behavior>.run-scenario.sh <id> --plan-onlybehavior.test/e2e/docs/parity-map.yamlwithlayer,gap_domain,owner, and runner/secret requirements where applicable. If the parity workflow/map has been removed, capture the same metadata in the current scenario coverage/reporting mechanism instead.Spec requirements for
/vd_specThe generated spec should include:
covered,new assertion,deferred, orretired.coveredornew assertionitem.Validation expectations for the implementation PR
The PR opened from this work should include evidence that:
run-scenario.sh <id> --plan-onlyworks for each new or changed platform/remote scenario.Acceptance criteria
deferredorretiredwith layer/domain metadata.