Skip to content

test(e2e): migrate platform and remote coverage to scenario suites #3816

@jyaunches

Description

@jyaunches

Parent epic: #3588

Goal

Migrate the platform-remote E2E coverage area into the layered scenario framework without porting legacy scripts line-for-line. Add the missing primitive layer first, then move assertions into scenario plans/suites with stable IDs.

This issue is also the input for /vd_spec: create an implementation spec and a validation spec for platform/remote scenario coverage. The spec should make it unambiguous which checks are expected to pass, which are intentionally deferred/skipped, and what evidence the implementation PR must show.

Scope definition: platform-remote

This domain covers E2E behaviors tied to platform-specific or remote execution paths, including:

  • GPU/local Ollama host flows
  • Brev CPU/GPU remote branch validation and launchable install flows
  • DGX Spark / DGX Station style local-model hosts
  • macOS and Windows/WSL workflow/platform execution paths
  • OpenShell Docker-driver gateway/network topology where it differs by platform
  • platform-specific install/onboard preflight and recovery behavior

Out of scope unless explicitly pulled in by the spec: messaging-provider behavior, general policy presets, generic negative-path onboarding, and non-platform-specific CLI parsing.

Legacy / current E2E coverage to absorb

Migrate the highest-value assertions from these existing E2E assets into scenario-suite coverage. Do not port scripts line-for-line; extract the stable behaviors and classify the rest as covered, deferred, or retired.

Existing E2E assets and expected migrated assertions

Existing asset Coverage to consider migrating
test/e2e/test-gpu-e2e.sh GPU host preflight; Ollama install/start; sandbox GPU enabled; GPU proof logs; auth proxy token persistence and permissions; proxy reject/accept auth behavior; proxy recovery from persisted token; direct Ollama inference; sandbox inference.local to Ollama inference; destroy/uninstall cleanup.
test/e2e/test-gpu-double-onboard.sh Re-onboard with Ollama keeps the persisted proxy token valid and sandbox inference still works. This is the core regression from #2606 / PR #2617.
test/e2e/test-launchable-smoke.sh Brev launchable bootstrap artifacts; sentinel file; nemoclaw/openshell availability; non-interactive cloud onboard; sandbox health; inference.local routing; openclaw agent mediated inference.
test/e2e/test-spark-install.sh Linux/DGX Spark install path succeeds; CLI and OpenShell are on PATH; non-interactive install works.
test/e2e/brev-e2e.test.ts Clean Brev CPU/GPU VM branch validation; source install; remote setup; selected suite dispatch; Brev GPU runtime/network preparation.
.github/workflows/macos-e2e.yaml macOS platform workflow/runner requirements and dispatch metadata.
.github/workflows/wsl-e2e.yaml Windows/WSL platform workflow/runner requirements and dispatch metadata.

New merged/platform-remote work to evaluate for scenario coverage

These are not necessarily already covered by existing E2E. The spec should decide whether each becomes a scenario assertion, onboarding assertion, workflow metadata item, or deferred item.

Issue / PR Existing E2E added? Coverage decision needed for #3816
#3975 / PR #4180 No DGX Spark/aarch64 OpenShell-managed runtime health: delivery-chain health should be accepted when direct in-container 127.0.0.1 probe fails but gateway process/forward is healthy.
#4178 / PR #4186 No DGX Spark old Ollama upgrade loop: old host Ollama should trigger explicit upgrade path, not silent reuse/validation loop.
#4113 / PR #4132 No DGX Spark/Ollama model selection: local model selection should use available memory, not total memory.
#4114 / PR #4135 No Headless/non-interactive Linux Ollama install should support user-local fallback when sudo/system install is unavailable.
#3989 / PR #4060 No WSL/source install should bootstrap OpenShell before onboard instead of failing with circular advice.
#3974 / PR #4101 No Fresh Windows ARM with WSL enabled but no distro should install/register Ubuntu 24.04 or emit actionable failure.
#3986 / PR #4106 No WSL idle OpenShell gateway recovery: maintenance/list/backup path should recover named gateway and retry.
#3988 / PR #4062 No Windows ARM fake GPU detection: WDDM placeholder/non-NVIDIA GPU names should not pass NVIDIA GPU preflight.
#4177 / PR #4183 No Sandbox build context permissions: staged /opt/nemoclaw files should be readable by sandbox user so OpenClaw plugin install can succeed.
PR #4008 No Jetson GPU backend: Jetson/Tegra sandbox GPU mode uses NVIDIA runtime path rather than NVML/CDI assumptions.
#3473 / #3710 / PR #3965 No Jetson forced GPU passthrough should fail early with guidance; default/auto should stay CPU path.
PR #3963 No Spark GPU recreate should preserve the nemoclaw-start sandbox command and avoid stale Hermes runtime lock failure.
#3959 / PR #3960 Partially Brev GPU bridge gateway reachability: existing E2E infra was hardened in brev-e2e.test.ts and test-gpu-e2e.sh; decide whether a scenario assertion is still needed. #3959 remains open, so this may be deferred until live validation.
PR #4214 Partially Public installer E2Es must install the target ref, not silently install main; existing workflow/script validation was added, but scenario workflow metadata may need to preserve this.
PR #4046 Workflow only macOS/WSL/nightly workflow Node/action runtime updates; likely metadata only, not a behavior assertion.
PR #4038 Yes Existing E2E OpenClaw JSON parsing was hardened. Migrate only the stable platform/remote behavior if needed; helper parsing itself is not the domain behavior.
PR #4039 Yes Launchable smoke agent probe was hardened with --thinking off and better failure evidence; fold into launchable-smoke migrated assertions if retained.

Architecture contract

  • Add or extend the domain primitive library: test/e2e/validation_suites/lib/platform_remote.sh.
  • Helpers must consume $E2E_CONTEXT_DIR/context.env; suites must not reinstall, onboard, or rediscover setup state.
  • Add/extend suite family entries in test/e2e/validation_suites/suites.yaml.
  • Add onboarding profiles/test plans/onboarding assertions only when the behavior belongs before expected-state validation.
  • Emit stable assertion IDs using <layer>.<domain>.<behavior>.
  • Preserve compatibility with existing run-scenario.sh <id> --plan-only behavior.
  • If parity-map metadata still exists in the target branch, update test/e2e/docs/parity-map.yaml with layer, gap_domain, owner, and runner/secret requirements where applicable. If the parity workflow/map has been removed, capture the same metadata in the current scenario coverage/reporting mechanism instead.

Spec requirements for /vd_spec

The generated spec should include:

  1. A coverage inventory mapping every item above to one of: covered, new assertion, deferred, or retired.
  2. A scenario/test-plan design for each covered or new assertion item.
  3. Stable assertion IDs for each migrated/new check.
  4. Runner/secret/platform requirements for each scenario, including GPU, Brev, DGX Spark, macOS, WSL, and NVIDIA API key requirements where applicable.
  5. A validation spec that states which commands/workflows must pass, which scenario IDs are expected to be runnable, and which are intentionally skipped/deferred due to unavailable platform/secrets.
  6. Clear pass/fail expectations for the implementation PR: what should pass on the PR branch, what should fail on main or remain unimplemented if the PR is only adding coverage, and what evidence should be attached.

Validation expectations for the implementation PR

The PR opened from this work should include evidence that:

  • Scenario framework/unit validation passes locally for resolver/schema/suite/coverage-report behavior.
  • run-scenario.sh <id> --plan-only works for each new or changed platform/remote scenario.
  • The GitHub E2E scenario workflow runs on the PR branch and shows the relevant scenario jobs/checks passing, failing, skipped, or deferred according to the validation spec.
  • Any expected failures are intentional and documented in the PR body or validation artifact, with clear follow-up items if they cannot be made green in the same PR.
  • Platform-specific scenarios that cannot run on the default PR infrastructure are still represented with metadata and plan-only validation, and are marked as deferred/manual with the exact runner/secret requirement.

Acceptance criteria

  • Domain primitive helpers exist and are used by migrated suite steps.
  • At least the highest-value assertions from the listed legacy coverage are mapped to stable scenario assertion IDs.
  • New merged platform/remote bugs/features listed above are explicitly accepted into scope, deferred, or rejected with rationale.
  • Remaining legacy assertions are explicitly classified as deferred or retired with layer/domain metadata.
  • Scenario framework tests pass for resolver/schema/suite/coverage-report validation.
  • The coverage report makes this domain visible as covered, deferred, or retired.
  • The implementation PR includes the validation evidence described above, including E2E scenario workflow results on the PR branch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: e2eEnd-to-end tests, nightly failures, or validation infrastructure
    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions