Skip to content

Phase 2: Onboarding and Installer Audit Coverage (E2E audit-coverage) #4348

@jyaunches

Description

@jyaunches

Phase 2: Onboarding and Installer Audit Coverage

Parent epic: #3588

Goal

Cover every audited assertion and control-flow row from the legacy onboarding and installer E2E scripts. The full OpenClaw cloud onboarding path must be proven through three distinct inference surfaces (direct provider, sandbox inference.local, OpenClaw-mediated agent). Negative onboarding paths must fail closed with no forbidden side effects. Public/launchable install sources cannot be satisfied by repo-current install manifests.

Audit rows in scope

AQ Phase Legacy subject Required coverage Boundary Planned scenario/assertion Fixtures/actions
AQ-010 2 test-full-e2e.sh Full OpenClaw path covers install, onboard, gateway, sandbox, credentials, policy, and inference surfaces. host CLI/sandbox/provider full e2e scenario contract install/onboard actions
AQ-011 2 test-cloud-onboard-e2e.sh Cloud onboard covers OpenClaw setup, sandbox state, gateway route, credentials, expected policy presets. sandbox/provider cloud onboard assertions cloud provider fixture / live secrets
AQ-012 2 test/e2e/e2e-cloud-experimental/checks/*.sh Delegated cloud checks cover inference-local HTTP, security checks, landlock/read-only behavior. sandbox/security-policy cloud delegated check assertions cloud sandbox fixture
AQ-013 2 test-cloud-inference-e2e.sh Cloud inference proves direct provider chat, sandbox inference.local, and OpenClaw-mediated response as distinct evidence. provider/integration cloud inference surface assertions fake/live provider
AQ-014 2 test-hermes-e2e.sh Hermes onboarding validates agent selection, sandbox readiness, inference, Hermes-specific health. agent runtime hermes onboard assertions hermes sandbox fixture
AQ-015 2 test-double-onboard.sh and test-gpu-double-onboard.sh Repeated onboarding preserves/updates registry correctly and rejects stale or duplicate state. durable state double-onboard state assertions staged registry fixture
AQ-016 2 test-onboard-negative-paths.sh Invalid key, Docker/preflight failure, gateway port conflict, bad input fail closed without forbidden side effects. host CLI / durable state negative onboard assertions bad key / port-holder fixtures
AQ-017 2 test-onboard-resume.sh and test-onboard-repair.sh Resume/repair preserves expected state and repairs incomplete onboarding artifacts. durable state resume/repair assertions staged session fixture
AQ-018 2 test-launchable-smoke.sh and Brev launchable flow Public/launchable install path not satisfied by repo-current; proves launchable sentinel/readiness. host CLI launchable smoke assertions fake download/Brev fixture

Required manifests to add (test/e2e-scenario/nemoclaw_scenarios/manifests/)

  • openclaw-nvidia.yaml
  • openclaw-nvidia-public-curl.yaml
  • openclaw-nvidia-cloud-inference.yaml (when explicit model evidence is required)
  • openclaw-openai-compatible-double-onboard.yaml
  • openclaw-nvidia-invalid-key-negative.yaml
  • openclaw-nvidia-gateway-port-conflict.yaml
  • openclaw-nvidia-custom-policies.yaml
  • openclaw-nvidia-resume-after-interrupt.yaml
  • openclaw-nvidia-repair-existing-config.yaml
  • launchable-cloud-nvidia-openclaw.yaml
  • dgx-spark-install-only.yaml or explicit setup-only no-manifest scenario

Required fixtures / runtime actions

  • Public installer source/ref/log verification fixture
  • Fake OpenAI endpoint for double-onboard
  • Port-holder fixture for gateway port conflict
  • Bad-key fixture (injected by scenario, not stored in manifest)
  • Interrupted session fixture for resume/repair
  • Missing recorded sandbox fixture for repair
  • Launchable clone/sentinel fixture
  • Direct cloud / sandbox route / OpenClaw-mediated prompt payload fixtures
  • Hermes health/config/log fixtures

Required assertions

  • Install source/ref correctness
  • CLI/OpenShell availability
  • Direct NVIDIA chat, sandbox inference.local chat, and OpenClaw-mediated agent response as three distinct passing assertions
  • Hermes runtime health/config/log assertions as distinct
  • Gateway reuse and no port conflicts during double-onboard
  • Stale registry reconciliation and lifecycle guidance
  • Resume cached-step skipping and session completion
  • Repair recreates missing recorded sandbox and rejects conflicting resume requests
  • Launchable artifacts and sentinel readiness
  • Delegated cloud-experimental check PASS/FAIL outcomes (inference-local HTTP, security checks, Landlock readonly)

Validation scenarios — all must pass in PR workflow artifacts

Scenario 2.1 — Cloud OpenClaw onboarding is complete only with all three inference surfaces (Happy Path)

  • Given A live or hermetic OpenClaw cloud onboarding scenario completes onboarding.
  • When Direct provider chat, sandbox inference.local chat, and OpenClaw-mediated agent response assertions all run.
  • Then Onboarding can be marked complete with distinct evidence paths for each surface.
  • Steps
    1. Filter rows AQ-010…AQ-018; provision OpenClaw cloud onboarding contract with declared secrets or hermetic fake provider; collect PR workflow evidence.
    2. Scenario runner / Vitest workflow jobs run onboarding and three inference assertion modules.
    3. Verify each matched row is present with passing stable assertion IDs/artifact paths; onboarding is complete only after all three surface assertions pass.

Scenario 2.2 — Negative onboarding leaves no forbidden side effects (Sad Path)

  • Given Invalid NVIDIA key or gateway port conflict fixtures injected by scenario setup, not by product manifests.
  • When Onboarding is executed.
  • Then It exits with the expected message, no stack trace, and no sandbox/gateway/credential side effects.
  • Steps
    1. Filter row AQ-016; stage bad-key or port-holder fixture; collect PR workflow evidence for negative onboarding.
    2. Run onboarding action.
    3. Verify AQ-016 remains incomplete unless workflow evidence shows expected message, no stack trace, and no side effects.

Scenario 2.3 — Public installer and launchable flows are not satisfied by repo-current (Sad Path)

  • Given A public-curl, launchable, Spark, or installer scenario is wired to a repo-current install manifest.
  • When Contract validation runs.
  • Then Validation fails and asks for explicit install source/ref/log evidence or setup-only scenario.
  • Steps
    1. Filter row AQ-018; create invalid install-source metadata fixture.
    2. Run manifest/contract validation in workflow.
    3. Verify workflow evidence shows repo-current substitution is rejected and AQ-018 remains unresolved until public/launchable evidence exists.

Acceptance criteria — issue is NOT DONE until ALL are true

  1. PR landed in test/e2e-scenario/ adding all manifests, fixtures, runtime actions, and assertion modules above.
  2. PR CI passing:
    • All 9 audit-row scenarios run in workflow and emit PASS: markers
    • Happy-path OpenClaw cloud onboarding asserts all three inference surfaces — single-surface evidence is rejected
    • Negative onboarding asserts failure message + no stack trace + no forbidden side effects
  3. Validation Scenarios 2.1–2.3 all pass in PR workflow artifacts.
  4. Audit work queue updated: AQ-010 through AQ-018 flipped to evidence-complete with stable assertion IDs and evidence paths.
  5. Phase-specific validation gate (from spec): happy-path onboarding complete only when all three inference surfaces are covered; negative onboarding complete only when failure message + no stack trace + forbidden side effects all asserted; public installer/launchable cannot be satisfied by repo-current install manifests.
  6. No-cheat gate: generic /health or single-surface inference cannot satisfy AQ-013/AQ-014; repo-current cannot satisfy AQ-018.
  7. Secret gate: bad-key fixture is scenario-injected, never in product manifest; no raw secrets in manifests/logs.
  8. PR description references AQ rows covered, links this issue.

Dependencies

Out of scope

  • Provider/routing/config-shape (Phase 3)
  • GPU/Ollama (Phase 4)
  • Messaging lifecycle (Phase 5+)
  • Hermes Discord/Slack deep flow (Phase 6)
  • Security/credentials (Phase 7)

Cross-phase acceptance gates (apply to every phase)

  1. Setup gate — scenario contract declares environment, manifest or no-manifest reason, fixtures, runtime actions, assertions.
  2. No-cheat gate — preview/dry-run output cannot mark an audit row complete.
  3. Boundary gate — assertions touch the same SUT boundary as the legacy script.
  4. Evidence gate — every assertion emits an evidence path and stable assertion ID.
  5. Secret gate — no manifest, log, report, or fixture file contains raw secrets.
  6. Cleanup gate — fixtures that mutate host or repo state have restore/cleanup logic and tests.
  7. Audit completeness gate — every assigned audit row has owner, planned scenario/assertion, phase assignment, evidence status.
  8. Phase completion gate — phase complete only when every assigned row has executable evidence (or independent audit amendment).
  9. Executable assertion gate — completed scenarios point to concrete suite steps / assertion modules, not pendingStep(...), TODOs, generic probes, or prose.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: e2eEnd-to-end tests, nightly failures, or validation infrastructure
    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions