Skip to content

Phase 1: Environment, Manifest, Fixture, and Runtime Action Primitives (E2E audit-coverage) #4347

@jyaunches

Description

@jyaunches

Phase 1: Environment, Manifest, Fixture, and Runtime Action Primitives

Parent epic: #3588

Goal

Establish the foundational primitives the entire scenario E2E framework depends on. This phase introduces the 5-part scenario contract schema (environment / manifest-or-no-manifest-reason / fixtures / runtime actions / assertions), a setup-only/host-only scenario type that does not require a NemoClawInstance manifest, the runtime action runner with ordered evidence, and reusable fake-service / state-staging / dangerous-fixture-cleanup primitives that downstream phases consume.

This phase also delivers the audit-row coverage for legacy root scripts that are setup-heavy but assertion-light, where the entire test can be modeled as a hermetic / host-only scenario contract.

Audit rows in scope

AQ Phase Legacy subject Required coverage Boundary Planned scenario/assertion Fixtures/actions
AQ-001 1 Setup and onboarding manifest audit section Scenario contracts declare environment, manifest or no-manifest reason, fixtures, runtime actions, assertions, expected failures, and runner requirements. host CLI contract schema/resolver assertions none
AQ-002 1 test-gateway-drift-preflight.sh Hermetic preflight contract detects gateway drift without requiring product manifest. gateway gateway drift assertion module fake gateway/config fixture
AQ-003 1 test-gateway-health-honest.sh Gateway health scenario distinguishes honest healthy/unhealthy states and cannot pass from a generic HTTP probe alone. gateway gateway health-honest assertion module fake gateway fixture
AQ-004 1 test-openshell-version-pin.sh Host-only scenario verifies expected OpenShell version pin behavior. host CLI openshell version assertion module fake openshell CLI fixture
AQ-005 1 test-onboard-inference-smoke.sh Host-only/onboard smoke contract proves inference smoke setup without live provider secrets. provider/integration onboard inference smoke assertion module fake provider fixture
AQ-006 1 test-docs-validation.sh Docs validation is represented as setup-only/host-only scenario with real command assertions. host CLI docs validation assertion module docs fixture
AQ-007 1 test-ollama-auth-proxy-e2e.sh setup requirements Ollama auth proxy setup declares port, token, cleanup, and proxy fixture requirements. provider/integration ollama auth proxy setup contract ollama/proxy fixture
AQ-008 1 Dangerous fixtures Docker daemon, /etc/hosts, policy, blueprint, image, and port mutations require cleanup/restore tests. cleanup fixture cleanup validator mutation fixtures
AQ-009 1 Runtime actions Ordered lifecycle actions such as channels.add, inference.set, snapshot.create, rebuild, and upgrade emit ordered evidence. host CLI runtime action evidence assertions action runner

Required scenario contracts to add

Each lands with a stable scenario ID, no fake NemoClawInstance manifest, and real assertions emitting PASS:/FAIL: markers and evidence paths.

  • gateway-drift-preflight — hermetic, fake gateway/config fixture
  • gateway-health-honest — hermetic, fake gateway fixture, distinguishes honest healthy/unhealthy
  • installer-openshell-version-pin — host-only, fake OpenShell CLI fixture
  • onboard-inference-smoke — host-only, fake provider fixture
  • docs-validation — host-only, docs fixture
  • host-ollama-auth-proxy — host-only, Ollama/proxy fixture, declares port/token/cleanup

Required primitives to add

Fixtures (test/e2e-scenario/nemoclaw_scenarios/fixtures/)

  • Fake services lifecycle: OpenAI-compatible, Bedrock-compatible, Kimi, Discord Gateway, Slack API, Telegram, model router
  • Fake CLI/client fixtures: fake OpenShell, fake Docker, fake installer/download tools
  • State staging: ~/.nemoclaw/sandboxes.json, onboard-session.json, legacy credentials.json, provider records
  • Port holders + port probes
  • Old image fixtures: OpenClaw / Hermes / rebuild / upgrade
  • Crash shim fixture for openshell-gateway
  • Cleanup/restore obligations declared and tested for every dangerous fixture (Docker daemon mutation, /etc/hosts, blueprint, policy, image, port mutations)

Runtime actions (test/e2e-scenario/nemoclaw_scenarios/runtime-actions/)

  • Runtime action runner with ordered evidence emission
  • Action primitives stubbed for downstream phases: channels.add, inference.set, snapshot.create, rebuild, upgrade

Assertion modules (test/e2e-scenario/validation_suites/assert/)

  • Gateway drift assertion module
  • Gateway health-honest assertion module
  • OpenShell version assertion module
  • Onboard inference smoke assertion module
  • Docs validation assertion module
  • Ollama auth proxy setup contract assertion module
  • Fixture cleanup validator
  • Runtime action evidence assertion module

Validation scenarios — all must pass in PR workflow artifacts

Scenario 1.1 — Host-only hermetic scripts run without product manifests (Happy Path)

  • Given Gateway drift, gateway health honest, OpenShell version pin, docs validation, or Ollama auth proxy host-only scenarios declare explicit no-manifest reasons.
  • When Scenario resolution and preview run.
  • Then The resolved contract includes environment, fixtures, runtime actions if needed, real assertions, and no fake NemoClawInstance manifest.
  • Steps
    1. Filter audit rows AQ-001…AQ-009 for Phase 1; collect the PR workflow evidence report for the planned contract/schema/assertion modules.
    2. npx vitest run test/e2e-scenario/framework-tests as run by the PR workflow.
    3. Verify each matched audit row is present in workflow evidence with stable assertion IDs, artifact paths, and passing status; verify no-manifest reason is present and scenario is not blocked by missing product manifest.

Scenario 1.2 — Dangerous fixtures cannot omit cleanup (Sad Path)

  • Given A fixture mutates Docker daemon config, /etc/hosts, policies, blueprint files, or images.
  • When Fixture validation runs.
  • Then Validation fails unless cleanup/restore obligations and tests are declared.
  • Steps
    1. Filter row AQ-008; create mutation fixture metadata with cleanup omitted.
    2. Run fixture validation as the PR workflow would.
    3. Verify validation fails, names the mutation type and missing cleanup obligation, and keeps AQ-008 unresolved.

Scenario 1.3 — Runtime action evidence preserves declared order (Happy Path)

  • Given A scenario declares ordered runtime actions such as channels.add, inference.set, snapshot.create, rebuild.
  • When The runtime action runner creates a plan or executes hermetically.
  • Then Evidence records appear in declaration order and assertions can reference action outputs.
  • Steps
    1. Filter row AQ-009; collect PR workflow evidence for scenarios with multiple runtime actions.
    2. Run runtime action planner/runner tests as workflow jobs.
    3. Verify AQ-009 is present in workflow evidence with stable assertion IDs/artifact paths and ordering/output-dependency assertions pass.

Acceptance criteria — issue is NOT DONE until ALL are true

  1. PR landed in test/e2e-scenario/ adding all scenario contracts, fixtures, runtime actions, and assertion modules listed above.
  2. PR CI passing:
    • npx vitest run test/e2e-scenario/framework-tests — passes
    • All 6 hermetic / host-only scenarios run in workflow and emit PASS: markers with stable assertion IDs
  3. Validation Scenarios 1.1–1.3 all pass in PR workflow artifacts (Given/When/Then above).
  4. Audit work queue updated: rows AQ-001 through AQ-009 in docs/e2e-audit-work-queue.md flipped from not-started to evidence-complete, with stable assertion IDs and evidence artifact paths populated.
  5. Phase-specific validation gate (from spec): host-only/hermetic scenarios do not require fake product manifests; fixture setup/teardown tested without live cloud secrets; dangerous fixtures include cleanup/restore tests; the four named scripts (test-gateway-drift-preflight.sh, test-gateway-health-honest.sh, test-openshell-version-pin.sh, test-onboard-inference-smoke.sh) represented as hermetic scenario contracts with real assertions.
  6. No-cheat gate: no row marked complete by preview/dry-run/metadata-only output.
  7. Cleanup gate: every dangerous fixture has restore logic + test (AQ-008).
  8. Secret gate: no raw secrets, fake bad keys, or assertion IDs in any product-facing manifest.
  9. PR description references AQ rows covered, links this issue.

Out of scope

  • Product manifest expansion (Phase 2+)
  • Onboarding/install flow assertions (Phase 2)
  • Provider/routing/inference assertions (Phase 3)
  • Legacy script deletion (Phase 11+)

Dependencies

  • None. This is the foundation phase; everything else waits on it.

Cross-phase acceptance gates (apply to every phase)

  1. Setup gate — scenario contract declares environment, manifest or no-manifest reason, fixtures, runtime actions, assertions.
  2. No-cheat gate — preview/dry-run output cannot mark an audit row complete.
  3. Boundary gate — assertions touch the same SUT boundary as the legacy script.
  4. Evidence gate — every assertion emits an evidence path and stable assertion ID.
  5. Secret gate — no manifest, log, report, or fixture file contains raw secrets.
  6. Cleanup gate — fixtures that mutate host or repo state have restore/cleanup logic and tests.
  7. Audit completeness gate — every assigned audit row has owner, planned scenario/assertion, phase assignment, evidence status.
  8. Phase completion gate — phase complete only when every assigned row has executable evidence (or independent audit amendment).
  9. Executable assertion gate — completed scenarios point to concrete suite steps / assertion modules, not pendingStep(...), TODOs, generic probes, or prose.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: e2eEnd-to-end tests, nightly failures, or validation infrastructure
    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions