You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The target architecture has shifted from YAML-defined E2E scenarios to a hybrid model:
Onboarding configuration YAML is product-facing desired setup/onboarding state. It should be backup/update-friendly and useful for materializing a NemoClaw instance outside of tests.
E2E scenarios are deterministic typed builders in code. They define stable scenario IDs, matrix combinations, and assertion group composition.
Assertions are logical reusable code modules, not YAML. Scenario builders compose assertion groups, and --plan-only shows the expanded assertion list before execution.
Phase orchestrators own phase-local actions and assertions: environment, onboarding, and runtime.
Shared E2E clients/adapters wrap real NemoClaw product boundaries for reusable act/observe primitives.
%%{init: {"flowchart": {"htmlLabels": true, "nodeSpacing": 70, "rankSpacing": 95, "curve": "basis"}}}%%
flowchart LR
%% NemoClaw E2E architecture — hybrid scenario builders + onboarding manifests
classDef yaml fill:#f8fafc,stroke:#475569,stroke-width:2px,color:#0f172a
classDef builder fill:#eef8e8,stroke:#76B900,stroke-width:3px,color:#10220a
classDef module fill:#eff6ff,stroke:#2563eb,stroke-width:2px,color:#102040
classDef orch fill:#f0fdf4,stroke:#16a34a,stroke-width:2px,color:#052e16
classDef client fill:#f5f3ff,stroke:#7c3aed,stroke-width:2px,color:#24103f
classDef sut fill:#fff7ed,stroke:#ea580c,stroke-width:2px,color:#431407
classDef state fill:#ecfeff,stroke:#0891b2,stroke-width:2px,color:#083344
classDef output fill:#dcfce7,stroke:#15803d,stroke-width:3px,color:#052e16
classDef note fill:#ffffff,stroke:#334155,stroke-width:1.5px,color:#0f172a
%% ----------------------------------------------------------------------
%% 1. Inputs
%% ----------------------------------------------------------------------
subgraph C1["1. Inputs"]
direction TB
Manifest["<b>Onboarding configuration YAML</b><br/>Product-facing desired setup, not an E2E scenario<br/><br/>• install/runtime choices<br/>• agent/provider/model route<br/>• policy/messaging/lifecycle<br/>• durable refs for backup/update"]:::yaml
Scenarios["<b>Deterministic scenario builders</b><br/>E2E scenarios are typed code<br/><br/>• stable scenario IDs<br/>• environment/onboarding combinations<br/>• matrix rules<br/>• GitHub targeted execution"]:::builder
Assertions["<b>Assertion modules</b><br/>Logical reusable groups in code, not YAML<br/><br/>• environment groups<br/>• onboarding groups<br/>• runtime/domain groups<br/>• stable IDs + evidence output"]:::module
end
%% ----------------------------------------------------------------------
%% 2. Compile / Preview
%% ----------------------------------------------------------------------
subgraph C2["2. Compile / Preview"]
direction TB
Compiler["<b>Plan compiler</b><br/>Combines builder + onboarding YAML<br/><br/>• loads manifest<br/>• resolves selected scenario<br/>• expands assertion groups<br/>• validates phase compatibility"]:::orch
Plan["<b>Plan preview / run plan</b><br/>Visible before execution<br/><br/>• setup/onboarding actions<br/>• ordered phases<br/>• expanded assertion list<br/>• selected SUT boundaries"]:::state
end
%% ----------------------------------------------------------------------
%% 3. Phase-owned execution
%% ----------------------------------------------------------------------
subgraph C3["3. Phase-owned Execution"]
direction TB
Runner["<div style='min-width:760px'><b>E2E runner</b><br/>Coordinates the full run: orders phases, delegates to every phase orchestrator, passes prior phase results forward, aggregates final results</div>"]:::orch
subgraph PhaseOrchestrators["Managed phase orchestrators"]
direction LR
EnvPhase["<b>Environment Orchestrator</b><br/>Runs setup actions<br/>Runs environment assertions<br/>Emits environment.result"]:::orch
OnboardPhase["<b>Onboarding Orchestrator</b><br/>Consumes onboarding config from YAML<br/>Runs onboarding setup/decisions<br/>Runs onboarding assertions<br/>Emits onboarding.result"]:::orch
RuntimePhase["<b>Runtime Orchestrator</b><br/>Runs runtime actions/suites<br/>Runs runtime assertions<br/>Emits runtime.result"]:::orch
end
PhaseAccess["<b>Phase act/observe requests</b><br/>All phase access to NemoClaw goes through shared clients"]:::state
Runner --> EnvPhase
Runner -- "onboarding setup / decisions" --> OnboardPhase
Runner --> RuntimePhase
EnvPhase --> PhaseAccess
OnboardPhase --> PhaseAccess
RuntimePhase --> PhaseAccess
end
%% ----------------------------------------------------------------------
%% 4. Access layer
%% ----------------------------------------------------------------------
subgraph C4["4. Access Layer"]
direction TB
Clients["<b>Shared E2E clients / adapters</b><br/>Framework wrappers around product boundaries<br/><br/>• HostCliClient<br/>• GatewayClient<br/>• SandboxClient<br/>• AgentClient<br/>• ProviderClient<br/>• StateClient<br/><br/><i>Clients expose act/observe primitives;<br/>phases decide workflow and pass/fail meaning.</i>"]:::client
end
%% ----------------------------------------------------------------------
%% 5. System Under Test
%% ----------------------------------------------------------------------
subgraph C5["5. System Under Test"]
direction TB
Host["<b>Host Control Plane</b><br/>NemoClaw CLI<br/>install/update scripts<br/>local config/state<br/>Docker/image/cache"]:::sut
Gateway["<b>OpenShell Gateway</b><br/>process/API<br/>credential store / broker boundary<br/>inference routing<br/>policy/proxy enforcement<br/>sandbox lifecycle API"]:::sut
Sandbox["<b>Sandbox Runtime</b><br/>container boundary<br/>workspace mount<br/>env / CA / proxy config<br/>generated agent config<br/>logs/files"]:::sut
Agent["<b>Agent Runtime</b><br/>OpenClaw or Hermes<br/>plugins/tools<br/>agent home/config/state<br/>agent behavior surface"]:::sut
Providers["<b>Provider / Integration Plane</b><br/>NVIDIA · Ollama · compatible API<br/>Slack · Discord · Telegram<br/>Brave/web/search<br/>managed/brokered gateways"]:::sut
Durable["<b>Durable State Boundary</b><br/>backup/update-relevant state<br/>config snapshots<br/>credential metadata, not raw secrets<br/>workspace refs<br/>image/runtime versions"]:::sut
Host -- "starts/configures" --> Gateway
Gateway -- "creates/manages" --> Sandbox
Sandbox -- "runs" --> Agent
Agent -- "calls through routing/policy" --> Providers
Host -- "contributes state" --> Durable
Gateway -- "contributes state" --> Durable
Sandbox -- "contributes state" --> Durable
Agent -- "contributes state" --> Durable
end
%% ----------------------------------------------------------------------
%% 6. Outputs
%% ----------------------------------------------------------------------
subgraph C6["6. Outputs"]
direction TB
PhaseResults["<b>Phase results</b><br/>environment.result<br/>onboarding.result<br/>runtime.result"]:::state
Result["<b>result.yaml</b><br/>observed outcome<br/>assertion summaries<br/>artifact pointers<br/>failure layer"]:::output
Reports["<b>Human reports</b><br/>plan preview<br/>GitHub Step Summary<br/>operator notes"]:::output
Backup["<b>Future backup / update workflow</b><br/>onboarding YAML + observed result<br/>state diff<br/>restore / migration / update validation"]:::output
PhaseResults --> Result --> Reports
Result --> Backup
end
%% Main flow: keep lines mostly horizontal and non-overlapping.
Manifest -- "desired setup/onboarding config" --> Compiler
Scenarios -- "selected scenario ID / matrix rule" --> Compiler
Assertions -- "assertion groups" --> Compiler
Compiler -- "compile" --> Plan
Plan -- "execute" --> Runner
RuntimePhase -- "runtime.result" --> PhaseResults
%% Access flow.
PhaseAccess -- "act/observe requests" --> Clients
Clients -- "wraps" --> Host
Clients -- "wraps" --> Gateway
Clients -- "wraps" --> Sandbox
Clients -- "wraps" --> Agent
Clients -- "wraps" --> Providers
Clients -- "wraps" --> Durable
%% Onboarding YAML drives onboarding decisions; result YAML supports future backup/update.
Durable -- "observed durable state" --> Backup
%% Guardrails kept at bottom to avoid crossing the main flow.
G1["<b>Architectural Note</b><br/>YAML describes setup/onboarding desired state; it is not the test scenario."]:::note
G2["<b>Architectural Note</b><br/>Scenarios and assertion composition are deterministic code."]:::note
G3["<b>Architectural Note</b><br/>Phase orchestrators own phase assertions; clients only wrap SUT boundaries."]:::note
Manifest -- "clarifies" --> G1
Scenarios -- "clarifies" --> G2
Assertions -- "clarifies" --> G2
Clients -- "clarifies" --> G3
Loading
Overview & Objectives
NemoClaw's scenario-based E2E migration has reached the point where live execution is exposing real setup, onboarding, and feature-validation failures. The current framework is directionally correct, but it still treats a "scenario" as a single combined unit: platform + install + runtime + onboarding choices + expected state + post-onboard suites. That makes the matrix hard to expand, hard to report, and hard to use for coverage-gap discovery.
This specification restructures the E2E model into explicit layers:
The current model already has useful structure, but there are several gaps:
Scenario IDs hide layer boundaries.ubuntu-repo-cloud-openclaw includes base setup and onboarding in one name.
Base setup cannot be reported independently. There is no direct answer to "which install methods run on which platforms before onboarding?"
Onboarding choices are not matrixed cleanly. Provider, agent, endpoint, messaging, policy, and lifecycle variants are embedded in profiles or deferred to future scenarios.
Onboarding assertions are under-modeled. The runner validates final state and then suites run, but there is no explicit onboarding-stage assertion group for prompts, provider config, credential placement, policy selection, or resume/repair/double-onboard behavior.
Post-onboard suites are currently thin. The present suite list covers smoke, cloud inference, credentials-present, local Ollama checks, Ollama proxy, platform smoke, and Hermes health.
Parity gaps are large and not yet organized by layer. Current parity-map status counts are approximately:
mapped: 165
deferred: 1642
retired: 125
Deferred parity assertions are visible but not yet actionable enough. They need to be classified as base setup, onboarding flow, expected state, post-onboard suite, negative/failure mode, or retire.
GitHub visibility is incomplete. Parity compare uploads JSON and logs as artifacts, but does not currently publish a concise report to $GITHUB_STEP_SUMMARY.
High-value deferred areas
The largest deferred areas in test/e2e/docs/parity-map.yaml currently include:
Legacy area
Deferred assertions
Likely layer
test-messaging-providers.sh
108
onboarding + post-onboard messaging
test-double-onboard.sh
81
onboarding lifecycle
test-shields-config.sh
78
onboarding security + post-onboard security
test-sandbox-survival.sh
71
post-onboard lifecycle
test-gpu-e2e.sh
60
base GPU + local inference
test-ollama-auth-proxy-e2e.sh
59
onboarding/provider + post-onboard proxy
test-token-rotation.sh
55
onboarding lifecycle + messaging
test-gpu-double-onboard.sh
54
base GPU + onboarding lifecycle
test-credential-sanitization.sh
50
onboarding security + post-onboard security
test-inference-routing.sh
49
onboarding/provider + post-onboard inference
test-hermes-e2e.sh
48
onboarding + Hermes feature checks
test-onboard-resume.sh
48
onboarding lifecycle
test-onboard-repair.sh
46
onboarding lifecycle
These counts are not a one-to-one list of tests to write. They are extracted legacy assertions that must be mapped, consolidated, implemented, gated, or retired.
Architecture Design
Conceptual entities
1. Base environment scenarios
A base environment scenario describes what exists before onboarding decisions are applied.
This avoids breaking current workflow dispatches while moving the source of truth to layered test plans.
4. Onboarding-stage assertions
Onboarding assertions run after install/onboard operations and before post-onboard feature suites. They are distinct from post-onboard suites because they validate setup decisions and state transitions.
Feature suites consume the context produced by base setup and onboarding. They must not install, onboard, mutate onboarding choices, or rediscover scenario state except through $E2E_CONTEXT_DIR/context.env.
Suites continue to declare requires_state and are selected by each test plan.
Updated runner flow
flowchart TD
A[run-scenario.sh plan-id or legacy alias] --> B[Resolve alias]
B --> C[Load base_scenarios]
C --> D[Load onboarding_profiles]
D --> E[Load test_plans]
E --> F[Validate base + onboarding compatibility]
F --> G[Validate onboarding assertions]
G --> H[Validate suite requires_state]
H --> I[Print layered plan]
I --> J[Run base setup / install]
J --> K[Run onboarding profile]
K --> L[Emit context.env]
L --> M[Run onboarding-stage assertions]
M --> N[Validate expected state]
N --> O[Run post-onboard suites]
O --> P[Emit coverage + parity + gap reports]
Loading
Compatibility rules
The resolver must fail fast with clear messages when:
a test plan references a missing base scenario
a test plan references a missing onboarding profile
a test plan references a missing expected state
a test plan references a missing onboarding assertion
a test plan references a missing suite
a suite requires_state key is incompatible with the selected expected state
an onboarding profile requires a runner/secret not available through the base plan
a negative base scenario is combined with a positive onboarding profile without expected_failure
Gap classification model
Extend parity metadata so every deferred assertion has a layer classification:
- legacy: "NemoClaw installed"status: mappedid: base.cli.installedlayer: base-environment
- legacy: "sandbox shell env does not expose the real key"status: deferredlayer: onboarding-flowgap_domain: credential-securityowner: e2e-maintainersrunner_requirement: sandbox runner with NemoClaw/OpenShell CLIs
- legacy: "agent web-search returned a real Brave result"status: deferredlayer: post-onboard-suitegap_domain: brave-searchsecret_requirement: BRAVE_API_KEY
No new required environment variables are introduced in Phase 1.
Existing env remains relevant:
E2E_CONTEXT_DIR
E2E_SUITE_FILTER
E2E_VALIDATE_EXPECTED_STATE
NEMOCLAW_RECREATE_SANDBOX
NVIDIA_API_KEY
Potential future optional filters:
E2E_BASE_FILTER
E2E_ONBOARDING_FILTER
E2E_LAYER_FILTER
E2E_GAP_DOMAIN_FILTER
These should not be added until a concrete workflow needs them.
Implementation Phases
Phase 1: Layered Terminology and Schema Planning
Introduce the layered terminology and schema support while preserving current scenario IDs and behavior. This phase is intentionally documentation-first plus plan-only resolver work: future contributors should learn the new mental model before feature migration continues.
Implementation
Update test/e2e/docs/README.md and test/e2e/docs/MIGRATION.md to define:
base environment = platform + install + runtime
onboarding profile = user choices during onboarding
feature suite = post-onboard behavior
Extend scenarios.yaml with:
base_scenarios
onboarding_profiles
test_plans
setup_scenarios.<id>.alias_for_plan
Add layered equivalents for all existing scenarios:
ubuntu-repo-cloud-openclaw
ubuntu-repo-cloud-hermes
gpu-repo-local-ollama-openclaw
macos-repo-cloud-openclaw
wsl-repo-cloud-openclaw
brev-launchable-cloud-openclaw
ubuntu-no-docker-preflight-negative
Update resolver schema to accept both old and new forms.
Update resolver plan output to include:
base ID
onboarding ID
expected state ID
onboarding assertion IDs
suite IDs
Keep run-scenario.sh <old-id> working through aliases.
Acceptance Criteria
E2E docs explain base environments, onboarding profiles, test plans, onboarding assertions, expected states, and post-onboard feature suites.
bash test/e2e/runtime/run-scenario.sh ubuntu-repo-cloud-openclaw --plan-only still succeeds.
Specification: New E2E Model
Architecture Update: Hybrid Scenario Builders + Onboarding Configuration YAML
The target architecture has shifted from YAML-defined E2E scenarios to a hybrid model:
--plan-onlyshows the expanded assertion list before execution.%%{init: {"flowchart": {"htmlLabels": true, "nodeSpacing": 70, "rankSpacing": 95, "curve": "basis"}}}%% flowchart LR %% NemoClaw E2E architecture — hybrid scenario builders + onboarding manifests classDef yaml fill:#f8fafc,stroke:#475569,stroke-width:2px,color:#0f172a classDef builder fill:#eef8e8,stroke:#76B900,stroke-width:3px,color:#10220a classDef module fill:#eff6ff,stroke:#2563eb,stroke-width:2px,color:#102040 classDef orch fill:#f0fdf4,stroke:#16a34a,stroke-width:2px,color:#052e16 classDef client fill:#f5f3ff,stroke:#7c3aed,stroke-width:2px,color:#24103f classDef sut fill:#fff7ed,stroke:#ea580c,stroke-width:2px,color:#431407 classDef state fill:#ecfeff,stroke:#0891b2,stroke-width:2px,color:#083344 classDef output fill:#dcfce7,stroke:#15803d,stroke-width:3px,color:#052e16 classDef note fill:#ffffff,stroke:#334155,stroke-width:1.5px,color:#0f172a %% ---------------------------------------------------------------------- %% 1. Inputs %% ---------------------------------------------------------------------- subgraph C1["1. Inputs"] direction TB Manifest["<b>Onboarding configuration YAML</b><br/>Product-facing desired setup, not an E2E scenario<br/><br/>• install/runtime choices<br/>• agent/provider/model route<br/>• policy/messaging/lifecycle<br/>• durable refs for backup/update"]:::yaml Scenarios["<b>Deterministic scenario builders</b><br/>E2E scenarios are typed code<br/><br/>• stable scenario IDs<br/>• environment/onboarding combinations<br/>• matrix rules<br/>• GitHub targeted execution"]:::builder Assertions["<b>Assertion modules</b><br/>Logical reusable groups in code, not YAML<br/><br/>• environment groups<br/>• onboarding groups<br/>• runtime/domain groups<br/>• stable IDs + evidence output"]:::module end %% ---------------------------------------------------------------------- %% 2. Compile / Preview %% ---------------------------------------------------------------------- subgraph C2["2. Compile / Preview"] direction TB Compiler["<b>Plan compiler</b><br/>Combines builder + onboarding YAML<br/><br/>• loads manifest<br/>• resolves selected scenario<br/>• expands assertion groups<br/>• validates phase compatibility"]:::orch Plan["<b>Plan preview / run plan</b><br/>Visible before execution<br/><br/>• setup/onboarding actions<br/>• ordered phases<br/>• expanded assertion list<br/>• selected SUT boundaries"]:::state end %% ---------------------------------------------------------------------- %% 3. Phase-owned execution %% ---------------------------------------------------------------------- subgraph C3["3. Phase-owned Execution"] direction TB Runner["<div style='min-width:760px'><b>E2E runner</b><br/>Coordinates the full run: orders phases, delegates to every phase orchestrator, passes prior phase results forward, aggregates final results</div>"]:::orch subgraph PhaseOrchestrators["Managed phase orchestrators"] direction LR EnvPhase["<b>Environment Orchestrator</b><br/>Runs setup actions<br/>Runs environment assertions<br/>Emits environment.result"]:::orch OnboardPhase["<b>Onboarding Orchestrator</b><br/>Consumes onboarding config from YAML<br/>Runs onboarding setup/decisions<br/>Runs onboarding assertions<br/>Emits onboarding.result"]:::orch RuntimePhase["<b>Runtime Orchestrator</b><br/>Runs runtime actions/suites<br/>Runs runtime assertions<br/>Emits runtime.result"]:::orch end PhaseAccess["<b>Phase act/observe requests</b><br/>All phase access to NemoClaw goes through shared clients"]:::state Runner --> EnvPhase Runner -- "onboarding setup / decisions" --> OnboardPhase Runner --> RuntimePhase EnvPhase --> PhaseAccess OnboardPhase --> PhaseAccess RuntimePhase --> PhaseAccess end %% ---------------------------------------------------------------------- %% 4. Access layer %% ---------------------------------------------------------------------- subgraph C4["4. Access Layer"] direction TB Clients["<b>Shared E2E clients / adapters</b><br/>Framework wrappers around product boundaries<br/><br/>• HostCliClient<br/>• GatewayClient<br/>• SandboxClient<br/>• AgentClient<br/>• ProviderClient<br/>• StateClient<br/><br/><i>Clients expose act/observe primitives;<br/>phases decide workflow and pass/fail meaning.</i>"]:::client end %% ---------------------------------------------------------------------- %% 5. System Under Test %% ---------------------------------------------------------------------- subgraph C5["5. System Under Test"] direction TB Host["<b>Host Control Plane</b><br/>NemoClaw CLI<br/>install/update scripts<br/>local config/state<br/>Docker/image/cache"]:::sut Gateway["<b>OpenShell Gateway</b><br/>process/API<br/>credential store / broker boundary<br/>inference routing<br/>policy/proxy enforcement<br/>sandbox lifecycle API"]:::sut Sandbox["<b>Sandbox Runtime</b><br/>container boundary<br/>workspace mount<br/>env / CA / proxy config<br/>generated agent config<br/>logs/files"]:::sut Agent["<b>Agent Runtime</b><br/>OpenClaw or Hermes<br/>plugins/tools<br/>agent home/config/state<br/>agent behavior surface"]:::sut Providers["<b>Provider / Integration Plane</b><br/>NVIDIA · Ollama · compatible API<br/>Slack · Discord · Telegram<br/>Brave/web/search<br/>managed/brokered gateways"]:::sut Durable["<b>Durable State Boundary</b><br/>backup/update-relevant state<br/>config snapshots<br/>credential metadata, not raw secrets<br/>workspace refs<br/>image/runtime versions"]:::sut Host -- "starts/configures" --> Gateway Gateway -- "creates/manages" --> Sandbox Sandbox -- "runs" --> Agent Agent -- "calls through routing/policy" --> Providers Host -- "contributes state" --> Durable Gateway -- "contributes state" --> Durable Sandbox -- "contributes state" --> Durable Agent -- "contributes state" --> Durable end %% ---------------------------------------------------------------------- %% 6. Outputs %% ---------------------------------------------------------------------- subgraph C6["6. Outputs"] direction TB PhaseResults["<b>Phase results</b><br/>environment.result<br/>onboarding.result<br/>runtime.result"]:::state Result["<b>result.yaml</b><br/>observed outcome<br/>assertion summaries<br/>artifact pointers<br/>failure layer"]:::output Reports["<b>Human reports</b><br/>plan preview<br/>GitHub Step Summary<br/>operator notes"]:::output Backup["<b>Future backup / update workflow</b><br/>onboarding YAML + observed result<br/>state diff<br/>restore / migration / update validation"]:::output PhaseResults --> Result --> Reports Result --> Backup end %% Main flow: keep lines mostly horizontal and non-overlapping. Manifest -- "desired setup/onboarding config" --> Compiler Scenarios -- "selected scenario ID / matrix rule" --> Compiler Assertions -- "assertion groups" --> Compiler Compiler -- "compile" --> Plan Plan -- "execute" --> Runner RuntimePhase -- "runtime.result" --> PhaseResults %% Access flow. PhaseAccess -- "act/observe requests" --> Clients Clients -- "wraps" --> Host Clients -- "wraps" --> Gateway Clients -- "wraps" --> Sandbox Clients -- "wraps" --> Agent Clients -- "wraps" --> Providers Clients -- "wraps" --> Durable %% Onboarding YAML drives onboarding decisions; result YAML supports future backup/update. Durable -- "observed durable state" --> Backup %% Guardrails kept at bottom to avoid crossing the main flow. G1["<b>Architectural Note</b><br/>YAML describes setup/onboarding desired state; it is not the test scenario."]:::note G2["<b>Architectural Note</b><br/>Scenarios and assertion composition are deterministic code."]:::note G3["<b>Architectural Note</b><br/>Phase orchestrators own phase assertions; clients only wrap SUT boundaries."]:::note Manifest -- "clarifies" --> G1 Scenarios -- "clarifies" --> G2 Assertions -- "clarifies" --> G2 Clients -- "clarifies" --> G3Overview & Objectives
NemoClaw's scenario-based E2E migration has reached the point where live execution is exposing real setup, onboarding, and feature-validation failures. The current framework is directionally correct, but it still treats a "scenario" as a single combined unit: platform + install + runtime + onboarding choices + expected state + post-onboard suites. That makes the matrix hard to expand, hard to report, and hard to use for coverage-gap discovery.
This specification restructures the E2E model into explicit layers:
flowchart TB Base[Base environment scenario] Base --> Platform[Platform / hardware] Base --> Install[Install source] Base --> Runtime[Container/runtime prerequisites] Onboard[Onboarding profile] Onboard --> Agent[Agent] Onboard --> Provider[Inference provider] Onboard --> Decisions[Policy, messaging, endpoint, lifecycle choices] Plan[Test plan] Base --> Plan Onboard --> Plan Plan --> SetupRun[Run install + onboarding] SetupRun --> OnboardAssertions[Onboarding-stage assertions] OnboardAssertions --> State[Expected state validation] State --> Suites[Post-onboard feature suites] Suites --> Reports[Coverage + parity + gap reports]Objectives
Current State Analysis
Current scenario documentation describes this flow:
The current YAML files are:
test/e2e/nemoclaw_scenarios/scenarios.yamltest/e2e/nemoclaw_scenarios/expected-states.yamltest/e2e/validation_suites/suites.yamltest/e2e/docs/parity-map.yamlCurrent
setup_scenarioscombine these dimensions:ubuntu-local,macos-local,wsl-local,gpu-runner,brev-launchable,dgx-sparkrepo-current,public-curl,launchable,release,upgrade-from-versiondocker-running,gpu-docker-cdi,docker-missingcloud-openclaw,cloud-hermes,local-ollama-openclaw,openai-compatible-openclawCurrent scenario IDs include:
ubuntu-repo-cloud-openclawubuntu-repo-cloud-hermesgpu-repo-local-ollama-openclawmacos-repo-cloud-openclawwsl-repo-cloud-openclawbrev-launchable-cloud-openclawubuntu-no-docker-preflight-negativeThe current model already has useful structure, but there are several gaps:
Scenario IDs hide layer boundaries.
ubuntu-repo-cloud-openclawincludes base setup and onboarding in one name.Base setup cannot be reported independently. There is no direct answer to "which install methods run on which platforms before onboarding?"
Onboarding choices are not matrixed cleanly. Provider, agent, endpoint, messaging, policy, and lifecycle variants are embedded in profiles or deferred to future scenarios.
Onboarding assertions are under-modeled. The runner validates final state and then suites run, but there is no explicit onboarding-stage assertion group for prompts, provider config, credential placement, policy selection, or resume/repair/double-onboard behavior.
Post-onboard suites are currently thin. The present suite list covers smoke, cloud inference, credentials-present, local Ollama checks, Ollama proxy, platform smoke, and Hermes health.
Parity gaps are large and not yet organized by layer. Current parity-map status counts are approximately:
Deferred parity assertions are visible but not yet actionable enough. They need to be classified as base setup, onboarding flow, expected state, post-onboard suite, negative/failure mode, or retire.
GitHub visibility is incomplete. Parity compare uploads JSON and logs as artifacts, but does not currently publish a concise report to
$GITHUB_STEP_SUMMARY.High-value deferred areas
The largest deferred areas in
test/e2e/docs/parity-map.yamlcurrently include:test-messaging-providers.shtest-double-onboard.shtest-shields-config.shtest-sandbox-survival.shtest-gpu-e2e.shtest-ollama-auth-proxy-e2e.shtest-token-rotation.shtest-gpu-double-onboard.shtest-credential-sanitization.shtest-inference-routing.shtest-hermes-e2e.shtest-onboard-resume.shtest-onboard-repair.shThese counts are not a one-to-one list of tests to write. They are extracted legacy assertions that must be mapped, consolidated, implemented, gated, or retired.
Architecture Design
Conceptual entities
1. Base environment scenarios
A base environment scenario describes what exists before onboarding decisions are applied.
This layer answers:
Example base IDs:
This layer verifies:
2. Onboarding profiles
An onboarding profile describes user choices made during onboarding.
This layer answers:
Example onboarding IDs:
This layer verifies onboarding decisions and transitions, including:
3. Test plans
A test plan combines a base scenario, an onboarding profile, an expected state, onboarding assertions, and post-onboard suites.
Existing scenario IDs can remain as aliases during migration:
This avoids breaking current workflow dispatches while moving the source of truth to layered test plans.
4. Onboarding-stage assertions
Onboarding assertions run after install/onboard operations and before post-onboard feature suites. They are distinct from post-onboard suites because they validate setup decisions and state transitions.
Initial assertion groups:
Each assertion emits stable markers:
These IDs are mapped from
parity-map.yamland included in gap reports.5. Post-onboard feature suites
Feature suites run after expected state validation and must not install or onboard.
Suite families should be organized by feature domain:
Canonical suite IDs should include at least:
Feature suites consume the context produced by base setup and onboarding. They must not install, onboard, mutate onboarding choices, or rediscover scenario state except through
$E2E_CONTEXT_DIR/context.env.Suites continue to declare
requires_stateand are selected by each test plan.Updated runner flow
flowchart TD A[run-scenario.sh plan-id or legacy alias] --> B[Resolve alias] B --> C[Load base_scenarios] C --> D[Load onboarding_profiles] D --> E[Load test_plans] E --> F[Validate base + onboarding compatibility] F --> G[Validate onboarding assertions] G --> H[Validate suite requires_state] H --> I[Print layered plan] I --> J[Run base setup / install] J --> K[Run onboarding profile] K --> L[Emit context.env] L --> M[Run onboarding-stage assertions] M --> N[Validate expected state] N --> O[Run post-onboard suites] O --> P[Emit coverage + parity + gap reports]Compatibility rules
The resolver must fail fast with clear messages when:
requires_statekey is incompatible with the selected expected stateexpected_failureGap classification model
Extend parity metadata so every deferred assertion has a layer classification:
Allowed layers:
base-environmentonboarding-flowexpected-statepost-onboard-suitenegative-failure-moderetiredReports should aggregate by layer and gap domain.
Reporting design
Generate reports in
.e2e/reports/:The GitHub workflows should append
summary.mdto$GITHUB_STEP_SUMMARY.Minimum visible summary:
Configuration & Deployment Changes
Files to modify
test/e2e/nemoclaw_scenarios/scenarios.yamlbase_scenarios,onboarding_profiles, andtest_plans.platforms,installs, andruntimesprofiles.setup_scenariosas alias compatibility until final cleanup.test/e2e/nemoclaw_scenarios/expected-states.yamltest/e2e/validation_suites/suites.yamltest/e2e/runtime/resolver/schema.tstest/e2e/runtime/resolver/load.tstest/e2e/runtime/resolver/plan.tstest/e2e/runtime/resolver/coverage.tstest/e2e/runtime/resolver/index.tstest/e2e/runtime/run-scenario.shtest/e2e/runtime/run-suites.shtest/e2e/runtime/coverage-report.shscripts/e2e/check-parity-map.tslayerandgap_domainmetadata for deferred assertions.scripts/e2e/compare-parity.sh.github/workflows/e2e-scenarios.yaml$GITHUB_STEP_SUMMARY..github/workflows/e2e-parity-compare.yaml$GITHUB_STEP_SUMMARY.test/e2e/docs/README.mdtest/e2e/docs/MIGRATION.mdNew files / directories
Environment variables
No new required environment variables are introduced in Phase 1.
Existing env remains relevant:
E2E_CONTEXT_DIRE2E_SUITE_FILTERE2E_VALIDATE_EXPECTED_STATENEMOCLAW_RECREATE_SANDBOXNVIDIA_API_KEYPotential future optional filters:
E2E_BASE_FILTERE2E_ONBOARDING_FILTERE2E_LAYER_FILTERE2E_GAP_DOMAIN_FILTERThese should not be added until a concrete workflow needs them.
Implementation Phases
Phase 1: Layered Terminology and Schema Planning
Introduce the layered terminology and schema support while preserving current scenario IDs and behavior. This phase is intentionally documentation-first plus plan-only resolver work: future contributors should learn the new mental model before feature migration continues.
Implementation
test/e2e/docs/README.mdandtest/e2e/docs/MIGRATION.mdto define:scenarios.yamlwith:base_scenariosonboarding_profilestest_planssetup_scenarios.<id>.alias_for_planubuntu-repo-cloud-openclawubuntu-repo-cloud-hermesgpu-repo-local-ollama-openclawmacos-repo-cloud-openclawwsl-repo-cloud-openclawbrev-launchable-cloud-openclawubuntu-no-docker-preflight-negativerun-scenario.sh <old-id>working through aliases.Acceptance Criteria
bash test/e2e/runtime/run-scenario.sh ubuntu-repo-cloud-openclaw --plan-onlystill succeeds.bash test/e2e/runtime/run-scenario.sh ubuntu-repo-docker__cloud-nvidia-openclaw --plan-onlysucceeds.base,onboarding,expected_state, andsuitessections.Phase 2: Layered Coverage and Gap Reports
Make the existing coverage and parity data visible by layer.
Implementation
parity-map.yamlvalidation.coverage-report.sh/ resolver coverage logic to render:.e2e/reports/summary.mdgeneration.e2e-scenarios.yamlande2e-parity-compare.yamlto append summary markdown to$GITHUB_STEP_SUMMARY.Acceptance Criteria
bash test/e2e/runtime/coverage-report.shincludes sections for base scenarios, onboarding profiles, test plans, suites, and parity by layer.layerfields.Phase 3: Onboarding Assertion Stage
Add a first-class onboarding assertion stage between onboarding execution and expected-state validation.
Implementation
test/e2e/onboarding_assertions/structure.onboarding_assertionssection toscenarios.yaml.run-scenario.shto execute selected onboarding assertions after onboarding and before expected-state validation.PASS:/FAIL:IDs.Acceptance Criteria
onboarding-assertionsstage.Phase 4: Onboarding Matrix Expansion
Move onboarding lifecycle and provider variants into explicit onboarding profiles/test plans.
Implementation
Acceptance Criteria
Phase 5: Post-Onboard Suite Reorganization
Reorganize feature validation into clearer suite families and migrate high-value deferred areas.
Implementation
validation_suites/suites.yamlwith suite families:gateway-healthsandbox-shellsandbox-lifecyclesandbox-operationscloud-inferencelocal-ollama-inferenceollama-auth-proxyopenai-compatible-inferenceinference-routinginference-switchkimi-compatibilitymessaging-telegrammessaging-discordmessaging-slackmessaging-token-rotationsecurity-credentialssecurity-policysecurity-shieldssecurity-injectionsnapshotrebuildupgradediagnosticsdocs-validationAcceptance Criteria
Phase 6: Workflow and Report Visibility
Make layered E2E output visible to maintainers without downloading artifacts.
Implementation
gap-report.jsonand human-readablegap-report.md.Acceptance Criteria
Phase 7: Clean the House
Remove transitional compatibility once layered plans are stable.
Implementation
setup_scenariosentries that only duplicatetest_plans, or keep only explicit aliases required by public workflows.test/e2e/docs/README.mdtest/e2e/docs/MIGRATION.mdAGENTS.mdguidance if E2E workflow instructions changetest/e2e/test-*.shentrypoints were added.Acceptance Criteria
npx prek run --all-filespasses or has documented unrelated failures.