test-diagnostics.sh |
Debug archive, extraction, credential-leak scan, config readability, status/model fields, and credential reset behavior are mostly deferred or only partially mapped. |
Add diagnostics suite steps for nemoclaw --version format, debug --quick, full debug archive creation/extraction, debug tarball secret scan, agent config readability, and status/model assertions. Decide whether destructive credentials reset belongs here or is retired/deferred. |
test-docs-validation.sh |
nemoclaw on PATH is mapped by smoke; CLI/docs parity and link validation remain deferred. |
Add docs-validation suite steps invoking the current docs parity/link validation path, updated for MDX/Fern docs. Preserve clear pass/fail propagation. |
test-state-backup-restore.sh |
Workspace marker setup, backup, destroy, re-onboard, restore, and file/content verification are all deferred. |
Add a state backup/restore suite that writes marker files/directories, runs backup, destroys/re-onboards, restores, and verifies all marker files and memory directory contents. |
test-tunnel-lifecycle.sh |
nemoclaw tunnel start/status/stop, tunnel URL extraction, local dashboard readiness, remote dashboard probe, stale URL cleanup, and Cloudflare external-flake classification are deferred or missed. |
Add a tunnel lifecycle suite with local dashboard precheck, start/status URL assertion, remote URL/dashboard marker probe, stop/status cleanup, and explicit Cloudflare transient skip/expected-external classification. |
test-runtime-overrides.sh |
All runtime override assertions are deferred: model, context window, max tokens, reasoning, CORS, invalid values, and rollback/no-partial-write. |
Add runtime override suite steps that verify valid overrides patch config and hash correctly, invalid overrides are rejected, and rejected overrides leave config unchanged. |
test-overlayfs-autofix.sh |
Only Docker-running is mapped. Most overlayfs/containerd-snapshotter behavior is deferred; several applicability SKIP branches are missed; two brittle negatives are already retired. |
Decide in the spec whether overlayfs autofix remains #3817 scope. If retained, model Docker storage-driver/containerd-snapshotter applicability and migrate detection, patched-image, gateway-image, log-cleanliness, idempotency, and disabled-autofix negative assertions. If not retained, explicitly defer/retire with rationale. |
test-device-auth-health.sh |
Basic install/CLI/sandbox readiness maps to smoke, but device-auth-specific /health == 200, / == 401, status not Offline, and gateway recovery/status behavior are missed. |
Add device-auth health suite steps for sandbox-exec /health, root auth response, nemoclaw status not reporting Offline, host forward health, and any retained recovery behavior. Retire brittle install-log text checks unless required. |
test-skill-agent-e2e.sh |
CLI install checks map to baseline; injected skill fixture and agent verification are missed. Recent model/tool-call flake classification is not in scenarios. |
Add skill-agent suite steps for fixture injection/queryability and live agent verification. If live model/tool-call behavior remains nondeterministic, encode explicit external/inconclusive classification or split deterministic fixture checks from optional live-agent proof. |
Parent epic: #3588
Goal
Migrate the
runtime-servicesE2E coverage area into the layered scenario framework without porting legacy scripts line-for-line. Add the missing primitive layer first, then move assertions into scenario plans/suites with stable IDs.Legacy / current coverage to absorb
test-diagnostics.shtest-docs-validation.shtest-state-backup-restore.shtest-tunnel-lifecycle.shtest-runtime-overrides.shtest-overlayfs-autofix.shtest-device-auth-health.shtest-skill-agent-e2e.shArchitecture contract
test/e2e/validation_suites/lib/runtime_services.sh.$E2E_CONTEXT_DIR/context.env; suites must not reinstall, onboard, or rediscover setup state.test/e2e/validation_suites/suites.yaml.<layer>.<domain>.<behavior>.test/e2e/docs/parity-map.yamlmetadata withlayer,gap_domain,owner, and runner/secret requirements where applicable.run-scenario.sh <id> --plan-onlybehavior.Acceptance criteria
deferredorretiredwith layer/domain metadata.2026-05-26 scope refresh: current assertion audit and validation expectations
The legacy runtime-services scripts have continued to change since this issue was opened. Before implementing this migration, treat the current
origin/mainscripts as the source of truth and migrate/classify every current assertion.Current source scripts in scope
Keep the original #3817 scope, using the current versions of:
test/e2e/test-diagnostics.shtest/e2e/test-docs-validation.shtest/e2e/test-state-backup-restore.shtest/e2e/test-tunnel-lifecycle.shtest/e2e/test-runtime-overrides.shtest/e2e/test-overlayfs-autofix.shtest/e2e/test-device-auth-health.shtest/e2e/test-skill-agent-e2e.shCurrent coverage audit summary
The current scenario framework does not yet provide migration coverage for every assertion in these scripts. Existing coverage is mostly baseline smoke plus partial generic checks; domain assertions must either be migrated, explicitly deferred, or retired with rationale.
test-diagnostics.shnemoclaw --versionformat,debug --quick, full debug archive creation/extraction, debug tarball secret scan, agent config readability, and status/model assertions. Decide whether destructivecredentials resetbelongs here or is retired/deferred.test-docs-validation.shnemoclaw on PATHis mapped by smoke; CLI/docs parity and link validation remain deferred.test-state-backup-restore.shtest-tunnel-lifecycle.shnemoclaw tunnel start/status/stop, tunnel URL extraction, local dashboard readiness, remote dashboard probe, stale URL cleanup, and Cloudflare external-flake classification are deferred or missed.test-runtime-overrides.shtest-overlayfs-autofix.shtest-device-auth-health.sh/health == 200,/ == 401,statusnotOffline, and gateway recovery/status behavior are missed./health, root auth response,nemoclaw statusnot reportingOffline, host forward health, and any retained recovery behavior. Retire brittle install-log text checks unless required.test-skill-agent-e2e.shAdjacent current E2E changes to account for
These files were not all new since this issue opened, but current versions include relevant behavior that must be preserved or classified:
test-tunnel-lifecycle.sh: test(e2e): classify quick tunnel flakes as external #4154 Cloudflare quick-tunnel external classification; test(e2e): update cloudflared tunnel pin #4196cloudflaredpin update.test-skill-agent-e2e.sh: test(e2e): classify skill agent model flakes #4157 model/tool-call flake classification plus fixture-presence recheck.test-docs-validation.sh: docs: remove legacy markdown docs and refresh MDX checks #3837 MDX/Fern docs validation expectation refresh.test/e2e/lib/openclaw-json.sh: test: tolerate OpenClaw JSON envelope changes #4038 added OpenClaw JSON envelope parsing. This helper is new since issue creation and should be reused where runtime-service assertions need OpenClaw agent JSON parsing.Required implementation shape
test/e2e/validation_suites/lib/runtime_services.shfor shared runtime-service helpers.test/e2e/validation_suites/suites.yamlwith explicit suites for diagnostics, docs validation, state backup/restore, tunnel lifecycle, runtime overrides, device-auth health, skill-agent behavior, and optionally overlayfs autofix.mappedto a stable scenario assertion ID,deferredwith reason and owner,retiredwith reason,expected_failurewhere the scenario intentionally validates a negative/failure outcome.<layer>.<domain>.<behavior>and be specific enough to distinguish pass/fail/expected-failure behavior.Validation spec requirements for
/vd_specThe spec for this issue must include a validation section that makes expected pass/fail behavior explicit:
pass,fail,skip/external,expected_failure,deferred, orretired) and the scenario assertion ID that owns it.Acceptance criteria additions
run-scenario.sh <id> --plan-onlyworks for each new/updated runtime-services scenario.e2e-scenariosworkflow run against the PR branch with runtime-services suites selected or included.