Skip to content

test(e2e): migrate diagnostics, state, and runtime service coverage #3817

@jyaunches

Description

@jyaunches

Parent epic: #3588

Goal

Migrate the runtime-services E2E coverage area into the layered scenario framework without porting legacy scripts line-for-line. Add the missing primitive layer first, then move assertions into scenario plans/suites with stable IDs.

Legacy / current coverage to absorb

  • test-diagnostics.sh
  • test-docs-validation.sh
  • test-state-backup-restore.sh
  • test-tunnel-lifecycle.sh
  • test-runtime-overrides.sh
  • test-overlayfs-autofix.sh
  • test-device-auth-health.sh
  • test-skill-agent-e2e.sh

Architecture contract

  • Add or extend the domain primitive library: test/e2e/validation_suites/lib/runtime_services.sh.
  • Helpers must consume $E2E_CONTEXT_DIR/context.env; suites must not reinstall, onboard, or rediscover setup state.
  • Add/extend suite family entries in test/e2e/validation_suites/suites.yaml.
  • Add onboarding profiles/test plans/onboarding assertions only when the behavior belongs before expected-state validation.
  • Emit stable assertion IDs using <layer>.<domain>.<behavior>.
  • Update test/e2e/docs/parity-map.yaml metadata with layer, gap_domain, owner, and runner/secret requirements where applicable.
  • Preserve compatibility with existing run-scenario.sh <id> --plan-only behavior.

Acceptance criteria

  • Domain primitive helpers exist and are used by migrated suite steps.
  • At least the highest-value assertions from the listed legacy coverage are mapped to stable scenario assertion IDs.
  • Remaining legacy assertions are explicitly classified as deferred or retired with layer/domain metadata.
  • Scenario framework tests pass for resolver/schema/suite/parity-map validation.
  • The coverage report makes this domain visible as covered, deferred, or retired.

2026-05-26 scope refresh: current assertion audit and validation expectations

The legacy runtime-services scripts have continued to change since this issue was opened. Before implementing this migration, treat the current origin/main scripts as the source of truth and migrate/classify every current assertion.

Current source scripts in scope

Keep the original #3817 scope, using the current versions of:

  • test/e2e/test-diagnostics.sh
  • test/e2e/test-docs-validation.sh
  • test/e2e/test-state-backup-restore.sh
  • test/e2e/test-tunnel-lifecycle.sh
  • test/e2e/test-runtime-overrides.sh
  • test/e2e/test-overlayfs-autofix.sh
  • test/e2e/test-device-auth-health.sh
  • test/e2e/test-skill-agent-e2e.sh

Current coverage audit summary

The current scenario framework does not yet provide migration coverage for every assertion in these scripts. Existing coverage is mostly baseline smoke plus partial generic checks; domain assertions must either be migrated, explicitly deferred, or retired with rationale.

Legacy script Current assertion state Migration expectation
test-diagnostics.sh Debug archive, extraction, credential-leak scan, config readability, status/model fields, and credential reset behavior are mostly deferred or only partially mapped. Add diagnostics suite steps for nemoclaw --version format, debug --quick, full debug archive creation/extraction, debug tarball secret scan, agent config readability, and status/model assertions. Decide whether destructive credentials reset belongs here or is retired/deferred.
test-docs-validation.sh nemoclaw on PATH is mapped by smoke; CLI/docs parity and link validation remain deferred. Add docs-validation suite steps invoking the current docs parity/link validation path, updated for MDX/Fern docs. Preserve clear pass/fail propagation.
test-state-backup-restore.sh Workspace marker setup, backup, destroy, re-onboard, restore, and file/content verification are all deferred. Add a state backup/restore suite that writes marker files/directories, runs backup, destroys/re-onboards, restores, and verifies all marker files and memory directory contents.
test-tunnel-lifecycle.sh nemoclaw tunnel start/status/stop, tunnel URL extraction, local dashboard readiness, remote dashboard probe, stale URL cleanup, and Cloudflare external-flake classification are deferred or missed. Add a tunnel lifecycle suite with local dashboard precheck, start/status URL assertion, remote URL/dashboard marker probe, stop/status cleanup, and explicit Cloudflare transient skip/expected-external classification.
test-runtime-overrides.sh All runtime override assertions are deferred: model, context window, max tokens, reasoning, CORS, invalid values, and rollback/no-partial-write. Add runtime override suite steps that verify valid overrides patch config and hash correctly, invalid overrides are rejected, and rejected overrides leave config unchanged.
test-overlayfs-autofix.sh Only Docker-running is mapped. Most overlayfs/containerd-snapshotter behavior is deferred; several applicability SKIP branches are missed; two brittle negatives are already retired. Decide in the spec whether overlayfs autofix remains #3817 scope. If retained, model Docker storage-driver/containerd-snapshotter applicability and migrate detection, patched-image, gateway-image, log-cleanliness, idempotency, and disabled-autofix negative assertions. If not retained, explicitly defer/retire with rationale.
test-device-auth-health.sh Basic install/CLI/sandbox readiness maps to smoke, but device-auth-specific /health == 200, / == 401, status not Offline, and gateway recovery/status behavior are missed. Add device-auth health suite steps for sandbox-exec /health, root auth response, nemoclaw status not reporting Offline, host forward health, and any retained recovery behavior. Retire brittle install-log text checks unless required.
test-skill-agent-e2e.sh CLI install checks map to baseline; injected skill fixture and agent verification are missed. Recent model/tool-call flake classification is not in scenarios. Add skill-agent suite steps for fixture injection/queryability and live agent verification. If live model/tool-call behavior remains nondeterministic, encode explicit external/inconclusive classification or split deterministic fixture checks from optional live-agent proof.

Adjacent current E2E changes to account for

These files were not all new since this issue opened, but current versions include relevant behavior that must be preserved or classified:

Required implementation shape

  • Add or extend test/e2e/validation_suites/lib/runtime_services.sh for shared runtime-service helpers.
  • Add focused suite directories/steps rather than aliasing runtime-service suites to generic smoke steps.
  • Update test/e2e/validation_suites/suites.yaml with explicit suites for diagnostics, docs validation, state backup/restore, tunnel lifecycle, runtime overrides, device-auth health, skill-agent behavior, and optionally overlayfs autofix.
  • Update scenario metadata and coverage report inputs so every current assertion from the scripts above is visible as one of:
    • mapped to a stable scenario assertion ID,
    • deferred with reason and owner,
    • retired with reason,
    • or expected_failure where the scenario intentionally validates a negative/failure outcome.
  • Stable assertion IDs should follow <layer>.<domain>.<behavior> and be specific enough to distinguish pass/fail/expected-failure behavior.

Validation spec requirements for /vd_spec

The spec for this issue must include a validation section that makes expected pass/fail behavior explicit:

  1. Produce an assertion matrix for every current assertion in the scoped scripts, including expected result (pass, fail, skip/external, expected_failure, deferred, or retired) and the scenario assertion ID that owns it.
  2. Define which scenario IDs/suites are expected to pass on a healthy PR branch.
  3. Define which negative scenarios are expected to fail during setup/execution but be reported as expected failures by the scenario framework.
  4. Define which assertions are intentionally not executable in PR CI and why, including owner and follow-up path.
  5. Require the implementation PR to run the E2E scenario workflow on the PR head and include evidence that:
    • the runtime-services scenario suites pass where expected,
    • expected-failure scenarios are reported as expected failures, not unclassified failures,
    • no current assertion from the scoped legacy scripts is missing from the coverage report,
    • any deferred/retired assertion is visible in the report with rationale.

Acceptance criteria additions

  • run-scenario.sh <id> --plan-only works for each new/updated runtime-services scenario.
  • Scenario framework tests pass for resolver/schema/suite/coverage-report validation.
  • The PR shows an e2e-scenarios workflow run against the PR branch with runtime-services suites selected or included.
  • The PR description links the workflow run and includes the assertion matrix / coverage report excerpt showing pass, expected-failure, deferred, and retired classifications.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: e2eEnd-to-end tests, nightly failures, or validation infrastructure
    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions