test(e2e): extend migration inventory to scenario runner internals#5052
Conversation
`liveScenarioSupport` previously rejected any scenario that declared an `environment.lifecycle`, so post-onboard host mutations (reboot, rebuild, upgrade, drift) could not surface in the live Vitest matrix at all. Replace the unconditional reject with a `SUPPORTED_LIFECYCLES` whitelist that starts with the single profile the upcoming post-reboot-recovery fixture dispatches: `post-reboot-recovery`. Future profiles must land the dispatcher branch and an expected-state in the same change set, so the whitelist stays in lockstep with what the runner can actually execute. Prepares the runner for #4423's failing-test-first guard, which needs a post-reboot lifecycle scenario to demonstrate registry preservation + Docker-backed sandbox recovery on Linux/Spark Docker-driver hosts. Refs #4423
Adds two host-side state-validation probes the live runner needs to express the regression target tracked by #4423: * `local-registry-entry-present` reads `~/.nemoclaw/sandboxes.json` and asserts the scenario's sandbox name is still recorded. This is deliberately orthogonal to `sandbox.expected`: post-reboot bugs can wipe the local registry while the live OpenShell gateway is healthy, and only a host-side probe catches the data-loss regression. * `docker-sandbox-container-present` runs `docker ps -a --filter label=openshell.ai/sandbox-name=<name>` and accepts running, stopped, or `*-nemoclaw-gpu-backup-*` sibling containers. The label filter mirrors `OPENSHELL_SANDBOX_NAME_LABEL` used by `findOpenShellDockerSandboxContainerIds` in `src/lib/onboard/docker-gpu-patch.ts`, so the probe stays in lock- step with how OpenShell labels containers today. Probe wiring: * `StateProbeId` extended with the two new probe ids. * `ExpectedState` gains `localRegistry` and `dockerSandboxContainer` optional dimensions; `probesForState` emits the new probes only for `expected: "present"`. Negative-direction probes are intentionally omitted today and pinned by a probesForState test. * `StateValidationPhaseFixture.from()` now accepts either an expected-state ID or an inline `ExpectedState`, so unit tests can drive new probes without registering synthetic states in the typed registry. The live runner still calls `from(id, instance)`. * Fixture takes an optional `ProbeIO` injection so tests can stub the registry reader without touching `~/.nemoclaw`. No callers of the existing typed registry are affected: every shipped expected-state leaves `localRegistry` and `dockerSandboxContainer` unset, so `probesForState` returns the same probe lists as before. Refs #4423
Adds a Vitest phase fixture that mutates host state between onboarding
and state-validation, so live scenarios can express post-onboard
invariants the legacy bash runner has no equivalent for.
`LifecyclePhaseFixture.simulate("post-reboot-recovery", instance, opts)`
reproduces the host-side conditions of a DGX Spark / Linux Docker-driver
reboot in two modes:
* `stop-original` (default) — `openshell gateway stop` + `docker
stop` of the labeled sandbox
container. Models the common reboot
outcome where OpenShell forgets the
sandbox while Docker keeps the
container exited but labeled.
* `rename-to-gpu-backup` — additionally `docker rename`s the
container to a `*-nemoclaw-gpu-
backup-<ts>` sibling, mirroring the
GPU-patch reboot path in
`src/lib/onboard/docker-gpu-patch.ts`.
Both modes register cleanups (in reverse order) to restore the
container so test teardown leaves Docker in a usable state.
Wiring:
* `framework/phases/index.ts` re-exports the fixture and types.
* `framework/e2e-test.ts` registers a `lifecycle` Vitest fixture on
`E2EScenarioFixtures`, wired with the shared `host`, `sandbox`,
and `cleanup` registries.
* `live/registry-scenarios.test.ts` invokes
`lifecycle.simulate(profile, instance)` between `onboard.from(...)`
and `stateValidation.from(...)` whenever the scenario declares a
whitelisted `environment.lifecycle`. Scenarios that omit lifecycle
are unaffected. A scenario whose lifecycle is whitelisted by
`runtime-support.ts` but NOT dispatched by the fixture fails fast
with a clear error so the whitelist and dispatcher stay in lock-
step.
Coverage in `e2e-phase-lifecycle.test.ts` exercises both modes,
gateway-stop tolerance, the no-labeled-container failure case, the
docker-discover failure case, the unsupported-profile rejection,
the cleanup queue order, and `buildBackupContainerName` truncation.
The fixture is intentionally narrow on profiles: only
`post-reboot-recovery` is dispatched today. Adding rebuild, upgrade,
or drift profiles is a separate, equally narrow change set that must
land the dispatcher branch and `SUPPORTED_LIFECYCLES` whitelist
together.
Refs #4423
Registers the failing-test-first guard for #4423 in the typed scenario registry so the live Vitest matrix from #5006 fans it out as a dedicated CI job. Builds on the framework primitives added earlier in this PR (lifecycle phase fixture, host-side probes, lifecycle whitelist). Additions: * `post-reboot-recovery-ready` expected-state in `scenarios/expected-states.ts` declaring the user-visible invariants that must hold after a `nemoclaw <name> status` call on a freshly-rebooted DGX Spark / Linux Docker-driver host: - cli installed, - gateway healthy (the user-systemd unit from #4580 brings it back up before status runs), - sandbox running (recovery completed in time), - localRegistry entry preserved (the user-visible regression target — destroyed on unfixed `main`), - dockerSandboxContainer present (recovery didn't delete the labeled container or its `*-nemoclaw-gpu-backup-*` sibling). * `ubuntu-repo-docker-post-reboot-recovery` scenario in `scenarios/scenarios/baseline.ts` wiring `ubuntuRepoDockerLifecycle("cloud-openclaw", "post-reboot-recovery")` against the new expected-state and a smoke suite. Carries a description that explains the RED/GREEN contract and points to the PR-A fix landing in `src/lib/`. * `manifests/openclaw-nvidia-post-reboot-recovery.yaml` declares `lifecycle: post-reboot-recovery` and the same NVIDIA_API_KEY credential ref the cloud-openclaw scenarios use. * `.github/workflows/e2e-scenarios.yaml` ROUTES table gains the new scenario so the workflow-boundary test (`e2e-scenarios-workflow.test.ts`) routes every typed id. Test pinning: * `e2e-scenario-matrix.test.ts` updated from a 1-entry to a 2-entry live matrix expectation. The new entry asserts on `expectedStateId: "post-reboot-recovery-ready"` so a future accidental dropped-lifecycle change to the scenario regresses loudly. * `e2e-live-registry-discovery.test.ts` swaps the synthetic whitelist-coverage test for an assertion against the real `ubuntu-repo-docker-post-reboot-recovery` registry entry. Behavior: * On unfixed `main`, the live runner's lifecycle phase stops the OpenShell gateway runtime and `docker stop`s the labeled sandbox container. State-validation then runs `nemoclaw <name> status` (which restarts the gateway via systemd) and the destructive `missing` branch in `src/lib/actions/sandbox/status.ts` wipes the local registry entry. The `local-registry-entry-present` probe fails. Scenario goes RED. * On the PR-A fix branch, the new Docker-driver sandbox recovery helper restarts the labeled container before stale-removal can fire, registry survives, all five probes pass. Scenario flips GREEN. The bash-side legacy compiler emits a `lifecycle.profile.post-reboot-recovery` PhaseAction pointing at `nemoclaw_scenarios/lifecycle/dispatch.sh`, but the legacy bash worker is intentionally not provided: this scenario is Vitest-only. The typed runner's `LifecyclePhaseFixture` handles dispatch directly. If the legacy runner is invoked against this scenario it errors out at the dispatcher; that's the right failure mode while the bash side stays on its own retirement clock. Refs #4423
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: None Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
|
E2E Scenario Advisor RecommendationRequired scenario E2E: None Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 0 needs attention, 0 worth checking, 0 nice ideas Consider writing more tests for
This is an automated advisory review. A human maintainer must make the final merge decision. |
Prek hook auto-fixed formatting in 6 files added/touched by this PR. No behavior change.
The biome-format commit accidentally added a node_modules symlink alongside the formatting fixes. Remove it; the directory is already in .gitignore.
…nventory-internals # Conflicts: # test/e2e-scenario/framework-tests/e2e-phase-lifecycle.test.ts # test/e2e-scenario/framework/phases/lifecycle.ts
## Summary Adds the typed inference runtime helper surface for the Vitest E2E scenario runner. ## Related Issue Refs #4941 Refs #4990 Refs #4349 Depends on #5046, #5052, and the shared runtime-suite base stack. Stacked on branch `codex/e2e-fanout-01-inventory-internals`. ## Changes - Added `RuntimePhaseFixture` and the `runtime` Vitest fixture for inference runtime probes. - Added reusable helpers for sandbox-side `inference.local` models, chat completion, and HTTP status checks. - Added trusted-provider compatible endpoint helpers for models/chat probes while preserving shell-probe artifact capture and redaction. - Validate model-list responses for OpenAI-style `{ data: [...] }` and Ollama-style `{ models: [...] }` payloads so readiness helpers cannot pass on `{}` or error-only JSON. - Auto-redact sensitive custom header values and honor provider `curlMaxTimeSeconds` as `curl --max-time`. - Extended `ProviderClient` with a request-level JSON API that returns both parsed JSON and the captured `ShellProbeResult`. - Added framework tests for route normalization, argv construction, redaction values, provider-compatible requests, model-list validation, provider curl timeout propagation, and malformed response handling. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [ ] `npx prek run --all-files` passes - [ ] `npm test` passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `npm run docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) Verified locally: - `npx vitest run --project e2e-scenario-framework test/e2e-scenario/framework-tests/e2e-phase-runtime.test.ts test/e2e-scenario/framework-tests/e2e-clients.test.ts --silent=false --reporter=default` - `npx vitest run --project e2e-scenario-framework --silent=false --reporter=default` - `npm run typecheck:cli` - `npx prek run --files test/e2e-scenario/framework/clients/provider.ts test/e2e-scenario/framework/clients/index.ts test/e2e-scenario/framework/e2e-test.ts test/e2e-scenario/framework/phases/index.ts test/e2e-scenario/framework/phases/runtime.ts test/e2e-scenario/framework-tests/e2e-phase-runtime.test.ts --skip test-cli` - `git diff --check` CI/advisor evidence: - Required PR checks are green on the PR head. - PR review advisor: 0 needs attention, 0 worth checking, 0 nice ideas. - E2E recommendation advisor: no product E2E required. - E2E scenario advisor requested `e2e-scenarios-all`; dispatched run https://github.com/NVIDIA/NemoClaw/actions/runs/27241683412. The relevant `ubuntu-repo-cloud-openclaw` scenario passed. The all-run is red due to pre-existing scenario-runner coverage gaps outside this PR's helper surface, including generated scenarios whose onboarding profile ids are not yet implemented by `test/e2e-scenario/nemoclaw_scenarios/onboard/dispatch.sh` (for example `openai-compatible-openclaw`, `cloud-nvidia-openclaw-resume-after-interrupt`) and a Hermes-specific `runtime.hermes.history-writable` assertion that fails after onboarding/inference pass because it cannot determine shield state. Note: the full pre-commit hook's `test-cli` step still fails locally in `test/release-latest-tag.test.ts` because this machine's global Git config enables SSH commit signing but the private signing key is unavailable. The focused E2E framework suite and CLI typecheck pass. --- <!-- DCO sign-off required by CI. Run: git config user.name && git config user.email --> Signed-off-by: Carlos Villela <cvillela@nvidia.com> --------- Signed-off-by: Carlos Villela <cvillela@nvidia.com> Co-authored-by: Julie Yaunches <jyaunches@nvidia.com>
Summary
Extends the E2E migration inventory beyond direct legacy
test/e2e/test-*.shentrypoints so internal legacy runner surfaces are also guarded before deletion. The inventory now tracks coarse runner-internal groups for shell scenario workers, validation suites, onboarding assertion workers, TypeScript shell-runner orchestrators, and runtime helper libraries.This branch also merges the updated #5046 base to pick up the accidental
node_modulessymlink removal and carries one formatter-only wrap insrc/commands/sandbox/agents/list.tsso the all-files static hook stays green on this stack branch.Related Issue
Refs #4941
Refs #4990
Refs #4357
Depends on #5046 and the shared runtime-suite base stack.
Changes
internalSurfacesrecords totest/e2e-scenario/migration/legacy-inventory.jsonfor legacy runner internals.Type of Change
Verification
Focused verification run:
npx vitest run --project e2e-scenario-framework test/e2e-scenario/framework-tests/e2e-migration-inventory.test.ts test/e2e-scenario/framework-tests/e2e-migration-inventory-lock.test.ts --silent=false --reporter=defaultStatic-check parity run:
npm run validate:configs && npx prek run --all-files --stage pre-push --skip tsc-plugin --skip tsc-js --skip tsc-cli --skip version-tag-sync --skip test-cli --skip test-plugin --skip source-shape-test-budget --skip test-file-size-budget --skip test-skills-yaml && npm run source-shape:check && npm run test-size:check && npx vitest run test/skills-frontmatter.test.ts && python3 scripts/generate-platform-docs.py --checkAdditional local check:
npx vitest run --project cli test/docker-abstraction-guard.test.ts --silent=false --reporter=defaultnpx prek run --all-filespassesnpm testpassesnpm run docsbuilds without warnings (doc changes only)Signed-off-by: Carlos Villela cvillela@nvidia.com
Summary by CodeRabbit
Documentation
Tests
Chores