test(e2e): add inference runtime helpers#5056
Conversation
`liveScenarioSupport` previously rejected any scenario that declared an `environment.lifecycle`, so post-onboard host mutations (reboot, rebuild, upgrade, drift) could not surface in the live Vitest matrix at all. Replace the unconditional reject with a `SUPPORTED_LIFECYCLES` whitelist that starts with the single profile the upcoming post-reboot-recovery fixture dispatches: `post-reboot-recovery`. Future profiles must land the dispatcher branch and an expected-state in the same change set, so the whitelist stays in lockstep with what the runner can actually execute. Prepares the runner for #4423's failing-test-first guard, which needs a post-reboot lifecycle scenario to demonstrate registry preservation + Docker-backed sandbox recovery on Linux/Spark Docker-driver hosts. Refs #4423
Adds two host-side state-validation probes the live runner needs to express the regression target tracked by #4423: * `local-registry-entry-present` reads `~/.nemoclaw/sandboxes.json` and asserts the scenario's sandbox name is still recorded. This is deliberately orthogonal to `sandbox.expected`: post-reboot bugs can wipe the local registry while the live OpenShell gateway is healthy, and only a host-side probe catches the data-loss regression. * `docker-sandbox-container-present` runs `docker ps -a --filter label=openshell.ai/sandbox-name=<name>` and accepts running, stopped, or `*-nemoclaw-gpu-backup-*` sibling containers. The label filter mirrors `OPENSHELL_SANDBOX_NAME_LABEL` used by `findOpenShellDockerSandboxContainerIds` in `src/lib/onboard/docker-gpu-patch.ts`, so the probe stays in lock- step with how OpenShell labels containers today. Probe wiring: * `StateProbeId` extended with the two new probe ids. * `ExpectedState` gains `localRegistry` and `dockerSandboxContainer` optional dimensions; `probesForState` emits the new probes only for `expected: "present"`. Negative-direction probes are intentionally omitted today and pinned by a probesForState test. * `StateValidationPhaseFixture.from()` now accepts either an expected-state ID or an inline `ExpectedState`, so unit tests can drive new probes without registering synthetic states in the typed registry. The live runner still calls `from(id, instance)`. * Fixture takes an optional `ProbeIO` injection so tests can stub the registry reader without touching `~/.nemoclaw`. No callers of the existing typed registry are affected: every shipped expected-state leaves `localRegistry` and `dockerSandboxContainer` unset, so `probesForState` returns the same probe lists as before. Refs #4423
Adds a Vitest phase fixture that mutates host state between onboarding
and state-validation, so live scenarios can express post-onboard
invariants the legacy bash runner has no equivalent for.
`LifecyclePhaseFixture.simulate("post-reboot-recovery", instance, opts)`
reproduces the host-side conditions of a DGX Spark / Linux Docker-driver
reboot in two modes:
* `stop-original` (default) — `openshell gateway stop` + `docker
stop` of the labeled sandbox
container. Models the common reboot
outcome where OpenShell forgets the
sandbox while Docker keeps the
container exited but labeled.
* `rename-to-gpu-backup` — additionally `docker rename`s the
container to a `*-nemoclaw-gpu-
backup-<ts>` sibling, mirroring the
GPU-patch reboot path in
`src/lib/onboard/docker-gpu-patch.ts`.
Both modes register cleanups (in reverse order) to restore the
container so test teardown leaves Docker in a usable state.
Wiring:
* `framework/phases/index.ts` re-exports the fixture and types.
* `framework/e2e-test.ts` registers a `lifecycle` Vitest fixture on
`E2EScenarioFixtures`, wired with the shared `host`, `sandbox`,
and `cleanup` registries.
* `live/registry-scenarios.test.ts` invokes
`lifecycle.simulate(profile, instance)` between `onboard.from(...)`
and `stateValidation.from(...)` whenever the scenario declares a
whitelisted `environment.lifecycle`. Scenarios that omit lifecycle
are unaffected. A scenario whose lifecycle is whitelisted by
`runtime-support.ts` but NOT dispatched by the fixture fails fast
with a clear error so the whitelist and dispatcher stay in lock-
step.
Coverage in `e2e-phase-lifecycle.test.ts` exercises both modes,
gateway-stop tolerance, the no-labeled-container failure case, the
docker-discover failure case, the unsupported-profile rejection,
the cleanup queue order, and `buildBackupContainerName` truncation.
The fixture is intentionally narrow on profiles: only
`post-reboot-recovery` is dispatched today. Adding rebuild, upgrade,
or drift profiles is a separate, equally narrow change set that must
land the dispatcher branch and `SUPPORTED_LIFECYCLES` whitelist
together.
Refs #4423
Registers the failing-test-first guard for #4423 in the typed scenario registry so the live Vitest matrix from #5006 fans it out as a dedicated CI job. Builds on the framework primitives added earlier in this PR (lifecycle phase fixture, host-side probes, lifecycle whitelist). Additions: * `post-reboot-recovery-ready` expected-state in `scenarios/expected-states.ts` declaring the user-visible invariants that must hold after a `nemoclaw <name> status` call on a freshly-rebooted DGX Spark / Linux Docker-driver host: - cli installed, - gateway healthy (the user-systemd unit from #4580 brings it back up before status runs), - sandbox running (recovery completed in time), - localRegistry entry preserved (the user-visible regression target — destroyed on unfixed `main`), - dockerSandboxContainer present (recovery didn't delete the labeled container or its `*-nemoclaw-gpu-backup-*` sibling). * `ubuntu-repo-docker-post-reboot-recovery` scenario in `scenarios/scenarios/baseline.ts` wiring `ubuntuRepoDockerLifecycle("cloud-openclaw", "post-reboot-recovery")` against the new expected-state and a smoke suite. Carries a description that explains the RED/GREEN contract and points to the PR-A fix landing in `src/lib/`. * `manifests/openclaw-nvidia-post-reboot-recovery.yaml` declares `lifecycle: post-reboot-recovery` and the same NVIDIA_API_KEY credential ref the cloud-openclaw scenarios use. * `.github/workflows/e2e-scenarios.yaml` ROUTES table gains the new scenario so the workflow-boundary test (`e2e-scenarios-workflow.test.ts`) routes every typed id. Test pinning: * `e2e-scenario-matrix.test.ts` updated from a 1-entry to a 2-entry live matrix expectation. The new entry asserts on `expectedStateId: "post-reboot-recovery-ready"` so a future accidental dropped-lifecycle change to the scenario regresses loudly. * `e2e-live-registry-discovery.test.ts` swaps the synthetic whitelist-coverage test for an assertion against the real `ubuntu-repo-docker-post-reboot-recovery` registry entry. Behavior: * On unfixed `main`, the live runner's lifecycle phase stops the OpenShell gateway runtime and `docker stop`s the labeled sandbox container. State-validation then runs `nemoclaw <name> status` (which restarts the gateway via systemd) and the destructive `missing` branch in `src/lib/actions/sandbox/status.ts` wipes the local registry entry. The `local-registry-entry-present` probe fails. Scenario goes RED. * On the PR-A fix branch, the new Docker-driver sandbox recovery helper restarts the labeled container before stale-removal can fire, registry survives, all five probes pass. Scenario flips GREEN. The bash-side legacy compiler emits a `lifecycle.profile.post-reboot-recovery` PhaseAction pointing at `nemoclaw_scenarios/lifecycle/dispatch.sh`, but the legacy bash worker is intentionally not provided: this scenario is Vitest-only. The typed runner's `LifecyclePhaseFixture` handles dispatch directly. If the legacy runner is invoked against this scenario it errors out at the dispatcher; that's the right failure mode while the bash side stays on its own retirement clock. Refs #4423
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Warning Review limit reached
More reviews will be available in 56 minutes and 1 second. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (7)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: None Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: Dispatch required scenario E2E:
Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 0 needs attention, 2 worth checking, 1 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Consider writing more tests for
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
Prek hook auto-fixed formatting in 6 files added/touched by this PR. No behavior change.
The biome-format commit accidentally added a node_modules symlink alongside the formatting fixes. Remove it; the directory is already in .gitignore.
…nventory-internals # Conflicts: # test/e2e-scenario/framework-tests/e2e-phase-lifecycle.test.ts # test/e2e-scenario/framework/phases/lifecycle.ts
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Summary
Adds the typed inference runtime helper surface for the Vitest E2E scenario runner.
Related Issue
Refs #4941
Refs #4990
Refs #4349
Depends on #5046, #5052, and the shared runtime-suite base stack.
Stacked on branch
codex/e2e-fanout-01-inventory-internals.Changes
RuntimePhaseFixtureand theruntimeVitest fixture for inference runtime probes.inference.localmodels, chat completion, and HTTP status checks.{ data: [...] }and Ollama-style{ models: [...] }payloads so readiness helpers cannot pass on{}or error-only JSON.curlMaxTimeSecondsascurl --max-time.ProviderClientwith a request-level JSON API that returns both parsed JSON and the capturedShellProbeResult.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesnpm run docsbuilds without warnings (doc changes only)Verified locally:
npx vitest run --project e2e-scenario-framework test/e2e-scenario/framework-tests/e2e-phase-runtime.test.ts test/e2e-scenario/framework-tests/e2e-clients.test.ts --silent=false --reporter=defaultnpx vitest run --project e2e-scenario-framework --silent=false --reporter=defaultnpm run typecheck:clinpx prek run --files test/e2e-scenario/framework/clients/provider.ts test/e2e-scenario/framework/clients/index.ts test/e2e-scenario/framework/e2e-test.ts test/e2e-scenario/framework/phases/index.ts test/e2e-scenario/framework/phases/runtime.ts test/e2e-scenario/framework-tests/e2e-phase-runtime.test.ts --skip test-cligit diff --checkCI/advisor evidence:
e2e-scenarios-all; dispatched run https://github.com/NVIDIA/NemoClaw/actions/runs/27241683412. The relevantubuntu-repo-cloud-openclawscenario passed. The all-run is red due to pre-existing scenario-runner coverage gaps outside this PR's helper surface, including generated scenarios whose onboarding profile ids are not yet implemented bytest/e2e-scenario/nemoclaw_scenarios/onboard/dispatch.sh(for exampleopenai-compatible-openclaw,cloud-nvidia-openclaw-resume-after-interrupt) and a Hermes-specificruntime.hermes.history-writableassertion that fails after onboarding/inference pass because it cannot determine shield state.Note: the full pre-commit hook's
test-clistep still fails locally intest/release-latest-tag.test.tsbecause this machine's global Git config enables SSH commit signing but the private signing key is unavailable. The focused E2E framework suite and CLI typecheck pass.Signed-off-by: Carlos Villela cvillela@nvidia.com