test(e2e): add post-reboot-recovery regression guard for #4423#5046
Conversation
`liveScenarioSupport` previously rejected any scenario that declared an `environment.lifecycle`, so post-onboard host mutations (reboot, rebuild, upgrade, drift) could not surface in the live Vitest matrix at all. Replace the unconditional reject with a `SUPPORTED_LIFECYCLES` whitelist that starts with the single profile the upcoming post-reboot-recovery fixture dispatches: `post-reboot-recovery`. Future profiles must land the dispatcher branch and an expected-state in the same change set, so the whitelist stays in lockstep with what the runner can actually execute. Prepares the runner for #4423's failing-test-first guard, which needs a post-reboot lifecycle scenario to demonstrate registry preservation + Docker-backed sandbox recovery on Linux/Spark Docker-driver hosts. Refs #4423
Adds two host-side state-validation probes the live runner needs to express the regression target tracked by #4423: * `local-registry-entry-present` reads `~/.nemoclaw/sandboxes.json` and asserts the scenario's sandbox name is still recorded. This is deliberately orthogonal to `sandbox.expected`: post-reboot bugs can wipe the local registry while the live OpenShell gateway is healthy, and only a host-side probe catches the data-loss regression. * `docker-sandbox-container-present` runs `docker ps -a --filter label=openshell.ai/sandbox-name=<name>` and accepts running, stopped, or `*-nemoclaw-gpu-backup-*` sibling containers. The label filter mirrors `OPENSHELL_SANDBOX_NAME_LABEL` used by `findOpenShellDockerSandboxContainerIds` in `src/lib/onboard/docker-gpu-patch.ts`, so the probe stays in lock- step with how OpenShell labels containers today. Probe wiring: * `StateProbeId` extended with the two new probe ids. * `ExpectedState` gains `localRegistry` and `dockerSandboxContainer` optional dimensions; `probesForState` emits the new probes only for `expected: "present"`. Negative-direction probes are intentionally omitted today and pinned by a probesForState test. * `StateValidationPhaseFixture.from()` now accepts either an expected-state ID or an inline `ExpectedState`, so unit tests can drive new probes without registering synthetic states in the typed registry. The live runner still calls `from(id, instance)`. * Fixture takes an optional `ProbeIO` injection so tests can stub the registry reader without touching `~/.nemoclaw`. No callers of the existing typed registry are affected: every shipped expected-state leaves `localRegistry` and `dockerSandboxContainer` unset, so `probesForState` returns the same probe lists as before. Refs #4423
Adds a Vitest phase fixture that mutates host state between onboarding
and state-validation, so live scenarios can express post-onboard
invariants the legacy bash runner has no equivalent for.
`LifecyclePhaseFixture.simulate("post-reboot-recovery", instance, opts)`
reproduces the host-side conditions of a DGX Spark / Linux Docker-driver
reboot in two modes:
* `stop-original` (default) — `openshell gateway stop` + `docker
stop` of the labeled sandbox
container. Models the common reboot
outcome where OpenShell forgets the
sandbox while Docker keeps the
container exited but labeled.
* `rename-to-gpu-backup` — additionally `docker rename`s the
container to a `*-nemoclaw-gpu-
backup-<ts>` sibling, mirroring the
GPU-patch reboot path in
`src/lib/onboard/docker-gpu-patch.ts`.
Both modes register cleanups (in reverse order) to restore the
container so test teardown leaves Docker in a usable state.
Wiring:
* `framework/phases/index.ts` re-exports the fixture and types.
* `framework/e2e-test.ts` registers a `lifecycle` Vitest fixture on
`E2EScenarioFixtures`, wired with the shared `host`, `sandbox`,
and `cleanup` registries.
* `live/registry-scenarios.test.ts` invokes
`lifecycle.simulate(profile, instance)` between `onboard.from(...)`
and `stateValidation.from(...)` whenever the scenario declares a
whitelisted `environment.lifecycle`. Scenarios that omit lifecycle
are unaffected. A scenario whose lifecycle is whitelisted by
`runtime-support.ts` but NOT dispatched by the fixture fails fast
with a clear error so the whitelist and dispatcher stay in lock-
step.
Coverage in `e2e-phase-lifecycle.test.ts` exercises both modes,
gateway-stop tolerance, the no-labeled-container failure case, the
docker-discover failure case, the unsupported-profile rejection,
the cleanup queue order, and `buildBackupContainerName` truncation.
The fixture is intentionally narrow on profiles: only
`post-reboot-recovery` is dispatched today. Adding rebuild, upgrade,
or drift profiles is a separate, equally narrow change set that must
land the dispatcher branch and `SUPPORTED_LIFECYCLES` whitelist
together.
Refs #4423
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughAdds a LifecyclePhaseFixture for post-reboot recovery, host-observable expected-state probes (registry and Docker container), fixture wiring and allowlist-based lifecycle support, scenario/manifest/matrix/workflow routing for a new scenario, and tests for lifecycle flows and probe validation. ChangesLifecycle Phase Fixture and Host-Side Probe Support
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsStopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: Dispatch required scenario E2E:
Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 2 needs attention, 10 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Consider writing more tests for
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e-scenario/framework/phases/lifecycle.ts`:
- Around line 123-124: The code deterministically picks a primary container with
const originalName = containerNames[0], which is unstable; instead select the
intended original container by filtering containerNames for the exact/non-backup
label (e.g., choose the name that does NOT include '-nemoclaw-gpu-backup-' or
that matches the original label value) using containerNames.find(...) and fall
back with a clear error if none found; apply the same deterministic selection to
the other spots referenced in the block covering the later usages (the code
around lines 169-196) so references to originalName always point to the
non-backup/original container.
- Around line 109-113: The lifecycle phase currently treats any non-zero result
from the gateway-stop step as acceptable, which can hide real errors; update the
handling around the gateway-stop step (the steps.push call that uses id:
"gateway-stop" and the gatewayStop result) so it only tolerates the specific "no
runtime" failure mode (e.g., the NoSuchProcess error indicator or the specific
exit code/message your CLI emits when no runtime exists). Concretely, replace
the broad acceptance with logic that inspects gatewayStop's result/error
(exitCode, error.name, or stderr text) and only marks the step as non-fatal when
that inspection matches the expected NoSuchProcess/no-runtime signature; for all
other non-zero results let the lifecycle phase fail as before. Ensure the
matching is robust (check both name and stderr substring or exact exit code) and
keep the step ID "gateway-stop" and the gatewayStop variable name so it’s easy
to locate.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: be3abf06-7822-4def-8b49-b16f2bc57a67
📒 Files selected for processing (12)
test/e2e-scenario/framework-tests/e2e-expected-state.test.tstest/e2e-scenario/framework-tests/e2e-live-registry-discovery.test.tstest/e2e-scenario/framework-tests/e2e-phase-lifecycle.test.tstest/e2e-scenario/framework-tests/e2e-phase-state-validation.test.tstest/e2e-scenario/framework/e2e-test.tstest/e2e-scenario/framework/phases/index.tstest/e2e-scenario/framework/phases/lifecycle.tstest/e2e-scenario/framework/phases/state-validation.tstest/e2e-scenario/live/registry-scenarios.test.tstest/e2e-scenario/scenarios/expected-states.tstest/e2e-scenario/scenarios/runtime-support.tstest/e2e-scenario/scenarios/types.ts
| // gateway stop is best-effort: a fresh-start/no-runtime gateway | ||
| // will exit non-zero with NoSuchProcess, which is exactly the | ||
| // post-reboot state we want to simulate. Don't fail the lifecycle | ||
| // phase on it. | ||
| steps.push({ id: "gateway-stop", results: [gatewayStop] }); |
There was a problem hiding this comment.
Narrow gateway-stop tolerance to the expected “no runtime” failure mode.
Lines 109-113 currently allow any non-zero gateway-stop result. That can mask real failures (CLI unavailable, permission issues, timeout) and continue lifecycle simulation in a bad state.
Proposed fix
steps.push({ id: "gateway-stop", results: [gatewayStop] });
+ if (gatewayStop.exitCode !== 0) {
+ const combinedOutput = `${gatewayStop.stdout}\n${gatewayStop.stderr}`;
+ const isExpectedNoRuntime = /no\s+such\s+process|no\s+gateway\s+runtime/i.test(
+ combinedOutput,
+ );
+ if (!isExpectedNoRuntime) {
+ throw new Error(
+ `lifecycle.post-reboot-recovery failed to stop gateway (exit ${gatewayStop.exitCode}).`,
+ );
+ }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // gateway stop is best-effort: a fresh-start/no-runtime gateway | |
| // will exit non-zero with NoSuchProcess, which is exactly the | |
| // post-reboot state we want to simulate. Don't fail the lifecycle | |
| // phase on it. | |
| steps.push({ id: "gateway-stop", results: [gatewayStop] }); | |
| // gateway stop is best-effort: a fresh-start/no-runtime gateway | |
| // will exit non-zero with NoSuchProcess, which is exactly the | |
| // post-reboot state we want to simulate. Don't fail the lifecycle | |
| // phase on it. | |
| steps.push({ id: "gateway-stop", results: [gatewayStop] }); | |
| if (gatewayStop.exitCode !== 0) { | |
| const combinedOutput = `${gatewayStop.stdout}\n${gatewayStop.stderr}`; | |
| const isExpectedNoRuntime = /no\s+such\s+process|no\s+gateway\s+runtime/i.test( | |
| combinedOutput, | |
| ); | |
| if (!isExpectedNoRuntime) { | |
| throw new Error( | |
| `lifecycle.post-reboot-recovery failed to stop gateway (exit ${gatewayStop.exitCode}).`, | |
| ); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e-scenario/framework/phases/lifecycle.ts` around lines 109 - 113, The
lifecycle phase currently treats any non-zero result from the gateway-stop step
as acceptable, which can hide real errors; update the handling around the
gateway-stop step (the steps.push call that uses id: "gateway-stop" and the
gatewayStop result) so it only tolerates the specific "no runtime" failure mode
(e.g., the NoSuchProcess error indicator or the specific exit code/message your
CLI emits when no runtime exists). Concretely, replace the broad acceptance with
logic that inspects gatewayStop's result/error (exitCode, error.name, or stderr
text) and only marks the step as non-fatal when that inspection matches the
expected NoSuchProcess/no-runtime signature; for all other non-zero results let
the lifecycle phase fail as before. Ensure the matching is robust (check both
name and stderr substring or exact exit code) and keep the step ID
"gateway-stop" and the gatewayStop variable name so it’s easy to locate.
| const originalName = containerNames[0]; | ||
|
|
There was a problem hiding this comment.
Avoid nondeterministic primary-container selection when multiple labeled containers exist.
Line 123 picks containerNames[0] directly. docker ps -a ordering is not a stable contract; with both original and *-nemoclaw-gpu-backup-* siblings labeled, this can stop/rename the wrong container and register incorrect cleanup targets.
Proposed fix
- const originalName = containerNames[0];
+ const primaryCandidates = containerNames.filter(
+ (name) => !name.includes("-nemoclaw-gpu-backup-"),
+ );
+ if (primaryCandidates.length !== 1) {
+ throw new Error(
+ `lifecycle.post-reboot-recovery expected exactly one primary labeled container ` +
+ `(found ${primaryCandidates.length}, total labeled ${containerNames.length}).`,
+ );
+ }
+ const originalName = primaryCandidates[0];Also applies to: 169-196
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e-scenario/framework/phases/lifecycle.ts` around lines 123 - 124, The
code deterministically picks a primary container with const originalName =
containerNames[0], which is unstable; instead select the intended original
container by filtering containerNames for the exact/non-backup label (e.g.,
choose the name that does NOT include '-nemoclaw-gpu-backup-' or that matches
the original label value) using containerNames.find(...) and fall back with a
clear error if none found; apply the same deterministic selection to the other
spots referenced in the block covering the later usages (the code around lines
169-196) so references to originalName always point to the non-backup/original
container.
|
Tracking link: this PR is part of the #4941 / #4990 Vitest E2E scenario migration path, specifically the lifecycle-fixture prerequisite after #5002-#5006. It also provides framework guardrails needed for #4423; the follow-up scenario/expected-state PR can carry the RED guard once these primitives land. Leaving the current static-check cleanup with @jyaunches. |
Registers the failing-test-first guard for #4423 in the typed scenario registry so the live Vitest matrix from #5006 fans it out as a dedicated CI job. Builds on the framework primitives added earlier in this PR (lifecycle phase fixture, host-side probes, lifecycle whitelist). Additions: * `post-reboot-recovery-ready` expected-state in `scenarios/expected-states.ts` declaring the user-visible invariants that must hold after a `nemoclaw <name> status` call on a freshly-rebooted DGX Spark / Linux Docker-driver host: - cli installed, - gateway healthy (the user-systemd unit from #4580 brings it back up before status runs), - sandbox running (recovery completed in time), - localRegistry entry preserved (the user-visible regression target — destroyed on unfixed `main`), - dockerSandboxContainer present (recovery didn't delete the labeled container or its `*-nemoclaw-gpu-backup-*` sibling). * `ubuntu-repo-docker-post-reboot-recovery` scenario in `scenarios/scenarios/baseline.ts` wiring `ubuntuRepoDockerLifecycle("cloud-openclaw", "post-reboot-recovery")` against the new expected-state and a smoke suite. Carries a description that explains the RED/GREEN contract and points to the PR-A fix landing in `src/lib/`. * `manifests/openclaw-nvidia-post-reboot-recovery.yaml` declares `lifecycle: post-reboot-recovery` and the same NVIDIA_API_KEY credential ref the cloud-openclaw scenarios use. * `.github/workflows/e2e-scenarios.yaml` ROUTES table gains the new scenario so the workflow-boundary test (`e2e-scenarios-workflow.test.ts`) routes every typed id. Test pinning: * `e2e-scenario-matrix.test.ts` updated from a 1-entry to a 2-entry live matrix expectation. The new entry asserts on `expectedStateId: "post-reboot-recovery-ready"` so a future accidental dropped-lifecycle change to the scenario regresses loudly. * `e2e-live-registry-discovery.test.ts` swaps the synthetic whitelist-coverage test for an assertion against the real `ubuntu-repo-docker-post-reboot-recovery` registry entry. Behavior: * On unfixed `main`, the live runner's lifecycle phase stops the OpenShell gateway runtime and `docker stop`s the labeled sandbox container. State-validation then runs `nemoclaw <name> status` (which restarts the gateway via systemd) and the destructive `missing` branch in `src/lib/actions/sandbox/status.ts` wipes the local registry entry. The `local-registry-entry-present` probe fails. Scenario goes RED. * On the PR-A fix branch, the new Docker-driver sandbox recovery helper restarts the labeled container before stale-removal can fire, registry survives, all five probes pass. Scenario flips GREEN. The bash-side legacy compiler emits a `lifecycle.profile.post-reboot-recovery` PhaseAction pointing at `nemoclaw_scenarios/lifecycle/dispatch.sh`, but the legacy bash worker is intentionally not provided: this scenario is Vitest-only. The typed runner's `LifecyclePhaseFixture` handles dispatch directly. If the legacy runner is invoked against this scenario it errors out at the dispatcher; that's the right failure mode while the bash side stays on its own retirement clock. Refs #4423
Prek hook auto-fixed formatting in 6 files added/touched by this PR. No behavior change.
The biome-format commit accidentally added a node_modules symlink alongside the formatting fixes. Remove it; the directory is already in .gitignore.
The first dispatch of the new post-reboot-recovery scenario surfaced a flow gap: state-validation asserted gateway-healthy immediately after lifecycle.simulate stopped the gateway, never running the user-visible action that actually triggers #4423's regression. Add `nemoclaw <name> status` as the final step of simulatePostReboot. This is faithful to the bug semantics — "after a reboot, the FIRST status invocation is what destroys state" — and lets state-validation observe the post-status host invariants: * On unfixed code: status restarts the gateway via the user-systemd unit from #4580, sees sandbox NotFound, runs the destructive `missing` branch in src/lib/actions/sandbox/status.ts, registry is wiped → `local-registry-entry-present` probe fails → scenario RED. * On the PR-A fix branch: status restarts the gateway, recovery helper restarts the labeled container before stale-removal can fire, registry survives → all probes pass → scenario GREEN. Status exit code is intentionally not asserted: the bug is precisely that status "succeeds" at destroying state. The post-action invariants are owned by state-validation, not lifecycle. Refs #4423
…-up to #5046) (#5087) ## Summary Follow-up to #5046. Tightens the `ubuntu-repo-docker-post-reboot-recovery` scenario into a stable RED-on-regression / GREEN-on-main guard while PR-A (the `#4423` source fix) is in flight. ## Related Issue Refs #4423 ## Why this is needed Post-merge dispatches of #5046's `ubuntu-repo-docker-post-reboot-recovery` job against `ubuntu-latest` revealed two issues: 1. **The `openshell gateway stop` step in `LifecyclePhaseFixture` was counterproductive.** `#4578`'s mitigation in `src/lib/actions/sandbox/status.ts:296` guards the destructive branch behind a `healthy_named` gateway. Stopping the gateway routed every dispatch through the safe path and masked the regression target. There is no `openshell gateway start` subcommand on `ubuntu-latest` (the user-systemd autostart from #4580 is the real mechanism), so the original lifecycle couldn't restart the gateway cleanly to keep it `healthy_named`. 2. **The `post-reboot-recovery-ready` expected-state asserted on runtime liveness probes** (`gateway-healthy`, `sandbox-running`) that are environmental on `ubuntu-latest` after `docker stop`. They varied independently of the `#4423` regression target and made every dispatch RED for the wrong reason — concretely, `gateway-healthy` failing before the host-side probes had a chance to run. ## Changes ### `LifecyclePhaseFixture.simulatePostReboot` - Drop the `openshell gateway stop` step. The lifecycle now stops only the labeled sandbox container and then runs `nemoclaw <name> status`. Gateway is left `healthy_named`, which is the precondition for the destructive branch we want PR-A to neutralize. - Drop the `GATEWAY_STOP_TIMEOUT_MS` constant (no longer referenced). - Updated docstring + 2 unit tests removed (gateway-stop-failure tolerance tests are no longer applicable). ### `post-reboot-recovery-ready` expected-state Lock down only the host-side preservation invariants this scenario can faithfully exercise: - `cli-installed` - `local-registry-entry-present` - `docker-sandbox-container-present` (running OR stopped OR `*-nemoclaw-gpu-backup-*` sibling) Drop `gateway: { expected: "present", health: "healthy" }` and `sandbox: { expected: "present", status: "running", agent: "openclaw" }`. A follow-up scenario on a more controlled runner (real DGX Spark / Brev) can extend the expected state with runtime liveness probes once PR-A has landed and a true post-reboot environment is available. ### `probesForState` ordering Reorder so host-side observables (`local-registry-entry-present`, `docker-sandbox-container-present`) emit BEFORE runtime-derived gateway/sandbox probes everywhere. The state-validation orchestrator short-circuits on the first probe failure, so host-side regression targets must be observed first. Pre-existing scenarios (`cloud-openclaw-ready`, `preflight-failure-*`, etc.) are unaffected — none declare host-side dimensions, so their probe lists are unchanged. ## RED / GREEN contract - **On `main`** (this PR's HEAD): scenario goes 🟢 GREEN. Locks in `#4578`'s mitigation as a passive guard — registry preserved through `docker stop` + `nemoclaw status` round-trip. - **On a regression PR** that re-introduces unconditional `registry.removeSandbox` in `status.ts:296` or `gateway-state.ts:412`: scenario goes 🔴 RED at `local-registry-entry-present`. - **On PR-A's fix branch** that adds Docker-driver sandbox recovery: scenario stays 🟢, demonstrating the new helper does not regress either host-side invariant. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [x] `npm test` passes — 142 tests across the directly affected files all green; no regressions on the rest of the e2e-scenario framework suite. - [x] Live dispatch (`gh workflow run e2e-vitest-scenarios.yaml -f scenarios=ubuntu-repo-docker-post-reboot-recovery --ref e2e-post-reboot-scope-host-side`) goes GREEN: lifecycle (`docker stop` + `nemoclaw status`) → host-side probes pass → cleanup restores Docker. Will dispatch as a verification step once this PR is open. - [x] Tests added or updated for new or changed behavior. - [x] No secrets, API keys, or credentials committed. ## Follow-ups (post PR-A) - Once PR-A's Docker-driver recovery helper is in place, extend the post-reboot expected-state with gateway/sandbox runtime probes on a controlled runner. - Consider a `post-reboot-spark` lifecycle profile for hardware-runner dispatch that can do a real reboot and exercise the full bug class. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Added a validation test for post-reboot-recovery expected-state focusing on host-side invariants. * Updated post-reboot-recovery phase tests to remove the gateway-stop step and assert container stop + host status checks. * **Documentation** * Clarified lifecycle and scenario comments/manifests to describe container-stop + host-status-driven recovery and probe ordering. * **Behavior** * Expected-state and probe emission now prioritize host-side checks and omit runtime gateway/sandbox expectations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
…5052) ## Summary Extends the E2E migration inventory beyond direct legacy `test/e2e/test-*.sh` entrypoints so internal legacy runner surfaces are also guarded before deletion. The inventory now tracks coarse runner-internal groups for shell scenario workers, validation suites, onboarding assertion workers, TypeScript shell-runner orchestrators, and runtime helper libraries. This branch also merges the updated #5046 base to pick up the accidental `node_modules` symlink removal and carries one formatter-only wrap in `src/commands/sandbox/agents/list.ts` so the all-files static hook stays green on this stack branch. ## Related Issue Refs #4941 Refs #4990 Refs #4357 Depends on #5046 and the shared runtime-suite base stack. ## Changes - Added `internalSurfaces` records to `test/e2e-scenario/migration/legacy-inventory.json` for legacy runner internals. - Extended the migration inventory gate test to validate internal-surface IDs, path existence, owner issues, replacement surfaces, deletion readiness, and path coverage. - Documented that the inventory remains a deletion gate, not a progress dashboard, while now covering coarse internal runner surfaces as well as direct scripts. - Applied one formatter-only wrap required by the all-files static hook. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [x] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification Focused verification run: `npx vitest run --project e2e-scenario-framework test/e2e-scenario/framework-tests/e2e-migration-inventory.test.ts test/e2e-scenario/framework-tests/e2e-migration-inventory-lock.test.ts --silent=false --reporter=default` Static-check parity run: `npm run validate:configs && npx prek run --all-files --stage pre-push --skip tsc-plugin --skip tsc-js --skip tsc-cli --skip version-tag-sync --skip test-cli --skip test-plugin --skip source-shape-test-budget --skip test-file-size-budget --skip test-skills-yaml && npm run source-shape:check && npm run test-size:check && npx vitest run test/skills-frontmatter.test.ts && python3 scripts/generate-platform-docs.py --check` Additional local check: `npx vitest run --project cli test/docker-abstraction-guard.test.ts --silent=false --reporter=default` - [ ] `npx prek run --all-files` passes - [ ] `npm test` passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `npm run docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) --- <!-- DCO sign-off required by CI. Run: git config user.name && git config user.email --> Signed-off-by: Carlos Villela <cvillela@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Clarified requirements for migration inventory and deletion-readiness rules * Improved formatting of migration tracking documentation * **Tests** * Extended migration inventory validation to include internal migration surfaces * Enhanced test logic for migration deletion-readiness verification * **Chores** * Updated legacy inventory registry to track internal migration surfaces <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Carlos Villela <cvillela@nvidia.com> Co-authored-by: Julie Yaunches <jyaunches@nvidia.com>
## Summary Adds the typed inference runtime helper surface for the Vitest E2E scenario runner. ## Related Issue Refs #4941 Refs #4990 Refs #4349 Depends on #5046, #5052, and the shared runtime-suite base stack. Stacked on branch `codex/e2e-fanout-01-inventory-internals`. ## Changes - Added `RuntimePhaseFixture` and the `runtime` Vitest fixture for inference runtime probes. - Added reusable helpers for sandbox-side `inference.local` models, chat completion, and HTTP status checks. - Added trusted-provider compatible endpoint helpers for models/chat probes while preserving shell-probe artifact capture and redaction. - Validate model-list responses for OpenAI-style `{ data: [...] }` and Ollama-style `{ models: [...] }` payloads so readiness helpers cannot pass on `{}` or error-only JSON. - Auto-redact sensitive custom header values and honor provider `curlMaxTimeSeconds` as `curl --max-time`. - Extended `ProviderClient` with a request-level JSON API that returns both parsed JSON and the captured `ShellProbeResult`. - Added framework tests for route normalization, argv construction, redaction values, provider-compatible requests, model-list validation, provider curl timeout propagation, and malformed response handling. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [ ] `npx prek run --all-files` passes - [ ] `npm test` passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `npm run docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) Verified locally: - `npx vitest run --project e2e-scenario-framework test/e2e-scenario/framework-tests/e2e-phase-runtime.test.ts test/e2e-scenario/framework-tests/e2e-clients.test.ts --silent=false --reporter=default` - `npx vitest run --project e2e-scenario-framework --silent=false --reporter=default` - `npm run typecheck:cli` - `npx prek run --files test/e2e-scenario/framework/clients/provider.ts test/e2e-scenario/framework/clients/index.ts test/e2e-scenario/framework/e2e-test.ts test/e2e-scenario/framework/phases/index.ts test/e2e-scenario/framework/phases/runtime.ts test/e2e-scenario/framework-tests/e2e-phase-runtime.test.ts --skip test-cli` - `git diff --check` CI/advisor evidence: - Required PR checks are green on the PR head. - PR review advisor: 0 needs attention, 0 worth checking, 0 nice ideas. - E2E recommendation advisor: no product E2E required. - E2E scenario advisor requested `e2e-scenarios-all`; dispatched run https://github.com/NVIDIA/NemoClaw/actions/runs/27241683412. The relevant `ubuntu-repo-cloud-openclaw` scenario passed. The all-run is red due to pre-existing scenario-runner coverage gaps outside this PR's helper surface, including generated scenarios whose onboarding profile ids are not yet implemented by `test/e2e-scenario/nemoclaw_scenarios/onboard/dispatch.sh` (for example `openai-compatible-openclaw`, `cloud-nvidia-openclaw-resume-after-interrupt`) and a Hermes-specific `runtime.hermes.history-writable` assertion that fails after onboarding/inference pass because it cannot determine shield state. Note: the full pre-commit hook's `test-cli` step still fails locally in `test/release-latest-tag.test.ts` because this machine's global Git config enables SSH commit signing but the private signing key is unavailable. The focused E2E framework suite and CLI typecheck pass. --- <!-- DCO sign-off required by CI. Run: git config user.name && git config user.email --> Signed-off-by: Carlos Villela <cvillela@nvidia.com> --------- Signed-off-by: Carlos Villela <cvillela@nvidia.com> Co-authored-by: Julie Yaunches <jyaunches@nvidia.com>
Summary
Failing-test-first regression scaffolding for #4423, plus the framework primitives that make post-reboot-style lifecycle scenarios expressible in the live Vitest registry. After this PR lands, the live matrix from #5006 fans out a dedicated
ubuntu-repo-docker-post-reboot-recoveryjob that locks the host-side preservation invariants on every dispatch and is the running surface for PR-A's Docker-driver recovery helper to extend.Related Issue
Refs #4423
What this PR adds
Framework primitives (commits 1–3)
SUPPORTED_LIFECYCLESwhitelist inscenarios/runtime-support.ts. Replaces the unconditional reject forenvironment.lifecyclewith a lock-step whitelist, so lifecycle scenarios can graduate one profile at a time. Initial whitelist:post-reboot-recovery.Two host-side state-validation probes:
local-registry-entry-present— reads~/.nemoclaw/sandboxes.jsonand asserts the scenario's sandbox name is recorded. This is the user-visible regression target for [DGX Spark][Sandbox] post-reboot first 'nemoclaw <name> status' destroys sandbox via "Removed stale local registry entry" #4423.docker-sandbox-container-present— runsdocker ps -a --filter label=openshell.ai/sandbox-name=…and accepts running, stopped, and*-nemoclaw-gpu-backup-*siblings. MirrorsOPENSHELL_SANDBOX_NAME_LABEL/findOpenShellDockerSandboxContainerIdsinsrc/lib/onboard/docker-gpu-patch.ts.New
ExpectedState.localRegistryandExpectedState.dockerSandboxContainerdimensions;probesForStateemits the probes only forexpected: "present".StateValidationPhaseFixture.from()now accepts an inlineExpectedStateso unit tests don't need to register synthetic states. OptionalProbeIOinjection for the registry reader.LifecyclePhaseFixtureinframework/phases/lifecycle.ts. Dispatchessimulate(profile, instance, opts). Today the only profile ispost-reboot-recovery, with two modes:stop-original(default) —docker stopof the labeled sandbox container, then invokesnemoclaw <name> status(the user-visible action that exposed [DGX Spark][Sandbox] post-reboot first 'nemoclaw <name> status' destroys sandbox via "Removed stale local registry entry" #4423).rename-to-gpu-backup— additionally renames the container to a*-nemoclaw-gpu-backup-<ts>sibling, mirroringbuildBackupContainerName()insrc/lib/onboard/docker-gpu-patch.ts.Both modes register reverse-order cleanups so teardown leaves Docker usable. Wired into
E2EScenarioFixturesand intolive/registry-scenarios.test.tsbetweenonboard.from()andstateValidation.from()whenever the scenario declaresenvironment.lifecycle.The regression scaffold (commits 4–8)
ubuntu-repo-docker-post-reboot-recoveryscenario +post-reboot-recovery-readyexpected-state + manifest YAML. The expected-state is intentionally scoped to the host-side preservation invariants this scenario can faithfully exercise fromubuntu-latest:cliinstalled,localRegistryentry preserved (the user-visible regression target),dockerSandboxContainer(labeled, including*-nemoclaw-gpu-backup-*siblings) present..github/workflows/e2e-scenarios.yamlROUTES table gains the new id soe2e-scenarios-workflow.test.tskeeps every typed id routed.Honest scope
While iterating end-to-end against
ubuntu-latestI confirmed the full#4423bug class cannot be reproduced from CI without a real reboot:docker stopof the labeled sandbox container puts OpenShell intoPhase: Provisioning, not literalNotFound— so the destructivemissingbranch insrc/lib/actions/sandbox/status.ts:308does not fire on unfixed code.ubuntu-latestdoes not expose agateway startsubcommand (the user-systemd autostart from fix(onboard): use OpenShell gateway user service #4580 is the real autostart path), so we can't cleanly reset the gateway to a fresh-state-with-no-sandbox-memory configuration that yields trueNotFound.The expected-state therefore locks the host-side invariants (
localRegistry,dockerSandboxContainer) rather than asserting on gateway/sandbox runtime liveness, which is environmental onubuntu-latestafterdocker stopand would mask the real signal. This makes the scenario:#4578's mitigation (passivestatusdoes not destroy registry onNotFound).Type of Change
Verification
npx prek run --all-filespassesnpm testpasses — 142 tests across the directly affected files all green; full e2e-scenario framework suite passes other than the pre-existingshell probe reaps timed-out command process groupsload-flake ine2e-fixture-context.test.ts(unchanged onmain).--emit-live-matrixnow emits bothubuntu-repo-cloud-openclawandubuntu-repo-docker-post-reboot-recoveryas supported live scenarios; the workflow fan-out from ci(e2e): fan out Vitest scenarios from registry #5006 dispatches each as its own CI job.ubuntu-latestexercises the full pipeline: onboard → lifecycle (docker stop+nemoclaw status) → host-side probes → cleanup.npm run docsbuilds without warnings (doc changes only).Boundaries kept honest
src/lib/**intoframework/**. The two duplicated constants (OPENSHELL_SANDBOX_NAME_LABEL,~/.nemoclaw/sandboxes.jsonpath) carry comments pointing back to the CLI source-of-truth.LifecyclePhaseFixtureprofile dispatch is exhaustive (never-checked). New profiles must add the dispatcher branch and theSUPPORTED_LIFECYCLESwhitelist together.expected: "absent"forlocalRegistry/dockerSandboxContainer) deliberately emit nothing today, with aprobesForStatetest pinning the gap.nemoclaw_scenarios/lifecycle/dispatch.sh) is intentionally not updated forpost-reboot-recovery. This scenario is Vitest-only; the typed runner'sLifecyclePhaseFixturehandles dispatch directly.Follow-ups (post PR-A)
framework/phases/lifecycle-postreboot-spark.tsprofile for hardware-runner dispatch that can do a real reboot and exercise the full bug class.