test(e2e): extend migration inventory to scenario runner internals by cv · Pull Request #5052 · NVIDIA/NemoClaw

cv · 2026-06-09T18:06:22Z

Summary

Extends the E2E migration inventory beyond direct legacy test/e2e/test-*.sh entrypoints so internal legacy runner surfaces are also guarded before deletion. The inventory now tracks coarse runner-internal groups for shell scenario workers, validation suites, onboarding assertion workers, TypeScript shell-runner orchestrators, and runtime helper libraries.

This branch also merges the updated #5046 base to pick up the accidental node_modules symlink removal and carries one formatter-only wrap in src/commands/sandbox/agents/list.ts so the all-files static hook stays green on this stack branch.

Related Issue

Refs #4941
Refs #4990
Refs #4357
Depends on #5046 and the shared runtime-suite base stack.

Changes

Added internalSurfaces records to test/e2e-scenario/migration/legacy-inventory.json for legacy runner internals.
Extended the migration inventory gate test to validate internal-surface IDs, path existence, owner issues, replacement surfaces, deletion readiness, and path coverage.
Documented that the inventory remains a deletion gate, not a progress dashboard, while now covering coarse internal runner surfaces as well as direct scripts.
Applied one formatter-only wrap required by the all-files static hook.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

Focused verification run:
npx vitest run --project e2e-scenario-framework test/e2e-scenario/framework-tests/e2e-migration-inventory.test.ts test/e2e-scenario/framework-tests/e2e-migration-inventory-lock.test.ts --silent=false --reporter=default

Static-check parity run:
npm run validate:configs && npx prek run --all-files --stage pre-push --skip tsc-plugin --skip tsc-js --skip tsc-cli --skip version-tag-sync --skip test-cli --skip test-plugin --skip source-shape-test-budget --skip test-file-size-budget --skip test-skills-yaml && npm run source-shape:check && npm run test-size:check && npx vitest run test/skills-frontmatter.test.ts && python3 scripts/generate-platform-docs.py --check

Additional local check:
npx vitest run --project cli test/docker-abstraction-guard.test.ts --silent=false --reporter=default

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

Documentation
- Clarified requirements for migration inventory and deletion-readiness rules
- Improved formatting of migration tracking documentation
Tests
- Extended migration inventory validation to include internal migration surfaces
- Enhanced test logic for migration deletion-readiness verification
Chores
- Updated legacy inventory registry to track internal migration surfaces

`liveScenarioSupport` previously rejected any scenario that declared an `environment.lifecycle`, so post-onboard host mutations (reboot, rebuild, upgrade, drift) could not surface in the live Vitest matrix at all. Replace the unconditional reject with a `SUPPORTED_LIFECYCLES` whitelist that starts with the single profile the upcoming post-reboot-recovery fixture dispatches: `post-reboot-recovery`. Future profiles must land the dispatcher branch and an expected-state in the same change set, so the whitelist stays in lockstep with what the runner can actually execute. Prepares the runner for #4423's failing-test-first guard, which needs a post-reboot lifecycle scenario to demonstrate registry preservation + Docker-backed sandbox recovery on Linux/Spark Docker-driver hosts. Refs #4423

Adds two host-side state-validation probes the live runner needs to express the regression target tracked by #4423: * `local-registry-entry-present` reads `~/.nemoclaw/sandboxes.json` and asserts the scenario's sandbox name is still recorded. This is deliberately orthogonal to `sandbox.expected`: post-reboot bugs can wipe the local registry while the live OpenShell gateway is healthy, and only a host-side probe catches the data-loss regression. * `docker-sandbox-container-present` runs `docker ps -a --filter label=openshell.ai/sandbox-name=<name>` and accepts running, stopped, or `*-nemoclaw-gpu-backup-*` sibling containers. The label filter mirrors `OPENSHELL_SANDBOX_NAME_LABEL` used by `findOpenShellDockerSandboxContainerIds` in `src/lib/onboard/docker-gpu-patch.ts`, so the probe stays in lock- step with how OpenShell labels containers today. Probe wiring: * `StateProbeId` extended with the two new probe ids. * `ExpectedState` gains `localRegistry` and `dockerSandboxContainer` optional dimensions; `probesForState` emits the new probes only for `expected: "present"`. Negative-direction probes are intentionally omitted today and pinned by a probesForState test. * `StateValidationPhaseFixture.from()` now accepts either an expected-state ID or an inline `ExpectedState`, so unit tests can drive new probes without registering synthetic states in the typed registry. The live runner still calls `from(id, instance)`. * Fixture takes an optional `ProbeIO` injection so tests can stub the registry reader without touching `~/.nemoclaw`. No callers of the existing typed registry are affected: every shipped expected-state leaves `localRegistry` and `dockerSandboxContainer` unset, so `probesForState` returns the same probe lists as before. Refs #4423

Adds a Vitest phase fixture that mutates host state between onboarding and state-validation, so live scenarios can express post-onboard invariants the legacy bash runner has no equivalent for. `LifecyclePhaseFixture.simulate("post-reboot-recovery", instance, opts)` reproduces the host-side conditions of a DGX Spark / Linux Docker-driver reboot in two modes: * `stop-original` (default) — `openshell gateway stop` + `docker stop` of the labeled sandbox container. Models the common reboot outcome where OpenShell forgets the sandbox while Docker keeps the container exited but labeled. * `rename-to-gpu-backup` — additionally `docker rename`s the container to a `*-nemoclaw-gpu- backup-<ts>` sibling, mirroring the GPU-patch reboot path in `src/lib/onboard/docker-gpu-patch.ts`. Both modes register cleanups (in reverse order) to restore the container so test teardown leaves Docker in a usable state. Wiring: * `framework/phases/index.ts` re-exports the fixture and types. * `framework/e2e-test.ts` registers a `lifecycle` Vitest fixture on `E2EScenarioFixtures`, wired with the shared `host`, `sandbox`, and `cleanup` registries. * `live/registry-scenarios.test.ts` invokes `lifecycle.simulate(profile, instance)` between `onboard.from(...)` and `stateValidation.from(...)` whenever the scenario declares a whitelisted `environment.lifecycle`. Scenarios that omit lifecycle are unaffected. A scenario whose lifecycle is whitelisted by `runtime-support.ts` but NOT dispatched by the fixture fails fast with a clear error so the whitelist and dispatcher stay in lock- step. Coverage in `e2e-phase-lifecycle.test.ts` exercises both modes, gateway-stop tolerance, the no-labeled-container failure case, the docker-discover failure case, the unsupported-profile rejection, the cleanup queue order, and `buildBackupContainerName` truncation. The fixture is intentionally narrow on profiles: only `post-reboot-recovery` is dispatched today. Adding rebuild, upgrade, or drift profiles is a separate, equally narrow change set that must land the dispatcher branch and `SUPPORTED_LIFECYCLES` whitelist together. Refs #4423

Registers the failing-test-first guard for #4423 in the typed scenario registry so the live Vitest matrix from #5006 fans it out as a dedicated CI job. Builds on the framework primitives added earlier in this PR (lifecycle phase fixture, host-side probes, lifecycle whitelist). Additions: * `post-reboot-recovery-ready` expected-state in `scenarios/expected-states.ts` declaring the user-visible invariants that must hold after a `nemoclaw <name> status` call on a freshly-rebooted DGX Spark / Linux Docker-driver host: - cli installed, - gateway healthy (the user-systemd unit from #4580 brings it back up before status runs), - sandbox running (recovery completed in time), - localRegistry entry preserved (the user-visible regression target — destroyed on unfixed `main`), - dockerSandboxContainer present (recovery didn't delete the labeled container or its `*-nemoclaw-gpu-backup-*` sibling). * `ubuntu-repo-docker-post-reboot-recovery` scenario in `scenarios/scenarios/baseline.ts` wiring `ubuntuRepoDockerLifecycle("cloud-openclaw", "post-reboot-recovery")` against the new expected-state and a smoke suite. Carries a description that explains the RED/GREEN contract and points to the PR-A fix landing in `src/lib/`. * `manifests/openclaw-nvidia-post-reboot-recovery.yaml` declares `lifecycle: post-reboot-recovery` and the same NVIDIA_API_KEY credential ref the cloud-openclaw scenarios use. * `.github/workflows/e2e-scenarios.yaml` ROUTES table gains the new scenario so the workflow-boundary test (`e2e-scenarios-workflow.test.ts`) routes every typed id. Test pinning: * `e2e-scenario-matrix.test.ts` updated from a 1-entry to a 2-entry live matrix expectation. The new entry asserts on `expectedStateId: "post-reboot-recovery-ready"` so a future accidental dropped-lifecycle change to the scenario regresses loudly. * `e2e-live-registry-discovery.test.ts` swaps the synthetic whitelist-coverage test for an assertion against the real `ubuntu-repo-docker-post-reboot-recovery` registry entry. Behavior: * On unfixed `main`, the live runner's lifecycle phase stops the OpenShell gateway runtime and `docker stop`s the labeled sandbox container. State-validation then runs `nemoclaw <name> status` (which restarts the gateway via systemd) and the destructive `missing` branch in `src/lib/actions/sandbox/status.ts` wipes the local registry entry. The `local-registry-entry-present` probe fails. Scenario goes RED. * On the PR-A fix branch, the new Docker-driver sandbox recovery helper restarts the labeled container before stale-removal can fire, registry survives, all five probes pass. Scenario flips GREEN. The bash-side legacy compiler emits a `lifecycle.profile.post-reboot-recovery` PhaseAction pointing at `nemoclaw_scenarios/lifecycle/dispatch.sh`, but the legacy bash worker is intentionally not provided: this scenario is Vitest-only. The typed runner's `LifecyclePhaseFixture` handles dispatch directly. If the legacy runner is invoked against this scenario it errors out at the dispatcher; that's the right failure mode while the bash side stays on its own retirement clock. Refs #4423

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

copy-pr-bot · 2026-06-09T18:06:26Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-06-09T18:06:30Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9a464025-a2c9-4237-a8fb-6c57a53b4779

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

✅ Review completed - (🔄 Check again to review again)

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/e2e-fanout-01-inventory-internals

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-09T18:07:14Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

None. No live E2E is needed. The changes are limited to E2E migration documentation, a machine-readable deletion-readiness inventory, and a Vitest framework test for that inventory. They do not alter runtime/user flows or any code path that can affect install, onboarding, credentials, sandbox lifecycle, network policy, inference routing, deployment, or assistant behavior. The relevant validation is the normal e2e-scenario-framework test project, not a live E2E job.

Optional E2E

None.

New E2E recommendations

None.

github-actions · 2026-06-09T18:07:15Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

None. Changes are limited to scenario documentation, migration deletion-gate inventory data, and framework tests for that inventory. They do not alter dispatchable scenario routes, scenario catalog metadata, expected-state contracts, suite metadata/scripts, onboarding/install helpers, or shared scenario runtime behavior, so no scenario E2E job is required.

Optional scenario E2E

None.

Relevant changed files

test/e2e-scenario/docs/MIGRATION.md
test/e2e-scenario/docs/README.md
test/e2e-scenario/framework-tests/e2e-migration-inventory.test.ts
test/e2e-scenario/migration/legacy-inventory.json

github-actions · 2026-06-09T18:10:35Z

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 0 still apply, 0 new items found

Consider writing more tests for

**Acceptance clause:** Refs Adopt Vitest fixtures as the E2E scenario execution model #4941 — add test evidence or identify existing coverage. The deterministic context did not include issue Adopt Vitest fixtures as the E2E scenario execution model #4941 body or comments, so literal issue-clause mapping could not be completed. The diff is consistent with the documented Vitest/single-runner migration direction.
**Acceptance clause:** Refs Adopt phase fixtures + registry-driven test discovery for Vitest E2E scenarios #4990 — add test evidence or identify existing coverage. The deterministic context did not include issue Adopt phase fixtures + registry-driven test discovery for Vitest E2E scenarios #4990 body or comments, so literal issue-clause mapping could not be completed.
**Acceptance clause:** Refs Phase 11: Final Audit Reconciliation (E2E audit-coverage) #4357 — add test evidence or identify existing coverage. The deterministic context did not include issue Phase 11: Final Audit Reconciliation (E2E audit-coverage) #4357 body or comments, so literal issue-clause mapping could not be completed. The diff repeatedly gates deletion readiness on Phase 11: Final Audit Reconciliation (E2E audit-coverage) #4357 and adds internal-surface inventory coverage.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

Prek hook auto-fixed formatting in 6 files added/touched by this PR. No behavior change.

The biome-format commit accidentally added a node_modules symlink alongside the formatting fixes. Remove it; the directory is already in .gitignore.

…nventory-internals # Conflicts: # test/e2e-scenario/framework-tests/e2e-phase-lifecycle.test.ts # test/e2e-scenario/framework/phases/lifecycle.ts

copy-pr-bot · 2026-06-09T19:44:32Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

## Summary Adds the typed inference runtime helper surface for the Vitest E2E scenario runner. ## Related Issue Refs #4941 Refs #4990 Refs #4349 Depends on #5046, #5052, and the shared runtime-suite base stack. Stacked on branch `codex/e2e-fanout-01-inventory-internals`. ## Changes - Added `RuntimePhaseFixture` and the `runtime` Vitest fixture for inference runtime probes. - Added reusable helpers for sandbox-side `inference.local` models, chat completion, and HTTP status checks. - Added trusted-provider compatible endpoint helpers for models/chat probes while preserving shell-probe artifact capture and redaction. - Validate model-list responses for OpenAI-style `{ data: [...] }` and Ollama-style `{ models: [...] }` payloads so readiness helpers cannot pass on `{}` or error-only JSON. - Auto-redact sensitive custom header values and honor provider `curlMaxTimeSeconds` as `curl --max-time`. - Extended `ProviderClient` with a request-level JSON API that returns both parsed JSON and the captured `ShellProbeResult`. - Added framework tests for route normalization, argv construction, redaction values, provider-compatible requests, model-list validation, provider curl timeout propagation, and malformed response handling. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [ ] `npx prek run --all-files` passes - [ ] `npm test` passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `npm run docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) Verified locally: - `npx vitest run --project e2e-scenario-framework test/e2e-scenario/framework-tests/e2e-phase-runtime.test.ts test/e2e-scenario/framework-tests/e2e-clients.test.ts --silent=false --reporter=default` - `npx vitest run --project e2e-scenario-framework --silent=false --reporter=default` - `npm run typecheck:cli` - `npx prek run --files test/e2e-scenario/framework/clients/provider.ts test/e2e-scenario/framework/clients/index.ts test/e2e-scenario/framework/e2e-test.ts test/e2e-scenario/framework/phases/index.ts test/e2e-scenario/framework/phases/runtime.ts test/e2e-scenario/framework-tests/e2e-phase-runtime.test.ts --skip test-cli` - `git diff --check` CI/advisor evidence: - Required PR checks are green on the PR head. - PR review advisor: 0 needs attention, 0 worth checking, 0 nice ideas. - E2E recommendation advisor: no product E2E required. - E2E scenario advisor requested `e2e-scenarios-all`; dispatched run https://github.com/NVIDIA/NemoClaw/actions/runs/27241683412. The relevant `ubuntu-repo-cloud-openclaw` scenario passed. The all-run is red due to pre-existing scenario-runner coverage gaps outside this PR's helper surface, including generated scenarios whose onboarding profile ids are not yet implemented by `test/e2e-scenario/nemoclaw_scenarios/onboard/dispatch.sh` (for example `openai-compatible-openclaw`, `cloud-nvidia-openclaw-resume-after-interrupt`) and a Hermes-specific `runtime.hermes.history-writable` assertion that fails after onboarding/inference pass because it cannot determine shield state. Note: the full pre-commit hook's `test-cli` step still fails locally in `test/release-latest-tag.test.ts` because this machine's global Git config enables SSH commit signing but the private signing key is unavailable. The focused E2E framework suite and CLI typecheck pass. ---  Signed-off-by: Carlos Villela <cvillela@nvidia.com> --------- Signed-off-by: Carlos Villela <cvillela@nvidia.com> Co-authored-by: Julie Yaunches <jyaunches@nvidia.com>

jyaunches and others added 6 commits June 9, 2026 12:24

Merge branch 'main' into e2e-scenario-lifecycle-fixture-prereq

e276ef3

chore(e2e): scaffold inventory internals migration draft

34466dd

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv self-assigned this Jun 9, 2026

cv added area: e2e End-to-end tests, nightly failures, or validation infrastructure area: architecture Architecture, design debt, major refactors, or maintainability chore Build, CI, dependency, or tooling maintenance labels Jun 9, 2026

This was referenced Jun 9, 2026

test(e2e): migrate Hermes messaging scenarios #5065

Closed

test(e2e): migrate WhatsApp QR compact guard #5066

Closed

test(e2e): add security and policy runtime helpers #5067

Closed

jyaunches and others added 4 commits June 9, 2026 14:22

chore(e2e): apply biome formatting

06b84f3

Prek hook auto-fixed formatting in 6 files added/touched by this PR. No behavior change.

chore(e2e): drop accidental node_modules symlink

839b7bf

The biome-format commit accidentally added a node_modules symlink alongside the formatting fixes. Remove it; the directory is already in .gitignore.

test(e2e): extend migration inventory to runner internals

5ed1735

merge existing draft PR history

f3e6f16

cv mentioned this pull request Jun 9, 2026

Phase 11: Final Audit Reconciliation (E2E audit-coverage) #4357

Closed

15 tasks

cv added 2 commits June 9, 2026 11:34

merge updated lifecycle fixture prerequisite

a08d680

chore(e2e): apply static formatting

24b7e89

Base automatically changed from e2e-scenario-lifecycle-fixture-prereq to main June 9, 2026 18:42

Merge remote-tracking branch 'origin/main' into codex/e2e-fanout-01-i…

e268958

…nventory-internals # Conflicts: # test/e2e-scenario/framework-tests/e2e-phase-lifecycle.test.ts # test/e2e-scenario/framework/phases/lifecycle.ts

Merge branch 'main' into codex/e2e-fanout-01-inventory-internals

3c04849

cv marked this pull request as ready for review June 10, 2026 00:31

Merge branch 'main' into codex/e2e-fanout-01-inventory-internals

c722a7f

cv merged commit bf817c4 into main Jun 10, 2026
31 checks passed

cv deleted the codex/e2e-fanout-01-inventory-internals branch June 10, 2026 02:41

This was referenced Jun 10, 2026

Epic: Migrate legacy bash E2E into the Vitest E2E system #5098

Open

test(e2e): typed-shell-runner cutover (parity → retirement) #5106

Merged

This was referenced Jun 10, 2026

test(e2e): migrate test-openshell-version-pin.sh to free-standing Vitest live test #5107

Merged

test(onboard): add helper-level Vitest coverage for docker-unreachable gateway-start abort (#4355) #5109

Merged

coderabbitai Bot mentioned this pull request Jun 10, 2026

test(e2e): retire docker-unreachable gateway script #5119

Merged

12 tasks

cv added the v0.0.63 Release target label Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e): extend migration inventory to scenario runner internals#5052

test(e2e): extend migration inventory to scenario runner internals#5052
cv merged 15 commits into
mainfrom
codex/e2e-fanout-01-inventory-internals

cv commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented Jun 9, 2026

Uh oh!

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

Review skipped

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cv commented Jun 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Jun 9, 2026

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

Uh oh!

copy-pr-bot Bot commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cv commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading