Skip to content

test(e2e): add post-reboot-recovery regression guard for #4423#5046

Merged
cv merged 9 commits into
mainfrom
e2e-scenario-lifecycle-fixture-prereq
Jun 9, 2026
Merged

test(e2e): add post-reboot-recovery regression guard for #4423#5046
cv merged 9 commits into
mainfrom
e2e-scenario-lifecycle-fixture-prereq

Conversation

@jyaunches

@jyaunches jyaunches commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Failing-test-first regression scaffolding for #4423, plus the framework primitives that make post-reboot-style lifecycle scenarios expressible in the live Vitest registry. After this PR lands, the live matrix from #5006 fans out a dedicated ubuntu-repo-docker-post-reboot-recovery job that locks the host-side preservation invariants on every dispatch and is the running surface for PR-A's Docker-driver recovery helper to extend.

Related Issue

Refs #4423

What this PR adds

Framework primitives (commits 1–3)

  1. SUPPORTED_LIFECYCLES whitelist in scenarios/runtime-support.ts. Replaces the unconditional reject for environment.lifecycle with a lock-step whitelist, so lifecycle scenarios can graduate one profile at a time. Initial whitelist: post-reboot-recovery.

  2. Two host-side state-validation probes:

    New ExpectedState.localRegistry and ExpectedState.dockerSandboxContainer dimensions; probesForState emits the probes only for expected: "present". StateValidationPhaseFixture.from() now accepts an inline ExpectedState so unit tests don't need to register synthetic states. Optional ProbeIO injection for the registry reader.

  3. LifecyclePhaseFixture in framework/phases/lifecycle.ts. Dispatches simulate(profile, instance, opts). Today the only profile is post-reboot-recovery, with two modes:

    Both modes register reverse-order cleanups so teardown leaves Docker usable. Wired into E2EScenarioFixtures and into live/registry-scenarios.test.ts between onboard.from() and stateValidation.from() whenever the scenario declares environment.lifecycle.

The regression scaffold (commits 4–8)

  1. ubuntu-repo-docker-post-reboot-recovery scenario + post-reboot-recovery-ready expected-state + manifest YAML. The expected-state is intentionally scoped to the host-side preservation invariants this scenario can faithfully exercise from ubuntu-latest:

    • cli installed,
    • localRegistry entry preserved (the user-visible regression target),
    • dockerSandboxContainer (labeled, including *-nemoclaw-gpu-backup-* siblings) present.

    .github/workflows/e2e-scenarios.yaml ROUTES table gains the new id so e2e-scenarios-workflow.test.ts keeps every typed id routed.

Honest scope

While iterating end-to-end against ubuntu-latest I confirmed the full #4423 bug class cannot be reproduced from CI without a real reboot:

  • docker stop of the labeled sandbox container puts OpenShell into Phase: Provisioning, not literal NotFound — so the destructive missing branch in src/lib/actions/sandbox/status.ts:308 does not fire on unfixed code.
  • The OpenShell CLI on ubuntu-latest does not expose a gateway start subcommand (the user-systemd autostart from fix(onboard): use OpenShell gateway user service #4580 is the real autostart path), so we can't cleanly reset the gateway to a fresh-state-with-no-sandbox-memory configuration that yields true NotFound.

The expected-state therefore locks the host-side invariants (localRegistry, dockerSandboxContainer) rather than asserting on gateway/sandbox runtime liveness, which is environmental on ubuntu-latest after docker stop and would mask the real signal. This makes the scenario:

  • A lock-in guard for #4578's mitigation (passive status does not destroy registry on NotFound).
  • The running surface PR-A's Docker-driver recovery helper extends. PR-A's logic changes will exercise this scenario; if PR-A regresses either host-side invariant, this scenario goes RED.
  • A base for follow-up scenarios on more controlled runners (real DGX Spark / Brev) that can add gateway/sandbox runtime probes against a true post-reboot environment.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes — 142 tests across the directly affected files all green; full e2e-scenario framework suite passes other than the pre-existing shell probe reaps timed-out command process groups load-flake in e2e-fixture-context.test.ts (unchanged on main).
  • --emit-live-matrix now emits both ubuntu-repo-cloud-openclaw and ubuntu-repo-docker-post-reboot-recovery as supported live scenarios; the workflow fan-out from ci(e2e): fan out Vitest scenarios from registry #5006 dispatches each as its own CI job.
  • Live scenario dispatch on ubuntu-latest exercises the full pipeline: onboard → lifecycle (docker stop + nemoclaw status) → host-side probes → cleanup.
  • Tests added or updated for new or changed behavior.
  • No secrets, API keys, or credentials committed.
  • Docs updated for user-facing behavior changes.
  • npm run docs builds without warnings (doc changes only).

Boundaries kept honest

  • No imports from src/lib/** into framework/**. The two duplicated constants (OPENSHELL_SANDBOX_NAME_LABEL, ~/.nemoclaw/sandboxes.json path) carry comments pointing back to the CLI source-of-truth.
  • LifecyclePhaseFixture profile dispatch is exhaustive (never-checked). New profiles must add the dispatcher branch and the SUPPORTED_LIFECYCLES whitelist together.
  • Negative-direction host probes (expected: "absent" for localRegistry / dockerSandboxContainer) deliberately emit nothing today, with a probesForState test pinning the gap.
  • The legacy bash dispatcher (nemoclaw_scenarios/lifecycle/dispatch.sh) is intentionally not updated for post-reboot-recovery. This scenario is Vitest-only; the typed runner's LifecyclePhaseFixture handles dispatch directly.

Follow-ups (post PR-A)

  • Once PR-A's Docker-driver recovery helper is in place, extend the post-reboot expected-state with gateway/sandbox runtime probes on a controlled runner.
  • Consider a framework/phases/lifecycle-postreboot-spark.ts profile for hardware-runner dispatch that can do a real reboot and exercise the full bug class.

jyaunches added 3 commits June 9, 2026 12:24
`liveScenarioSupport` previously rejected any scenario that declared an
`environment.lifecycle`, so post-onboard host mutations (reboot, rebuild,
upgrade, drift) could not surface in the live Vitest matrix at all.

Replace the unconditional reject with a `SUPPORTED_LIFECYCLES` whitelist
that starts with the single profile the upcoming post-reboot-recovery
fixture dispatches: `post-reboot-recovery`. Future profiles must land the
dispatcher branch and an expected-state in the same change set, so the
whitelist stays in lockstep with what the runner can actually execute.

Prepares the runner for #4423's failing-test-first guard, which needs a
post-reboot lifecycle scenario to demonstrate registry preservation +
Docker-backed sandbox recovery on Linux/Spark Docker-driver hosts.

Refs #4423
Adds two host-side state-validation probes the live runner needs to
express the regression target tracked by #4423:

  * `local-registry-entry-present` reads `~/.nemoclaw/sandboxes.json`
    and asserts the scenario's sandbox name is still recorded. This is
    deliberately orthogonal to `sandbox.expected`: post-reboot bugs
    can wipe the local registry while the live OpenShell gateway is
    healthy, and only a host-side probe catches the data-loss
    regression.

  * `docker-sandbox-container-present` runs
    `docker ps -a --filter label=openshell.ai/sandbox-name=<name>` and
    accepts running, stopped, or `*-nemoclaw-gpu-backup-*` sibling
    containers. The label filter mirrors `OPENSHELL_SANDBOX_NAME_LABEL`
    used by `findOpenShellDockerSandboxContainerIds` in
    `src/lib/onboard/docker-gpu-patch.ts`, so the probe stays in lock-
    step with how OpenShell labels containers today.

Probe wiring:

  * `StateProbeId` extended with the two new probe ids.
  * `ExpectedState` gains `localRegistry` and `dockerSandboxContainer`
    optional dimensions; `probesForState` emits the new probes only
    for `expected: "present"`. Negative-direction probes are
    intentionally omitted today and pinned by a probesForState test.
  * `StateValidationPhaseFixture.from()` now accepts either an
    expected-state ID or an inline `ExpectedState`, so unit tests can
    drive new probes without registering synthetic states in the
    typed registry. The live runner still calls `from(id, instance)`.
  * Fixture takes an optional `ProbeIO` injection so tests can stub
    the registry reader without touching `~/.nemoclaw`.

No callers of the existing typed registry are affected: every shipped
expected-state leaves `localRegistry` and `dockerSandboxContainer`
unset, so `probesForState` returns the same probe lists as before.

Refs #4423
Adds a Vitest phase fixture that mutates host state between onboarding
and state-validation, so live scenarios can express post-onboard
invariants the legacy bash runner has no equivalent for.

`LifecyclePhaseFixture.simulate("post-reboot-recovery", instance, opts)`
reproduces the host-side conditions of a DGX Spark / Linux Docker-driver
reboot in two modes:

  * `stop-original` (default)   — `openshell gateway stop` + `docker
                                   stop` of the labeled sandbox
                                   container. Models the common reboot
                                   outcome where OpenShell forgets the
                                   sandbox while Docker keeps the
                                   container exited but labeled.

  * `rename-to-gpu-backup`      — additionally `docker rename`s the
                                   container to a `*-nemoclaw-gpu-
                                   backup-<ts>` sibling, mirroring the
                                   GPU-patch reboot path in
                                   `src/lib/onboard/docker-gpu-patch.ts`.

Both modes register cleanups (in reverse order) to restore the
container so test teardown leaves Docker in a usable state.

Wiring:

  * `framework/phases/index.ts` re-exports the fixture and types.
  * `framework/e2e-test.ts` registers a `lifecycle` Vitest fixture on
    `E2EScenarioFixtures`, wired with the shared `host`, `sandbox`,
    and `cleanup` registries.
  * `live/registry-scenarios.test.ts` invokes
    `lifecycle.simulate(profile, instance)` between `onboard.from(...)`
    and `stateValidation.from(...)` whenever the scenario declares a
    whitelisted `environment.lifecycle`. Scenarios that omit lifecycle
    are unaffected. A scenario whose lifecycle is whitelisted by
    `runtime-support.ts` but NOT dispatched by the fixture fails fast
    with a clear error so the whitelist and dispatcher stay in lock-
    step.

Coverage in `e2e-phase-lifecycle.test.ts` exercises both modes,
gateway-stop tolerance, the no-labeled-container failure case, the
docker-discover failure case, the unsupported-profile rejection,
the cleanup queue order, and `buildBackupContainerName` truncation.

The fixture is intentionally narrow on profiles: only
`post-reboot-recovery` is dispatched today. Adding rebuild, upgrade,
or drift profiles is a separate, equally narrow change set that must
land the dispatcher branch and `SUPPORTED_LIFECYCLES` whitelist
together.

Refs #4423
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6d8dcdd7-88f4-4096-b002-32f9c186dad5

📥 Commits

Reviewing files that changed from the base of the PR and between 7d78354 and 034cc8d.

📒 Files selected for processing (1)
  • src/commands/sandbox/agents/list.ts
✅ Files skipped from review due to trivial changes (1)
  • src/commands/sandbox/agents/list.ts

📝 Walkthrough

Walkthrough

Adds a LifecyclePhaseFixture for post-reboot recovery, host-observable expected-state probes (registry and Docker container), fixture wiring and allowlist-based lifecycle support, scenario/manifest/matrix/workflow routing for a new scenario, and tests for lifecycle flows and probe validation.

Changes

Lifecycle Phase Fixture and Host-Side Probe Support

Layer / File(s) Summary
Type schema for lifecycle and host-side expectations
test/e2e-scenario/scenarios/types.ts
StateProbeId extended with local-registry-entry-present and docker-sandbox-container-present; ExpectedState gains localRegistry? and dockerSandboxContainer?.
LifecyclePhaseFixture implementation
test/e2e-scenario/framework/phases/lifecycle.ts
Adds LifecyclePhaseFixture with simulate() dispatcher and simulatePostReboot() orchestration: gateway stop, Docker discovery/stop, optional rename-to-gpu-backup, cleanup registration; includes buildBackupContainerName() and discovery helpers.
State validation phase host-side probe support
test/e2e-scenario/framework/phases/state-validation.ts
StateValidationPhaseFixture accepts optional ProbeIO; from() accepts `string
Fixture exports and test context wiring
test/e2e-scenario/framework/phases/index.ts, test/e2e-scenario/framework/e2e-test.ts
Re-exports lifecycle fixture/types; E2EScenarioFixtures adds lifecycle and base.extend wires LifecyclePhaseFixture using host, sandbox, and cleanup.
Lifecycle profile allowlist and probe mapping
test/e2e-scenario/scenarios/runtime-support.ts, test/e2e-scenario/scenarios/expected-states.ts
Adds SUPPORTED_LIFECYCLES (includes post-reboot-recovery) and updates liveScenarioSupport() gating; registers post-reboot-recovery-ready expected state and maps host-side probes when present.
LifecyclePhaseFixture behavior and contract tests
test/e2e-scenario/framework-tests/e2e-phase-lifecycle.test.ts
Vitest tests with fakes validate stop-original and rename-to-gpu-backup flows, cleanup registration, profile dispatch errors, and buildBackupContainerName() truncation.
Host-side probe implementation and tests
test/e2e-scenario/framework-tests/e2e-phase-state-validation.test.ts
Fixture helper accepts io override; tests validate registry presence probe and docker-labeled-container probe (including gpu-backup siblings and failure cases).
StateProbeId mapping validation tests
test/e2e-scenario/framework-tests/e2e-expected-state.test.ts
Adds expectations that localRegistry.expected=present emits local-registry-entry-present, dockerSandboxContainer.expected=present emits docker-sandbox-container-present, and both absent currently emit no probes.
Lifecycle execution in live scenario runs
test/e2e-scenario/live/registry-scenarios.test.ts
Adds lifecycle profile typing and type-guard, conditionally calls lifecycle.simulate() before state validation, and records optional lifecycle result in scenario-result.json.
Scenario manifests, canonical inputs, matrix and workflow routing
test/e2e-scenario/manifests/openclaw-nvidia-post-reboot-recovery.yaml, test/e2e-scenario/scenarios/scenarios/baseline.ts, .github/workflows/e2e-scenarios.yaml, test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
Adds new manifest and canonical scenario input for post-reboot recovery, updates workflow ROUTES mapping, and extends live matrix tests to include ubuntu-repo-docker-post-reboot-recovery.
Lifecycle profile support classification tests
test/e2e-scenario/framework-tests/e2e-live-registry-discovery.test.ts
Tests liveScenarioSupport() gating: non-whitelisted lifecycle profiles rejected with reasons; whitelisted post-reboot-recovery accepted.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#5004: Both PRs are directly related: #5004 introduces the StateValidationPhaseFixture/state-validation probe pipeline, and this PR extends that fixture with new host-side probe types and ProbeIO support.
  • NVIDIA/NemoClaw#5005: Related changes to live scenario gating and tests that touch liveScenarioSupport() and registry discovery tests.
  • NVIDIA/NemoClaw#5006: Related live Vitest scenario/matrix framework updates that this PR expands with a post-reboot-recovery scenario.

Suggested labels

E2E

Suggested reviewers

  • cv

Poem

A lifecycle awakens post-reboot's haze,
Gateways pause while Docker shifts their ways.
Registry keeps a kindly little trace,
Backup names bloom, a timestamped embrace.
Rabbits cheer the tests that keep things right — hop hop! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a post-reboot-recovery regression guard test for issue #4423, which aligns with the PR's primary objective of introducing framework primitives and a failing-test-first regression guard.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch e2e-scenario-lifecycle-fixture-prereq

Warning

Review ran into problems

🔥 Problems

Stopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a @coderabbit review after the pipeline has finished.


Comment @coderabbitai help to get the list of available commands and usage tips.

@jyaunches jyaunches added the v0.0.62 Release target label Jun 9, 2026
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: e2e-vitest-scenarios:ubuntu-repo-docker-post-reboot-recovery, e2e-scenarios:ubuntu-repo-docker-post-reboot-recovery
Optional E2E: e2e-vitest-scenarios:ubuntu-repo-cloud-openclaw, e2e-scenarios-all:matrix-generation

Dispatch hint: scenarios=ubuntu-repo-docker-post-reboot-recovery

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • e2e-vitest-scenarios:ubuntu-repo-docker-post-reboot-recovery (medium-high; live Docker onboarding and recovery flow on ubuntu-latest): Primary required validation for this PR: exercises the newly wired live Vitest lifecycle scenario, including onboarding, simulated post-reboot recovery, nemoclaw <name> status, and the new local-registry/Docker-container state-validation probes. Requires NVIDIA_API_KEY.
  • e2e-scenarios:ubuntu-repo-docker-post-reboot-recovery (medium-high; live Docker onboarding and scenario-runner lifecycle flow on ubuntu-latest): Required because .github/workflows/e2e-scenarios.yaml was changed to route this canonical scenario id. Running the workflow-dispatched scenario validates the new route, typed scenario compilation, lifecycle phase dispatch, and state-validation dispatch path used by the scenario runner. This is especially important because the PR adds typed/live Vitest support for new probes and lifecycle behavior while the shell-dispatched scenario runner must also handle the id it now accepts.

Optional E2E

  • e2e-vitest-scenarios:ubuntu-repo-cloud-openclaw (medium; live Docker onboarding on ubuntu-latest): Useful regression check for the existing baseline live Vitest scenario after adding the lifecycle fixture to the shared test context and modifying state-validation fixture internals. It confirms scenarios without a lifecycle profile still onboard and validate normally.
  • e2e-scenarios-all:matrix-generation (low; matrix generation only): Optional broad confidence for registry/workflow matrix changes: the new scenario should appear in the all-scenarios matrix with the expected ubuntu-latest runner, without requiring every matrix tile to run.

New E2E recommendations

  • workflow-dispatched scenario runner parity (high): The PR adds ubuntu-repo-docker-post-reboot-recovery to .github/workflows/e2e-scenarios.yaml, but the existing shell lifecycle dispatcher under test/e2e-scenario/nemoclaw_scenarios/lifecycle/dispatch.sh only shows rebuild-current-version, and the shell probe dispatcher has no evident workers for local-registry-entry-present or docker-sandbox-container-present. Add parity coverage/implementation for the workflow-dispatched runner, or keep the new scenario limited to the live Vitest workflow until that path is supported.
    • Suggested test: Add scenario-runner shell lifecycle/probe support and an E2E dispatch check for ubuntu-repo-docker-post-reboot-recovery through .github/workflows/e2e-scenarios.yaml.
  • GPU backup-container recovery variant (medium): The new lifecycle fixture unit-tests rename-to-gpu-backup, but the live scenario defaults to stop-original. If the GPU patch backup-sibling path is a supported recovery mode, it needs live E2E coverage because Docker labels/names and recovery behavior can differ from mocked fixture tests.
    • Suggested test: Add a live post-reboot recovery scenario variant that exercises the rename-to-gpu-backup mode and validates registry/container preservation after recovery.

Dispatch hint

  • Workflow: .github/workflows/e2e-vitest-scenarios.yaml
  • jobs input: scenarios=ubuntu-repo-docker-post-reboot-recovery

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: e2e-scenarios-all
Optional scenario E2E: None

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • e2e-scenarios-all: Run the full scenario fan-out because this PR changes the scenario workflow routing table and shared scenario framework/runtime, live runner lifecycle support, typed scenario catalog, expected-state/probe contracts, and scenario manifests. These are global scenario E2E surfaces and require all dispatchable scenarios to validate routing, catalog compatibility, and runner behavior.
    • Dispatch: gh workflow run e2e-scenarios-all.yaml --ref <pr-head-ref>

Optional scenario E2E

  • None.

Relevant changed files

  • .github/workflows/e2e-scenarios.yaml
  • test/e2e-scenario/framework-tests/e2e-expected-state.test.ts
  • test/e2e-scenario/framework-tests/e2e-live-registry-discovery.test.ts
  • test/e2e-scenario/framework-tests/e2e-phase-lifecycle.test.ts
  • test/e2e-scenario/framework-tests/e2e-phase-state-validation.test.ts
  • test/e2e-scenario/framework-tests/e2e-scenario-matrix.test.ts
  • test/e2e-scenario/framework/e2e-test.ts
  • test/e2e-scenario/framework/phases/index.ts
  • test/e2e-scenario/framework/phases/lifecycle.ts
  • test/e2e-scenario/framework/phases/state-validation.ts
  • test/e2e-scenario/live/registry-scenarios.test.ts
  • test/e2e-scenario/manifests/openclaw-nvidia-post-reboot-recovery.yaml
  • test/e2e-scenario/scenarios/expected-states.ts
  • test/e2e-scenario/scenarios/runtime-support.ts
  • test/e2e-scenario/scenarios/scenarios/baseline.ts
  • test/e2e-scenario/scenarios/types.ts

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 2 needs attention, 10 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 6 still apply, 0 new items found

Review findings

🛠️ Needs attention

  • Compiled workflow route is unwired for post-reboot recovery (.github/workflows/e2e-scenarios.yaml:81): Adding ubuntu-repo-docker-post-reboot-recovery to e2e-scenarios.yaml makes it executable through the compiled scenario workflow, which invokes test/e2e-scenario/scenarios/run.ts. For this scenario, compileRunPlans emits shell-dispatched state-validation probes and a lifecycle profile that do not exist in the bash dispatcher path, so the workflow can fail before exercising the intended Vitest-only regression guard.
    • Recommendation: Either keep ubuntu-repo-docker-post-reboot-recovery out of .github/workflows/e2e-scenarios.yaml and route it only through the live Vitest scenario workflow, or fully implement the compiled shell path by adding post-reboot-recovery to nemoclaw_scenarios/lifecycle/dispatch.sh plus local-registry-entry-present.sh and docker-sandbox-container-present.sh probe scripts. Add a compiled-runner contract test proving every emitted action for this scenario is dispatchable.
    • Evidence: .github/workflows/e2e-scenarios.yaml adds [ubuntu-repo-docker-post-reboot-recovery]=ubuntu-latest and later runs npx tsx test/e2e-scenario/scenarios/run.ts --scenarios. probesForState now emits local-registry-entry-present and docker-sandbox-container-present, and phaseActions emits lifecycle.profile.post-reboot-recovery. test/e2e-scenario/nemoclaw_scenarios/lifecycle/dispatch.sh only handles rebuild-current-version, and test/e2e-scenario/nemoclaw_scenarios/probes/ contains only cli-installed, gateway-*, and sandbox-* scripts.
  • Compiled plans validate state before the lifecycle mutation (test/e2e-scenario/framework-tests/e2e-expected-state.test.ts:129): The live Vitest runner correctly runs lifecycle.simulate() after onboarding and before stateValidation.from(), but the compiled runner still orders state-validation before lifecycle and the updated test continues to pin that order. If the new scenario remains routed through e2e-scenarios.yaml, the compiled path checks pre-mutation state and only then runs the post-reboot mutation/status action, so it cannot prove the post-reboot preservation invariant.
    • Recommendation: Make lifecycle scenarios compile as environment -> onboarding -> lifecycle -> state-validation -> runtime, or explicitly exclude this Vitest-only scenario from the compiled workflow. Update the phase-order test and comments so live and compiled runners share one lifecycle ordering contract.
    • Evidence: The test asserts phase order is environment -> onboarding -> state-validation -> lifecycle -> runtime. Nearby compiler.ts still defines PHASES in that order. In contrast, test/e2e-scenario/live/registry-scenarios.test.ts invokes lifecycle.simulate(profile, instance) before stateValidation.from(scenario.expectedStateId, instance).

🔎 Worth checking

  • Source-of-truth review needed: Workflow route for ubuntu-repo-docker-post-reboot-recovery: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: .github/workflows/e2e-scenarios.yaml routes the id to run.ts; lifecycle/dispatch.sh and probes/ do not implement the emitted post-reboot-recovery, local-registry-entry-present, or docker-sandbox-container-present actions.
  • Source-of-truth review needed: Live runner lifecycle ordering versus compiled run plan: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: e2e-expected-state.test.ts asserts environment -> onboarding -> state-validation -> lifecycle -> runtime, while live/registry-scenarios.test.ts runs lifecycle before stateValidation.
  • Source-of-truth review needed: Docker container discovery for lifecycle and state-validation probes: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: Lifecycle and state-validation use only --filter label=openshell.ai/sandbox-name=<name>. Production docker-gpu-patch.ts filters on openshell.ai/managed-by=openshell plus sandbox-name.
  • Source-of-truth review needed: Registry reader tolerant parsing: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: defaultReadRegistry() returns null only when the file does not exist; catch returns { entries: {} }.
  • Source-of-truth review needed: Duplicated framework constants for labels, registry path, and backup-name behavior: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: lifecycle.ts and state-validation.ts duplicate OPENSHELL_SANDBOX_NAME_LABEL, ~/.nemoclaw/sandboxes.json, and Docker backup name behavior; the fixture omits the production managed-by filter.
  • Source-of-truth review needed: Lifecycle whitelist and expected-state invariants: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: SUPPORTED_LIFECYCLES includes post-reboot-recovery; liveScenarioSupport() only checks the whitelist and expectedStateId presence.
  • Docker lifecycle discovery trusts only a spoofable sandbox-name label (test/e2e-scenario/framework/phases/lifecycle.ts:170): The lifecycle fixture discovers a Docker container with only openshell.ai/sandbox-name=<name> and then stops or renames the first returned container. The state-validation Docker probe uses the same one-label boundary. On a shared or dirty Docker host, a spoofed container with the predictable sandbox-name label could make validation pass or be mutated by the lifecycle phase. The new fixture also does not validate instance.sandboxName before using it in Docker label filters and artifact names.
    • Recommendation: Mirror the production Docker provenance boundary by requiring both openshell.ai/managed-by=openshell and openshell.ai/sandbox-name=<name>, validate sandbox names at lifecycle and state-validation fixture boundaries, and reject ambiguous multiple matches. Add negative tests for spoofed labels, malformed names, and multiple labeled containers.
    • Evidence: LifecyclePhaseFixture.discoverLabeledContainerNames() and StateValidationPhaseFixture.expectDockerSandboxContainerPresent() call docker ps -a --filter label=openshell.ai/sandbox-name=... --format {{.Names}}. Production src/lib/onboard/docker-gpu-patch.ts defines OPENSHELL_MANAGED_BY_LABEL/OPENSHELL_MANAGED_BY_VALUE and filters on managed-by plus sandbox-name.
  • Malformed registry state is silently treated like an empty registry (test/e2e-scenario/framework/phases/state-validation.ts:45): The local registry probe distinguishes a missing registry file from an existing file, but any read or JSON parse error is caught and converted to { entries: {} }. That makes a corrupt or unreadable ~/.nemoclaw/sandboxes.json look like a normal missing-entry condition, weakening the evidence for the registry-preservation regression this scenario is intended to guard.
    • Recommendation: Report malformed JSON and read errors distinctly from a missing registry file or a missing sandbox entry. Add tests for invalid JSON and registry read failures.
    • Evidence: defaultReadRegistry() returns null when the registry file does not exist, but its catch block returns { entries: {} }. Tests cover missing file and missing sandbox entry, but not invalid JSON or read errors.
  • Lifecycle support can be whitelisted without post-lifecycle invariants (test/e2e-scenario/scenarios/runtime-support.ts:15): The runtime-support comment says a lifecycle profile is supported only after both the dispatcher and an expected state with post-lifecycle host invariants land together. The implementation only checks that the lifecycle profile is whitelisted and that some expectedStateId exists. The new canonical scenario uses the intended expected state, but future post-reboot scenarios could be classified supported while referencing an expected state that lacks localRegistry.present and dockerSandboxContainer.present.
    • Recommendation: Enforce that post-reboot-recovery scenarios reference an expected state with localRegistry.expected: "present" and dockerSandboxContainer.expected: "present", or weaken the comment to match the implementation. Add a negative test for a post-reboot scenario whose expected state lacks those invariants.
    • Evidence: SUPPORTED_LIFECYCLES includes post-reboot-recovery, but liveScenarioSupport() does not inspect the referenced ExpectedState. post-reboot-recovery-ready contains the invariants, but a scenario using cloud-openclaw-ready would still pass the current support classifier.
  • Duplicated product constants lack a drift guard (test/e2e-scenario/framework/phases/lifecycle.ts:10): The fixture layer intentionally duplicates product-owned Docker labels, registry path, and backup-name behavior to avoid importing src/lib code. The comments identify the sources of truth, but there is no lightweight parity or behavioral contract guard for label/path/name drift, and the duplicated Docker boundary omits the production managed-by provenance label.
    • Recommendation: Add parity or behavioral contract tests where feasible, or make the follow-up dependency explicit in code/tests so drift in product labels, registry path, backup-name behavior, and Docker provenance cannot silently invalidate the probes.
    • Evidence: lifecycle.ts mirrors OPENSHELL_SANDBOX_NAME_LABEL and MAX_DOCKER_CONTAINER_NAME_LENGTH/buildBackupContainerName from src/lib/onboard/docker-gpu-patch.ts. state-validation.ts mirrors ~/.nemoclaw/sandboxes.json from src/lib/state/registry.ts. Production Docker discovery also uses openshell.ai/managed-by=openshell, which the fixture does not mirror.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — compiled ubuntu-repo-docker-post-reboot-recovery has dispatchable lifecycle and state-validation shell actions, or is excluded from e2e-scenarios.yaml. Unit coverage is substantial for the live fixture path, but the PR changes workflow routing, host Docker mutation, gateway lifecycle simulation, and post-reboot recovery validation. The compiled workflow path and host-level Docker provenance boundaries need targeted behavioral coverage before the new route can be trusted.
  • **Runtime validation** — compiled lifecycle scenarios order onboarding -> lifecycle -> state-validation -> runtime. Unit coverage is substantial for the live fixture path, but the PR changes workflow routing, host Docker mutation, gateway lifecycle simulation, and post-reboot recovery validation. The compiled workflow path and host-level Docker provenance boundaries need targeted behavioral coverage before the new route can be trusted.
  • **Runtime validation** — liveScenarioSupport rejects post-reboot-recovery when expectedStateId lacks localRegistry.present and dockerSandboxContainer.present. Unit coverage is substantial for the live fixture path, but the PR changes workflow routing, host Docker mutation, gateway lifecycle simulation, and post-reboot recovery validation. The compiled workflow path and host-level Docker provenance boundaries need targeted behavioral coverage before the new route can be trusted.
  • **Runtime validation** — LifecyclePhaseFixture rejects malformed sandbox names before building Docker label filters or artifact names. Unit coverage is substantial for the live fixture path, but the PR changes workflow routing, host Docker mutation, gateway lifecycle simulation, and post-reboot recovery validation. The compiled workflow path and host-level Docker provenance boundaries need targeted behavioral coverage before the new route can be trusted.
  • **Runtime validation** — LifecyclePhaseFixture ignores or rejects containers missing openshell.ai/managed-by=openshell. Unit coverage is substantial for the live fixture path, but the PR changes workflow routing, host Docker mutation, gateway lifecycle simulation, and post-reboot recovery validation. The compiled workflow path and host-level Docker provenance boundaries need targeted behavioral coverage before the new route can be trusted.
  • **Lifecycle support can be whitelisted without post-lifecycle invariants** — Enforce that post-reboot-recovery scenarios reference an expected state with localRegistry.expected: "present" and dockerSandboxContainer.expected: "present", or weaken the comment to match the implementation. Add a negative test for a post-reboot scenario whose expected state lacks those invariants.
  • **Acceptance clause:** Refs [DGX Spark][Sandbox] post-reboot first 'nemoclaw <name> status' destroys sandbox via "Removed stale local registry entry" #4423 — add test evidence or identify existing coverage. No trusted linked issue text or issue comments were provided in the deterministic context, so the actual [DGX Spark][Sandbox] post-reboot first 'nemoclaw <name> status' destroys sandbox via "Removed stale local registry entry" #4423 acceptance clauses could not be mapped literally.
  • **Acceptance clause:** 2. **Two host-side state-validation probes**: - `local-registry-entry-present` — reads `~/.nemoclaw/sandboxes.json` and asserts the scenario's sandbox name is recorded. - `docker-sandbox-container-present` — runs `docker ps -a --filter label=openshell.ai/sandbox-name=…` and accepts running, stopped, and `*-nemoclaw-gpu-backup-*` siblings. — add test evidence or identify existing coverage. The live StateValidationPhaseFixture implements these probes and unit tests cover present, missing, and docker-query-failure cases. However, probesForState also makes the compiled runner emit state-validation.local-registry-entry-present and state-validation.docker-sandbox-container-present, and the corresponding shell probe scripts are absent.
Since last review details

Current findings:

  • Source-of-truth review needed: Workflow route for ubuntu-repo-docker-post-reboot-recovery: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: .github/workflows/e2e-scenarios.yaml routes the id to run.ts; lifecycle/dispatch.sh and probes/ do not implement the emitted post-reboot-recovery, local-registry-entry-present, or docker-sandbox-container-present actions.
  • Source-of-truth review needed: Live runner lifecycle ordering versus compiled run plan: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: e2e-expected-state.test.ts asserts environment -> onboarding -> state-validation -> lifecycle -> runtime, while live/registry-scenarios.test.ts runs lifecycle before stateValidation.
  • Source-of-truth review needed: Docker container discovery for lifecycle and state-validation probes: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: Lifecycle and state-validation use only --filter label=openshell.ai/sandbox-name=<name>. Production docker-gpu-patch.ts filters on openshell.ai/managed-by=openshell plus sandbox-name.
  • Source-of-truth review needed: Registry reader tolerant parsing: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: defaultReadRegistry() returns null only when the file does not exist; catch returns { entries: {} }.
  • Source-of-truth review needed: Duplicated framework constants for labels, registry path, and backup-name behavior: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: lifecycle.ts and state-validation.ts duplicate OPENSHELL_SANDBOX_NAME_LABEL, ~/.nemoclaw/sandboxes.json, and Docker backup name behavior; the fixture omits the production managed-by filter.
  • Source-of-truth review needed: Lifecycle whitelist and expected-state invariants: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: SUPPORTED_LIFECYCLES includes post-reboot-recovery; liveScenarioSupport() only checks the whitelist and expectedStateId presence.
  • Compiled workflow route is unwired for post-reboot recovery (.github/workflows/e2e-scenarios.yaml:81): Adding ubuntu-repo-docker-post-reboot-recovery to e2e-scenarios.yaml makes it executable through the compiled scenario workflow, which invokes test/e2e-scenario/scenarios/run.ts. For this scenario, compileRunPlans emits shell-dispatched state-validation probes and a lifecycle profile that do not exist in the bash dispatcher path, so the workflow can fail before exercising the intended Vitest-only regression guard.
    • Recommendation: Either keep ubuntu-repo-docker-post-reboot-recovery out of .github/workflows/e2e-scenarios.yaml and route it only through the live Vitest scenario workflow, or fully implement the compiled shell path by adding post-reboot-recovery to nemoclaw_scenarios/lifecycle/dispatch.sh plus local-registry-entry-present.sh and docker-sandbox-container-present.sh probe scripts. Add a compiled-runner contract test proving every emitted action for this scenario is dispatchable.
    • Evidence: .github/workflows/e2e-scenarios.yaml adds [ubuntu-repo-docker-post-reboot-recovery]=ubuntu-latest and later runs npx tsx test/e2e-scenario/scenarios/run.ts --scenarios. probesForState now emits local-registry-entry-present and docker-sandbox-container-present, and phaseActions emits lifecycle.profile.post-reboot-recovery. test/e2e-scenario/nemoclaw_scenarios/lifecycle/dispatch.sh only handles rebuild-current-version, and test/e2e-scenario/nemoclaw_scenarios/probes/ contains only cli-installed, gateway-*, and sandbox-* scripts.
  • Compiled plans validate state before the lifecycle mutation (test/e2e-scenario/framework-tests/e2e-expected-state.test.ts:129): The live Vitest runner correctly runs lifecycle.simulate() after onboarding and before stateValidation.from(), but the compiled runner still orders state-validation before lifecycle and the updated test continues to pin that order. If the new scenario remains routed through e2e-scenarios.yaml, the compiled path checks pre-mutation state and only then runs the post-reboot mutation/status action, so it cannot prove the post-reboot preservation invariant.
    • Recommendation: Make lifecycle scenarios compile as environment -> onboarding -> lifecycle -> state-validation -> runtime, or explicitly exclude this Vitest-only scenario from the compiled workflow. Update the phase-order test and comments so live and compiled runners share one lifecycle ordering contract.
    • Evidence: The test asserts phase order is environment -> onboarding -> state-validation -> lifecycle -> runtime. Nearby compiler.ts still defines PHASES in that order. In contrast, test/e2e-scenario/live/registry-scenarios.test.ts invokes lifecycle.simulate(profile, instance) before stateValidation.from(scenario.expectedStateId, instance).
  • Docker lifecycle discovery trusts only a spoofable sandbox-name label (test/e2e-scenario/framework/phases/lifecycle.ts:170): The lifecycle fixture discovers a Docker container with only openshell.ai/sandbox-name=<name> and then stops or renames the first returned container. The state-validation Docker probe uses the same one-label boundary. On a shared or dirty Docker host, a spoofed container with the predictable sandbox-name label could make validation pass or be mutated by the lifecycle phase. The new fixture also does not validate instance.sandboxName before using it in Docker label filters and artifact names.
    • Recommendation: Mirror the production Docker provenance boundary by requiring both openshell.ai/managed-by=openshell and openshell.ai/sandbox-name=<name>, validate sandbox names at lifecycle and state-validation fixture boundaries, and reject ambiguous multiple matches. Add negative tests for spoofed labels, malformed names, and multiple labeled containers.
    • Evidence: LifecyclePhaseFixture.discoverLabeledContainerNames() and StateValidationPhaseFixture.expectDockerSandboxContainerPresent() call docker ps -a --filter label=openshell.ai/sandbox-name=... --format {{.Names}}. Production src/lib/onboard/docker-gpu-patch.ts defines OPENSHELL_MANAGED_BY_LABEL/OPENSHELL_MANAGED_BY_VALUE and filters on managed-by plus sandbox-name.
  • Malformed registry state is silently treated like an empty registry (test/e2e-scenario/framework/phases/state-validation.ts:45): The local registry probe distinguishes a missing registry file from an existing file, but any read or JSON parse error is caught and converted to { entries: {} }. That makes a corrupt or unreadable ~/.nemoclaw/sandboxes.json look like a normal missing-entry condition, weakening the evidence for the registry-preservation regression this scenario is intended to guard.
    • Recommendation: Report malformed JSON and read errors distinctly from a missing registry file or a missing sandbox entry. Add tests for invalid JSON and registry read failures.
    • Evidence: defaultReadRegistry() returns null when the registry file does not exist, but its catch block returns { entries: {} }. Tests cover missing file and missing sandbox entry, but not invalid JSON or read errors.
  • Lifecycle support can be whitelisted without post-lifecycle invariants (test/e2e-scenario/scenarios/runtime-support.ts:15): The runtime-support comment says a lifecycle profile is supported only after both the dispatcher and an expected state with post-lifecycle host invariants land together. The implementation only checks that the lifecycle profile is whitelisted and that some expectedStateId exists. The new canonical scenario uses the intended expected state, but future post-reboot scenarios could be classified supported while referencing an expected state that lacks localRegistry.present and dockerSandboxContainer.present.
    • Recommendation: Enforce that post-reboot-recovery scenarios reference an expected state with localRegistry.expected: "present" and dockerSandboxContainer.expected: "present", or weaken the comment to match the implementation. Add a negative test for a post-reboot scenario whose expected state lacks those invariants.
    • Evidence: SUPPORTED_LIFECYCLES includes post-reboot-recovery, but liveScenarioSupport() does not inspect the referenced ExpectedState. post-reboot-recovery-ready contains the invariants, but a scenario using cloud-openclaw-ready would still pass the current support classifier.
  • Duplicated product constants lack a drift guard (test/e2e-scenario/framework/phases/lifecycle.ts:10): The fixture layer intentionally duplicates product-owned Docker labels, registry path, and backup-name behavior to avoid importing src/lib code. The comments identify the sources of truth, but there is no lightweight parity or behavioral contract guard for label/path/name drift, and the duplicated Docker boundary omits the production managed-by provenance label.
    • Recommendation: Add parity or behavioral contract tests where feasible, or make the follow-up dependency explicit in code/tests so drift in product labels, registry path, backup-name behavior, and Docker provenance cannot silently invalidate the probes.
    • Evidence: lifecycle.ts mirrors OPENSHELL_SANDBOX_NAME_LABEL and MAX_DOCKER_CONTAINER_NAME_LENGTH/buildBackupContainerName from src/lib/onboard/docker-gpu-patch.ts. state-validation.ts mirrors ~/.nemoclaw/sandboxes.json from src/lib/state/registry.ts. Production Docker discovery also uses openshell.ai/managed-by=openshell, which the fixture does not mirror.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-scenario/framework/phases/lifecycle.ts`:
- Around line 123-124: The code deterministically picks a primary container with
const originalName = containerNames[0], which is unstable; instead select the
intended original container by filtering containerNames for the exact/non-backup
label (e.g., choose the name that does NOT include '-nemoclaw-gpu-backup-' or
that matches the original label value) using containerNames.find(...) and fall
back with a clear error if none found; apply the same deterministic selection to
the other spots referenced in the block covering the later usages (the code
around lines 169-196) so references to originalName always point to the
non-backup/original container.
- Around line 109-113: The lifecycle phase currently treats any non-zero result
from the gateway-stop step as acceptable, which can hide real errors; update the
handling around the gateway-stop step (the steps.push call that uses id:
"gateway-stop" and the gatewayStop result) so it only tolerates the specific "no
runtime" failure mode (e.g., the NoSuchProcess error indicator or the specific
exit code/message your CLI emits when no runtime exists). Concretely, replace
the broad acceptance with logic that inspects gatewayStop's result/error
(exitCode, error.name, or stderr text) and only marks the step as non-fatal when
that inspection matches the expected NoSuchProcess/no-runtime signature; for all
other non-zero results let the lifecycle phase fail as before. Ensure the
matching is robust (check both name and stderr substring or exact exit code) and
keep the step ID "gateway-stop" and the gatewayStop variable name so it’s easy
to locate.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: be3abf06-7822-4def-8b49-b16f2bc57a67

📥 Commits

Reviewing files that changed from the base of the PR and between bdcf6a0 and 0516112.

📒 Files selected for processing (12)
  • test/e2e-scenario/framework-tests/e2e-expected-state.test.ts
  • test/e2e-scenario/framework-tests/e2e-live-registry-discovery.test.ts
  • test/e2e-scenario/framework-tests/e2e-phase-lifecycle.test.ts
  • test/e2e-scenario/framework-tests/e2e-phase-state-validation.test.ts
  • test/e2e-scenario/framework/e2e-test.ts
  • test/e2e-scenario/framework/phases/index.ts
  • test/e2e-scenario/framework/phases/lifecycle.ts
  • test/e2e-scenario/framework/phases/state-validation.ts
  • test/e2e-scenario/live/registry-scenarios.test.ts
  • test/e2e-scenario/scenarios/expected-states.ts
  • test/e2e-scenario/scenarios/runtime-support.ts
  • test/e2e-scenario/scenarios/types.ts

Comment on lines +109 to +113
// gateway stop is best-effort: a fresh-start/no-runtime gateway
// will exit non-zero with NoSuchProcess, which is exactly the
// post-reboot state we want to simulate. Don't fail the lifecycle
// phase on it.
steps.push({ id: "gateway-stop", results: [gatewayStop] });

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Narrow gateway-stop tolerance to the expected “no runtime” failure mode.

Lines 109-113 currently allow any non-zero gateway-stop result. That can mask real failures (CLI unavailable, permission issues, timeout) and continue lifecycle simulation in a bad state.

Proposed fix
     steps.push({ id: "gateway-stop", results: [gatewayStop] });
+    if (gatewayStop.exitCode !== 0) {
+      const combinedOutput = `${gatewayStop.stdout}\n${gatewayStop.stderr}`;
+      const isExpectedNoRuntime = /no\s+such\s+process|no\s+gateway\s+runtime/i.test(
+        combinedOutput,
+      );
+      if (!isExpectedNoRuntime) {
+        throw new Error(
+          `lifecycle.post-reboot-recovery failed to stop gateway (exit ${gatewayStop.exitCode}).`,
+        );
+      }
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// gateway stop is best-effort: a fresh-start/no-runtime gateway
// will exit non-zero with NoSuchProcess, which is exactly the
// post-reboot state we want to simulate. Don't fail the lifecycle
// phase on it.
steps.push({ id: "gateway-stop", results: [gatewayStop] });
// gateway stop is best-effort: a fresh-start/no-runtime gateway
// will exit non-zero with NoSuchProcess, which is exactly the
// post-reboot state we want to simulate. Don't fail the lifecycle
// phase on it.
steps.push({ id: "gateway-stop", results: [gatewayStop] });
if (gatewayStop.exitCode !== 0) {
const combinedOutput = `${gatewayStop.stdout}\n${gatewayStop.stderr}`;
const isExpectedNoRuntime = /no\s+such\s+process|no\s+gateway\s+runtime/i.test(
combinedOutput,
);
if (!isExpectedNoRuntime) {
throw new Error(
`lifecycle.post-reboot-recovery failed to stop gateway (exit ${gatewayStop.exitCode}).`,
);
}
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/framework/phases/lifecycle.ts` around lines 109 - 113, The
lifecycle phase currently treats any non-zero result from the gateway-stop step
as acceptable, which can hide real errors; update the handling around the
gateway-stop step (the steps.push call that uses id: "gateway-stop" and the
gatewayStop result) so it only tolerates the specific "no runtime" failure mode
(e.g., the NoSuchProcess error indicator or the specific exit code/message your
CLI emits when no runtime exists). Concretely, replace the broad acceptance with
logic that inspects gatewayStop's result/error (exitCode, error.name, or stderr
text) and only marks the step as non-fatal when that inspection matches the
expected NoSuchProcess/no-runtime signature; for all other non-zero results let
the lifecycle phase fail as before. Ensure the matching is robust (check both
name and stderr substring or exact exit code) and keep the step ID
"gateway-stop" and the gatewayStop variable name so it’s easy to locate.

Comment on lines +123 to +124
const originalName = containerNames[0];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid nondeterministic primary-container selection when multiple labeled containers exist.

Line 123 picks containerNames[0] directly. docker ps -a ordering is not a stable contract; with both original and *-nemoclaw-gpu-backup-* siblings labeled, this can stop/rename the wrong container and register incorrect cleanup targets.

Proposed fix
-    const originalName = containerNames[0];
+    const primaryCandidates = containerNames.filter(
+      (name) => !name.includes("-nemoclaw-gpu-backup-"),
+    );
+    if (primaryCandidates.length !== 1) {
+      throw new Error(
+        `lifecycle.post-reboot-recovery expected exactly one primary labeled container ` +
+          `(found ${primaryCandidates.length}, total labeled ${containerNames.length}).`,
+      );
+    }
+    const originalName = primaryCandidates[0];

Also applies to: 169-196

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/framework/phases/lifecycle.ts` around lines 123 - 124, The
code deterministically picks a primary container with const originalName =
containerNames[0], which is unstable; instead select the intended original
container by filtering containerNames for the exact/non-backup label (e.g.,
choose the name that does NOT include '-nemoclaw-gpu-backup-' or that matches
the original label value) using containerNames.find(...) and fall back with a
clear error if none found; apply the same deterministic selection to the other
spots referenced in the block covering the later usages (the code around lines
169-196) so references to originalName always point to the non-backup/original
container.

@cv

cv commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Tracking link: this PR is part of the #4941 / #4990 Vitest E2E scenario migration path, specifically the lifecycle-fixture prerequisite after #5002-#5006. It also provides framework guardrails needed for #4423; the follow-up scenario/expected-state PR can carry the RED guard once these primitives land. Leaving the current static-check cleanup with @jyaunches.

@cv cv added chore Build, CI, dependency, or tooling maintenance area: e2e End-to-end tests, nightly failures, or validation infrastructure area: architecture Architecture, design debt, major refactors, or maintainability labels Jun 9, 2026
Registers the failing-test-first guard for #4423 in the typed scenario
registry so the live Vitest matrix from #5006 fans it out as a
dedicated CI job. Builds on the framework primitives added earlier in
this PR (lifecycle phase fixture, host-side probes, lifecycle whitelist).

Additions:

  * `post-reboot-recovery-ready` expected-state in
    `scenarios/expected-states.ts` declaring the user-visible
    invariants that must hold after a `nemoclaw <name> status` call
    on a freshly-rebooted DGX Spark / Linux Docker-driver host:
      - cli installed,
      - gateway healthy (the user-systemd unit from #4580 brings it
        back up before status runs),
      - sandbox running (recovery completed in time),
      - localRegistry entry preserved (the user-visible regression
        target — destroyed on unfixed `main`),
      - dockerSandboxContainer present (recovery didn't delete the
        labeled container or its `*-nemoclaw-gpu-backup-*` sibling).

  * `ubuntu-repo-docker-post-reboot-recovery` scenario in
    `scenarios/scenarios/baseline.ts` wiring
    `ubuntuRepoDockerLifecycle("cloud-openclaw", "post-reboot-recovery")`
    against the new expected-state and a smoke suite. Carries a
    description that explains the RED/GREEN contract and points to the
    PR-A fix landing in `src/lib/`.

  * `manifests/openclaw-nvidia-post-reboot-recovery.yaml` declares
    `lifecycle: post-reboot-recovery` and the same NVIDIA_API_KEY
    credential ref the cloud-openclaw scenarios use.

  * `.github/workflows/e2e-scenarios.yaml` ROUTES table gains the new
    scenario so the workflow-boundary test
    (`e2e-scenarios-workflow.test.ts`) routes every typed id.

Test pinning:

  * `e2e-scenario-matrix.test.ts` updated from a 1-entry to a 2-entry
    live matrix expectation. The new entry asserts on
    `expectedStateId: "post-reboot-recovery-ready"` so a future
    accidental dropped-lifecycle change to the scenario regresses
    loudly.

  * `e2e-live-registry-discovery.test.ts` swaps the synthetic
    whitelist-coverage test for an assertion against the real
    `ubuntu-repo-docker-post-reboot-recovery` registry entry.

Behavior:

  * On unfixed `main`, the live runner's lifecycle phase stops the
    OpenShell gateway runtime and `docker stop`s the labeled sandbox
    container. State-validation then runs `nemoclaw <name> status`
    (which restarts the gateway via systemd) and the destructive
    `missing` branch in `src/lib/actions/sandbox/status.ts` wipes the
    local registry entry. The `local-registry-entry-present` probe
    fails. Scenario goes RED.

  * On the PR-A fix branch, the new Docker-driver sandbox recovery
    helper restarts the labeled container before stale-removal can
    fire, registry survives, all five probes pass. Scenario flips
    GREEN.

The bash-side legacy compiler emits a
`lifecycle.profile.post-reboot-recovery` PhaseAction pointing at
`nemoclaw_scenarios/lifecycle/dispatch.sh`, but the legacy bash worker
is intentionally not provided: this scenario is Vitest-only. The
typed runner's `LifecyclePhaseFixture` handles dispatch directly. If
the legacy runner is invoked against this scenario it errors out at
the dispatcher; that's the right failure mode while the bash side
stays on its own retirement clock.

Refs #4423
jyaunches added 2 commits June 9, 2026 14:22
Prek hook auto-fixed formatting in 6 files added/touched by this PR.
No behavior change.
The biome-format commit accidentally added a node_modules symlink
alongside the formatting fixes. Remove it; the directory is already
in .gitignore.
jyaunches added 2 commits June 9, 2026 14:34
The first dispatch of the new post-reboot-recovery scenario surfaced a
flow gap: state-validation asserted gateway-healthy immediately after
lifecycle.simulate stopped the gateway, never running the user-visible
action that actually triggers #4423's regression.

Add `nemoclaw <name> status` as the final step of simulatePostReboot.
This is faithful to the bug semantics — "after a reboot, the FIRST
status invocation is what destroys state" — and lets state-validation
observe the post-status host invariants:

  * On unfixed code: status restarts the gateway via the user-systemd
    unit from #4580, sees sandbox NotFound, runs the destructive
    `missing` branch in src/lib/actions/sandbox/status.ts, registry is
    wiped → `local-registry-entry-present` probe fails → scenario RED.
  * On the PR-A fix branch: status restarts the gateway, recovery
    helper restarts the labeled container before stale-removal can
    fire, registry survives → all probes pass → scenario GREEN.

Status exit code is intentionally not asserted: the bug is precisely
that status "succeeds" at destroying state. The post-action invariants
are owned by state-validation, not lifecycle.

Refs #4423
@cv cv merged commit f483a49 into main Jun 9, 2026
39 checks passed
@cv cv deleted the e2e-scenario-lifecycle-fixture-prereq branch June 9, 2026 18:42
cv pushed a commit that referenced this pull request Jun 9, 2026
…-up to #5046) (#5087)

## Summary

Follow-up to #5046. Tightens the
`ubuntu-repo-docker-post-reboot-recovery` scenario into a stable
RED-on-regression / GREEN-on-main guard while PR-A (the `#4423` source
fix) is in flight.

## Related Issue

Refs #4423

## Why this is needed

Post-merge dispatches of #5046's
`ubuntu-repo-docker-post-reboot-recovery` job against `ubuntu-latest`
revealed two issues:

1. **The `openshell gateway stop` step in `LifecyclePhaseFixture` was
counterproductive.** `#4578`'s mitigation in
`src/lib/actions/sandbox/status.ts:296` guards the destructive branch
behind a `healthy_named` gateway. Stopping the gateway routed every
dispatch through the safe path and masked the regression target. There
is no `openshell gateway start` subcommand on `ubuntu-latest` (the
user-systemd autostart from #4580 is the real mechanism), so the
original lifecycle couldn't restart the gateway cleanly to keep it
`healthy_named`.

2. **The `post-reboot-recovery-ready` expected-state asserted on runtime
liveness probes** (`gateway-healthy`, `sandbox-running`) that are
environmental on `ubuntu-latest` after `docker stop`. They varied
independently of the `#4423` regression target and made every dispatch
RED for the wrong reason — concretely, `gateway-healthy` failing before
the host-side probes had a chance to run.

## Changes

### `LifecyclePhaseFixture.simulatePostReboot`

- Drop the `openshell gateway stop` step. The lifecycle now stops only
the labeled sandbox container and then runs `nemoclaw <name> status`.
Gateway is left `healthy_named`, which is the precondition for the
destructive branch we want PR-A to neutralize.
- Drop the `GATEWAY_STOP_TIMEOUT_MS` constant (no longer referenced).
- Updated docstring + 2 unit tests removed (gateway-stop-failure
tolerance tests are no longer applicable).

### `post-reboot-recovery-ready` expected-state

Lock down only the host-side preservation invariants this scenario can
faithfully exercise:

- `cli-installed`
- `local-registry-entry-present`
- `docker-sandbox-container-present` (running OR stopped OR
`*-nemoclaw-gpu-backup-*` sibling)

Drop `gateway: { expected: "present", health: "healthy" }` and `sandbox:
{ expected: "present", status: "running", agent: "openclaw" }`. A
follow-up scenario on a more controlled runner (real DGX Spark / Brev)
can extend the expected state with runtime liveness probes once PR-A has
landed and a true post-reboot environment is available.

### `probesForState` ordering

Reorder so host-side observables (`local-registry-entry-present`,
`docker-sandbox-container-present`) emit BEFORE runtime-derived
gateway/sandbox probes everywhere. The state-validation orchestrator
short-circuits on the first probe failure, so host-side regression
targets must be observed first. Pre-existing scenarios
(`cloud-openclaw-ready`, `preflight-failure-*`, etc.) are unaffected —
none declare host-side dimensions, so their probe lists are unchanged.

## RED / GREEN contract

- **On `main`** (this PR's HEAD): scenario goes 🟢 GREEN. Locks in
`#4578`'s mitigation as a passive guard — registry preserved through
`docker stop` + `nemoclaw status` round-trip.
- **On a regression PR** that re-introduces unconditional
`registry.removeSandbox` in `status.ts:296` or `gateway-state.ts:412`:
scenario goes 🔴 RED at `local-registry-entry-present`.
- **On PR-A's fix branch** that adds Docker-driver sandbox recovery:
scenario stays 🟢, demonstrating the new helper does not regress either
host-side invariant.

## Type of Change

- [x] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification

- [x] `npx prek run --all-files` passes
- [x] `npm test` passes — 142 tests across the directly affected files
all green; no regressions on the rest of the e2e-scenario framework
suite.
- [x] Live dispatch (`gh workflow run e2e-vitest-scenarios.yaml -f
scenarios=ubuntu-repo-docker-post-reboot-recovery --ref
e2e-post-reboot-scope-host-side`) goes GREEN: lifecycle (`docker stop` +
`nemoclaw status`) → host-side probes pass → cleanup restores Docker.
Will dispatch as a verification step once this PR is open.
- [x] Tests added or updated for new or changed behavior.
- [x] No secrets, API keys, or credentials committed.

## Follow-ups (post PR-A)

- Once PR-A's Docker-driver recovery helper is in place, extend the
post-reboot expected-state with gateway/sandbox runtime probes on a
controlled runner.
- Consider a `post-reboot-spark` lifecycle profile for hardware-runner
dispatch that can do a real reboot and exercise the full bug class.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
* Added a validation test for post-reboot-recovery expected-state
focusing on host-side invariants.
* Updated post-reboot-recovery phase tests to remove the gateway-stop
step and assert container stop + host status checks.

* **Documentation**
* Clarified lifecycle and scenario comments/manifests to describe
container-stop + host-status-driven recovery and probe ordering.

* **Behavior**
* Expected-state and probe emission now prioritize host-side checks and
omit runtime gateway/sandbox expectations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
cv added a commit that referenced this pull request Jun 10, 2026
…5052)

## Summary
Extends the E2E migration inventory beyond direct legacy
`test/e2e/test-*.sh` entrypoints so internal legacy runner surfaces are
also guarded before deletion. The inventory now tracks coarse
runner-internal groups for shell scenario workers, validation suites,
onboarding assertion workers, TypeScript shell-runner orchestrators, and
runtime helper libraries.

This branch also merges the updated #5046 base to pick up the accidental
`node_modules` symlink removal and carries one formatter-only wrap in
`src/commands/sandbox/agents/list.ts` so the all-files static hook stays
green on this stack branch.

## Related Issue
Refs #4941
Refs #4990
Refs #4357
Depends on #5046 and the shared runtime-suite base stack.

## Changes
- Added `internalSurfaces` records to
`test/e2e-scenario/migration/legacy-inventory.json` for legacy runner
internals.
- Extended the migration inventory gate test to validate
internal-surface IDs, path existence, owner issues, replacement
surfaces, deletion readiness, and path coverage.
- Documented that the inventory remains a deletion gate, not a progress
dashboard, while now covering coarse internal runner surfaces as well as
direct scripts.
- Applied one formatter-only wrap required by the all-files static hook.

## Type of Change

- [ ] Code change (feature, bug fix, or refactor)
- [x] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification
Focused verification run:
`npx vitest run --project e2e-scenario-framework
test/e2e-scenario/framework-tests/e2e-migration-inventory.test.ts
test/e2e-scenario/framework-tests/e2e-migration-inventory-lock.test.ts
--silent=false --reporter=default`

Static-check parity run:
`npm run validate:configs && npx prek run --all-files --stage pre-push
--skip tsc-plugin --skip tsc-js --skip tsc-cli --skip version-tag-sync
--skip test-cli --skip test-plugin --skip source-shape-test-budget
--skip test-file-size-budget --skip test-skills-yaml && npm run
source-shape:check && npm run test-size:check && npx vitest run
test/skills-frontmatter.test.ts && python3
scripts/generate-platform-docs.py --check`

Additional local check:
`npx vitest run --project cli test/docker-abstraction-guard.test.ts
--silent=false --reporter=default`

- [ ] `npx prek run --all-files` passes
- [ ] `npm test` passes
- [x] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [ ] Docs updated for user-facing behavior changes
- [ ] `npm run docs` builds without warnings (doc changes only)
- [ ] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

---
<!-- DCO sign-off required by CI. Run: git config user.name && git
config user.email -->
Signed-off-by: Carlos Villela <cvillela@nvidia.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Documentation**
* Clarified requirements for migration inventory and deletion-readiness
rules
  * Improved formatting of migration tracking documentation

* **Tests**
* Extended migration inventory validation to include internal migration
surfaces
  * Enhanced test logic for migration deletion-readiness verification

* **Chores**
* Updated legacy inventory registry to track internal migration surfaces

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Co-authored-by: Julie Yaunches <jyaunches@nvidia.com>
cv added a commit that referenced this pull request Jun 10, 2026
## Summary
Adds the typed inference runtime helper surface for the Vitest E2E
scenario runner.

## Related Issue
Refs #4941
Refs #4990
Refs #4349
Depends on #5046, #5052, and the shared runtime-suite base stack.
Stacked on branch `codex/e2e-fanout-01-inventory-internals`.

## Changes
- Added `RuntimePhaseFixture` and the `runtime` Vitest fixture for
inference runtime probes.
- Added reusable helpers for sandbox-side `inference.local` models, chat
completion, and HTTP status checks.
- Added trusted-provider compatible endpoint helpers for models/chat
probes while preserving shell-probe artifact capture and redaction.
- Validate model-list responses for OpenAI-style `{ data: [...] }` and
Ollama-style `{ models: [...] }` payloads so readiness helpers cannot
pass on `{}` or error-only JSON.
- Auto-redact sensitive custom header values and honor provider
`curlMaxTimeSeconds` as `curl --max-time`.
- Extended `ProviderClient` with a request-level JSON API that returns
both parsed JSON and the captured `ShellProbeResult`.
- Added framework tests for route normalization, argv construction,
redaction values, provider-compatible requests, model-list validation,
provider curl timeout propagation, and malformed response handling.

## Type of Change

- [x] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification
- [ ] `npx prek run --all-files` passes
- [ ] `npm test` passes
- [x] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [ ] Docs updated for user-facing behavior changes
- [ ] `npm run docs` builds without warnings (doc changes only)
- [ ] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

Verified locally:
- `npx vitest run --project e2e-scenario-framework
test/e2e-scenario/framework-tests/e2e-phase-runtime.test.ts
test/e2e-scenario/framework-tests/e2e-clients.test.ts --silent=false
--reporter=default`
- `npx vitest run --project e2e-scenario-framework --silent=false
--reporter=default`
- `npm run typecheck:cli`
- `npx prek run --files test/e2e-scenario/framework/clients/provider.ts
test/e2e-scenario/framework/clients/index.ts
test/e2e-scenario/framework/e2e-test.ts
test/e2e-scenario/framework/phases/index.ts
test/e2e-scenario/framework/phases/runtime.ts
test/e2e-scenario/framework-tests/e2e-phase-runtime.test.ts --skip
test-cli`
- `git diff --check`

CI/advisor evidence:
- Required PR checks are green on the PR head.
- PR review advisor: 0 needs attention, 0 worth checking, 0 nice ideas.
- E2E recommendation advisor: no product E2E required.
- E2E scenario advisor requested `e2e-scenarios-all`; dispatched run
https://github.com/NVIDIA/NemoClaw/actions/runs/27241683412. The
relevant `ubuntu-repo-cloud-openclaw` scenario passed. The all-run is
red due to pre-existing scenario-runner coverage gaps outside this PR's
helper surface, including generated scenarios whose onboarding profile
ids are not yet implemented by
`test/e2e-scenario/nemoclaw_scenarios/onboard/dispatch.sh` (for example
`openai-compatible-openclaw`,
`cloud-nvidia-openclaw-resume-after-interrupt`) and a Hermes-specific
`runtime.hermes.history-writable` assertion that fails after
onboarding/inference pass because it cannot determine shield state.

Note: the full pre-commit hook's `test-cli` step still fails locally in
`test/release-latest-tag.test.ts` because this machine's global Git
config enables SSH commit signing but the private signing key is
unavailable. The focused E2E framework suite and CLI typecheck pass.

---
<!-- DCO sign-off required by CI. Run: git config user.name && git
config user.email -->
Signed-off-by: Carlos Villela <cvillela@nvidia.com>

---------

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Co-authored-by: Julie Yaunches <jyaunches@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: architecture Architecture, design debt, major refactors, or maintainability area: e2e End-to-end tests, nightly failures, or validation infrastructure chore Build, CI, dependency, or tooling maintenance v0.0.62 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants