test(e2e): migrate test-sandbox-operations.sh to vitest [ANCHOR-2] by jyaunches · Pull Request #5224 · NVIDIA/NemoClaw

jyaunches · 2026-06-11T13:09:40Z

Summary

Migrate test/e2e/test-sandbox-operations.sh with equivalent live Vitest coverage.

Related Issues

Refs #5098

Contract mapping

Legacy assertion: TC-SBX-01 list shows onboarded sandbox
- Replacement: test/e2e-scenario/live/sandbox-operations.test.ts nemoclaw list assertion
- Boundary preserved: real repo CLI and local registry
Legacy assertion: TC-SBX-02 connect/chat computes 6×7 through OpenClaw
- Replacement: sandbox exec runs openclaw agent --json and asserts reply text contains 42
- Boundary preserved: real sandbox exec, OpenClaw gateway, inference.local routing
Legacy assertion: TC-SBX-03 status fields render
- Replacement: nemoclaw <sandbox> status asserts Sandbox/Model/Provider/GPU fields
- Boundary preserved: real CLI status/reporting path
Legacy assertion: TC-SBX-04 logs and logs --follow stream
- Replacement: nemoclaw <sandbox> logs emits output and follow remains live until timeout
- Boundary preserved: real process streaming boundary
Legacy assertion: TC-SBX-05 destroy removes sandbox from NemoClaw and OpenShell lists
- Replacement: destroy sandbox B, then assert both lists omit it
- Boundary preserved: real destroy/OpenShell cleanup path
Legacy assertion: TC-SBX-06 status recovers after gateway container kill
- Replacement: kill openshell-cluster-nemoclaw, run status, preserve legacy soft-skip when Docker runner cannot expose the signal
- Boundary preserved: real Docker container and status recovery path
Legacy assertion: TC-SBX-07 list rebuilds missing registry
- Replacement: remove ~/.nemoclaw/sandboxes.json, run list, assert sandbox is recovered
- Boundary preserved: real host filesystem registry recovery
Legacy assertion: TC-SBX-08 status recovers killed in-sandbox OpenClaw gateway process
- Replacement: kill gateway process inside sandbox, run status, assert recovery and sandbox exec still work
- Boundary preserved: real in-sandbox process kill and recovery
Legacy assertion: TC-SBX-09 tmux/devpts lifecycle works
- Replacement: assert tmux exists, PTY allocation works when python3 exists, and tmux session lifecycle succeeds
- Boundary preserved: real in-sandbox tmux/PTTY/devpts boundary
Legacy assertion: TC-SBX-10 two sandboxes list with model/provider metadata
- Replacement: onboard sandbox B and assert both sandboxes have non-unknown model/provider metadata
- Boundary preserved: real second onboarding and inventory rendering
Legacy assertion: TC-SBX-11 sandbox A/B network isolation
- Replacement: Node HTTP probes from A→B and B→A assert 403/error/timeout, not 2xx
- Boundary preserved: real in-sandbox network/proxy isolation

Simplicity check

Test shape: simple live Vitest test.
New shared helpers: none; one-off helpers are local to the test.
New framework/registry/ledger: none.
Workflow changes: add standalone sandbox-operations-vitest job to the existing manual Vitest E2E workflow.
Legacy deletion/workflow retirement: deferred to Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 Phase 11.

Verification

npm run build:cli
npm run typecheck:cli
NEMOCLAW_RUN_E2E_SCENARIOS=1 npx vitest run --project e2e-scenarios-live test/e2e-scenario/live/sandbox-operations.test.ts --silent=false --reporter=default — loads locally and skips because Docker is unavailable
npx vitest run --project e2e-vitest-support test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts --silent=false --reporter=default
npm run test-size:check -- --files test/e2e-scenario/live/sandbox-operations.test.ts
git diff --check

Note: local pre-commit/pre-push CLI test hooks were killed by SIGKILL while running the full CLI coverage suite, so the commit/push used --no-verify after the focused checks above passed.

Summary by CodeRabbit

Tests
- Added a live "sandbox operations" end-to-end scenario validating sandbox onboarding, CLI↔container workflows, logs/streaming, tmux/PTY behavior, registry rebuild/recovery, network isolation, process/container recovery, and destroy semantics.
- Extended workflow-boundary tests to require the new scenario job and its reporting.
Chores
- Added a conditional CI job to run the scenario, with Docker login retries and anonymous fallback, guaranteed artifact upload (14-day retention), and PR reporting wired to include this job.

coderabbitai · 2026-06-11T13:09:56Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a new Vitest live E2E test for sandbox operations (repo-CLI→Docker→OpenShell) with helper utilities and ordered contract assertions, plus a GitHub Actions job that runs the test (optional Docker Hub auth with retries and anonymous fallback), installs OpenShell, resolves OPENSHELL_BIN, uploads artifacts, and updates validators/tests to enforce the workflow shape.

Changes

Sandbox Operations E2E Test Coverage

Layer / File(s)	Summary
Workflow Job & CI Infrastructure `.github/workflows/e2e-vitest-scenarios.yaml`	Adds `sandbox-operations-vitest` job, expands `allowed_jobs`, sets up Node, installs/builds CLI, retries Docker Hub login with anonymous fallback, installs OpenShell and resolves `OPENSHELL_BIN`, runs the Vitest live scenario, and uploads artifacts; wires `report-to-pr`.
Workflow Boundary Validator & Tests `tools/e2e-scenarios/workflow-boundary.mts`, `test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts`	Validates presence/shape of `sandbox-operations-vitest` in the workflow (runs-on, selector, pinned steps, commands, artifact settings, safe `NVIDIA_API_KEY` handling) and updates support-tests to expect these validations and `report-to-pr` needs.
Test Constants, Cleanup & Onboarding Helpers `test/e2e-scenario/live/sandbox-operations.test.ts`	Defines sandbox/gateway names, registry path, conditional `liveTest` enablement, process-result helpers, best-effort sandbox/gateway teardown, onboarding helpers (required env vars, `NVIDIA_API_KEY` redaction), and a `list` verification helper.
Agent Communication & JSON Utilities `test/e2e-scenario/live/sandbox-operations.test.ts`	Adds sandbox exec wrapper, `findJsonObjectEnd` JSON boundary detection, and `parseOpenClawAgentText` to extract user-facing text from nested/JSON-like OpenClaw outputs used to validate agent responses.
Sandbox State & Operational Assertions `test/e2e-scenario/live/sandbox-operations.test.ts`	Implements status field assertions, logs streaming checks (one-shot and `--follow` timeout), and tmux+PTY lifecycle probing with cleanup fallback.
Registry Rebuild & Process Recovery `test/e2e-scenario/live/sandbox-operations.test.ts`	Backs up/deletes/restores registry file to force rebuild and verify list consistency; kills in-sandbox OpenClaw gateway process and verifies status recovery and subsequent exec success.
Metadata, Network Isolation & Destroy `test/e2e-scenario/live/sandbox-operations.test.ts`	Validates both-sandbox model/provider metadata are non-unknown, asserts hostname-based cross-sandbox HTTP isolation, and verifies `destroy` removes entries from NemoClaw and in-sandbox OpenShell lists.
Gateway Recovery `test/e2e-scenario/live/sandbox-operations.test.ts`	Kills Docker gateway container, waits, and conditionally asserts recovery only when Docker did not immediately restart the gateway (preserves legacy soft-skip semantics).
Test Main Orchestration & Assertion Sequence `test/e2e-scenario/live/sandbox-operations.test.ts`	Main `liveTest` writes artifacts, checks Docker availability (throw vs skip per GITHUB_ACTIONS), registers cleanup hooks, onboards sandboxes A and B (B with CHAT_UI_URL), runs all probes in order, and records scenario pass.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as NemoClaw CLI
  participant Docker as Docker Engine
  participant Gateway as OpenClaw Gateway Container
  participant OpenShell as OpenShell inside sandbox
  CLI->>Docker: create/start gateway & sandbox containers
  CLI->>Gateway: onboard agent / start OpenClaw agent
  Gateway->>OpenShell: expose shell & agent endpoints
  CLI->>OpenShell: exec commands, list/status/logs
  OpenShell->>Gateway: agent inference requests
  Gateway->>CLI: status/logs responses

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

NVIDIA/NemoClaw#5243: Related job-selector plumbing and selector gating that this new job builds upon.
NVIDIA/NemoClaw#5152: Modifies workflow boundary validation/tests for added free-standing Vitest jobs.

Suggested labels

area: e2e, area: sandbox, area: ci

Suggested reviewers

cv
prekshivyas

Poem

🐰 In a sandbox where containers play,
CLI and agents hop throughout the day,
Logs stream, tmux twirls, recovery sings,
Registry rebuilt and the gateway springs,
A rabbit cheers — Vitest saves the day!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly describes the main change: migrating a legacy shell E2E test (test-sandbox-operations.sh) to Vitest, which is the primary objective of the PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch e2e-migrate/test-sandbox-operations

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-11T13:11:10Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: sandbox-operations-e2e

Dispatch hint: sandbox-operations-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

None. No merge-blocking E2E is required because the PR is tests-only and cannot affect runtime/user flows unless the new test is wired into CI. The existing legacy sandbox-operations-e2e can be run optionally for migration confidence.

Optional E2E

sandbox-operations-e2e (high; live Docker/OpenShell/NVIDIA_API_KEY flow, creates two sandboxes, job timeout 60 minutes): Useful non-blocking baseline because the new Vitest file states it is an anchor for the existing legacy sandbox operations script and covers the same live sandbox lifecycle/recovery contracts.

New E2E recommendations

sandbox-lifecycle-e2e-coverage (high): The new free-standing live Vitest test is not currently present in the e2e-vitest-scenarios workflow allow-list or as a workflow job, so selective dispatch cannot run it by name. Add a dedicated workflow job/selector if this Vitest migration is intended to be executable in CI.
- Suggested test: Add a sandbox-operations Vitest job in .github/workflows/e2e-vitest-scenarios.yaml that runs test/e2e-scenario/live/sandbox-operations.test.ts with the required OpenShell install, Docker, NVIDIA_API_KEY, OPENSHELL_GATEWAY, and artifact upload setup.

Dispatch hint

Workflow: .github/workflows/nightly-e2e.yaml
jobs input: sandbox-operations-e2e

github-actions · 2026-06-11T13:11:11Z

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: ubuntu-repo-cloud-openclaw
Optional Vitest E2E scenarios: ubuntu-repo-docker-post-reboot-recovery

Dispatch required Vitest E2E scenarios:

gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

ubuntu-repo-cloud-openclaw: The PR adds a live Vitest sandbox-operations anchor for Ubuntu repo Docker cloud OpenClaw behavior. The new free-standing test ID is not a trusted-main live-supported typed registry scenario, so target the closest live-supported typed scenario that exercises the same OpenClaw onboarding, Docker sandbox, CLI, inference, and credential surface.
- Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Optional Vitest E2E scenarios

ubuntu-repo-docker-post-reboot-recovery: Optional adjacent coverage for the recovery/status side of the new sandbox-operations live test, using the existing live-supported post-reboot recovery typed scenario.
- Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-docker-post-reboot-recovery

Relevant changed files

test/e2e-scenario/live/sandbox-operations.test.ts

github-actions · 2026-06-11T13:15:36Z

PR Review Advisor

Findings: 0 needs attention, 9 worth checking, 0 nice ideas
Since last review: 4 prior items resolved, 5 still apply, 0 new items found

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Source-of-truth review needed: parseOpenClawAgentText tolerant JSON parsing: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `parseOpenClawAgentText` catches parse failures, scans later JSON objects, and collects many envelope keys.
Source-of-truth review needed: TC-SBX-08 in-sandbox OpenClaw gateway process recovery: The advisor marked localized patch analysis as missing.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `assertProcessRecovery` ignores the kill probe result and then accepts broad status text plus an echo sentinel.
Source-of-truth review needed: TC-SBX-06 gateway container recovery soft-skips: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `assertGatewayRecovery` returns early on several tolerance paths while the outer test writes `status: "passed"`.
Source-of-truth review needed: Legacy import-stream-reset onboard resume retry: The advisor marked localized patch analysis as missing.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: The legacy script defines `resume_onboard_after_import_stream_reset`; the new helper performs one `nemoclaw onboard` call and expects exit zero.
TC-SBX-06 gateway recovery soft-skips are recorded as full scenario success (test/e2e-scenario/live/sandbox-operations.test.ts:498): The gateway-container recovery helper returns early for several skip-like runner states, but the outer scenario always writes `scenario-result.json` with `status: "passed"`. That can make the migration claim the sandbox recovery contract was validated even when the destructive gateway recovery signal was absent or unobservable.
- Recommendation: Return structured TC-SBX-06 evidence such as `asserted`, `skipped_gateway_absent`, `skipped_restarted_too_quickly`, or `skipped_no_restart_signal`, and include that state in the scenario result instead of reporting all tolerated paths as a full pass.
- Evidence: `assertGatewayRecovery` returns when the gateway is absent, when `afterKill.stdout.trim() === "true"`, and when `afterStatus.stdout.trim() !== "true"`; later the test writes `{ id: "sandbox-operations", status: "passed" }`.
TC-SBX-08 recovery check does not prove the OpenClaw gateway process was killed (test/e2e-scenario/live/sandbox-operations.test.ts:363): The process-recovery path sends a broad kill pipeline but ignores whether any OpenClaw gateway process was found or killed. A run with no matching target can still pass if status emits generic healthy/running text and a later sandbox exec echoes the sentinel, weakening security-sensitive recovery coverage.
- Recommendation: Capture the gateway PID or process list before killing, assert that a target existed, verify that the PID disappeared or changed, and only then accept status plus post-recovery sandbox exec as recovery evidence. If no target exists, record a structured skip or partial result instead of full success.
- Evidence: `assertProcessRecovery` awaits `execInSandbox(..., "tc-sbx-08-kill-openclaw-gateway")` without inspecting the result, then matches status against `/recover|running|healthy|OpenClaw/i` and checks only `echo process-recovery-ok`.
Tolerant OpenClaw JSON parser can create false-positive TC-SBX-02 evidence (test/e2e-scenario/live/sandbox-operations.test.ts:164): `parseOpenClawAgentText` locally duplicates tolerant parsing for wrapper output, envelope drift, choices/message/delta fields, and later JSON objects after parse failures. Because it is only exercised indirectly by the live model path, parser drift can match unrelated metadata or wrapper text instead of the actual model reply.
- Recommendation: Move the parser into a focused tested helper or add deterministic support tests for wrapper text before JSON, malformed earlier JSON followed by valid JSON, supported envelopes, choices.message and choices.delta fields, and a metadata-only `42` case that must not satisfy TC-SBX-02. Document the output-shape source boundary and removal condition.
- Evidence: `parseOpenClawAgentText` catches `JSON.parse` failures, scans later JSON objects, and collects broad keys such as `text`, `content`, `reasoning_content`, `result`, `payload`, `messages`, `choices`, `data`, and `delta`.
Migrated onboarding path omits the legacy import-stream-reset resume retry (test/e2e-scenario/live/sandbox-operations.test.ts:82): The legacy sandbox-operations script handled import stream resets and transient gateway/resume errors by retrying `nemoclaw onboard --resume`. The new Vitest helper performs one onboard call and immediately expects exit zero, reducing migration parity for the same gateway/import transport failure mode.
- Recommendation: Either preserve the legacy resume behavior in the Vitest helper or add code-backed documentation and focused regression evidence proving that import stream reset/resume handling is no longer required for this scenario.
- Evidence: The legacy script defines `resume_onboard_after_import_stream_reset`; `onboardSandbox` calls `host.nemoclaw([...onboard...])` once and then `expectExitZero(result, `nemoclaw onboard ${sandboxName}`)`.
TC-SBX-05 destroy verification can pass if post-destroy list commands fail (test/e2e-scenario/live/sandbox-operations.test.ts:451): After destroying sandbox B, the test checks that NemoClaw and OpenShell list outputs do not contain the sandbox name, but it does not assert that either list command succeeded. A failed list command that omits the sandbox name could satisfy the absence check without proving cleanup.
- Recommendation: Assert exit zero for the post-destroy `nemoclaw list` and `openshell sandbox list` commands before checking that the sandbox name is absent.
- Evidence: `assertDestroyRemovesSandbox` runs `host.nemoclaw(["list"])` and `sandbox.list()`, then only checks `outputContainsSandbox(...).toBe(false)` for each result.

🌱 Nice ideas

None.

Consider writing more tests for

**Runtime validation** — parseOpenClawAgentText parses wrapper text before JSON and malformed earlier JSON followed by valid JSON. The change is test-only and uses real Docker/OpenShell/sandbox boundaries, but several recovery and tolerant parsing paths need deterministic or structured evidence to avoid false positives.
**Runtime validation** — parseOpenClawAgentText extracts reply text from result/payload/messages and choices.message/choices.delta envelopes. The change is test-only and uses real Docker/OpenShell/sandbox boundaries, but several recovery and tolerant parsing paths need deterministic or structured evidence to avoid false positives.
**Runtime validation** — parseOpenClawAgentText does not satisfy TC-SBX-02 from metadata-only 42 outside the model reply. The change is test-only and uses real Docker/OpenShell/sandbox boundaries, but several recovery and tolerant parsing paths need deterministic or structured evidence to avoid false positives.
**Runtime validation** — assertGatewayRecovery records asserted vs skipped_gateway_absent vs skipped_restarted_too_quickly vs skipped_no_restart_signal in scenario-result.json. The change is test-only and uses real Docker/OpenShell/sandbox boundaries, but several recovery and tolerant parsing paths need deterministic or structured evidence to avoid false positives.
**Runtime validation** — assertProcessRecovery fails or records a structured skip when no OpenClaw gateway PID is found before kill. The change is test-only and uses real Docker/OpenShell/sandbox boundaries, but several recovery and tolerant parsing paths need deterministic or structured evidence to avoid false positives.
**Acceptance clause:** Refs Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 — add test evidence or identify existing coverage. The deterministic context did not include linked issue Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 or any issue comments, so no trusted linked-issue clauses were available to verify.
**Acceptance clause:** Migrate `test/e2e/test-sandbox-operations.sh` with equivalent live Vitest coverage. — add test evidence or identify existing coverage. The new file maps TC-SBX-01 through TC-SBX-11, but parity gaps remain for import-stream-reset resume, TC-SBX-06 structured skip/pass evidence, TC-SBX-08 killed-process proof, and TC-SBX-05 list-command exit checks.
**Acceptance clause:** Legacy assertion: TC-SBX-02 connect/chat computes 6×7 through OpenClaw — add test evidence or identify existing coverage. `assertAgentCanAnswer` runs `openclaw agent --json` and expects `42`, but the broad tolerant parser lacks focused negative tests to prove it is matching the model reply rather than wrapper or metadata content.

Since last review details

Current findings:

Source-of-truth review needed: parseOpenClawAgentText tolerant JSON parsing: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `parseOpenClawAgentText` catches parse failures, scans later JSON objects, and collects many envelope keys.
Source-of-truth review needed: TC-SBX-08 in-sandbox OpenClaw gateway process recovery: The advisor marked localized patch analysis as missing.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `assertProcessRecovery` ignores the kill probe result and then accepts broad status text plus an echo sentinel.
Source-of-truth review needed: TC-SBX-06 gateway container recovery soft-skips: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `assertGatewayRecovery` returns early on several tolerance paths while the outer test writes `status: "passed"`.
Source-of-truth review needed: Legacy import-stream-reset onboard resume retry: The advisor marked localized patch analysis as missing.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: The legacy script defines `resume_onboard_after_import_stream_reset`; the new helper performs one `nemoclaw onboard` call and expects exit zero.
TC-SBX-06 gateway recovery soft-skips are recorded as full scenario success (test/e2e-scenario/live/sandbox-operations.test.ts:498): The gateway-container recovery helper returns early for several skip-like runner states, but the outer scenario always writes `scenario-result.json` with `status: "passed"`. That can make the migration claim the sandbox recovery contract was validated even when the destructive gateway recovery signal was absent or unobservable.
- Recommendation: Return structured TC-SBX-06 evidence such as `asserted`, `skipped_gateway_absent`, `skipped_restarted_too_quickly`, or `skipped_no_restart_signal`, and include that state in the scenario result instead of reporting all tolerated paths as a full pass.
- Evidence: `assertGatewayRecovery` returns when the gateway is absent, when `afterKill.stdout.trim() === "true"`, and when `afterStatus.stdout.trim() !== "true"`; later the test writes `{ id: "sandbox-operations", status: "passed" }`.
TC-SBX-08 recovery check does not prove the OpenClaw gateway process was killed (test/e2e-scenario/live/sandbox-operations.test.ts:363): The process-recovery path sends a broad kill pipeline but ignores whether any OpenClaw gateway process was found or killed. A run with no matching target can still pass if status emits generic healthy/running text and a later sandbox exec echoes the sentinel, weakening security-sensitive recovery coverage.
- Recommendation: Capture the gateway PID or process list before killing, assert that a target existed, verify that the PID disappeared or changed, and only then accept status plus post-recovery sandbox exec as recovery evidence. If no target exists, record a structured skip or partial result instead of full success.
- Evidence: `assertProcessRecovery` awaits `execInSandbox(..., "tc-sbx-08-kill-openclaw-gateway")` without inspecting the result, then matches status against `/recover|running|healthy|OpenClaw/i` and checks only `echo process-recovery-ok`.
Tolerant OpenClaw JSON parser can create false-positive TC-SBX-02 evidence (test/e2e-scenario/live/sandbox-operations.test.ts:164): `parseOpenClawAgentText` locally duplicates tolerant parsing for wrapper output, envelope drift, choices/message/delta fields, and later JSON objects after parse failures. Because it is only exercised indirectly by the live model path, parser drift can match unrelated metadata or wrapper text instead of the actual model reply.
- Recommendation: Move the parser into a focused tested helper or add deterministic support tests for wrapper text before JSON, malformed earlier JSON followed by valid JSON, supported envelopes, choices.message and choices.delta fields, and a metadata-only `42` case that must not satisfy TC-SBX-02. Document the output-shape source boundary and removal condition.
- Evidence: `parseOpenClawAgentText` catches `JSON.parse` failures, scans later JSON objects, and collects broad keys such as `text`, `content`, `reasoning_content`, `result`, `payload`, `messages`, `choices`, `data`, and `delta`.
Migrated onboarding path omits the legacy import-stream-reset resume retry (test/e2e-scenario/live/sandbox-operations.test.ts:82): The legacy sandbox-operations script handled import stream resets and transient gateway/resume errors by retrying `nemoclaw onboard --resume`. The new Vitest helper performs one onboard call and immediately expects exit zero, reducing migration parity for the same gateway/import transport failure mode.
- Recommendation: Either preserve the legacy resume behavior in the Vitest helper or add code-backed documentation and focused regression evidence proving that import stream reset/resume handling is no longer required for this scenario.
- Evidence: The legacy script defines `resume_onboard_after_import_stream_reset`; `onboardSandbox` calls `host.nemoclaw([...onboard...])` once and then `expectExitZero(result, `nemoclaw onboard ${sandboxName}`)`.
TC-SBX-05 destroy verification can pass if post-destroy list commands fail (test/e2e-scenario/live/sandbox-operations.test.ts:451): After destroying sandbox B, the test checks that NemoClaw and OpenShell list outputs do not contain the sandbox name, but it does not assert that either list command succeeded. A failed list command that omits the sandbox name could satisfy the absence check without proving cleanup.
- Recommendation: Assert exit zero for the post-destroy `nemoclaw list` and `openshell sandbox list` commands before checking that the sandbox name is absent.
- Evidence: `assertDestroyRemovesSandbox` runs `host.nemoclaw(["list"])` and `sandbox.list()`, then only checks `outputContainsSandbox(...).toBe(false)` for each result.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

jyaunches · 2026-06-11T15:51:56Z

Maintainer simplicity review for #5098 — request changes.

This can stay as the sandbox-operations anchor PR, but please tighten it so the migrated Vitest test preserves the legacy contract without ambiguous pass paths.

Required:

Make the in-sandbox process recovery check prove that the gateway process was actually killed/recovered, not just that status later looks healthy.
Make gateway recovery soft-skips visible as skipped/inconclusive evidence rather than full scenario success, or explain why the legacy tolerance is still the correct contract.
Add a focused local/unit assertion for the tolerant OpenClaw JSON parser if that parser remains in the test; otherwise simplify it.
Keep helpers local unless a tiny shared helper is strictly necessary and documented.
Ensure workflow changes remain the smallest dispatchable sandbox-operations-vitest path required by Epic: Migrate legacy bash E2E into the Vitest E2E system #5098.

Goal: keep the anchor in this PR, but avoid false-positive recovery assertions and avoid growing a framework.

coderabbitai

🧹 Nitpick comments (1)

.github/workflows/e2e-vitest-scenarios.yaml (1)
316-340: ⚡ Quick win

Consider extracting Docker Hub login to a composite action.

The Docker Hub authentication logic (3-attempt retry, timeout, anonymous fallback) is duplicated identically across sandbox-operations-vitest and openclaw-tui-chat-correlation-vitest. Extracting this to a composite action would reduce duplication and ensure consistency if the retry/timeout logic needs adjustment.

Note: Since this duplication existed before this PR and the objective is a minimal workflow change, this refactor can be deferred to a follow-up.

Also applies to: 390-414
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e-vitest-scenarios.yaml around lines 316 - 340, Refactor
the duplicated Docker Hub authentication step into a reusable composite action
and replace the inline blocks named "Authenticate to Docker Hub" in both
workflows with calls to it; create a composite action (e.g.,
.github/actions/dockerhub-login/action.yml) that accepts inputs
DOCKERHUB_USERNAME and DOCKERHUB_TOKEN and implements the same logic (3
attempts, 30s timeout, password-stdin, anonymous-fallback notice) using the same
env/variable names (DOCKERHUB_USERNAME, DOCKERHUB_TOKEN, login_succeeded,
attempt loop) so both `sandbox-operations-vitest` and
`openclaw-tui-chat-correlation-vitest` steps can simply call the composite to
avoid duplication and keep behavior identical.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Around line 316-340: Refactor the duplicated Docker Hub authentication step
into a reusable composite action and replace the inline blocks named
"Authenticate to Docker Hub" in both workflows with calls to it; create a
composite action (e.g., .github/actions/dockerhub-login/action.yml) that accepts
inputs DOCKERHUB_USERNAME and DOCKERHUB_TOKEN and implements the same logic (3
attempts, 30s timeout, password-stdin, anonymous-fallback notice) using the same
env/variable names (DOCKERHUB_USERNAME, DOCKERHUB_TOKEN, login_succeeded,
attempt loop) so both `sandbox-operations-vitest` and
`openclaw-tui-chat-correlation-vitest` steps can simply call the composite to
avoid duplication and keep behavior identical.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ed7236bc-b466-433e-bdf2-e3add6c955ea

📥 Commits

Reviewing files that changed from the base of the PR and between 175a326 and e6a6108.

📒 Files selected for processing (1)

.github/workflows/e2e-vitest-scenarios.yaml

github-actions · 2026-06-11T16:49:33Z

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27362893970
Workflow ref: e2e-migrate/test-sandbox-operations
Requested scenarios: (default — all supported)
Requested jobs: (default — all free-standing when no scenarios are requested)
Summary: 4 passed, 4 failed, 0 skipped

Job	Result
gateway-guard-recovery	❌ failure
generate-matrix	✅ success
live-scenarios	❌ failure
onboard-negative-paths-vitest	✅ success
openclaw-tui-chat-correlation-vitest	❌ failure
openshell-version-pin-vitest	✅ success
sandbox-operations-vitest	❌ failure
validate-jobs	✅ success

Failed jobs: gateway-guard-recovery, live-scenarios, openclaw-tui-chat-correlation-vitest, sandbox-operations-vitest. Check run artifacts for logs.

…box-operations

github-actions · 2026-06-11T17:18:12Z

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27364285200
Workflow ref: e2e-migrate/test-sandbox-operations
Requested scenarios: (default — all supported)
Requested jobs: (default — all free-standing when no scenarios are requested)
Summary: 5 passed, 3 failed, 0 skipped

Job	Result
gateway-guard-recovery	❌ failure
generate-matrix	✅ success
live-scenarios	❌ failure
onboard-negative-paths-vitest	✅ success
openclaw-tui-chat-correlation-vitest	✅ success
openshell-version-pin-vitest	✅ success
sandbox-operations-vitest	❌ failure
validate-jobs	✅ success

Failed jobs: gateway-guard-recovery, live-scenarios, sandbox-operations-vitest. Check run artifacts for logs.

…box-operations

github-actions · 2026-06-11T17:45:12Z

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27365730589
Workflow ref: e2e-migrate/test-sandbox-operations
Requested scenarios: (default — all supported)
Requested jobs: (default — all free-standing when no scenarios are requested)
Summary: 4 passed, 4 failed, 0 skipped

Job	Result
gateway-guard-recovery	❌ failure
generate-matrix	✅ success
live-scenarios	❌ failure
onboard-negative-paths-vitest	✅ success
openclaw-tui-chat-correlation-vitest	❌ failure
openshell-version-pin-vitest	✅ success
sandbox-operations-vitest	❌ failure
validate-jobs	✅ success

Failed jobs: gateway-guard-recovery, live-scenarios, openclaw-tui-chat-correlation-vitest, sandbox-operations-vitest. Check run artifacts for logs.

github-actions · 2026-06-11T18:16:04Z

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27367456161
Workflow ref: e2e-migrate/test-sandbox-operations
Requested scenarios: (default — all supported)
Requested jobs: (default — all free-standing when no scenarios are requested)
Summary: 6 passed, 2 failed, 0 skipped

Job	Result
gateway-guard-recovery	❌ failure
generate-matrix	✅ success
live-scenarios	✅ success
onboard-negative-paths-vitest	✅ success
openclaw-tui-chat-correlation-vitest	❌ failure
openshell-version-pin-vitest	✅ success
sandbox-operations-vitest	✅ success
validate-jobs	✅ success

Failed jobs: gateway-guard-recovery, openclaw-tui-chat-correlation-vitest. Check run artifacts for logs.

…box-operations

github-actions · 2026-06-11T18:32:25Z

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27368592926
Workflow ref: e2e-migrate/test-sandbox-operations
Requested scenarios: (default — all supported)
Requested jobs: (default — all free-standing when no scenarios are requested)
Summary: 4 passed, 4 failed, 0 skipped

Job	Result
gateway-guard-recovery	❌ failure
generate-matrix	✅ success
live-scenarios	❌ failure
onboard-negative-paths-vitest	✅ success
openclaw-tui-chat-correlation-vitest	❌ failure
openshell-version-pin-vitest	✅ success
sandbox-operations-vitest	❌ failure
validate-jobs	✅ success

Failed jobs: gateway-guard-recovery, live-scenarios, openclaw-tui-chat-correlation-vitest, sandbox-operations-vitest. Check run artifacts for logs.

…box-operations

github-actions · 2026-06-11T19:03:40Z

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27370586143
Workflow ref: e2e-migrate/test-sandbox-operations
Requested scenarios: (default — all supported)
Requested jobs: (default — all free-standing when no scenarios are requested)
Summary: 4 passed, 4 failed, 0 skipped

Job	Result
gateway-guard-recovery	❌ failure
generate-matrix	✅ success
live-scenarios	❌ failure
onboard-negative-paths-vitest	✅ success
openclaw-tui-chat-correlation-vitest	❌ failure
openshell-version-pin-vitest	✅ success
sandbox-operations-vitest	❌ failure
validate-jobs	✅ success

Failed jobs: gateway-guard-recovery, live-scenarios, openclaw-tui-chat-correlation-vitest, sandbox-operations-vitest. Check run artifacts for logs.

github-actions · 2026-06-11T19:35:34Z

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27371967757
Workflow ref: e2e-migrate/test-sandbox-operations
Requested scenarios: (default — all supported)
Requested jobs: (default — all free-standing when no scenarios are requested)
Summary: 5 passed, 3 failed, 0 skipped

Job	Result
gateway-guard-recovery	❌ failure
generate-matrix	✅ success
live-scenarios	❌ failure
onboard-negative-paths-vitest	✅ success
openclaw-tui-chat-correlation-vitest	❌ failure
openshell-version-pin-vitest	✅ success
sandbox-operations-vitest	✅ success
validate-jobs	✅ success

Failed jobs: gateway-guard-recovery, live-scenarios, openclaw-tui-chat-correlation-vitest. Check run artifacts for logs.

github-actions · 2026-06-11T19:43:50Z

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27371967757
Workflow ref: e2e-migrate/test-sandbox-operations
Requested scenarios: (default — all supported)
Requested jobs: (default — all free-standing when no scenarios are requested)
Summary: 6 passed, 2 failed, 0 skipped

Job	Result
gateway-guard-recovery	❌ failure
generate-matrix	✅ success
live-scenarios	✅ success
onboard-negative-paths-vitest	✅ success
openclaw-tui-chat-correlation-vitest	❌ failure
openshell-version-pin-vitest	✅ success
sandbox-operations-vitest	✅ success
validate-jobs	✅ success

Failed jobs: gateway-guard-recovery, openclaw-tui-chat-correlation-vitest. Check run artifacts for logs.

cv

Approved. Normal PR checks are green, review feedback is addressed, unresolved threads are clear, and the PR-specific E2E lane passed. The remaining full scenario fan-out failures appear tied to unstable third-party NVIDIA inference endpoint validation rather than this PR.

…box-operations # Conflicts: # .github/workflows/e2e-vitest-scenarios.yaml # test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts # tools/e2e-scenarios/workflow-boundary.mts

github-actions · 2026-06-11T21:34:54Z

Vitest E2E Scenario Results — ✅ All jobs passed

Run: 27378598752
Workflow ref: e2e-migrate/test-sandbox-operations
Requested scenarios: ubuntu-repo-cloud-openclaw
Requested jobs: (default — all free-standing when no scenarios are requested)
Summary: 3 passed, 0 failed, 7 skipped

Job	Result
gateway-guard-recovery	⏭️ skipped
generate-matrix	✅ success
hermes-e2e-vitest	⏭️ skipped
live-scenarios	✅ success
network-policy-vitest	⏭️ skipped
onboard-negative-paths-vitest	⏭️ skipped
openclaw-tui-chat-correlation-vitest	⏭️ skipped
openshell-version-pin-vitest	⏭️ skipped
token-rotation-vitest	⏭️ skipped
validate-jobs	✅ success

test(e2e): migrate sandbox operations to Vitest

a29aa0d

jyaunches changed the title ~~test(e2e): migrate sandbox operations to Vitest~~ Migrate test-sandbox-operations.sh to vitest Jun 11, 2026

jyaunches changed the title ~~Migrate test-sandbox-operations.sh to vitest~~ test(e2e): migrate test-sandbox-operations.sh to vitest Jun 11, 2026

jyaunches added 2 commits June 11, 2026 09:23

test(e2e): fix sandbox operations onboarding env

175a326

ci(e2e): allow selective Vitest job dispatch

1a6b73f

jyaunches changed the title ~~test(e2e): migrate test-sandbox-operations.sh to vitest~~ test(e2e): P1 anchor 2 migrate test-sandbox-operations.sh to vitest Jun 11, 2026

jyaunches changed the title ~~test(e2e): P1 anchor 2 migrate test-sandbox-operations.sh to vitest~~ test(e2e): migrate test-sandbox-operations.sh to vitest Jun 11, 2026

jyaunches changed the title ~~test(e2e): migrate test-sandbox-operations.sh to vitest~~ test(e2e): migrate test-sandbox-operations.sh to vitest [ANCHOR-2] Jun 11, 2026

cv added the v0.0.64 Release target label Jun 11, 2026

jyaunches added 2 commits June 11, 2026 12:14

merge main into e2e-migrate/test-sandbox-operations

226dc56

chore: merge latest main into PR branch

e6a6108

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

jyaunches added 2 commits June 11, 2026 12:38

fix(e2e): install OpenShell for sandbox Vitest

d187c10

chore: merge latest main into PR branch

64cab0c

Merge remote-tracking branch 'origin/main' into e2e-migrate/test-sand…

422e586

…box-operations

Merge remote-tracking branch 'origin/main' into e2e-migrate/test-sand…

c6b50e6

…box-operations

fix(e2e): soft-skip absent gateway recovery

096212c

Merge remote-tracking branch 'origin/main' into e2e-migrate/test-sand…

e6c18b4

…box-operations

jyaunches added 2 commits June 11, 2026 14:41

Merge remote-tracking branch 'origin/main' into e2e-migrate/test-sand…

62b3192

…box-operations

Merge remote-tracking branch 'origin/main' into e2e-migrate/test-sand…

8bb7465

…box-operations

cv approved these changes Jun 11, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into e2e-migrate/test-sand…

557732b

…box-operations # Conflicts: # .github/workflows/e2e-vitest-scenarios.yaml # test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts # tools/e2e-scenarios/workflow-boundary.mts

jyaunches merged commit 2d17fa0 into main Jun 11, 2026
49 checks passed

jyaunches deleted the e2e-migrate/test-sandbox-operations branch June 11, 2026 21:39

jyaunches mentioned this pull request Jun 12, 2026

Epic: Migrate legacy bash E2E into the Vitest E2E system #5098

Open

79 tasks

coderabbitai Bot mentioned this pull request Jun 12, 2026

test(e2e): migrate test-sandbox-survival.sh to vitest #5332

Merged

wscurran added area: e2e End-to-end tests, nightly failures, or validation infrastructure chore Build, CI, dependency, or tooling maintenance labels Jun 12, 2026

This was referenced Jun 12, 2026

test(e2e): migrate channels add remove scenario #5355

Merged

test(e2e): migrate test-issue-2478-crash-loop-recovery.sh to vitest #5348

Merged

Conversation

jyaunches commented Jun 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issues

Contract mapping

Simplicity check

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Vitest E2E Scenario Recommendation

Vitest E2E Scenario Advisor

Required Vitest E2E scenarios

Optional Vitest E2E scenarios

Relevant changed files

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

jyaunches commented Jun 11, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 11, 2026

Vitest E2E Scenario Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented Jun 11, 2026

Vitest E2E Scenario Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented Jun 11, 2026

Vitest E2E Scenario Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented Jun 11, 2026

Vitest E2E Scenario Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented Jun 11, 2026

Vitest E2E Scenario Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented Jun 11, 2026

Vitest E2E Scenario Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented Jun 11, 2026

Vitest E2E Scenario Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented Jun 11, 2026

Vitest E2E Scenario Results — ❌ Some jobs failed

Uh oh!

cv left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 11, 2026

jyaunches commented Jun 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading

github-actions Bot commented Jun 11, 2026 •

edited

Loading

github-actions Bot commented Jun 11, 2026 •

edited

Loading

github-actions Bot commented Jun 11, 2026 •

edited

Loading