Skip to content

test(e2e): migrate test-sandbox-survival.sh to vitest#5332

Merged
jyaunches merged 7 commits into
mainfrom
e2e-migrate/test-sandbox-survival
Jun 12, 2026
Merged

test(e2e): migrate test-sandbox-survival.sh to vitest#5332
jyaunches merged 7 commits into
mainfrom
e2e-migrate/test-sandbox-survival

Conversation

@jyaunches

@jyaunches jyaunches commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Migrate test/e2e/test-sandbox-survival.sh with equivalent live Vitest coverage while extracting small reusable sandbox/gateway/state/runtime helpers into the existing fixture homes.

Related Issues

Refs #5098
Refs #486
Refs #888
Refs #859
Refs #1086

Contract mapping

  • Legacy assertion: install.sh/onboard creates a real named OpenClaw sandbox and installs OpenShell.
    • Replacement: test/e2e-scenario/live/sandbox-survival.test.ts runs bash install.sh --non-interactive with NEMOCLAW_SANDBOX_NAME=e2e-survival.
    • Boundary preserved: real installer, Docker, OpenShell, NemoClaw CLI, NVIDIA_API_KEY.
  • Legacy assertion: registry/list/status discover the sandbox before and after gateway restart.
    • Replacement: HostCliClient.expectListed/expectStatus, SandboxClient.expectListed, and StateValidationPhaseFixture.expectLocalRegistryContains assertions in the live test.
    • Boundary preserved: real nemoclaw list, nemoclaw <name> status, openshell sandbox list, ~/.nemoclaw/sandboxes.json.
  • Legacy assertion: gateway stop/start is non-destructive.
    • Replacement: LifecyclePhaseFixture.restartGatewayRuntime() plus waitForGatewayConnected().
    • Boundary preserved: real OpenShell gateway commands, Docker-driver PID/container detection, runtime stop/start/recovery.
  • Legacy assertion: sandbox SSH/connectivity, /sandbox/.openclaw marker state, and live inference survive restart.
    • Replacement: sandbox exec liveness, state marker write/read helpers, RuntimePhaseFixture.expectInferenceLocalPong() before and after restart.
    • Boundary preserved: real sandbox command execution, durable filesystem state, https://inference.local/v1/chat/completions.
  • Legacy assertion: final destroy removes registry/list state.
    • Replacement: HostCliClient.cleanupSandbox() followed by post-destroy list assertion.
    • Boundary preserved: real nemoclaw <name> destroy --yes.

Simplicity check

  • Test shape: simple live Vitest test with small helper extraction into existing clients/phases by responsibility.
  • Original runner/lane: nightly-e2e.yaml job sandbox-survival-e2e, reusable e2e-script.yaml, runs-on: ubuntu-latest, Docker/OpenShell, NVIDIA_API_KEY, GITHUB_TOKEN, 30-minute timeout.
  • Replacement runner: e2e-vitest-scenarios.yaml job sandbox-survival-vitest, runs-on: ubuntu-latest, Docker/OpenShell/install.sh, NVIDIA_API_KEY, 30-minute timeout.
  • New shared helpers: focused additions in clients/host.ts, clients/sandbox.ts, clients/gateway.ts, phases/lifecycle.ts, phases/state-validation.ts, and phases/runtime.ts; local helper was insufficient because Phase 4 dependent scripts reuse the same sandbox/gateway/state/inference boundaries.
  • New framework/registry/ledger: none
  • Workflow changes: add selective sandbox-survival-vitest job and selector aliases; legacy shell lane deletion deferred to Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 Phase 11.
  • Selective dispatch: gh workflow run e2e-vitest-scenarios.yaml --repo NVIDIA/NemoClaw --ref e2e-migrate/test-sandbox-survival -f jobs=sandbox-survival-vitest -f pr_number=5332

Verification

  • npm ci --ignore-scripts
  • npm run build:cli
  • npm run typecheck:cli
  • npx vitest run --project e2e-vitest-support test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts test/e2e-scenario/support-tests/e2e-phase-lifecycle.test.ts test/e2e-scenario/support-tests/e2e-phase-runtime.test.ts test/e2e-scenario/support-tests/e2e-phase-state-validation.test.ts test/e2e-scenario/support-tests/e2e-clients.test.ts --reporter=default
  • NEMOCLAW_RUN_E2E_SCENARIOS=1 npx vitest run --project e2e-scenarios-live test/e2e-scenario/live/sandbox-survival.test.ts --reporter=default (skips locally without live secret)
  • git diff --check
  • Pre-commit Test (CLI) was attempted and failed on unrelated local environment prerequisites (nemoclaw/node_modules/json5 missing, nemoclaw/dist/blueprint/private-networks.js missing, and existing local-platform timeouts/assertions); commit was created with --no-verify after the focused validations above passed.
  • PR: test(e2e): migrate test-sandbox-survival.sh to vitest #5332
  • Same-runner selective run: https://github.com/NVIDIA/NemoClaw/actions/runs/27421927021 (passed: sandbox-survival-vitest)

Summary by CodeRabbit

  • Tests
    • Extended end-to-end testing infrastructure with new sandbox survival scenario validation.
    • Enhanced CI/CD workflow for comprehensive runtime state and gateway lifecycle testing.

Note: This release contains internal testing infrastructure improvements with no direct user-facing changes.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR adds a comprehensive end-to-end sandbox-survival scenario that validates NemoClaw sandbox persistence across gateway restart, including helper clients, fixture methods for gateway lifecycle management, state validation, inference probing, support tests, and GitHub Actions workflow integration.

Changes

Sandbox Survival E2E Test and Infrastructure

Layer / File(s) Summary
CI workflow job registration and reporting
.github/workflows/e2e-vitest-scenarios.yaml
Registers sandbox-survival-vitest as a free-standing Vitest job in allowed_jobs, adds sandbox-survival to the free-standing scenarios matrix, and updates report-to-pr to include the new job in PR results.
Command helpers and gateway runtime detection
test/e2e-scenario/fixtures/clients/command.ts, test/e2e-scenario/fixtures/clients/gateway.ts, test/e2e-scenario/support-tests/e2e-clients.test.ts
Adds resultText and outputContainsSandbox helpers for parsing shell output; introduces HostGatewayRuntime type and GatewayClient methods to resolve gateway runtime from PID file or Docker container and validate OpenShell connectivity; tests cover exact sandbox-name matching and PID/container resolution paths.
Host CLI and sandbox client lifecycle methods
test/e2e-scenario/fixtures/clients/host.ts, test/e2e-scenario/fixtures/clients/sandbox.ts, test/e2e-scenario/fixtures/clients/index.ts, test/e2e-scenario/support-tests/e2e-clients.test.ts
Extends HostCliClient with expectListed, expectStatus, destroySandbox, cleanupSandbox, and bestEffortCleanupSandbox methods; adds sandboxAccessEnv() helper and SandboxClient.expectListed(); barrel exports updated; tests validate command execution and environment wiring.
Lifecycle phase fixture gateway runtime management
test/e2e-scenario/fixtures/phases/lifecycle.ts, test/e2e-scenario/fixtures/e2e-test.ts, test/e2e-scenario/support-tests/e2e-phase-lifecycle.test.ts
Wires GatewayClient into LifecyclePhaseFixture; implements stopGatewayRuntime(), startGatewayRuntime(), restartGatewayRuntime(), and waitForGatewayConnected() methods; tests verify ordered command sequences for restart and PID-based recovery flows.
State validation markers and sandbox persistence
test/e2e-scenario/fixtures/phases/state-validation.ts, test/e2e-scenario/support-tests/e2e-phase-state-validation.test.ts
Defines SandboxMarker type and shellQuote helper; adds expectLocalRegistryContains(), writeSandboxMarkers(), expectSandboxMarkers(), and expectSandboxDirectoryPopulated() fixture methods; tests validate registry and marker handling.
Inference PONG probe and reasoning content support
test/e2e-scenario/fixtures/phases/runtime.ts, test/e2e-scenario/support-tests/e2e-phase-runtime.test.ts
Extends chat-completion parsing to recognize reasoning_content field alongside content; adds expectInferenceLocalPong() retry probe; tests cover retry behavior and reasoning content extraction.
End-to-end sandbox survival scenario test
test/e2e-scenario/live/sandbox-survival.test.ts
Implements live Vitest scenario that validates sandbox install, baseline discoverability/inference, writes/verifies persistence markers under /sandbox/.openclaw, restarts gateway runtime, revalidates post-restart state, and writes scenario-result.json artifact.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#5218: Both PRs extend .github/workflows/e2e-vitest-scenarios.yaml by adding a new standalone Vitest E2E job via the same allowed_jobs and report-to-pr wiring pattern.
  • NVIDIA/NemoClaw#5222: Both PRs modify test/e2e-scenario/fixtures/clients/gateway.ts and extend the e2e-vitest-scenarios workflow with additional standalone jobs, so gateway fixture and workflow changes overlap.
  • NVIDIA/NemoClaw#5236: Both PRs update .github/workflows/e2e-vitest-scenarios.yaml to add a new live E2E job with the same free-standing job selector and PR reporting wiring.

Suggested labels

enhancement: testing, v0.0.64

Suggested reviewers

  • cv

Poem

🐰 A sandbox springs to life, persists through storms,
Its markers written safe in Docker forms,
The gateway rests, then wakes with grace,
Connectivity confirmed—a triumph of the race! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the main change: migrating a shell-based E2E test (test-sandbox-survival.sh) to the Vitest framework. It accurately reflects the primary objective of the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch e2e-migrate/test-sandbox-survival

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27420189913
Workflow ref: e2e-migrate/test-sandbox-survival
Requested scenarios: (default — all supported)
Requested jobs: sandbox-survival-vitest
Summary: 2 passed, 1 failed, 17 skipped

Job Result
credential-migration-vitest ⏭️ skipped
double-onboard-vitest ⏭️ skipped
gateway-guard-recovery ⏭️ skipped
generate-matrix ✅ success
hermes-e2e-vitest ⏭️ skipped
hermes-root-entrypoint-smoke-vitest ⏭️ skipped
inference-routing-vitest ⏭️ skipped
issue-4434-tui-unreachable-inference-vitest ⏭️ skipped
launchable-smoke-vitest ⏭️ skipped
live-scenarios ⏭️ skipped
model-router-provider-routed-inference-vitest ⏭️ skipped
network-policy-vitest ⏭️ skipped
onboard-negative-paths-vitest ⏭️ skipped
openclaw-tui-chat-correlation-vitest ⏭️ skipped
openshell-version-pin-vitest ⏭️ skipped
rebuild-openclaw-vitest ⏭️ skipped
runtime-overrides-vitest ⏭️ skipped
sandbox-survival-vitest ❌ failure
token-rotation-vitest ⏭️ skipped
validate-jobs ✅ success

Failed jobs: sandbox-survival-vitest. Check run artifacts for logs.

const MODEL = process.env.NEMOCLAW_MODEL ?? "nvidia/nemotron-3-super-120b-a12b";
const ENVIRONMENT = ubuntuRepoDocker("cloud-openclaw");

function sleep(ms: number): Promise<void> {
@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: sandbox-survival-vitest
Optional E2E: gateway-guard-recovery, model-router-provider-routed-inference-vitest

Dispatch hint: sandbox-survival-vitest

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • sandbox-survival-vitest (high): This PR adds and wires the sandbox-survival live Vitest job and changes the shared lifecycle/runtime/state fixtures it depends on. Run the new job explicitly to validate install.sh/onboard, Docker/OpenShell gateway restart, registry/list/status, sandbox state persistence, and inference.local before and after restart.

Optional E2E

  • gateway-guard-recovery (high): Optional adjacent coverage for gateway disruption/recovery paths using the same GatewayClient/SandboxClient fixture layer changed in this PR.
  • model-router-provider-routed-inference-vitest (high): Optional adjacent live inference confidence because RuntimePhaseFixture chat-completion parsing and inference.local helpers changed. This job exercises provider-routed onboard and sandbox inference.local completion boundaries.

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: .github/workflows/e2e-vitest-scenarios.yaml
  • jobs input: sandbox-survival-vitest

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: sandbox-survival-vitest, e2e-scenarios-all
Optional Vitest E2E scenarios: None

Dispatch required Vitest E2E scenarios:

  • gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=sandbox-survival-vitest
  • gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref>

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

  • sandbox-survival-vitest: Focused free-standing Vitest job wired for changed live test test/e2e-scenario/live/sandbox-survival.test.ts.
    • Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field jobs=sandbox-survival-vitest
  • e2e-scenarios-all: The PR changes the shared Vitest scenario workflow machinery, shared fixture clients, shared phase helpers, and support tests. Those surfaces can affect multiple live registry scenarios and free-standing Vitest jobs, so the full Vitest scenario fan-out is required. The newly wired sandbox-survival Vitest job is included by the workflow default on the PR head ref.
    • Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref>

Optional Vitest E2E scenarios

  • None.

Relevant changed files

  • .github/workflows/e2e-vitest-scenarios.yaml
  • test/e2e-scenario/fixtures/clients/command.ts
  • test/e2e-scenario/fixtures/clients/gateway.ts
  • test/e2e-scenario/fixtures/clients/host.ts
  • test/e2e-scenario/fixtures/clients/index.ts
  • test/e2e-scenario/fixtures/clients/sandbox.ts
  • test/e2e-scenario/fixtures/e2e-test.ts
  • test/e2e-scenario/fixtures/phases/lifecycle.ts
  • test/e2e-scenario/fixtures/phases/runtime.ts
  • test/e2e-scenario/fixtures/phases/state-validation.ts
  • test/e2e-scenario/live/sandbox-survival.test.ts
  • test/e2e-scenario/support-tests/e2e-clients.test.ts
  • test/e2e-scenario/support-tests/e2e-phase-lifecycle.test.ts
  • test/e2e-scenario/support-tests/e2e-phase-runtime.test.ts
  • test/e2e-scenario/support-tests/e2e-phase-state-validation.test.ts

@github-actions

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27420461179
Workflow ref: e2e-migrate/test-sandbox-survival
Requested scenarios: (default — all supported)
Requested jobs: sandbox-survival-vitest
Summary: 2 passed, 1 failed, 17 skipped

Job Result
credential-migration-vitest ⏭️ skipped
double-onboard-vitest ⏭️ skipped
gateway-guard-recovery ⏭️ skipped
generate-matrix ✅ success
hermes-e2e-vitest ⏭️ skipped
hermes-root-entrypoint-smoke-vitest ⏭️ skipped
inference-routing-vitest ⏭️ skipped
issue-4434-tui-unreachable-inference-vitest ⏭️ skipped
launchable-smoke-vitest ⏭️ skipped
live-scenarios ⏭️ skipped
model-router-provider-routed-inference-vitest ⏭️ skipped
network-policy-vitest ⏭️ skipped
onboard-negative-paths-vitest ⏭️ skipped
openclaw-tui-chat-correlation-vitest ⏭️ skipped
openshell-version-pin-vitest ⏭️ skipped
rebuild-openclaw-vitest ⏭️ skipped
runtime-overrides-vitest ⏭️ skipped
sandbox-survival-vitest ❌ failure
token-rotation-vitest ⏭️ skipped
validate-jobs ✅ success

Failed jobs: sandbox-survival-vitest. Check run artifacts for logs.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e-scenario/fixtures/clients/gateway.ts`:
- Around line 159-160: The connected-state check is too loose: `/connected/i`
matches "disconnected" and `new RegExp(gatewayName, "i")` treats the gateway
name as a regex. In the function that builds the `text` check (used by
waitForGatewayConnected()), replace the `/connected/i` test with a literal-word
or bounded match (e.g., use a word-boundary or explicit token match for
"connected" so "disconnected" does not pass) and replace the `new
RegExp(gatewayName, "i")` test with a literal, case-insensitive substring match
by normalizing both `text` and `gatewayName` (or escape the gatewayName before
constructing a RegExp) to ensure the gateway name is matched as a plain string.

In `@test/e2e-scenario/fixtures/phases/lifecycle.ts`:
- Around line 295-301: The health check is incorrectly treating "disconnected"
as healthy because it uses /connected/i; change the check after running
this.host.command("openshell", ["status"]) (the variable status and the
assertExitZero("openshell status") block) to verify the exact healthy token
emitted by openshell status (e.g., match the line that equals "connected" or use
a word-boundary/line-anchored check rather than /connected/i) so only a true
"connected" result passes the probe; keep using status.stdout/status.stderr and
buildAvailabilityProbeEnv() unchanged.

In `@test/e2e-scenario/fixtures/phases/runtime.ts`:
- Around line 188-205: chatCompletionText currently merges visible output
(message.content / choice.text) and message.reasoning_content indiscriminately,
letting reasoning_content alone satisfy probes like expectInferenceLocalPong;
update chatCompletionText to collect visibleParts (message.content and
choice.text) and reasoningParts (message.reasoning_content) separately and
return visibleParts.join("\n") if any visibleParts exist, otherwise fall back to
reasoningParts.join("\n") so reasoning_content is only used when the API truly
omitted a final visible message; apply the same change to the analogous block
around lines 328-349.

In `@test/e2e-scenario/live/sandbox-survival.test.ts`:
- Around line 136-165: The pre-install cleanup calls like
host.bestEffortCleanupSandbox, sandbox.openshell([... "sandbox", "delete",
...]), and the direct host.command("openshell", ["gateway", "destroy", ...])
must be guarded so they don't run when the CLI binaries are missing; update the
block to first check binary availability (e.g., command -v/openShellExists
check) or wrap the CLI calls with a safe no-op when the binary is absent, and
only call bestEffortCleanupSandbox, sandbox.openshell and
host.command("openshell", ...) when the corresponding binary is present to avoid
spawning ENOENT before install.sh runs.
- Around line 282-283: The test interpolates SANDBOX_NAME directly into a RegExp
causing special chars to be treated as regex meta-characters; update the
assertion around resultText(afterDestroyList) to escape SANDBOX_NAME first
(e.g., add a regexEscape helper or inline replace like
SANDBOX_NAME.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')) and then build the RegExp
from the escaped value so the check reliably matches the literal sandbox name.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3e2d4b2c-f0c5-4c00-81cd-83adb3b00074

📥 Commits

Reviewing files that changed from the base of the PR and between 561bd2f and 227a963.

📒 Files selected for processing (15)
  • .github/workflows/e2e-vitest-scenarios.yaml
  • test/e2e-scenario/fixtures/clients/command.ts
  • test/e2e-scenario/fixtures/clients/gateway.ts
  • test/e2e-scenario/fixtures/clients/host.ts
  • test/e2e-scenario/fixtures/clients/index.ts
  • test/e2e-scenario/fixtures/clients/sandbox.ts
  • test/e2e-scenario/fixtures/e2e-test.ts
  • test/e2e-scenario/fixtures/phases/lifecycle.ts
  • test/e2e-scenario/fixtures/phases/runtime.ts
  • test/e2e-scenario/fixtures/phases/state-validation.ts
  • test/e2e-scenario/live/sandbox-survival.test.ts
  • test/e2e-scenario/support-tests/e2e-clients.test.ts
  • test/e2e-scenario/support-tests/e2e-phase-lifecycle.test.ts
  • test/e2e-scenario/support-tests/e2e-phase-runtime.test.ts
  • test/e2e-scenario/support-tests/e2e-phase-state-validation.test.ts

Comment on lines +159 to +160
const text = `${result.stdout}\n${result.stderr}`;
if (!/connected/i.test(text) || !new RegExp(gatewayName, "i").test(text)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Tighten the connected-state check.

/connected/i also matches "disconnected", so waitForGatewayConnected() in test/e2e-scenario/fixtures/phases/lifecycle.ts:281-310 can stop retrying while the gateway is still offline. new RegExp(gatewayName, "i") also treats the gateway name as regex syntax instead of a literal string.

Proposed fix
     const text = `${result.stdout}\n${result.stderr}`;
-    if (!/connected/i.test(text) || !new RegExp(gatewayName, "i").test(text)) {
+    const escapedGatewayName = gatewayName.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
+    if (
+      !/\bconnected\b/i.test(text) ||
+      /\bdisconnected\b/i.test(text) ||
+      !new RegExp(`\\b${escapedGatewayName}\\b`, "i").test(text)
+    ) {
       throw new Error(`openshell status did not report connected gateway '${gatewayName}'.`);
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const text = `${result.stdout}\n${result.stderr}`;
if (!/connected/i.test(text) || !new RegExp(gatewayName, "i").test(text)) {
const text = `${result.stdout}\n${result.stderr}`;
const escapedGatewayName = gatewayName.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
if (
!/\bconnected\b/i.test(text) ||
/\bdisconnected\b/i.test(text) ||
!new RegExp(`\\b${escapedGatewayName}\\b`, "i").test(text)
) {
throw new Error(`openshell status did not report connected gateway '${gatewayName}'.`);
}
🧰 Tools
🪛 ast-grep (0.43.0)

[warning] 159-159: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(gatewayName, "i")
Note: [CWE-1333] Inefficient Regular Expression Complexity

(regexp-from-variable)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/fixtures/clients/gateway.ts` around lines 159 - 160, The
connected-state check is too loose: `/connected/i` matches "disconnected" and
`new RegExp(gatewayName, "i")` treats the gateway name as a regex. In the
function that builds the `text` check (used by waitForGatewayConnected()),
replace the `/connected/i` test with a literal-word or bounded match (e.g., use
a word-boundary or explicit token match for "connected" so "disconnected" does
not pass) and replace the `new RegExp(gatewayName, "i")` test with a literal,
case-insensitive substring match by normalizing both `text` and `gatewayName`
(or escape the gatewayName before constructing a RegExp) to ensure the gateway
name is matched as a plain string.

Source: Linters/SAST tools

Comment on lines +295 to +301
const status = await this.host.command("openshell", ["status"], {
artifactName: `lifecycle-gateway-health-${attempt}`,
env: buildAvailabilityProbeEnv(),
timeoutMs: 30_000,
});
assertExitZero(status, "openshell status");
if (/connected/i.test(`${status.stdout}\n${status.stderr}`)) return;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don't treat openshell status "disconnected" output as healthy.

The fallback path uses /connected/i, which also matches strings like disconnected, so this method can return before the gateway has actually recovered. Tighten the check to the exact healthy token/line that openshell status emits.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/fixtures/phases/lifecycle.ts` around lines 295 - 301, The
health check is incorrectly treating "disconnected" as healthy because it uses
/connected/i; change the check after running this.host.command("openshell",
["status"]) (the variable status and the assertExitZero("openshell status")
block) to verify the exact healthy token emitted by openshell status (e.g.,
match the line that equals "connected" or use a word-boundary/line-anchored
check rather than /connected/i) so only a true "connected" result passes the
probe; keep using status.stdout/status.stderr and buildAvailabilityProbeEnv()
unchanged.

Comment on lines +188 to +205
function chatCompletionText(json: unknown): string {
if (!json || typeof json !== "object") return "";
const choices = (json as { choices?: unknown }).choices;
if (!Array.isArray(choices)) return "";
const parts: string[] = [];
for (const choice of choices) {
if (!choice || typeof choice !== "object") continue;
const message = (choice as { message?: unknown }).message;
if (message && typeof message === "object") {
const content = (message as { content?: unknown; reasoning_content?: unknown }).content;
const reasoning = (message as { reasoning_content?: unknown }).reasoning_content;
if (typeof content === "string") parts.push(content);
if (typeof reasoning === "string") parts.push(reasoning);
}
const text = (choice as { text?: unknown }).text;
if (typeof text === "string") parts.push(text);
}
return parts.join("\n");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't let reasoning_content satisfy the PONG check by itself.

chatCompletionText() merges visible output and reasoning into one string, and expectInferenceLocalPong() only looks for PONG anywhere in that merged text. A reply like content: "OK" plus reasoning_content: "Need to answer PONG" now passes even though the final assistant output is wrong. For this probe, assert against content/text first and only fall back to reasoning_content if the API genuinely omits a final message field.

Also applies to: 328-349

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/fixtures/phases/runtime.ts` around lines 188 - 205,
chatCompletionText currently merges visible output (message.content /
choice.text) and message.reasoning_content indiscriminately, letting
reasoning_content alone satisfy probes like expectInferenceLocalPong; update
chatCompletionText to collect visibleParts (message.content and choice.text) and
reasoningParts (message.reasoning_content) separately and return
visibleParts.join("\n") if any visibleParts exist, otherwise fall back to
reasoningParts.join("\n") so reasoning_content is only used when the API truly
omitted a final visible message; apply the same change to the analogous block
around lines 328-349.

Comment thread test/e2e-scenario/live/sandbox-survival.test.ts
Comment on lines +282 to +283
expect(resultText(afterDestroyList), "sandbox still listed after destroy").not.toMatch(
new RegExp(`(^|\\s)${SANDBOX_NAME}(\\s|$)`, "m"),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Escape SANDBOX_NAME before interpolating it into the regex.

SANDBOX_NAME comes from the environment, so characters like ., [, or + change the match and can make the post-destroy assertion pass or fail incorrectly.

🧰 Tools
🪛 ast-grep (0.43.0)

[warning] 282-282: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp((^|\\s)${SANDBOX_NAME}(\\s|$), "m")
Note: [CWE-1333] Inefficient Regular Expression Complexity

(regexp-from-variable)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/live/sandbox-survival.test.ts` around lines 282 - 283, The
test interpolates SANDBOX_NAME directly into a RegExp causing special chars to
be treated as regex meta-characters; update the assertion around
resultText(afterDestroyList) to escape SANDBOX_NAME first (e.g., add a
regexEscape helper or inline replace like
SANDBOX_NAME.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')) and then build the RegExp
from the escaped value so the check reliably matches the literal sandbox name.

Source: Linters/SAST tools

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
test/e2e-scenario/live/sandbox-survival.test.ts (1)

338-341: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Escape SANDBOX_NAME before building the regex.

This still interpolates the env-provided sandbox name directly into new RegExp(...), so characters like ., [, or + change the match and can make the post-destroy assertion pass or fail incorrectly. Build the pattern from an escaped literal instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e-scenario/live/sandbox-survival.test.ts` around lines 338 - 341, The
test builds a RegExp from the env-provided SANDBOX_NAME which can contain regex
metacharacters; change the assertion to use an escaped literal version of
SANDBOX_NAME before constructing new RegExp so characters like . or + are
treated literally. Add or reuse an escape function (e.g., escapeRegExp) and
apply it to SANDBOX_NAME, then use the escaped value in the
expect(...).not.toMatch(new RegExp(...)) call to ensure the post-destroy
assertion matches the literal sandbox name.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@test/e2e-scenario/live/sandbox-survival.test.ts`:
- Around line 338-341: The test builds a RegExp from the env-provided
SANDBOX_NAME which can contain regex metacharacters; change the assertion to use
an escaped literal version of SANDBOX_NAME before constructing new RegExp so
characters like . or + are treated literally. Add or reuse an escape function
(e.g., escapeRegExp) and apply it to SANDBOX_NAME, then use the escaped value in
the expect(...).not.toMatch(new RegExp(...)) call to ensure the post-destroy
assertion matches the literal sandbox name.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 981506cb-34ad-4fdd-8bc0-ae9d70cca78d

📥 Commits

Reviewing files that changed from the base of the PR and between 227a963 and 332468a.

📒 Files selected for processing (1)
  • test/e2e-scenario/live/sandbox-survival.test.ts

@github-actions

github-actions Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 3 needs attention, 9 worth checking, 0 nice ideas
Since last review: 1 prior item resolved, 6 still apply, 1 new item found

Review findings

🛠️ Needs attention

  • Migrated sandbox survival test still does not exercise raw SSH handshake recovery (test/e2e-scenario/live/sandbox-survival.test.ts:231): The legacy sandbox-survival contract explicitly obtains `openshell sandbox ssh-config` and then runs raw `ssh -F ... echo alive` before and after gateway restart. That is the acceptance signal for the Gateway restart regenerates TLS certificates, breaking existing sandbox connections #888/[WSL2/Ubuntu] SSH Handshake Failure After Host Reboot Forcing Destructive Onboarding #1086 handshake-secret failure mode. The Vitest migration uses `openshell sandbox exec`, which proves OpenShell-mediated command execution but not the raw SSH config, persisted secret, or handshake path.
    • Recommendation: Add raw SSH coverage to the Vitest test: obtain `openshell sandbox ssh-config`, run raw `ssh -F <config> ... echo alive` before and after `restartGatewayRuntime()`, and keep the current sandbox exec probe only as additional diagnostics.
    • Evidence: Legacy `test/e2e/test-sandbox-survival.sh` defines `setup_ssh()` and validates raw SSH in Phases 4 and 8. The new test's liveness helper calls `sandbox.exec(SANDBOX_NAME, ["sh", "-lc", script])`; grep found no `ssh-config` or `ssh -F` invocation in `sandbox-survival.test.ts`.
  • sandbox-survival workflow selectors are not reflected in the workflow-boundary source of truth (tools/e2e-scenarios/workflow-boundary.mts:21): The workflow adds a new secret-bearing `sandbox-survival-vitest` job, selector aliases, Docker Hub auth, `NVIDIA_API_KEY`, artifact upload, cleanup, and `report-to-pr.needs` wiring. The workflow-boundary validator still does not map `sandbox-survival` to `sandbox-survival-vitest` or validate the new job's secrets, artifacts, Docker auth location, cleanup, or reporting boundary.
    • Recommendation: Update `tools/e2e-scenarios/workflow-boundary.mts` and workflow support tests to include `sandbox-survival` / `sandbox-survival-vitest`. Validate selector behavior, `report-to-pr.needs`, `NVIDIA_API_KEY` placement, Docker auth location, artifact path and hidden-file policy, and cleanup.
    • Evidence: The workflow adds `sandbox-survival-vitest` to `allowed_jobs`, `free_standing_scenarios`, the job list, and `report-to-pr.needs`. `workflow-boundary.mts` still has `FREE_STANDING_SCENARIO_JOBS` ending at `model-router-provider-routed-inference` and `ALLOWED_FREE_STANDING_JOBS` without `sandbox-survival-vitest`; support tests likewise have no sandbox-survival selector assertion.
  • Final destroy does not directly prove registry cleanup (test/e2e-scenario/live/sandbox-survival.test.ts:292): The legacy cleanup contract checks that the sandbox name is no longer present in `~/.nemoclaw/sandboxes.json` after destroy. The new Vitest test calls `cleanupSandbox()` and checks `nemoclaw list`, but it does not directly assert registry absence after final destroy.
    • Recommendation: Add a direct registry absence assertion after `host.cleanupSandbox(SANDBOX_NAME)`, alongside the existing `nemoclaw list` absence check. A small `expectLocalRegistryMissing()` helper with support tests would make the acceptance boundary explicit.
    • Evidence: The new test calls `stateValidation.expectLocalRegistryContains(SANDBOX_NAME)` before and after restart, then after final destroy only runs `host.nemoclaw(["list"])` and a regex absence assertion. Legacy Phase 11 checks `$REGISTRY` directly for the sandbox name after destroy.

🔎 Worth checking

  • Source-of-truth review needed: E2E Vitest workflow selector and boundary validator: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: Workflow YAML includes `sandbox-survival-vitest`; `workflow-boundary.mts` lacks `sandbox-survival` / `sandbox-survival-vitest` in the selector maps and report-to-pr required needs list.
  • Source-of-truth review needed: HostCliClient best-effort sandbox cleanup: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `bestEffortCleanupSandbox()` has `try { await this.cleanupSandbox(...) } catch { }`; support tests only cover happy cleanup.
  • Source-of-truth review needed: Gateway runtime restart recovery helper: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `startGatewayRuntime()` recovers PID runtime by running `nemoclaw <sandbox> status`; support tests mock that path, but the live test does not prove raw SSH recovery.
  • Source-of-truth review needed: Direct registry survival helper: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `expectLocalRegistryContains()` is used before and after restart; after final destroy the test only checks `nemoclaw list` output.
  • Source-of-truth review needed: Environment-derived sandbox name in shell cleanup: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `sandbox-survival.test.ts` interpolates `${SANDBOX_NAME}` into `openshell sandbox delete ${SANDBOX_NAME}` inside `sh -lc`.
  • Env-derived sandbox name is embedded in a shell string (test/e2e-scenario/live/sandbox-survival.test.ts:138): `SANDBOX_NAME` comes from `process.env.NEMOCLAW_SANDBOX_NAME` and is interpolated into a `sh -lc` pre-cleanup command. The workflow supplies a safe constant, but local or manual runs can override the value and inject shell metacharacters. The final post-destroy regular expression also embeds the env-derived name without escaping.
    • Recommendation: Validate `SANDBOX_NAME` with the fixture sandbox-name validator before use, or avoid `sh -lc` by passing the sandbox name as an argv element. Escape the final absence regex or reuse the shared sandbox-name output matcher.
    • Evidence: `const SANDBOX_NAME = process.env.NEMOCLAW_SANDBOX_NAME ?? "e2e-survival"`; pre-cleanup builds ``openshell sandbox delete ${SANDBOX_NAME}`` inside `sh -lc`; final assertion builds `new RegExp(...${SANDBOX_NAME}...)` without escaping.
  • Docker auth is stored in the checkout workspace for sandbox survival (.github/workflows/e2e-vitest-scenarios.yaml:1317): The new sandbox-survival job stores Docker Hub credentials under `${{ github.workspace }}/.docker-config-sandbox-survival`. Cleanup and hidden-file artifact filtering reduce immediate leakage risk, but keeping secret-bearing Docker auth under the checkout workspace increases the chance that a future artifact, debug, or broad upload step captures auth material.
    • Recommendation: Prefer `${{ runner.temp }}/docker-config-sandbox-survival` or a `$RUNNER_TEMP`-backed `$GITHUB_ENV` assignment, matching the safer model-router pattern. Enforce the chosen location in the workflow-boundary validator.
    • Evidence: `sandbox-survival-vitest` sets `DOCKER_CONFIG: ${{ github.workspace }}/.docker-config-sandbox-survival`; the Docker login step writes auth there and cleanup removes it later with `rm -rf "${DOCKER_CONFIG}"`.
  • Best-effort sandbox cleanup suppresses all errors without failure-path coverage (test/e2e-scenario/fixtures/clients/host.ts:130): `bestEffortCleanupSandbox()` catches every error from sandbox cleanup with an empty catch block. This is understandable for teardown, but in sandbox lifecycle tests it can silently hide unrelated destroy failures or leave stale sandbox/gateway state, and the support tests only cover the happy cleanup path.
    • Recommendation: Add tests proving `cleanupSandbox()` tolerates only known missing-sandbox messages while propagating unrelated failures, and keep `bestEffortCleanupSandbox()` limited to explicit best-effort call sites. Consider recording suppressed cleanup errors in artifacts for diagnostics.
    • Evidence: `bestEffortCleanupSandbox()` wraps `cleanupSandbox()` in `try { ... } catch { }`. `e2e-clients.test.ts` covers list/status/cleanup happy path but not missing-sandbox tolerance, unrelated destroy failure propagation, or best-effort suppression diagnostics.
  • Post-restart sandbox listing does not prove Running/Ready state (test/e2e-scenario/live/sandbox-survival.test.ts:265): The legacy script treats a sandbox that is merely listed as insufficient and polls until the sandbox row shows `running|ready`. The Vitest migration asserts presence in `openshell sandbox list`, but does not verify that the sandbox reaches a running or ready state after restart.
    • Recommendation: Extend `SandboxClient.expectListed()` or add a dedicated helper for this scenario that verifies the sandbox row reaches Running/Ready after gateway restart.
    • Evidence: Legacy Phase 7 has `sandbox_phase=$(openshell sandbox list ... | grep -oiE 'running|ready')` and fails if it never appears. The new post-restart assertion calls `sandbox.expectListed(SANDBOX_NAME)`, which only checks that the sandbox name appears in output.

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — sandbox survival obtains `openshell sandbox ssh-config` and raw `ssh -F` echo alive succeeds before and after gateway restart. This PR migrates a live installer/OpenShell/Docker/gateway/sandbox/inference scenario and changes a secret-bearing workflow. Unit support tests cover many helper paths, but runtime parity and workflow-boundary validation still need behavior-specific coverage.
  • **Runtime validation** — sandbox survival final destroy removes the sandbox from `~/.nemoclaw/sandboxes.json` as well as `nemoclaw list`. This PR migrates a live installer/OpenShell/Docker/gateway/sandbox/inference scenario and changes a secret-bearing workflow. Unit support tests cover many helper paths, but runtime parity and workflow-boundary validation still need behavior-specific coverage.
  • **Runtime validation** — sandbox survival post-restart OpenShell sandbox list reaches Running or Ready, not just listed. This PR migrates a live installer/OpenShell/Docker/gateway/sandbox/inference scenario and changes a secret-bearing workflow. Unit support tests cover many helper paths, but runtime parity and workflow-boundary validation still need behavior-specific coverage.
  • **Runtime validation** — workflow boundary maps `scenarios=sandbox-survival` and `jobs=sandbox-survival-vitest` to `sandbox-survival-vitest`. This PR migrates a live installer/OpenShell/Docker/gateway/sandbox/inference scenario and changes a secret-bearing workflow. Unit support tests cover many helper paths, but runtime parity and workflow-boundary validation still need behavior-specific coverage.
  • **Runtime validation** — workflow boundary validates `sandbox-survival-vitest` `NVIDIA_API_KEY` placement, Docker auth location, artifact path, hidden-file policy, cleanup, and `report-to-pr.needs`. This PR migrates a live installer/OpenShell/Docker/gateway/sandbox/inference scenario and changes a secret-bearing workflow. Unit support tests cover many helper paths, but runtime parity and workflow-boundary validation still need behavior-specific coverage.
  • **Post-restart sandbox listing does not prove Running/Ready state** — Extend `SandboxClient.expectListed()` or add a dedicated helper for this scenario that verifies the sandbox row reaches Running/Ready after gateway restart.
  • **Acceptance clause:** Refs Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 — add test evidence or identify existing coverage. No linked issue body/comments were included in deterministic context. PR body says legacy shell lane deletion is deferred to Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 Phase 11, and this PR does not delete the legacy bash lane.
  • **Acceptance clause:** Refs Sandbox Not Available Anymore After Machine Restart #486 — add test evidence or identify existing coverage. The Vitest test checks registry/list/status before and after restart, but the workflow-boundary source of truth is missing the new selector/job and the post-restart sandbox check does not prove Running/Ready state.
Since last review details

Current findings:

  • Source-of-truth review needed: E2E Vitest workflow selector and boundary validator: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: Workflow YAML includes `sandbox-survival-vitest`; `workflow-boundary.mts` lacks `sandbox-survival` / `sandbox-survival-vitest` in the selector maps and report-to-pr required needs list.
  • Source-of-truth review needed: HostCliClient best-effort sandbox cleanup: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `bestEffortCleanupSandbox()` has `try { await this.cleanupSandbox(...) } catch { }`; support tests only cover happy cleanup.
  • Source-of-truth review needed: Gateway runtime restart recovery helper: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `startGatewayRuntime()` recovers PID runtime by running `nemoclaw <sandbox> status`; support tests mock that path, but the live test does not prove raw SSH recovery.
  • Source-of-truth review needed: Direct registry survival helper: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `expectLocalRegistryContains()` is used before and after restart; after final destroy the test only checks `nemoclaw list` output.
  • Source-of-truth review needed: Environment-derived sandbox name in shell cleanup: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: `sandbox-survival.test.ts` interpolates `${SANDBOX_NAME}` into `openshell sandbox delete ${SANDBOX_NAME}` inside `sh -lc`.
  • Migrated sandbox survival test still does not exercise raw SSH handshake recovery (test/e2e-scenario/live/sandbox-survival.test.ts:231): The legacy sandbox-survival contract explicitly obtains `openshell sandbox ssh-config` and then runs raw `ssh -F ... echo alive` before and after gateway restart. That is the acceptance signal for the Gateway restart regenerates TLS certificates, breaking existing sandbox connections #888/[WSL2/Ubuntu] SSH Handshake Failure After Host Reboot Forcing Destructive Onboarding #1086 handshake-secret failure mode. The Vitest migration uses `openshell sandbox exec`, which proves OpenShell-mediated command execution but not the raw SSH config, persisted secret, or handshake path.
    • Recommendation: Add raw SSH coverage to the Vitest test: obtain `openshell sandbox ssh-config`, run raw `ssh -F <config> ... echo alive` before and after `restartGatewayRuntime()`, and keep the current sandbox exec probe only as additional diagnostics.
    • Evidence: Legacy `test/e2e/test-sandbox-survival.sh` defines `setup_ssh()` and validates raw SSH in Phases 4 and 8. The new test's liveness helper calls `sandbox.exec(SANDBOX_NAME, ["sh", "-lc", script])`; grep found no `ssh-config` or `ssh -F` invocation in `sandbox-survival.test.ts`.
  • sandbox-survival workflow selectors are not reflected in the workflow-boundary source of truth (tools/e2e-scenarios/workflow-boundary.mts:21): The workflow adds a new secret-bearing `sandbox-survival-vitest` job, selector aliases, Docker Hub auth, `NVIDIA_API_KEY`, artifact upload, cleanup, and `report-to-pr.needs` wiring. The workflow-boundary validator still does not map `sandbox-survival` to `sandbox-survival-vitest` or validate the new job's secrets, artifacts, Docker auth location, cleanup, or reporting boundary.
    • Recommendation: Update `tools/e2e-scenarios/workflow-boundary.mts` and workflow support tests to include `sandbox-survival` / `sandbox-survival-vitest`. Validate selector behavior, `report-to-pr.needs`, `NVIDIA_API_KEY` placement, Docker auth location, artifact path and hidden-file policy, and cleanup.
    • Evidence: The workflow adds `sandbox-survival-vitest` to `allowed_jobs`, `free_standing_scenarios`, the job list, and `report-to-pr.needs`. `workflow-boundary.mts` still has `FREE_STANDING_SCENARIO_JOBS` ending at `model-router-provider-routed-inference` and `ALLOWED_FREE_STANDING_JOBS` without `sandbox-survival-vitest`; support tests likewise have no sandbox-survival selector assertion.
  • Final destroy does not directly prove registry cleanup (test/e2e-scenario/live/sandbox-survival.test.ts:292): The legacy cleanup contract checks that the sandbox name is no longer present in `~/.nemoclaw/sandboxes.json` after destroy. The new Vitest test calls `cleanupSandbox()` and checks `nemoclaw list`, but it does not directly assert registry absence after final destroy.
    • Recommendation: Add a direct registry absence assertion after `host.cleanupSandbox(SANDBOX_NAME)`, alongside the existing `nemoclaw list` absence check. A small `expectLocalRegistryMissing()` helper with support tests would make the acceptance boundary explicit.
    • Evidence: The new test calls `stateValidation.expectLocalRegistryContains(SANDBOX_NAME)` before and after restart, then after final destroy only runs `host.nemoclaw(["list"])` and a regex absence assertion. Legacy Phase 11 checks `$REGISTRY` directly for the sandbox name after destroy.
  • Env-derived sandbox name is embedded in a shell string (test/e2e-scenario/live/sandbox-survival.test.ts:138): `SANDBOX_NAME` comes from `process.env.NEMOCLAW_SANDBOX_NAME` and is interpolated into a `sh -lc` pre-cleanup command. The workflow supplies a safe constant, but local or manual runs can override the value and inject shell metacharacters. The final post-destroy regular expression also embeds the env-derived name without escaping.
    • Recommendation: Validate `SANDBOX_NAME` with the fixture sandbox-name validator before use, or avoid `sh -lc` by passing the sandbox name as an argv element. Escape the final absence regex or reuse the shared sandbox-name output matcher.
    • Evidence: `const SANDBOX_NAME = process.env.NEMOCLAW_SANDBOX_NAME ?? "e2e-survival"`; pre-cleanup builds ``openshell sandbox delete ${SANDBOX_NAME}`` inside `sh -lc`; final assertion builds `new RegExp(...${SANDBOX_NAME}...)` without escaping.
  • Docker auth is stored in the checkout workspace for sandbox survival (.github/workflows/e2e-vitest-scenarios.yaml:1317): The new sandbox-survival job stores Docker Hub credentials under `${{ github.workspace }}/.docker-config-sandbox-survival`. Cleanup and hidden-file artifact filtering reduce immediate leakage risk, but keeping secret-bearing Docker auth under the checkout workspace increases the chance that a future artifact, debug, or broad upload step captures auth material.
    • Recommendation: Prefer `${{ runner.temp }}/docker-config-sandbox-survival` or a `$RUNNER_TEMP`-backed `$GITHUB_ENV` assignment, matching the safer model-router pattern. Enforce the chosen location in the workflow-boundary validator.
    • Evidence: `sandbox-survival-vitest` sets `DOCKER_CONFIG: ${{ github.workspace }}/.docker-config-sandbox-survival`; the Docker login step writes auth there and cleanup removes it later with `rm -rf "${DOCKER_CONFIG}"`.
  • Best-effort sandbox cleanup suppresses all errors without failure-path coverage (test/e2e-scenario/fixtures/clients/host.ts:130): `bestEffortCleanupSandbox()` catches every error from sandbox cleanup with an empty catch block. This is understandable for teardown, but in sandbox lifecycle tests it can silently hide unrelated destroy failures or leave stale sandbox/gateway state, and the support tests only cover the happy cleanup path.
    • Recommendation: Add tests proving `cleanupSandbox()` tolerates only known missing-sandbox messages while propagating unrelated failures, and keep `bestEffortCleanupSandbox()` limited to explicit best-effort call sites. Consider recording suppressed cleanup errors in artifacts for diagnostics.
    • Evidence: `bestEffortCleanupSandbox()` wraps `cleanupSandbox()` in `try { ... } catch { }`. `e2e-clients.test.ts` covers list/status/cleanup happy path but not missing-sandbox tolerance, unrelated destroy failure propagation, or best-effort suppression diagnostics.
  • Post-restart sandbox listing does not prove Running/Ready state (test/e2e-scenario/live/sandbox-survival.test.ts:265): The legacy script treats a sandbox that is merely listed as insufficient and polls until the sandbox row shows `running|ready`. The Vitest migration asserts presence in `openshell sandbox list`, but does not verify that the sandbox reaches a running or ready state after restart.
    • Recommendation: Extend `SandboxClient.expectListed()` or add a dedicated helper for this scenario that verifies the sandbox row reaches Running/Ready after gateway restart.
    • Evidence: Legacy Phase 7 has `sandbox_phase=$(openshell sandbox list ... | grep -oiE 'running|ready')` and fails if it never appears. The new post-restart assertion calls `sandbox.expectListed(SANDBOX_NAME)`, which only checks that the sandbox name appears in output.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@github-actions

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Results — ❌ Some jobs failed

Run: 27420633022
Workflow ref: e2e-migrate/test-sandbox-survival
Requested scenarios: (default — all supported)
Requested jobs: sandbox-survival-vitest
Summary: 2 passed, 1 failed, 17 skipped

Job Result
credential-migration-vitest ⏭️ skipped
double-onboard-vitest ⏭️ skipped
gateway-guard-recovery ⏭️ skipped
generate-matrix ✅ success
hermes-e2e-vitest ⏭️ skipped
hermes-root-entrypoint-smoke-vitest ⏭️ skipped
inference-routing-vitest ⏭️ skipped
issue-4434-tui-unreachable-inference-vitest ⏭️ skipped
launchable-smoke-vitest ⏭️ skipped
live-scenarios ⏭️ skipped
model-router-provider-routed-inference-vitest ⏭️ skipped
network-policy-vitest ⏭️ skipped
onboard-negative-paths-vitest ⏭️ skipped
openclaw-tui-chat-correlation-vitest ⏭️ skipped
openshell-version-pin-vitest ⏭️ skipped
rebuild-openclaw-vitest ⏭️ skipped
runtime-overrides-vitest ⏭️ skipped
sandbox-survival-vitest ❌ failure
token-rotation-vitest ⏭️ skipped
validate-jobs ✅ success

Failed jobs: sandbox-survival-vitest. Check run artifacts for logs.

@github-actions

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Results — ✅ All jobs passed

Run: 27421563168
Workflow ref: e2e-migrate/test-sandbox-survival
Requested scenarios: (default — all supported)
Requested jobs: sandbox-survival-vitest
Summary: 2 passed, 0 failed, 17 skipped

Job Result
credential-migration-vitest ⏭️ skipped
double-onboard-vitest ⏭️ skipped
gateway-guard-recovery ⏭️ skipped
generate-matrix ✅ success
hermes-e2e-vitest ⏭️ skipped
hermes-root-entrypoint-smoke-vitest ⏭️ skipped
inference-routing-vitest ⏭️ skipped
issue-4434-tui-unreachable-inference-vitest ⏭️ skipped
launchable-smoke-vitest ⏭️ skipped
live-scenarios ⏭️ skipped
model-router-provider-routed-inference-vitest ⏭️ skipped
network-policy-vitest ⏭️ skipped
onboard-negative-paths-vitest ⏭️ skipped
openclaw-tui-chat-correlation-vitest ⏭️ skipped
openshell-version-pin-vitest ⏭️ skipped
rebuild-openclaw-vitest ⏭️ skipped
runtime-overrides-vitest ⏭️ skipped
sandbox-survival-vitest ⚠️ cancelled
token-rotation-vitest ⏭️ skipped
validate-jobs ✅ success

@github-actions

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Results — ✅ All jobs passed

Run: 27421927021
Workflow ref: e2e-migrate/test-sandbox-survival
Requested scenarios: (default — all supported)
Requested jobs: sandbox-survival-vitest
Summary: 3 passed, 0 failed, 17 skipped

Job Result
credential-migration-vitest ⏭️ skipped
double-onboard-vitest ⏭️ skipped
gateway-guard-recovery ⏭️ skipped
generate-matrix ✅ success
hermes-e2e-vitest ⏭️ skipped
hermes-root-entrypoint-smoke-vitest ⏭️ skipped
inference-routing-vitest ⏭️ skipped
issue-4434-tui-unreachable-inference-vitest ⏭️ skipped
launchable-smoke-vitest ⏭️ skipped
live-scenarios ⏭️ skipped
model-router-provider-routed-inference-vitest ⏭️ skipped
network-policy-vitest ⏭️ skipped
onboard-negative-paths-vitest ⏭️ skipped
openclaw-tui-chat-correlation-vitest ⏭️ skipped
openshell-version-pin-vitest ⏭️ skipped
rebuild-openclaw-vitest ⏭️ skipped
runtime-overrides-vitest ⏭️ skipped
sandbox-survival-vitest ✅ success
token-rotation-vitest ⏭️ skipped
validate-jobs ✅ success

@wscurran wscurran added area: e2e End-to-end tests, nightly failures, or validation infrastructure chore Build, CI, dependency, or tooling maintenance labels Jun 12, 2026
@jyaunches jyaunches added the v0.0.65 Release target label Jun 12, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.github/workflows/e2e-vitest-scenarios.yaml (1)

1316-1318: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move DOCKER_CONFIG out of the workspace.

Line 1317 writes Docker credentials under ${{ github.workspace }} even though this lane runs the real installer/Docker flow. If any step builds from the repo root, .docker-config-sandbox-survival/config.json is part of the build context and can leak the Docker Hub token into the build or image. Cleanup at Lines 1392-1397 happens too late to prevent that. Mirror the safer pattern already used at Lines 1237-1238 and keep DOCKER_CONFIG under ${{ runner.temp }} instead.

🔒 Proposed fix
   sandbox-survival-vitest:
     needs: [validate-jobs, generate-matrix]
     if: ${{ (inputs.jobs == '' && inputs.scenarios == '') || contains(format(',{0},', inputs.jobs), ',sandbox-survival-vitest,') || contains(format(',{0},', inputs.scenarios), ',sandbox-survival,') }}
     runs-on: ubuntu-latest
     timeout-minutes: 30
     env:
-      DOCKER_CONFIG: ${{ github.workspace }}/.docker-config-sandbox-survival
+      DOCKER_CONFIG: ${{ runner.temp }}/docker-config-sandbox-survival
       E2E_ARTIFACT_DIR: ${{ github.workspace }}/e2e-artifacts/vitest/sandbox-survival
       NEMOCLAW_RUN_E2E_SCENARIOS: "1"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/e2e-vitest-scenarios.yaml around lines 1316 - 1318, Change
the DOCKER_CONFIG env to place the Docker config under the runner's temp
directory instead of the workspace: replace DOCKER_CONFIG: ${{ github.workspace
}}/.docker-config-sandbox-survival with DOCKER_CONFIG: ${{ runner.temp
}}/.docker-config-sandbox-survival so the created
.docker-config-sandbox-survival/config.json is not included in repo build
contexts; leave E2E_ARTIFACT_DIR as-is and ensure any existing cleanup steps
still remove files under ${{ runner.temp }}/.docker-config-sandbox-survival if
they reference the old path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Around line 1316-1318: Change the DOCKER_CONFIG env to place the Docker config
under the runner's temp directory instead of the workspace: replace
DOCKER_CONFIG: ${{ github.workspace }}/.docker-config-sandbox-survival with
DOCKER_CONFIG: ${{ runner.temp }}/.docker-config-sandbox-survival so the created
.docker-config-sandbox-survival/config.json is not included in repo build
contexts; leave E2E_ARTIFACT_DIR as-is and ensure any existing cleanup steps
still remove files under ${{ runner.temp }}/.docker-config-sandbox-survival if
they reference the old path.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 557ae478-6697-42e2-84f6-81ba543c406d

📥 Commits

Reviewing files that changed from the base of the PR and between 5152319 and 47dca0c.

📒 Files selected for processing (1)
  • .github/workflows/e2e-vitest-scenarios.yaml

@github-actions

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Results — ✅ All jobs passed

Run: 27423404351
Workflow ref: e2e-migrate/test-sandbox-survival
Requested scenarios: (default — all supported)
Requested jobs: sandbox-survival-vitest
Summary: 3 passed, 0 failed, 18 skipped

Job Result
credential-migration-vitest ⏭️ skipped
double-onboard-vitest ⏭️ skipped
gateway-guard-recovery ⏭️ skipped
generate-matrix ✅ success
hermes-e2e-vitest ⏭️ skipped
hermes-root-entrypoint-smoke-vitest ⏭️ skipped
inference-routing-vitest ⏭️ skipped
issue-4434-tui-unreachable-inference-vitest ⏭️ skipped
launchable-smoke-vitest ⏭️ skipped
live-scenarios ⏭️ skipped
model-router-provider-routed-inference-vitest ⏭️ skipped
network-policy-vitest ⏭️ skipped
onboard-negative-paths-vitest ⏭️ skipped
openclaw-tui-chat-correlation-vitest ⏭️ skipped
openshell-version-pin-vitest ⏭️ skipped
rebuild-openclaw-vitest ⏭️ skipped
runtime-overrides-vitest ⏭️ skipped
sandbox-survival-vitest ✅ success
skill-agent-vitest ⏭️ skipped
token-rotation-vitest ⏭️ skipped
validate-jobs ✅ success

@jyaunches jyaunches merged commit bffcd71 into main Jun 12, 2026
64 checks passed
@jyaunches jyaunches deleted the e2e-migrate/test-sandbox-survival branch June 12, 2026 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: e2e End-to-end tests, nightly failures, or validation infrastructure chore Build, CI, dependency, or tooling maintenance v0.0.65 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants