test(e2e): migrate test-sandbox-survival.sh to vitest#5332
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR adds a comprehensive end-to-end sandbox-survival scenario that validates NemoClaw sandbox persistence across gateway restart, including helper clients, fixture methods for gateway lifecycle management, state validation, inference probing, support tests, and GitHub Actions workflow integration. ChangesSandbox Survival E2E Test and Infrastructure
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Vitest E2E Scenario Results — ❌ Some jobs failedRun: 27420189913
|
| const MODEL = process.env.NEMOCLAW_MODEL ?? "nvidia/nemotron-3-super-120b-a12b"; | ||
| const ENVIRONMENT = ubuntuRepoDocker("cloud-openclaw"); | ||
|
|
||
| function sleep(ms: number): Promise<void> { |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
Vitest E2E Scenario RecommendationRequired Vitest E2E scenarios: Dispatch required Vitest E2E scenarios:
Full Vitest E2E advisor summaryVitest E2E Scenario AdvisorBase: Required Vitest E2E scenarios
Optional Vitest E2E scenarios
Relevant changed files
|
Vitest E2E Scenario Results — ❌ Some jobs failedRun: 27420461179
|
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e-scenario/fixtures/clients/gateway.ts`:
- Around line 159-160: The connected-state check is too loose: `/connected/i`
matches "disconnected" and `new RegExp(gatewayName, "i")` treats the gateway
name as a regex. In the function that builds the `text` check (used by
waitForGatewayConnected()), replace the `/connected/i` test with a literal-word
or bounded match (e.g., use a word-boundary or explicit token match for
"connected" so "disconnected" does not pass) and replace the `new
RegExp(gatewayName, "i")` test with a literal, case-insensitive substring match
by normalizing both `text` and `gatewayName` (or escape the gatewayName before
constructing a RegExp) to ensure the gateway name is matched as a plain string.
In `@test/e2e-scenario/fixtures/phases/lifecycle.ts`:
- Around line 295-301: The health check is incorrectly treating "disconnected"
as healthy because it uses /connected/i; change the check after running
this.host.command("openshell", ["status"]) (the variable status and the
assertExitZero("openshell status") block) to verify the exact healthy token
emitted by openshell status (e.g., match the line that equals "connected" or use
a word-boundary/line-anchored check rather than /connected/i) so only a true
"connected" result passes the probe; keep using status.stdout/status.stderr and
buildAvailabilityProbeEnv() unchanged.
In `@test/e2e-scenario/fixtures/phases/runtime.ts`:
- Around line 188-205: chatCompletionText currently merges visible output
(message.content / choice.text) and message.reasoning_content indiscriminately,
letting reasoning_content alone satisfy probes like expectInferenceLocalPong;
update chatCompletionText to collect visibleParts (message.content and
choice.text) and reasoningParts (message.reasoning_content) separately and
return visibleParts.join("\n") if any visibleParts exist, otherwise fall back to
reasoningParts.join("\n") so reasoning_content is only used when the API truly
omitted a final visible message; apply the same change to the analogous block
around lines 328-349.
In `@test/e2e-scenario/live/sandbox-survival.test.ts`:
- Around line 136-165: The pre-install cleanup calls like
host.bestEffortCleanupSandbox, sandbox.openshell([... "sandbox", "delete",
...]), and the direct host.command("openshell", ["gateway", "destroy", ...])
must be guarded so they don't run when the CLI binaries are missing; update the
block to first check binary availability (e.g., command -v/openShellExists
check) or wrap the CLI calls with a safe no-op when the binary is absent, and
only call bestEffortCleanupSandbox, sandbox.openshell and
host.command("openshell", ...) when the corresponding binary is present to avoid
spawning ENOENT before install.sh runs.
- Around line 282-283: The test interpolates SANDBOX_NAME directly into a RegExp
causing special chars to be treated as regex meta-characters; update the
assertion around resultText(afterDestroyList) to escape SANDBOX_NAME first
(e.g., add a regexEscape helper or inline replace like
SANDBOX_NAME.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')) and then build the RegExp
from the escaped value so the check reliably matches the literal sandbox name.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 3e2d4b2c-f0c5-4c00-81cd-83adb3b00074
📒 Files selected for processing (15)
.github/workflows/e2e-vitest-scenarios.yamltest/e2e-scenario/fixtures/clients/command.tstest/e2e-scenario/fixtures/clients/gateway.tstest/e2e-scenario/fixtures/clients/host.tstest/e2e-scenario/fixtures/clients/index.tstest/e2e-scenario/fixtures/clients/sandbox.tstest/e2e-scenario/fixtures/e2e-test.tstest/e2e-scenario/fixtures/phases/lifecycle.tstest/e2e-scenario/fixtures/phases/runtime.tstest/e2e-scenario/fixtures/phases/state-validation.tstest/e2e-scenario/live/sandbox-survival.test.tstest/e2e-scenario/support-tests/e2e-clients.test.tstest/e2e-scenario/support-tests/e2e-phase-lifecycle.test.tstest/e2e-scenario/support-tests/e2e-phase-runtime.test.tstest/e2e-scenario/support-tests/e2e-phase-state-validation.test.ts
| const text = `${result.stdout}\n${result.stderr}`; | ||
| if (!/connected/i.test(text) || !new RegExp(gatewayName, "i").test(text)) { |
There was a problem hiding this comment.
Tighten the connected-state check.
/connected/i also matches "disconnected", so waitForGatewayConnected() in test/e2e-scenario/fixtures/phases/lifecycle.ts:281-310 can stop retrying while the gateway is still offline. new RegExp(gatewayName, "i") also treats the gateway name as regex syntax instead of a literal string.
Proposed fix
const text = `${result.stdout}\n${result.stderr}`;
- if (!/connected/i.test(text) || !new RegExp(gatewayName, "i").test(text)) {
+ const escapedGatewayName = gatewayName.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
+ if (
+ !/\bconnected\b/i.test(text) ||
+ /\bdisconnected\b/i.test(text) ||
+ !new RegExp(`\\b${escapedGatewayName}\\b`, "i").test(text)
+ ) {
throw new Error(`openshell status did not report connected gateway '${gatewayName}'.`);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const text = `${result.stdout}\n${result.stderr}`; | |
| if (!/connected/i.test(text) || !new RegExp(gatewayName, "i").test(text)) { | |
| const text = `${result.stdout}\n${result.stderr}`; | |
| const escapedGatewayName = gatewayName.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"); | |
| if ( | |
| !/\bconnected\b/i.test(text) || | |
| /\bdisconnected\b/i.test(text) || | |
| !new RegExp(`\\b${escapedGatewayName}\\b`, "i").test(text) | |
| ) { | |
| throw new Error(`openshell status did not report connected gateway '${gatewayName}'.`); | |
| } |
🧰 Tools
🪛 ast-grep (0.43.0)
[warning] 159-159: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(gatewayName, "i")
Note: [CWE-1333] Inefficient Regular Expression Complexity
(regexp-from-variable)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e-scenario/fixtures/clients/gateway.ts` around lines 159 - 160, The
connected-state check is too loose: `/connected/i` matches "disconnected" and
`new RegExp(gatewayName, "i")` treats the gateway name as a regex. In the
function that builds the `text` check (used by waitForGatewayConnected()),
replace the `/connected/i` test with a literal-word or bounded match (e.g., use
a word-boundary or explicit token match for "connected" so "disconnected" does
not pass) and replace the `new RegExp(gatewayName, "i")` test with a literal,
case-insensitive substring match by normalizing both `text` and `gatewayName`
(or escape the gatewayName before constructing a RegExp) to ensure the gateway
name is matched as a plain string.
Source: Linters/SAST tools
| const status = await this.host.command("openshell", ["status"], { | ||
| artifactName: `lifecycle-gateway-health-${attempt}`, | ||
| env: buildAvailabilityProbeEnv(), | ||
| timeoutMs: 30_000, | ||
| }); | ||
| assertExitZero(status, "openshell status"); | ||
| if (/connected/i.test(`${status.stdout}\n${status.stderr}`)) return; |
There was a problem hiding this comment.
Don't treat openshell status "disconnected" output as healthy.
The fallback path uses /connected/i, which also matches strings like disconnected, so this method can return before the gateway has actually recovered. Tighten the check to the exact healthy token/line that openshell status emits.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e-scenario/fixtures/phases/lifecycle.ts` around lines 295 - 301, The
health check is incorrectly treating "disconnected" as healthy because it uses
/connected/i; change the check after running this.host.command("openshell",
["status"]) (the variable status and the assertExitZero("openshell status")
block) to verify the exact healthy token emitted by openshell status (e.g.,
match the line that equals "connected" or use a word-boundary/line-anchored
check rather than /connected/i) so only a true "connected" result passes the
probe; keep using status.stdout/status.stderr and buildAvailabilityProbeEnv()
unchanged.
| function chatCompletionText(json: unknown): string { | ||
| if (!json || typeof json !== "object") return ""; | ||
| const choices = (json as { choices?: unknown }).choices; | ||
| if (!Array.isArray(choices)) return ""; | ||
| const parts: string[] = []; | ||
| for (const choice of choices) { | ||
| if (!choice || typeof choice !== "object") continue; | ||
| const message = (choice as { message?: unknown }).message; | ||
| if (message && typeof message === "object") { | ||
| const content = (message as { content?: unknown; reasoning_content?: unknown }).content; | ||
| const reasoning = (message as { reasoning_content?: unknown }).reasoning_content; | ||
| if (typeof content === "string") parts.push(content); | ||
| if (typeof reasoning === "string") parts.push(reasoning); | ||
| } | ||
| const text = (choice as { text?: unknown }).text; | ||
| if (typeof text === "string") parts.push(text); | ||
| } | ||
| return parts.join("\n"); |
There was a problem hiding this comment.
Don't let reasoning_content satisfy the PONG check by itself.
chatCompletionText() merges visible output and reasoning into one string, and expectInferenceLocalPong() only looks for PONG anywhere in that merged text. A reply like content: "OK" plus reasoning_content: "Need to answer PONG" now passes even though the final assistant output is wrong. For this probe, assert against content/text first and only fall back to reasoning_content if the API genuinely omits a final message field.
Also applies to: 328-349
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e-scenario/fixtures/phases/runtime.ts` around lines 188 - 205,
chatCompletionText currently merges visible output (message.content /
choice.text) and message.reasoning_content indiscriminately, letting
reasoning_content alone satisfy probes like expectInferenceLocalPong; update
chatCompletionText to collect visibleParts (message.content and choice.text) and
reasoningParts (message.reasoning_content) separately and return
visibleParts.join("\n") if any visibleParts exist, otherwise fall back to
reasoningParts.join("\n") so reasoning_content is only used when the API truly
omitted a final visible message; apply the same change to the analogous block
around lines 328-349.
| expect(resultText(afterDestroyList), "sandbox still listed after destroy").not.toMatch( | ||
| new RegExp(`(^|\\s)${SANDBOX_NAME}(\\s|$)`, "m"), |
There was a problem hiding this comment.
Escape SANDBOX_NAME before interpolating it into the regex.
SANDBOX_NAME comes from the environment, so characters like ., [, or + change the match and can make the post-destroy assertion pass or fail incorrectly.
🧰 Tools
🪛 ast-grep (0.43.0)
[warning] 282-282: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp((^|\\s)${SANDBOX_NAME}(\\s|$), "m")
Note: [CWE-1333] Inefficient Regular Expression Complexity
(regexp-from-variable)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e-scenario/live/sandbox-survival.test.ts` around lines 282 - 283, The
test interpolates SANDBOX_NAME directly into a RegExp causing special chars to
be treated as regex meta-characters; update the assertion around
resultText(afterDestroyList) to escape SANDBOX_NAME first (e.g., add a
regexEscape helper or inline replace like
SANDBOX_NAME.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')) and then build the RegExp
from the escaped value so the check reliably matches the literal sandbox name.
Source: Linters/SAST tools
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/e2e-scenario/live/sandbox-survival.test.ts (1)
338-341:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winEscape
SANDBOX_NAMEbefore building the regex.This still interpolates the env-provided sandbox name directly into
new RegExp(...), so characters like.,[, or+change the match and can make the post-destroy assertion pass or fail incorrectly. Build the pattern from an escaped literal instead.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e-scenario/live/sandbox-survival.test.ts` around lines 338 - 341, The test builds a RegExp from the env-provided SANDBOX_NAME which can contain regex metacharacters; change the assertion to use an escaped literal version of SANDBOX_NAME before constructing new RegExp so characters like . or + are treated literally. Add or reuse an escape function (e.g., escapeRegExp) and apply it to SANDBOX_NAME, then use the escaped value in the expect(...).not.toMatch(new RegExp(...)) call to ensure the post-destroy assertion matches the literal sandbox name.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@test/e2e-scenario/live/sandbox-survival.test.ts`:
- Around line 338-341: The test builds a RegExp from the env-provided
SANDBOX_NAME which can contain regex metacharacters; change the assertion to use
an escaped literal version of SANDBOX_NAME before constructing new RegExp so
characters like . or + are treated literally. Add or reuse an escape function
(e.g., escapeRegExp) and apply it to SANDBOX_NAME, then use the escaped value in
the expect(...).not.toMatch(new RegExp(...)) call to ensure the post-destroy
assertion matches the literal sandbox name.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 981506cb-34ad-4fdd-8bc0-ae9d70cca78d
📒 Files selected for processing (1)
test/e2e-scenario/live/sandbox-survival.test.ts
PR Review AdvisorFindings: 3 needs attention, 9 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Consider writing more tests for
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
Vitest E2E Scenario Results — ❌ Some jobs failedRun: 27420633022
|
Vitest E2E Scenario Results — ✅ All jobs passedRun: 27421563168
|
Vitest E2E Scenario Results — ✅ All jobs passedRun: 27421927021
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/e2e-vitest-scenarios.yaml (1)
1316-1318:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMove
DOCKER_CONFIGout of the workspace.Line 1317 writes Docker credentials under
${{ github.workspace }}even though this lane runs the real installer/Docker flow. If any step builds from the repo root,.docker-config-sandbox-survival/config.jsonis part of the build context and can leak the Docker Hub token into the build or image. Cleanup at Lines 1392-1397 happens too late to prevent that. Mirror the safer pattern already used at Lines 1237-1238 and keepDOCKER_CONFIGunder${{ runner.temp }}instead.🔒 Proposed fix
sandbox-survival-vitest: needs: [validate-jobs, generate-matrix] if: ${{ (inputs.jobs == '' && inputs.scenarios == '') || contains(format(',{0},', inputs.jobs), ',sandbox-survival-vitest,') || contains(format(',{0},', inputs.scenarios), ',sandbox-survival,') }} runs-on: ubuntu-latest timeout-minutes: 30 env: - DOCKER_CONFIG: ${{ github.workspace }}/.docker-config-sandbox-survival + DOCKER_CONFIG: ${{ runner.temp }}/docker-config-sandbox-survival E2E_ARTIFACT_DIR: ${{ github.workspace }}/e2e-artifacts/vitest/sandbox-survival NEMOCLAW_RUN_E2E_SCENARIOS: "1"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/e2e-vitest-scenarios.yaml around lines 1316 - 1318, Change the DOCKER_CONFIG env to place the Docker config under the runner's temp directory instead of the workspace: replace DOCKER_CONFIG: ${{ github.workspace }}/.docker-config-sandbox-survival with DOCKER_CONFIG: ${{ runner.temp }}/.docker-config-sandbox-survival so the created .docker-config-sandbox-survival/config.json is not included in repo build contexts; leave E2E_ARTIFACT_DIR as-is and ensure any existing cleanup steps still remove files under ${{ runner.temp }}/.docker-config-sandbox-survival if they reference the old path.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In @.github/workflows/e2e-vitest-scenarios.yaml:
- Around line 1316-1318: Change the DOCKER_CONFIG env to place the Docker config
under the runner's temp directory instead of the workspace: replace
DOCKER_CONFIG: ${{ github.workspace }}/.docker-config-sandbox-survival with
DOCKER_CONFIG: ${{ runner.temp }}/.docker-config-sandbox-survival so the created
.docker-config-sandbox-survival/config.json is not included in repo build
contexts; leave E2E_ARTIFACT_DIR as-is and ensure any existing cleanup steps
still remove files under ${{ runner.temp }}/.docker-config-sandbox-survival if
they reference the old path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 557ae478-6697-42e2-84f6-81ba543c406d
📒 Files selected for processing (1)
.github/workflows/e2e-vitest-scenarios.yaml
Vitest E2E Scenario Results — ✅ All jobs passedRun: 27423404351
|
Summary
Migrate
test/e2e/test-sandbox-survival.shwith equivalent live Vitest coverage while extracting small reusable sandbox/gateway/state/runtime helpers into the existing fixture homes.Related Issues
Refs #5098
Refs #486
Refs #888
Refs #859
Refs #1086
Contract mapping
test/e2e-scenario/live/sandbox-survival.test.tsrunsbash install.sh --non-interactivewithNEMOCLAW_SANDBOX_NAME=e2e-survival.HostCliClient.expectListed/expectStatus,SandboxClient.expectListed, andStateValidationPhaseFixture.expectLocalRegistryContainsassertions in the live test.nemoclaw list,nemoclaw <name> status,openshell sandbox list,~/.nemoclaw/sandboxes.json.LifecyclePhaseFixture.restartGatewayRuntime()pluswaitForGatewayConnected()./sandbox/.openclawmarker state, and live inference survive restart.RuntimePhaseFixture.expectInferenceLocalPong()before and after restart.https://inference.local/v1/chat/completions.HostCliClient.cleanupSandbox()followed by post-destroy list assertion.nemoclaw <name> destroy --yes.Simplicity check
nightly-e2e.yamljobsandbox-survival-e2e, reusablee2e-script.yaml,runs-on: ubuntu-latest, Docker/OpenShell,NVIDIA_API_KEY,GITHUB_TOKEN, 30-minute timeout.e2e-vitest-scenarios.yamljobsandbox-survival-vitest,runs-on: ubuntu-latest, Docker/OpenShell/install.sh,NVIDIA_API_KEY, 30-minute timeout.clients/host.ts,clients/sandbox.ts,clients/gateway.ts,phases/lifecycle.ts,phases/state-validation.ts, andphases/runtime.ts; local helper was insufficient because Phase 4 dependent scripts reuse the same sandbox/gateway/state/inference boundaries.sandbox-survival-vitestjob and selector aliases; legacy shell lane deletion deferred to Epic: Migrate legacy bash E2E into the Vitest E2E system #5098 Phase 11.gh workflow run e2e-vitest-scenarios.yaml --repo NVIDIA/NemoClaw --ref e2e-migrate/test-sandbox-survival -f jobs=sandbox-survival-vitest -f pr_number=5332Verification
npm ci --ignore-scriptsnpm run build:clinpm run typecheck:clinpx vitest run --project e2e-vitest-support test/e2e-scenario/support-tests/e2e-scenarios-workflow.test.ts test/e2e-scenario/support-tests/e2e-phase-lifecycle.test.ts test/e2e-scenario/support-tests/e2e-phase-runtime.test.ts test/e2e-scenario/support-tests/e2e-phase-state-validation.test.ts test/e2e-scenario/support-tests/e2e-clients.test.ts --reporter=defaultNEMOCLAW_RUN_E2E_SCENARIOS=1 npx vitest run --project e2e-scenarios-live test/e2e-scenario/live/sandbox-survival.test.ts --reporter=default(skips locally without live secret)git diff --checkTest (CLI)was attempted and failed on unrelated local environment prerequisites (nemoclaw/node_modules/json5missing,nemoclaw/dist/blueprint/private-networks.jsmissing, and existing local-platform timeouts/assertions); commit was created with--no-verifyafter the focused validations above passed.Summary by CodeRabbit
Note: This release contains internal testing infrastructure improvements with no direct user-facing changes.