fix(test): repair four nightly E2E test failures#2367
Conversation
The snapshot-commands E2E test uses `$(nemoclaw ... 2>&1)` inside `set -euo pipefail`, which causes the shell to exit immediately when the command fails — before the captured output is ever printed. This has been masking the actual error message for every Phase 3+ failure. Changes: - Add `run_capture` helper that captures both output and exit code without triggering set -e on non-zero exit - Replace all bare `$(nemoclaw ... 2>&1)` calls with run_capture - Add Phase 2b pre-snapshot diagnostics: registry state, sandbox list, docker containers, stale lock check - Enrich fail() diagnostics with registry contents, lock state, docker ps, node version The test will still fail at the same point, but now the actual error message from nemoclaw will be visible in CI logs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
snapshot-commands-e2e: the grep assertion looked for the literal substring "Snapshot created" but #2184 changed the CLI output to "Snapshot v1 created" (version inserted). Use "Snapshot.*created" to match the new format. deployment-services-e2e: SANDBOX_NAME and LOG_FILE were never defined, crashing immediately on `set -u`. Add the missing variable definitions and pass NEMOCLAW_SANDBOX_NAME in the workflow. network-policy-e2e: after `destroy --yes` the registry entry lingers in a not-ready state, so re-onboard fails with "already exists but is not ready." Add NEMOCLAW_RECREATE_SANDBOX=1 to the onboard call so it overwrites the stale entry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The onboard_sandbox() function explicitly sets its own env vars, so the workflow-level NEMOCLAW_RECREATE_SANDBOX was not reaching the nemoclaw onboard call. Add it inline like the network-policy test fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address CodeRabbit review: the Phase 9 help check used run_capture but skipped the exit code check, so a non-zero exit that still printed help text would be a false pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two unrelated failures (Landlock /sandbox writability regression, CLI/docs command reference drift) need separate investigation. Skip the job to stop it from noise-gating the nightly run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughGuard conditional cloud E2E job in the nightly workflow; add explicit sandbox naming and force sandbox recreation in deployment/network tests; enhance snapshot tests with a capture helper, richer diagnostics, and more robust success/failure checks. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/test-snapshot-commands.sh`:
- Around line 58-65: The run_capture function currently uses eval to assign the
captured output into a variable name, which re-parses and can execute shell
metacharacters; change the assignment to use printf -v to set the by-name
variable safely (preserving the existing _CAPTURE_RC behavior and the local
_output capture) so that outputs containing $(), backticks, or ${...} are not
re-evaluated; update references to run_capture, _CAPTURE_RC, and _output
accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: c8288094-65e0-4b80-90a6-ed747a21a140
📒 Files selected for processing (4)
.github/workflows/nightly-e2e.yamltest/e2e/test-deployment-services.shtest/e2e/test-network-policy.shtest/e2e/test-snapshot-commands.sh
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/nightly-e2e.yaml:
- Around line 78-80: The job uses a constant disabled condition `if: false`
which actionlint flags; replace it with a variable-gated check like the existing
pattern (`vars.GPU_E2E_ENABLED`) by switching `if: false` to a repo-variable
condition that references a new or existing variable (e.g.,
`vars.CLOUD_EXPERIMENTAL_E2E_ENABLED`) so the job stays disabled by default but
avoids the constant-expression error; update the workflow to check that variable
and document that setting `CLOUD_EXPERIMENTAL_E2E_ENABLED` to 'true' in
repository settings will re-enable the job.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 731d71d4-0e2c-4ea6-8c59-8c4abf5589a2
📒 Files selected for processing (4)
.github/workflows/nightly-e2e.yamltest/e2e/test-deployment-services.shtest/e2e/test-network-policy.shtest/e2e/test-snapshot-commands.sh
🚧 Files skipped from review as they are similar to previous changes (2)
- test/e2e/test-network-policy.sh
- test/e2e/test-snapshot-commands.sh
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
prekshivyas
left a comment
There was a problem hiding this comment.
Review
Good intent — the snapshot grep fix and run_capture helper are clearly correct. Flagging a few concerns before merge.
Blocking
1. No nightly-e2e run on this branch. gh run list --branch fix/nightly-e2e-repairs-v4 --workflow nightly-e2e.yaml returns empty. Every fix here is load-bearing for nightly, but none are empirically validated. Please trigger a run and wait for deployment-services-e2e, network-policy-e2e, and snapshot-commands-e2e to all go green before merge.
2. On-PR checks job is failing — Files were modified by following hooks ... Failed. Hook modified files despite the PR body claiming shfmt -i 2 -ci and prettier --check clean. Likely a formatter version mismatch. Needs resolution.
Concern worth validating
3. deployment-services-e2e may hit the openshell k8s delete+create race. In my open PR #2313 I set workflow-level NEMOCLAW_RECREATE_SANDBOX=1 for this job (run 24811493024) and it failed with:
✓ Deleted sandbox e2e-deploy
Creating sandbox 'e2e-deploy' (this takes a few minutes on first run)...
[~2 min image rebuild]
× sandbox 'e2e-deploy' already exists
FATAL: Onboard failed for 'e2e-deploy'
Root cause: install.sh creates the sandbox Ready, then the test's onboard with RECREATE=1 deletes + rebuilds + creates — openshell retains residual state across the rebuild window. I had to narrow the env var to only network-policy-e2e (where install.sh leaves it not-ready anyway, so the race doesn't apply). PR #2367 re-enables the same pattern; the e2e-deploy → e2e-deploy-svc rename doesn't address the race.
Ask: did you observe this race in any manual testing? If not, the nightly run in concern #1 will surface it.
Merge coordination
Both this PR and #2313 touch:
.github/workflows/nightly-e2e.yaml(different concerns — #2313 narrowed env vars, this adds them; conflict likely)test/e2e/test-network-policy.sh(same inlineNEMOCLAW_RECREATE_SANDBOX=1added — should merge cleanly)test/e2e/test-deployment-services.sh(different hunks — likely clean)test/e2e/test-snapshot-commands.sh(both modify around Phase 3 — conflict likely)
Whichever merges first, the other needs a rebase. Happy to coordinate — #2313 has two green validation runs (24812247684 + 24813062692).
Minor
if: falseoncloud-experimental-e2ewith no tracking issue reference. Who remembers to re-enable? Please link OpenShell#905 for the Landlock side and a docs-drift tracking issue.e2e-deploy→e2e-deploy-svcrename isn't explained in the PR body. Intentional or incidental?
Signed-off-by: Prekshi Vyas prekshiv@nvidia.com
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/e2e/test-snapshot-commands.sh (1)
63-70:⚠️ Potential issue | 🟠 MajorReplace
evalinrun_captureassignment.Line [69] reparses captured output as shell code. That’s unsafe and can mis-handle output containing shell metacharacters.
🛠️ Safer Bash-native fix
run_capture() { local _var_name="$1" shift _CAPTURE_RC=0 local _output _output=$("$@" 2>&1) || _CAPTURE_RC=$? - eval "${_var_name}=\${_output}" + printf -v "$_var_name" '%s' "$_output" }#!/bin/bash # Verify whether run_capture still uses eval assignment. rg -n -A3 -B3 'run_capture\(\)|eval "\$\{_var_name\}=\$\{_output\}"|printf -v "\$_var_name"' test/e2e/test-snapshot-commands.sh🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-snapshot-commands.sh` around lines 63 - 70, In run_capture, avoid re-parsing captured output with eval; instead assign the captured string to the caller-provided variable name using a safe builtin (e.g., printf -v) rather than eval "${_var_name}=${_output}"; keep the existing logic that captures stdout+stderr into _output and sets _CAPTURE_RC, then use printf -v "$_var_name" '%s' "$_output" (or an equivalent safe declare/indirect assignment) so special characters aren't reinterpreted; update references in run_capture accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/e2e/test-snapshot-commands.sh`:
- Around line 63-70: In run_capture, avoid re-parsing captured output with eval;
instead assign the captured string to the caller-provided variable name using a
safe builtin (e.g., printf -v) rather than eval "${_var_name}=${_output}"; keep
the existing logic that captures stdout+stderr into _output and sets
_CAPTURE_RC, then use printf -v "$_var_name" '%s' "$_output" (or an equivalent
safe declare/indirect assignment) so special characters aren't reinterpreted;
update references in run_capture accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 7a18a0da-3a90-4396-bd98-d95e8dc045a7
📒 Files selected for processing (2)
test/e2e/test-deployment-services.shtest/e2e/test-snapshot-commands.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- test/e2e/test-deployment-services.sh
- Replace eval with printf -v in run_capture to avoid shell injection - Replace if: false with vars.CLOUD_EXPERIMENTAL_E2E_ENABLED gate to avoid actionlint constant-expression error Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@prekshivyas Thanks for the thorough review. Responding to each point: 1. No nightly-e2e run on this branchThe earlier version of this PR (before the formatting fix) ran nightly-e2e as run 2.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/e2e/test-snapshot-commands.sh (1)
49-53: Refactor duplicated diagnostics into a single helper.The same registry/docker/binary diagnostics are now emitted in multiple places. Consolidating into one helper reduces drift and keeps future debug updates consistent.
♻️ Suggested DRY refactor
+print_runtime_diagnostics() { + local _stream="${1:-stdout}" + local _redir="/dev/stdout" + [ "$_stream" = "stderr" ] && _redir="/dev/stderr" + + { + echo -e "${YELLOW}[DIAG]${NC} Sandboxes: $(openshell sandbox list 2>&1 || echo 'unavailable')" + echo -e "${YELLOW}[DIAG]${NC} Registry: $(cat "$HOME/.nemoclaw/sandboxes.json" 2>&1 || echo 'not found')" + echo -e "${YELLOW}[DIAG]${NC} Registry lock: $(ls -la "$HOME/.nemoclaw/sandboxes.json.lock" 2>&1 || echo 'no lock')" + echo -e "${YELLOW}[DIAG]${NC} Docker ps: $(docker ps --format '{{.Names}} {{.Status}}' 2>&1 || echo 'unavailable')" + echo -e "${YELLOW}[DIAG]${NC} nemoclaw path: $(command -v nemoclaw 2>&1 || echo 'not found')" + echo -e "${YELLOW}[DIAG]${NC} node version: $(node --version 2>&1 || echo 'not found')" + } >"$_redir" +} + fail() { echo -e "${RED}[FAIL]${NC} $1" >&2 echo -e "${YELLOW}[DIAG]${NC} --- Failure diagnostics ---" >&2 - ... + print_runtime_diagnostics stderr echo -e "${YELLOW}[DIAG]${NC} --- End diagnostics ---" >&2 exit 1 } # Phase 2b info "Phase 2b: Pre-snapshot diagnostics..." -... +print_runtime_diagnosticsAlso applies to: 124-138
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-snapshot-commands.sh` around lines 49 - 53, Introduce a single helper function (e.g., dump_diagnostics) that emits the repeated diagnostic lines (registry file cat, registry lock ls, docker ps, command -v nemoclaw, node --version) and prints to stderr with the same color variables (${YELLOW}, ${NC}); then replace each duplicated block (the echo -e ... lines currently duplicated) with a single call to dump_diagnostics so all diagnostics are centralized and consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/test-deployment-services.sh`:
- Around line 158-160: Replace the hardcoded GitHub download URL in the curl
command with a configurable environment variable (e.g., CLOUDFLARED_URL) and
update the conditional that downloads and installs cloudflared (the
curl/chmod/sudo mv block that writes to /usr/local/bin/cloudflared) to use that
variable; validate the variable before using it (exit with an error if empty)
and optionally keep a documented, approved fallback source, so the script no
longer contains a direct third-party repository link.
---
Nitpick comments:
In `@test/e2e/test-snapshot-commands.sh`:
- Around line 49-53: Introduce a single helper function (e.g., dump_diagnostics)
that emits the repeated diagnostic lines (registry file cat, registry lock ls,
docker ps, command -v nemoclaw, node --version) and prints to stderr with the
same color variables (${YELLOW}, ${NC}); then replace each duplicated block (the
echo -e ... lines currently duplicated) with a single call to dump_diagnostics
so all diagnostics are centralized and consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: b7118fc5-b756-421e-9359-425b99810d93
📒 Files selected for processing (3)
.github/workflows/nightly-e2e.yamltest/e2e/test-deployment-services.shtest/e2e/test-snapshot-commands.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/nightly-e2e.yaml
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
test/e2e/test-snapshot-commands.sh (1)
49-53: Optional: centralize diagnostics to avoid drift.The same probes are now split across
fail()and Phase 2b. A shared helper would keep diagnostics consistent as the script evolves.Also applies to: 124-139
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-snapshot-commands.sh` around lines 49 - 53, Split-out duplicate diagnostic echo commands into a single helper and call it from both fail() and the Phase 2b block: create a function (e.g., print_diagnostics or dump_diagnostics) that runs the registry, lock, docker, nemoclaw path and node version probes currently shown in the snippet, replace the duplicated echo block in the Phase 2b section and the fail() function to call that helper so diagnostics are centralized and consistent across the script.test/e2e/test-deployment-services.sh (1)
201-202: Consider making forced sandbox recreation overrideable.Hardcoding recreation is fine for CI stability, but allowing an env override keeps local repro/debug flows less destructive.
♻️ Suggested tweak
- NEMOCLAW_RECREATE_SANDBOX=1 \ + NEMOCLAW_RECREATE_SANDBOX="${NEMOCLAW_RECREATE_SANDBOX:-1}" \🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-deployment-services.sh` around lines 201 - 202, The script currently forces sandbox recreation by hardcoding NEMOCLAW_RECREATE_SANDBOX=1 before calling run_with_timeout and nemoclaw onboard; change this to respect an external override by only setting NEMOCLAW_RECREATE_SANDBOX to 1 when it is not already defined (i.e., default to 1 but allow users to export NEMOCLAW_RECREATE_SANDBOX=0 to skip recreation), leaving the run_with_timeout and nemoclaw onboard invocation (function/command names: run_with_timeout, nemoclaw onboard) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/e2e/test-deployment-services.sh`:
- Around line 201-202: The script currently forces sandbox recreation by
hardcoding NEMOCLAW_RECREATE_SANDBOX=1 before calling run_with_timeout and
nemoclaw onboard; change this to respect an external override by only setting
NEMOCLAW_RECREATE_SANDBOX to 1 when it is not already defined (i.e., default to
1 but allow users to export NEMOCLAW_RECREATE_SANDBOX=0 to skip recreation),
leaving the run_with_timeout and nemoclaw onboard invocation (function/command
names: run_with_timeout, nemoclaw onboard) unchanged.
In `@test/e2e/test-snapshot-commands.sh`:
- Around line 49-53: Split-out duplicate diagnostic echo commands into a single
helper and call it from both fail() and the Phase 2b block: create a function
(e.g., print_diagnostics or dump_diagnostics) that runs the registry, lock,
docker, nemoclaw path and node version probes currently shown in the snippet,
replace the duplicated echo block in the Phase 2b section and the fail()
function to call that helper so diagnostics are centralized and consistent
across the script.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: b2e8ac3c-e235-47d3-b503-e212f19e27de
📒 Files selected for processing (2)
test/e2e/test-deployment-services.shtest/e2e/test-snapshot-commands.sh
Move the hardcoded GitHub URL into a CLOUDFLARED_DOWNLOAD_URL env var with the current URL as default. Addresses the no-third-party-links coding guideline flagged by CodeRabbit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Two items before merge: 1. 2. # DISABLED: Landlock /sandbox writability regression + CLI/docs command
# reference drift. Re-enable once both are fixed.Either (a) try re-enabling the job (flip |
|
Update on the OpenShell v0.0.36 was released 2026-04-23 15:18 UTC and includes OpenShell#910 "preserve explicit read-only baseline paths", which fixes OpenShell#905 — the So the Landlock half of the reason this job is disabled has an upstream fix available now. NemoClaw is currently pinned to Suggested follow-up (separate PR, not blocking this one):
The docs-drift half is independent of the OpenShell version, so re-enabling still needs that piece resolved first. Flagging for visibility so the job doesn't stay disabled longer than necessary. Signed-off-by: Prekshi Vyas prekshiv@nvidia.com |
Merge main into fix/nightly-e2e-repairs-v4, resolving conflicts: - nightly-e2e.yaml: keep contains(needs.*.result, 'failure') from #2371, remove cloud-experimental-e2e from notify-on-failure needs array - test-snapshot-commands.sh: keep exit-code check from this PR + tolerant grep regex from #2314 Address reviewer feedback: - Drop NEMOCLAW_RECREATE_SANDBOX from deployment-services to avoid re-introducing the delete+create race fixed in #2313 (@prekshivyas) - Update cloud-experimental-e2e disable comment: drop stale Landlock mention (fixed in OpenShell#810 v0.0.32+), reference only docs-drift (@prekshivyas) - Extract dump_diagnostics() in test-snapshot-commands.sh to centralize duplicate probes from fail() and Phase 2b (CodeRabbit)
…ds list) Merge main into fix/nightly-e2e-repairs-v4, resolving the conflict in .github/workflows/nightly-e2e.yaml: - Keep multi-line needs array format (PR readability) - Restore cloud-experimental-e2e in needs list (from main) - Adopt cancelled-job notification condition (from #2379) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Re-applies the E2E fixes from #2351 (reverted due to formatting violations). Formatting verified with both
shfmt -i 2 -ciandprettier.Changes
run_capturehelper, diagnostics, fix grep assertion, exit code checkNEMOCLAW_RECREATE_SANDBOXNEMOCLAW_RECREATE_SANDBOX=1to onboardTest plan
shfmt -i 2 -ci -dcleanprettier --checkcleanshellcheckcleanSummary by CodeRabbit
Tests
Chores