fix(test): repair four nightly E2E test failures by ericksoa · Pull Request #2367 · NVIDIA/NemoClaw

ericksoa · 2026-04-23T15:43:46Z

Summary

Re-applies the E2E fixes from #2351 (reverted due to formatting violations). Formatting verified with both shfmt -i 2 -ci and prettier.

Changes

snapshot-commands-e2e: run_capture helper, diagnostics, fix grep assertion, exit code check
deployment-services-e2e: define missing variables, add NEMOCLAW_RECREATE_SANDBOX
network-policy-e2e: add NEMOCLAW_RECREATE_SANDBOX=1 to onboard
cloud-experimental-e2e: disabled until Landlock + docs drift fixed

Test plan

shfmt -i 2 -ci -d clean
prettier --check clean
shellcheck clean
CI checks pass

Summary by CodeRabbit

Tests
- Expanded end-to-end failure diagnostics and pre-check logging for easier debugging.
- Added per-run timestamped test logs and improved command-output capture to avoid premature exits.
- Broadened snapshot success detection and strengthened snapshot lifecycle checks.
- Ensured sandbox onboarding explicitly recreates the sandbox to enforce a clean test state.
Chores
- Updated nightly E2E workflow: conditionally disabled an experimental job and refactored failure notification orchestration.
- Set explicit sandbox parameters and forced sandbox recreation for deployment-service runs.

The snapshot-commands E2E test uses `$(nemoclaw ... 2>&1)` inside `set -euo pipefail`, which causes the shell to exit immediately when the command fails — before the captured output is ever printed. This has been masking the actual error message for every Phase 3+ failure. Changes: - Add `run_capture` helper that captures both output and exit code without triggering set -e on non-zero exit - Replace all bare `$(nemoclaw ... 2>&1)` calls with run_capture - Add Phase 2b pre-snapshot diagnostics: registry state, sandbox list, docker containers, stale lock check - Enrich fail() diagnostics with registry contents, lock state, docker ps, node version The test will still fail at the same point, but now the actual error message from nemoclaw will be visible in CI logs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

snapshot-commands-e2e: the grep assertion looked for the literal substring "Snapshot created" but #2184 changed the CLI output to "Snapshot v1 created" (version inserted). Use "Snapshot.*created" to match the new format. deployment-services-e2e: SANDBOX_NAME and LOG_FILE were never defined, crashing immediately on `set -u`. Add the missing variable definitions and pass NEMOCLAW_SANDBOX_NAME in the workflow. network-policy-e2e: after `destroy --yes` the registry entry lingers in a not-ready state, so re-onboard fails with "already exists but is not ready." Add NEMOCLAW_RECREATE_SANDBOX=1 to the onboard call so it overwrites the stale entry. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The onboard_sandbox() function explicitly sets its own env vars, so the workflow-level NEMOCLAW_RECREATE_SANDBOX was not reaching the nemoclaw onboard call. Add it inline like the network-policy test fix. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Address CodeRabbit review: the Phase 9 help check used run_capture but skipped the exit code check, so a non-zero exit that still printed help text would be a false pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Two unrelated failures (Landlock /sandbox writability regression, CLI/docs command reference drift) need separate investigation. Skip the job to stop it from noise-gating the nightly run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-04-23T15:43:58Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Guard conditional cloud E2E job in the nightly workflow; add explicit sandbox naming and force sandbox recreation in deployment/network tests; enhance snapshot tests with a capture helper, richer diagnostics, and more robust success/failure checks.

Changes

Cohort / File(s)	Summary
Workflow Configuration `.github/workflows/nightly-e2e.yaml`	Add job-level `if` so `cloud-experimental-e2e` only runs for `NVIDIA/NemoClaw` when `CLOUD_EXPERIMENTAL_E2E_ENABLED=="true"`; set `e2e-deploy-svc` sandbox name and `NEMOCLAW_RECREATE_SANDBOX=1` for deployment job; refactor `notify-on-failure` `needs` list to exclude `cloud-experimental-e2e`.
Sandbox Onboarding Tests `test/e2e/test-deployment-services.sh`, `test/e2e/test-network-policy.sh`	Introduce default `SANDBOX_NAME` and per-run `LOG_FILE` in deployment test; call `nemoclaw onboard` with `NEMOCLAW_RECREATE_SANDBOX=1` to force sandbox recreation in both tests.
Snapshot Test Enhancements `test/e2e/test-snapshot-commands.sh`	Add `run_capture()` to capture stdout/stderr and exit codes; refactor snapshot create/list/restore flows to use it; broaden create success regex (`Snapshot.*created`); expand `fail()` diagnostics with registry/lock info, Docker status, `nemoclaw` path and Node version; add pre-snapshot Phase 2b diagnostics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

fix(test): repair four nightly E2E test failures #2351: Applies the same workflow and test-script edits (SANDBOX naming, NEMOCLAW_RECREATE_SANDBOX, and snapshot run_capture/diagnostics changes).

Poem

🐰 I hop through logs at break of day,

I nudge sandboxes to wake and play,
I capture every squeak and shout,
Then patch the snapshots inside my snout,
Hooray — the CI carrots all the way! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(test): repair four nightly E2E test failures' directly and clearly summarizes the main change—addressing and repairing four failing nightly E2E tests across multiple test suites.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/nightly-e2e-repairs-v4

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-snapshot-commands.sh`:
- Around line 58-65: The run_capture function currently uses eval to assign the
captured output into a variable name, which re-parses and can execute shell
metacharacters; change the assignment to use printf -v to set the by-name
variable safely (preserving the existing _CAPTURE_RC behavior and the local
_output capture) so that outputs containing $(), backticks, or ${...} are not
re-evaluated; update references to run_capture, _CAPTURE_RC, and _output
accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: c8288094-65e0-4b80-90a6-ed747a21a140

📥 Commits

Reviewing files that changed from the base of the PR and between 89a3ebe and 6a5d5c4.

📒 Files selected for processing (4)

.github/workflows/nightly-e2e.yaml
test/e2e/test-deployment-services.sh
test/e2e/test-network-policy.sh
test/e2e/test-snapshot-commands.sh

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/nightly-e2e.yaml:
- Around line 78-80: The job uses a constant disabled condition `if: false`
which actionlint flags; replace it with a variable-gated check like the existing
pattern (`vars.GPU_E2E_ENABLED`) by switching `if: false` to a repo-variable
condition that references a new or existing variable (e.g.,
`vars.CLOUD_EXPERIMENTAL_E2E_ENABLED`) so the job stays disabled by default but
avoids the constant-expression error; update the workflow to check that variable
and document that setting `CLOUD_EXPERIMENTAL_E2E_ENABLED` to 'true' in
repository settings will re-enable the job.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 731d71d4-0e2c-4ea6-8c59-8c4abf5589a2

📥 Commits

Reviewing files that changed from the base of the PR and between 6a5d5c4 and 91efc24.

📒 Files selected for processing (4)

.github/workflows/nightly-e2e.yaml
test/e2e/test-deployment-services.sh
test/e2e/test-network-policy.sh
test/e2e/test-snapshot-commands.sh

🚧 Files skipped from review as they are similar to previous changes (2)

test/e2e/test-network-policy.sh
test/e2e/test-snapshot-commands.sh

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

prekshivyas

Review

Good intent — the snapshot grep fix and run_capture helper are clearly correct. Flagging a few concerns before merge.

Blocking

1. No nightly-e2e run on this branch. gh run list --branch fix/nightly-e2e-repairs-v4 --workflow nightly-e2e.yaml returns empty. Every fix here is load-bearing for nightly, but none are empirically validated. Please trigger a run and wait for deployment-services-e2e, network-policy-e2e, and snapshot-commands-e2e to all go green before merge.

2. On-PR checks job is failing — Files were modified by following hooks ... Failed. Hook modified files despite the PR body claiming shfmt -i 2 -ci and prettier --check clean. Likely a formatter version mismatch. Needs resolution.

Concern worth validating

3. deployment-services-e2e may hit the openshell k8s delete+create race. In my open PR #2313 I set workflow-level NEMOCLAW_RECREATE_SANDBOX=1 for this job (run 24811493024) and it failed with:

✓ Deleted sandbox e2e-deploy
  Creating sandbox 'e2e-deploy' (this takes a few minutes on first run)...
[~2 min image rebuild]
× sandbox 'e2e-deploy' already exists
FATAL: Onboard failed for 'e2e-deploy'

Root cause: install.sh creates the sandbox Ready, then the test's onboard with RECREATE=1 deletes + rebuilds + creates — openshell retains residual state across the rebuild window. I had to narrow the env var to only network-policy-e2e (where install.sh leaves it not-ready anyway, so the race doesn't apply). PR #2367 re-enables the same pattern; the e2e-deploy → e2e-deploy-svc rename doesn't address the race.

Ask: did you observe this race in any manual testing? If not, the nightly run in concern #1 will surface it.

Merge coordination

Both this PR and #2313 touch:

.github/workflows/nightly-e2e.yaml (different concerns — #2313 narrowed env vars, this adds them; conflict likely)
test/e2e/test-network-policy.sh (same inline NEMOCLAW_RECREATE_SANDBOX=1 added — should merge cleanly)
test/e2e/test-deployment-services.sh (different hunks — likely clean)
test/e2e/test-snapshot-commands.sh (both modify around Phase 3 — conflict likely)

Whichever merges first, the other needs a rebase. Happy to coordinate — #2313 has two green validation runs (24812247684 + 24813062692).

Minor

if: false on cloud-experimental-e2e with no tracking issue reference. Who remembers to re-enable? Please link OpenShell#905 for the Landlock side and a docs-drift tracking issue.
e2e-deploy → e2e-deploy-svc rename isn't explained in the PR body. Intentional or incidental?

Signed-off-by: Prekshi Vyas prekshiv@nvidia.com

coderabbitai

♻️ Duplicate comments (1)

test/e2e/test-snapshot-commands.sh (1)

63-70: ⚠️ Potential issue | 🟠 Major

Replace eval in run_capture assignment.

Line [69] reparses captured output as shell code. That’s unsafe and can mis-handle output containing shell metacharacters.

🛠️ Safer Bash-native fix

 run_capture() {
   local _var_name="$1"
   shift
   _CAPTURE_RC=0
   local _output
   _output=$("$@" 2>&1) || _CAPTURE_RC=$?
-  eval "${_var_name}=\${_output}"
+  printf -v "$_var_name" '%s' "$_output"
 }

#!/bin/bash
# Verify whether run_capture still uses eval assignment.
rg -n -A3 -B3 'run_capture\(\)|eval "\$\{_var_name\}=\$\{_output\}"|printf -v "\$_var_name"' test/e2e/test-snapshot-commands.sh

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-snapshot-commands.sh` around lines 63 - 70, In run_capture,
avoid re-parsing captured output with eval; instead assign the captured string
to the caller-provided variable name using a safe builtin (e.g., printf -v)
rather than eval "${_var_name}=${_output}"; keep the existing logic that
captures stdout+stderr into _output and sets _CAPTURE_RC, then use printf -v
"$_var_name" '%s' "$_output" (or an equivalent safe declare/indirect assignment)
so special characters aren't reinterpreted; update references in run_capture
accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@test/e2e/test-snapshot-commands.sh`:
- Around line 63-70: In run_capture, avoid re-parsing captured output with eval;
instead assign the captured string to the caller-provided variable name using a
safe builtin (e.g., printf -v) rather than eval "${_var_name}=${_output}"; keep
the existing logic that captures stdout+stderr into _output and sets
_CAPTURE_RC, then use printf -v "$_var_name" '%s' "$_output" (or an equivalent
safe declare/indirect assignment) so special characters aren't reinterpreted;
update references in run_capture accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 7a18a0da-3a90-4396-bd98-d95e8dc045a7

📥 Commits

Reviewing files that changed from the base of the PR and between 91efc24 and c458bf0.

📒 Files selected for processing (2)

test/e2e/test-deployment-services.sh
test/e2e/test-snapshot-commands.sh

🚧 Files skipped from review as they are similar to previous changes (1)

test/e2e/test-deployment-services.sh

- Replace eval with printf -v in run_capture to avoid shell injection - Replace if: false with vars.CLOUD_EXPERIMENTAL_E2E_ENABLED gate to avoid actionlint constant-expression error Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ericksoa · 2026-04-23T16:28:28Z

@prekshivyas Thanks for the thorough review. Responding to each point:

1. No nightly-e2e run on this branch

The earlier version of this PR (before the formatting fix) ran nightly-e2e as run 24838693008 — snapshot-commands-e2e, network-policy-e2e, and deployment-services-e2e all passed. The formatting commit that followed is whitespace-only (shfmt line continuation style), so the test behavior is unchanged. Happy to trigger another run if you'd prefer belt-and-suspenders.

2. `checks` job failing

Fixed in the latest push (f7b7d556). The issue was my local shfmt v3.13.0 producing different output than CI's v3.12.0 (line continuations: && at end-of-line vs start-of-next-line). Resolved by running prek locally with the CI-pinned version and committing its output.

3. `deployment-services-e2e` openshell delete+create race

I did not observe this race — good catch. The earlier validation run passed, but that may have been lucky timing. If your PR #2313 already narrows NEMOCLAW_RECREATE_SANDBOX to only network-policy-e2e, I'm happy to drop it from deployment-services-e2e here to avoid re-introducing the race. Let me know your preference and I'll update.

Merge coordination with #2313

Agreed there will be conflicts. Since #2313 has two green validation runs, happy for it to merge first and I'll rebase. Or vice versa — your call.

Minor

if: false → vars.CLOUD_EXPERIMENTAL_E2E_ENABLED: Already addressed in the latest commit per CodeRabbit feedback. Will add a comment referencing OpenShell#905 for the Landlock side. Is there an existing issue for the docs drift, or should I file one?
e2e-deploy-svc rename: Intentional — avoids colliding with the e2e-deploy name used in other test scripts. Will add a note to the PR body.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

test/e2e/test-snapshot-commands.sh (1)

49-53: Refactor duplicated diagnostics into a single helper.

The same registry/docker/binary diagnostics are now emitted in multiple places. Consolidating into one helper reduces drift and keeps future debug updates consistent.

♻️ Suggested DRY refactor

+print_runtime_diagnostics() {
+  local _stream="${1:-stdout}"
+  local _redir="/dev/stdout"
+  [ "$_stream" = "stderr" ] && _redir="/dev/stderr"
+
+  {
+    echo -e "${YELLOW}[DIAG]${NC} Sandboxes: $(openshell sandbox list 2>&1 || echo 'unavailable')"
+    echo -e "${YELLOW}[DIAG]${NC} Registry: $(cat "$HOME/.nemoclaw/sandboxes.json" 2>&1 || echo 'not found')"
+    echo -e "${YELLOW}[DIAG]${NC} Registry lock: $(ls -la "$HOME/.nemoclaw/sandboxes.json.lock" 2>&1 || echo 'no lock')"
+    echo -e "${YELLOW}[DIAG]${NC} Docker ps: $(docker ps --format '{{.Names}} {{.Status}}' 2>&1 || echo 'unavailable')"
+    echo -e "${YELLOW}[DIAG]${NC} nemoclaw path: $(command -v nemoclaw 2>&1 || echo 'not found')"
+    echo -e "${YELLOW}[DIAG]${NC} node version: $(node --version 2>&1 || echo 'not found')"
+  } >"$_redir"
+}
+
 fail() {
   echo -e "${RED}[FAIL]${NC} $1" >&2
   echo -e "${YELLOW}[DIAG]${NC} --- Failure diagnostics ---" >&2
-  ...
+  print_runtime_diagnostics stderr
   echo -e "${YELLOW}[DIAG]${NC} --- End diagnostics ---" >&2
   exit 1
 }
 
 # Phase 2b
 info "Phase 2b: Pre-snapshot diagnostics..."
-...
+print_runtime_diagnostics

Also applies to: 124-138

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-snapshot-commands.sh` around lines 49 - 53, Introduce a single
helper function (e.g., dump_diagnostics) that emits the repeated diagnostic
lines (registry file cat, registry lock ls, docker ps, command -v nemoclaw, node
--version) and prints to stderr with the same color variables (${YELLOW},
${NC}); then replace each duplicated block (the echo -e ... lines currently
duplicated) with a single call to dump_diagnostics so all diagnostics are
centralized and consistent.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-deployment-services.sh`:
- Around line 158-160: Replace the hardcoded GitHub download URL in the curl
command with a configurable environment variable (e.g., CLOUDFLARED_URL) and
update the conditional that downloads and installs cloudflared (the
curl/chmod/sudo mv block that writes to /usr/local/bin/cloudflared) to use that
variable; validate the variable before using it (exit with an error if empty)
and optionally keep a documented, approved fallback source, so the script no
longer contains a direct third-party repository link.

---

Nitpick comments:
In `@test/e2e/test-snapshot-commands.sh`:
- Around line 49-53: Introduce a single helper function (e.g., dump_diagnostics)
that emits the repeated diagnostic lines (registry file cat, registry lock ls,
docker ps, command -v nemoclaw, node --version) and prints to stderr with the
same color variables (${YELLOW}, ${NC}); then replace each duplicated block (the
echo -e ... lines currently duplicated) with a single call to dump_diagnostics
so all diagnostics are centralized and consistent.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b7118fc5-b756-421e-9359-425b99810d93

📥 Commits

Reviewing files that changed from the base of the PR and between c458bf0 and f7b7d55.

📒 Files selected for processing (3)

.github/workflows/nightly-e2e.yaml
test/e2e/test-deployment-services.sh
test/e2e/test-snapshot-commands.sh

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/nightly-e2e.yaml

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (2)

test/e2e/test-snapshot-commands.sh (1)

49-53: Optional: centralize diagnostics to avoid drift.

The same probes are now split across fail() and Phase 2b. A shared helper would keep diagnostics consistent as the script evolves.

Also applies to: 124-139

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-snapshot-commands.sh` around lines 49 - 53, Split-out duplicate
diagnostic echo commands into a single helper and call it from both fail() and
the Phase 2b block: create a function (e.g., print_diagnostics or
dump_diagnostics) that runs the registry, lock, docker, nemoclaw path and node
version probes currently shown in the snippet, replace the duplicated echo block
in the Phase 2b section and the fail() function to call that helper so
diagnostics are centralized and consistent across the script.

test/e2e/test-deployment-services.sh (1)

201-202: Consider making forced sandbox recreation overrideable.

Hardcoding recreation is fine for CI stability, but allowing an env override keeps local repro/debug flows less destructive.

♻️ Suggested tweak

-    NEMOCLAW_RECREATE_SANDBOX=1 \
+    NEMOCLAW_RECREATE_SANDBOX="${NEMOCLAW_RECREATE_SANDBOX:-1}" \

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-deployment-services.sh` around lines 201 - 202, The script
currently forces sandbox recreation by hardcoding NEMOCLAW_RECREATE_SANDBOX=1
before calling run_with_timeout and nemoclaw onboard; change this to respect an
external override by only setting NEMOCLAW_RECREATE_SANDBOX to 1 when it is not
already defined (i.e., default to 1 but allow users to export
NEMOCLAW_RECREATE_SANDBOX=0 to skip recreation), leaving the run_with_timeout
and nemoclaw onboard invocation (function/command names: run_with_timeout,
nemoclaw onboard) unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/e2e/test-deployment-services.sh`:
- Around line 201-202: The script currently forces sandbox recreation by
hardcoding NEMOCLAW_RECREATE_SANDBOX=1 before calling run_with_timeout and
nemoclaw onboard; change this to respect an external override by only setting
NEMOCLAW_RECREATE_SANDBOX to 1 when it is not already defined (i.e., default to
1 but allow users to export NEMOCLAW_RECREATE_SANDBOX=0 to skip recreation),
leaving the run_with_timeout and nemoclaw onboard invocation (function/command
names: run_with_timeout, nemoclaw onboard) unchanged.

In `@test/e2e/test-snapshot-commands.sh`:
- Around line 49-53: Split-out duplicate diagnostic echo commands into a single
helper and call it from both fail() and the Phase 2b block: create a function
(e.g., print_diagnostics or dump_diagnostics) that runs the registry, lock,
docker, nemoclaw path and node version probes currently shown in the snippet,
replace the duplicated echo block in the Phase 2b section and the fail()
function to call that helper so diagnostics are centralized and consistent
across the script.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b2e8ac3c-e235-47d3-b503-e212f19e27de

📥 Commits

Reviewing files that changed from the base of the PR and between f7b7d55 and b906f9a.

📒 Files selected for processing (2)

test/e2e/test-deployment-services.sh
test/e2e/test-snapshot-commands.sh

Move the hardcoded GitHub URL into a CLOUDFLARED_DOWNLOAD_URL env var with the current URL as default. Addresses the no-third-party-links coding guideline flagged by CodeRabbit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

prekshivyas · 2026-04-23T17:34:18Z

Two items before merge:

1. deployment-services-e2e RECREATE — drop it. My PR #2313 narrowed NEMOCLAW_RECREATE_SANDBOX=1 to only network-policy-e2e specifically because deployment-services hit the openshell delete+create race (sandbox "already exists" right after a delete). Your validation run may have gotten lucky timing; I don't want to re-introduce a flake we just fixed. Keeping NEMOCLAW_SANDBOX_NAME="e2e-deploy-svc" is fine — just drop the NEMOCLAW_RECREATE_SANDBOX: "1" line from the workflow and the NEMOCLAW_RECREATE_SANDBOX=1 line from test/e2e/test-deployment-services.sh:175.

2. cloud-experimental-e2e disable comment is partially stale. The Landlock reason no longer applies — OpenShell#810 (Landlock fix) merged 2026-04-13, OpenShell#905 closed 2026-04-22, and install-openshell.sh:39 pins MIN_VERSION=0.0.32 which is above the fix. Current comment:

# DISABLED: Landlock /sandbox writability regression + CLI/docs command
# reference drift. Re-enable once both are fixed.

Either (a) try re-enabling the job (flip vars.CLOUD_EXPERIMENTAL_E2E_ENABLED=true) — if it passes, the whole disable can come out; or (b) if docs-drift still breaks it, drop the Landlock mention from the comment and reference only the docs-drift issue (file one if none exists).

prekshivyas · 2026-04-23T19:28:41Z

Update on the cloud-experimental-e2e disable:

OpenShell v0.0.36 was released 2026-04-23 15:18 UTC and includes OpenShell#910 "preserve explicit read-only baseline paths", which fixes OpenShell#905 — the enrich_proto_baseline_paths() regression that made /sandbox writable and caused 04-landlock-readonly check #1 to fail.

So the Landlock half of the reason this job is disabled has an upstream fix available now. NemoClaw is currently pinned to 0.0.29; PR #2307 bumps to 0.0.32; a follow-up bump to 0.0.36 picks up the fix.

Suggested follow-up (separate PR, not blocking this one):

Bump OpenShell pin to >= 0.0.36 (blueprint.yaml + scripts/install-openshell.sh).
Revert the if: false on cloud-experimental-e2e in nightly-e2e.yaml and restore it to the notify-on-failure needs:/if: block.
Address the remaining CLI/docs command reference drift (the second reason cited in your comment) if it still reproduces on 0.0.36.

The docs-drift half is independent of the OpenShell version, so re-enabling still needs that piece resolved first. Flagging for visibility so the job doesn't stay disabled longer than necessary.

Signed-off-by: Prekshi Vyas prekshiv@nvidia.com

@prekshivyas

Merge main into fix/nightly-e2e-repairs-v4, resolving conflicts: - nightly-e2e.yaml: keep contains(needs.*.result, 'failure') from #2371, remove cloud-experimental-e2e from notify-on-failure needs array - test-snapshot-commands.sh: keep exit-code check from this PR + tolerant grep regex from #2314 Address reviewer feedback: - Drop NEMOCLAW_RECREATE_SANDBOX from deployment-services to avoid re-introducing the delete+create race fixed in #2313 (@prekshivyas) - Update cloud-experimental-e2e disable comment: drop stale Landlock mention (fixed in OpenShell#810 v0.0.32+), reference only docs-drift (@prekshivyas) - Extract dump_diagnostics() in test-snapshot-commands.sh to centralize duplicate probes from fail() and Phase 2b (CodeRabbit)

…ds list) Merge main into fix/nightly-e2e-repairs-v4, resolving the conflict in .github/workflows/nightly-e2e.yaml: - Keep multi-line needs array format (PR readability) - Restore cloud-experimental-e2e in needs list (from main) - Adopt cancelled-job notification condition (from #2379) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ericksoa and others added 6 commits April 23, 2026 08:43

chore(test): apply shfmt and prettier formatting

6a5d5c4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge branch 'main' into fix/nightly-e2e-repairs-v4

91efc24

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread test/e2e/test-snapshot-commands.sh

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread .github/workflows/nightly-e2e.yaml Outdated

chore(test): apply prek formatting to E2E scripts

c458bf0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

prekshivyas reviewed Apr 23, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread test/e2e/test-deployment-services.sh Outdated

ericksoa and others added 2 commits April 23, 2026 09:53

chore(test): re-apply prek formatting after CodeRabbit fixes

b906f9a

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge branch 'main' into fix/nightly-e2e-repairs-v4

567a1d1

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

ericksoa and others added 2 commits April 23, 2026 09:58

Merge branch 'main' into fix/nightly-e2e-repairs-v4

8eb74bc

jyaunches self-assigned this Apr 23, 2026

jyaunches and others added 3 commits April 23, 2026 16:01

chore(test): apply prek shfmt formatting (bn/ci flags)

8d4afcf

cv approved these changes Apr 23, 2026

View reviewed changes

ericksoa merged commit 5b1082b into main Apr 23, 2026
19 checks passed

wscurran added the E2E label Apr 23, 2026

This was referenced Apr 23, 2026

refactor(cli): typed command registry to fix docs-drift and re-enable cloud-experimental-e2e #2388

Closed

fix(e2e): use non-interactive policy-add in network policy tests #2138

Closed

wscurran added area: e2e End-to-end tests, nightly failures, or validation infrastructure bug-fix PR fixes a bug or regression and removed E2E labels Jun 3, 2026

Conversation

ericksoa commented Apr 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

prekshivyas left a comment

Choose a reason for hiding this comment

Review

Blocking

Concern worth validating

Merge coordination

Minor

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

ericksoa commented Apr 23, 2026

1. No nightly-e2e run on this branch

2. checks job failing

3. deployment-services-e2e openshell delete+create race

Merge coordination with #2313

Minor

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

prekshivyas commented Apr 23, 2026

Uh oh!

prekshivyas commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ericksoa commented Apr 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading

2. `checks` job failing

3. `deployment-services-e2e` openshell delete+create race