Skip to content

fix(test): repair four nightly E2E test failures#2367

Merged
ericksoa merged 16 commits into
mainfrom
fix/nightly-e2e-repairs-v4
Apr 23, 2026
Merged

fix(test): repair four nightly E2E test failures#2367
ericksoa merged 16 commits into
mainfrom
fix/nightly-e2e-repairs-v4

Conversation

@ericksoa

@ericksoa ericksoa commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Summary

Re-applies the E2E fixes from #2351 (reverted due to formatting violations). Formatting verified with both shfmt -i 2 -ci and prettier.

Changes

  • snapshot-commands-e2e: run_capture helper, diagnostics, fix grep assertion, exit code check
  • deployment-services-e2e: define missing variables, add NEMOCLAW_RECREATE_SANDBOX
  • network-policy-e2e: add NEMOCLAW_RECREATE_SANDBOX=1 to onboard
  • cloud-experimental-e2e: disabled until Landlock + docs drift fixed

Test plan

  • shfmt -i 2 -ci -d clean
  • prettier --check clean
  • shellcheck clean
  • CI checks pass

Summary by CodeRabbit

  • Tests

    • Expanded end-to-end failure diagnostics and pre-check logging for easier debugging.
    • Added per-run timestamped test logs and improved command-output capture to avoid premature exits.
    • Broadened snapshot success detection and strengthened snapshot lifecycle checks.
    • Ensured sandbox onboarding explicitly recreates the sandbox to enforce a clean test state.
  • Chores

    • Updated nightly E2E workflow: conditionally disabled an experimental job and refactored failure notification orchestration.
    • Set explicit sandbox parameters and forced sandbox recreation for deployment-service runs.

ericksoa and others added 6 commits April 23, 2026 08:43
The snapshot-commands E2E test uses `$(nemoclaw ... 2>&1)` inside
`set -euo pipefail`, which causes the shell to exit immediately when
the command fails — before the captured output is ever printed. This
has been masking the actual error message for every Phase 3+ failure.

Changes:
- Add `run_capture` helper that captures both output and exit code
  without triggering set -e on non-zero exit
- Replace all bare `$(nemoclaw ... 2>&1)` calls with run_capture
- Add Phase 2b pre-snapshot diagnostics: registry state, sandbox
  list, docker containers, stale lock check
- Enrich fail() diagnostics with registry contents, lock state,
  docker ps, node version

The test will still fail at the same point, but now the actual error
message from nemoclaw will be visible in CI logs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
snapshot-commands-e2e: the grep assertion looked for the literal
substring "Snapshot created" but #2184 changed the CLI output to
"Snapshot v1 created" (version inserted). Use "Snapshot.*created"
to match the new format.

deployment-services-e2e: SANDBOX_NAME and LOG_FILE were never
defined, crashing immediately on `set -u`. Add the missing variable
definitions and pass NEMOCLAW_SANDBOX_NAME in the workflow.

network-policy-e2e: after `destroy --yes` the registry entry
lingers in a not-ready state, so re-onboard fails with "already
exists but is not ready." Add NEMOCLAW_RECREATE_SANDBOX=1 to the
onboard call so it overwrites the stale entry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The onboard_sandbox() function explicitly sets its own env vars,
so the workflow-level NEMOCLAW_RECREATE_SANDBOX was not reaching
the nemoclaw onboard call. Add it inline like the network-policy
test fix.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Address CodeRabbit review: the Phase 9 help check used run_capture
but skipped the exit code check, so a non-zero exit that still
printed help text would be a false pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two unrelated failures (Landlock /sandbox writability regression,
CLI/docs command reference drift) need separate investigation.
Skip the job to stop it from noise-gating the nightly run.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Guard conditional cloud E2E job in the nightly workflow; add explicit sandbox naming and force sandbox recreation in deployment/network tests; enhance snapshot tests with a capture helper, richer diagnostics, and more robust success/failure checks.

Changes

Cohort / File(s) Summary
Workflow Configuration
.github/workflows/nightly-e2e.yaml
Add job-level if so cloud-experimental-e2e only runs for NVIDIA/NemoClaw when CLOUD_EXPERIMENTAL_E2E_ENABLED=="true"; set e2e-deploy-svc sandbox name and NEMOCLAW_RECREATE_SANDBOX=1 for deployment job; refactor notify-on-failure needs list to exclude cloud-experimental-e2e.
Sandbox Onboarding Tests
test/e2e/test-deployment-services.sh, test/e2e/test-network-policy.sh
Introduce default SANDBOX_NAME and per-run LOG_FILE in deployment test; call nemoclaw onboard with NEMOCLAW_RECREATE_SANDBOX=1 to force sandbox recreation in both tests.
Snapshot Test Enhancements
test/e2e/test-snapshot-commands.sh
Add run_capture() to capture stdout/stderr and exit codes; refactor snapshot create/list/restore flows to use it; broaden create success regex (Snapshot.*created); expand fail() diagnostics with registry/lock info, Docker status, nemoclaw path and Node version; add pre-snapshot Phase 2b diagnostics.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Poem

🐰 I hop through logs at break of day,

I nudge sandboxes to wake and play,
I capture every squeak and shout,
Then patch the snapshots inside my snout,
Hooray — the CI carrots all the way! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(test): repair four nightly E2E test failures' directly and clearly summarizes the main change—addressing and repairing four failing nightly E2E tests across multiple test suites.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/nightly-e2e-repairs-v4

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-snapshot-commands.sh`:
- Around line 58-65: The run_capture function currently uses eval to assign the
captured output into a variable name, which re-parses and can execute shell
metacharacters; change the assignment to use printf -v to set the by-name
variable safely (preserving the existing _CAPTURE_RC behavior and the local
_output capture) so that outputs containing $(), backticks, or ${...} are not
re-evaluated; update references to run_capture, _CAPTURE_RC, and _output
accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: c8288094-65e0-4b80-90a6-ed747a21a140

📥 Commits

Reviewing files that changed from the base of the PR and between 89a3ebe and 6a5d5c4.

📒 Files selected for processing (4)
  • .github/workflows/nightly-e2e.yaml
  • test/e2e/test-deployment-services.sh
  • test/e2e/test-network-policy.sh
  • test/e2e/test-snapshot-commands.sh

Comment thread test/e2e/test-snapshot-commands.sh

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/nightly-e2e.yaml:
- Around line 78-80: The job uses a constant disabled condition `if: false`
which actionlint flags; replace it with a variable-gated check like the existing
pattern (`vars.GPU_E2E_ENABLED`) by switching `if: false` to a repo-variable
condition that references a new or existing variable (e.g.,
`vars.CLOUD_EXPERIMENTAL_E2E_ENABLED`) so the job stays disabled by default but
avoids the constant-expression error; update the workflow to check that variable
and document that setting `CLOUD_EXPERIMENTAL_E2E_ENABLED` to 'true' in
repository settings will re-enable the job.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 731d71d4-0e2c-4ea6-8c59-8c4abf5589a2

📥 Commits

Reviewing files that changed from the base of the PR and between 6a5d5c4 and 91efc24.

📒 Files selected for processing (4)
  • .github/workflows/nightly-e2e.yaml
  • test/e2e/test-deployment-services.sh
  • test/e2e/test-network-policy.sh
  • test/e2e/test-snapshot-commands.sh
🚧 Files skipped from review as they are similar to previous changes (2)
  • test/e2e/test-network-policy.sh
  • test/e2e/test-snapshot-commands.sh

Comment thread .github/workflows/nightly-e2e.yaml Outdated
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

@prekshivyas prekshivyas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

Good intent — the snapshot grep fix and run_capture helper are clearly correct. Flagging a few concerns before merge.

Blocking

1. No nightly-e2e run on this branch. gh run list --branch fix/nightly-e2e-repairs-v4 --workflow nightly-e2e.yaml returns empty. Every fix here is load-bearing for nightly, but none are empirically validated. Please trigger a run and wait for deployment-services-e2e, network-policy-e2e, and snapshot-commands-e2e to all go green before merge.

2. On-PR checks job is failingFiles were modified by following hooks ... Failed. Hook modified files despite the PR body claiming shfmt -i 2 -ci and prettier --check clean. Likely a formatter version mismatch. Needs resolution.

Concern worth validating

3. deployment-services-e2e may hit the openshell k8s delete+create race. In my open PR #2313 I set workflow-level NEMOCLAW_RECREATE_SANDBOX=1 for this job (run 24811493024) and it failed with:

✓ Deleted sandbox e2e-deploy
  Creating sandbox 'e2e-deploy' (this takes a few minutes on first run)...
[~2 min image rebuild]
× sandbox 'e2e-deploy' already exists
FATAL: Onboard failed for 'e2e-deploy'

Root cause: install.sh creates the sandbox Ready, then the test's onboard with RECREATE=1 deletes + rebuilds + creates — openshell retains residual state across the rebuild window. I had to narrow the env var to only network-policy-e2e (where install.sh leaves it not-ready anyway, so the race doesn't apply). PR #2367 re-enables the same pattern; the e2e-deploye2e-deploy-svc rename doesn't address the race.

Ask: did you observe this race in any manual testing? If not, the nightly run in concern #1 will surface it.

Merge coordination

Both this PR and #2313 touch:

  • .github/workflows/nightly-e2e.yaml (different concerns — #2313 narrowed env vars, this adds them; conflict likely)
  • test/e2e/test-network-policy.sh (same inline NEMOCLAW_RECREATE_SANDBOX=1 added — should merge cleanly)
  • test/e2e/test-deployment-services.sh (different hunks — likely clean)
  • test/e2e/test-snapshot-commands.sh (both modify around Phase 3 — conflict likely)

Whichever merges first, the other needs a rebase. Happy to coordinate — #2313 has two green validation runs (24812247684 + 24813062692).

Minor

  • if: false on cloud-experimental-e2e with no tracking issue reference. Who remembers to re-enable? Please link OpenShell#905 for the Landlock side and a docs-drift tracking issue.
  • e2e-deploye2e-deploy-svc rename isn't explained in the PR body. Intentional or incidental?

Signed-off-by: Prekshi Vyas prekshiv@nvidia.com

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
test/e2e/test-snapshot-commands.sh (1)

63-70: ⚠️ Potential issue | 🟠 Major

Replace eval in run_capture assignment.

Line [69] reparses captured output as shell code. That’s unsafe and can mis-handle output containing shell metacharacters.

🛠️ Safer Bash-native fix
 run_capture() {
   local _var_name="$1"
   shift
   _CAPTURE_RC=0
   local _output
   _output=$("$@" 2>&1) || _CAPTURE_RC=$?
-  eval "${_var_name}=\${_output}"
+  printf -v "$_var_name" '%s' "$_output"
 }
#!/bin/bash
# Verify whether run_capture still uses eval assignment.
rg -n -A3 -B3 'run_capture\(\)|eval "\$\{_var_name\}=\$\{_output\}"|printf -v "\$_var_name"' test/e2e/test-snapshot-commands.sh
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-snapshot-commands.sh` around lines 63 - 70, In run_capture,
avoid re-parsing captured output with eval; instead assign the captured string
to the caller-provided variable name using a safe builtin (e.g., printf -v)
rather than eval "${_var_name}=${_output}"; keep the existing logic that
captures stdout+stderr into _output and sets _CAPTURE_RC, then use printf -v
"$_var_name" '%s' "$_output" (or an equivalent safe declare/indirect assignment)
so special characters aren't reinterpreted; update references in run_capture
accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@test/e2e/test-snapshot-commands.sh`:
- Around line 63-70: In run_capture, avoid re-parsing captured output with eval;
instead assign the captured string to the caller-provided variable name using a
safe builtin (e.g., printf -v) rather than eval "${_var_name}=${_output}"; keep
the existing logic that captures stdout+stderr into _output and sets
_CAPTURE_RC, then use printf -v "$_var_name" '%s' "$_output" (or an equivalent
safe declare/indirect assignment) so special characters aren't reinterpreted;
update references in run_capture accordingly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 7a18a0da-3a90-4396-bd98-d95e8dc045a7

📥 Commits

Reviewing files that changed from the base of the PR and between 91efc24 and c458bf0.

📒 Files selected for processing (2)
  • test/e2e/test-deployment-services.sh
  • test/e2e/test-snapshot-commands.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/e2e/test-deployment-services.sh

- Replace eval with printf -v in run_capture to avoid shell injection
- Replace if: false with vars.CLOUD_EXPERIMENTAL_E2E_ENABLED gate
  to avoid actionlint constant-expression error

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ericksoa

Copy link
Copy Markdown
Contributor Author

@prekshivyas Thanks for the thorough review. Responding to each point:

1. No nightly-e2e run on this branch

The earlier version of this PR (before the formatting fix) ran nightly-e2e as run 24838693008 — snapshot-commands-e2e, network-policy-e2e, and deployment-services-e2e all passed. The formatting commit that followed is whitespace-only (shfmt line continuation style), so the test behavior is unchanged. Happy to trigger another run if you'd prefer belt-and-suspenders.

2. checks job failing

Fixed in the latest push (f7b7d556). The issue was my local shfmt v3.13.0 producing different output than CI's v3.12.0 (line continuations: && at end-of-line vs start-of-next-line). Resolved by running prek locally with the CI-pinned version and committing its output.

3. deployment-services-e2e openshell delete+create race

I did not observe this race — good catch. The earlier validation run passed, but that may have been lucky timing. If your PR #2313 already narrows NEMOCLAW_RECREATE_SANDBOX to only network-policy-e2e, I'm happy to drop it from deployment-services-e2e here to avoid re-introducing the race. Let me know your preference and I'll update.

Merge coordination with #2313

Agreed there will be conflicts. Since #2313 has two green validation runs, happy for it to merge first and I'll rebase. Or vice versa — your call.

Minor

  • if: falsevars.CLOUD_EXPERIMENTAL_E2E_ENABLED: Already addressed in the latest commit per CodeRabbit feedback. Will add a comment referencing OpenShell#905 for the Landlock side. Is there an existing issue for the docs drift, or should I file one?
  • e2e-deploy-svc rename: Intentional — avoids colliding with the e2e-deploy name used in other test scripts. Will add a note to the PR body.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/e2e/test-snapshot-commands.sh (1)

49-53: Refactor duplicated diagnostics into a single helper.

The same registry/docker/binary diagnostics are now emitted in multiple places. Consolidating into one helper reduces drift and keeps future debug updates consistent.

♻️ Suggested DRY refactor
+print_runtime_diagnostics() {
+  local _stream="${1:-stdout}"
+  local _redir="/dev/stdout"
+  [ "$_stream" = "stderr" ] && _redir="/dev/stderr"
+
+  {
+    echo -e "${YELLOW}[DIAG]${NC} Sandboxes: $(openshell sandbox list 2>&1 || echo 'unavailable')"
+    echo -e "${YELLOW}[DIAG]${NC} Registry: $(cat "$HOME/.nemoclaw/sandboxes.json" 2>&1 || echo 'not found')"
+    echo -e "${YELLOW}[DIAG]${NC} Registry lock: $(ls -la "$HOME/.nemoclaw/sandboxes.json.lock" 2>&1 || echo 'no lock')"
+    echo -e "${YELLOW}[DIAG]${NC} Docker ps: $(docker ps --format '{{.Names}} {{.Status}}' 2>&1 || echo 'unavailable')"
+    echo -e "${YELLOW}[DIAG]${NC} nemoclaw path: $(command -v nemoclaw 2>&1 || echo 'not found')"
+    echo -e "${YELLOW}[DIAG]${NC} node version: $(node --version 2>&1 || echo 'not found')"
+  } >"$_redir"
+}
+
 fail() {
   echo -e "${RED}[FAIL]${NC} $1" >&2
   echo -e "${YELLOW}[DIAG]${NC} --- Failure diagnostics ---" >&2
-  ...
+  print_runtime_diagnostics stderr
   echo -e "${YELLOW}[DIAG]${NC} --- End diagnostics ---" >&2
   exit 1
 }
 
 # Phase 2b
 info "Phase 2b: Pre-snapshot diagnostics..."
-...
+print_runtime_diagnostics

Also applies to: 124-138

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-snapshot-commands.sh` around lines 49 - 53, Introduce a single
helper function (e.g., dump_diagnostics) that emits the repeated diagnostic
lines (registry file cat, registry lock ls, docker ps, command -v nemoclaw, node
--version) and prints to stderr with the same color variables (${YELLOW},
${NC}); then replace each duplicated block (the echo -e ... lines currently
duplicated) with a single call to dump_diagnostics so all diagnostics are
centralized and consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-deployment-services.sh`:
- Around line 158-160: Replace the hardcoded GitHub download URL in the curl
command with a configurable environment variable (e.g., CLOUDFLARED_URL) and
update the conditional that downloads and installs cloudflared (the
curl/chmod/sudo mv block that writes to /usr/local/bin/cloudflared) to use that
variable; validate the variable before using it (exit with an error if empty)
and optionally keep a documented, approved fallback source, so the script no
longer contains a direct third-party repository link.

---

Nitpick comments:
In `@test/e2e/test-snapshot-commands.sh`:
- Around line 49-53: Introduce a single helper function (e.g., dump_diagnostics)
that emits the repeated diagnostic lines (registry file cat, registry lock ls,
docker ps, command -v nemoclaw, node --version) and prints to stderr with the
same color variables (${YELLOW}, ${NC}); then replace each duplicated block (the
echo -e ... lines currently duplicated) with a single call to dump_diagnostics
so all diagnostics are centralized and consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b7118fc5-b756-421e-9359-425b99810d93

📥 Commits

Reviewing files that changed from the base of the PR and between c458bf0 and f7b7d55.

📒 Files selected for processing (3)
  • .github/workflows/nightly-e2e.yaml
  • test/e2e/test-deployment-services.sh
  • test/e2e/test-snapshot-commands.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/nightly-e2e.yaml

Comment thread test/e2e/test-deployment-services.sh Outdated
ericksoa and others added 2 commits April 23, 2026 09:53

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
test/e2e/test-snapshot-commands.sh (1)

49-53: Optional: centralize diagnostics to avoid drift.

The same probes are now split across fail() and Phase 2b. A shared helper would keep diagnostics consistent as the script evolves.

Also applies to: 124-139

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-snapshot-commands.sh` around lines 49 - 53, Split-out duplicate
diagnostic echo commands into a single helper and call it from both fail() and
the Phase 2b block: create a function (e.g., print_diagnostics or
dump_diagnostics) that runs the registry, lock, docker, nemoclaw path and node
version probes currently shown in the snippet, replace the duplicated echo block
in the Phase 2b section and the fail() function to call that helper so
diagnostics are centralized and consistent across the script.
test/e2e/test-deployment-services.sh (1)

201-202: Consider making forced sandbox recreation overrideable.

Hardcoding recreation is fine for CI stability, but allowing an env override keeps local repro/debug flows less destructive.

♻️ Suggested tweak
-    NEMOCLAW_RECREATE_SANDBOX=1 \
+    NEMOCLAW_RECREATE_SANDBOX="${NEMOCLAW_RECREATE_SANDBOX:-1}" \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-deployment-services.sh` around lines 201 - 202, The script
currently forces sandbox recreation by hardcoding NEMOCLAW_RECREATE_SANDBOX=1
before calling run_with_timeout and nemoclaw onboard; change this to respect an
external override by only setting NEMOCLAW_RECREATE_SANDBOX to 1 when it is not
already defined (i.e., default to 1 but allow users to export
NEMOCLAW_RECREATE_SANDBOX=0 to skip recreation), leaving the run_with_timeout
and nemoclaw onboard invocation (function/command names: run_with_timeout,
nemoclaw onboard) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/e2e/test-deployment-services.sh`:
- Around line 201-202: The script currently forces sandbox recreation by
hardcoding NEMOCLAW_RECREATE_SANDBOX=1 before calling run_with_timeout and
nemoclaw onboard; change this to respect an external override by only setting
NEMOCLAW_RECREATE_SANDBOX to 1 when it is not already defined (i.e., default to
1 but allow users to export NEMOCLAW_RECREATE_SANDBOX=0 to skip recreation),
leaving the run_with_timeout and nemoclaw onboard invocation (function/command
names: run_with_timeout, nemoclaw onboard) unchanged.

In `@test/e2e/test-snapshot-commands.sh`:
- Around line 49-53: Split-out duplicate diagnostic echo commands into a single
helper and call it from both fail() and the Phase 2b block: create a function
(e.g., print_diagnostics or dump_diagnostics) that runs the registry, lock,
docker, nemoclaw path and node version probes currently shown in the snippet,
replace the duplicated echo block in the Phase 2b section and the fail()
function to call that helper so diagnostics are centralized and consistent
across the script.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: b2e8ac3c-e235-47d3-b503-e212f19e27de

📥 Commits

Reviewing files that changed from the base of the PR and between f7b7d55 and b906f9a.

📒 Files selected for processing (2)
  • test/e2e/test-deployment-services.sh
  • test/e2e/test-snapshot-commands.sh

ericksoa and others added 2 commits April 23, 2026 09:58
Move the hardcoded GitHub URL into a CLOUDFLARED_DOWNLOAD_URL env var
with the current URL as default. Addresses the no-third-party-links
coding guideline flagged by CodeRabbit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@prekshivyas

Copy link
Copy Markdown
Contributor

Two items before merge:

1. deployment-services-e2e RECREATE — drop it. My PR #2313 narrowed NEMOCLAW_RECREATE_SANDBOX=1 to only network-policy-e2e specifically because deployment-services hit the openshell delete+create race (sandbox "already exists" right after a delete). Your validation run may have gotten lucky timing; I don't want to re-introduce a flake we just fixed. Keeping NEMOCLAW_SANDBOX_NAME="e2e-deploy-svc" is fine — just drop the NEMOCLAW_RECREATE_SANDBOX: "1" line from the workflow and the NEMOCLAW_RECREATE_SANDBOX=1 line from test/e2e/test-deployment-services.sh:175.

2. cloud-experimental-e2e disable comment is partially stale. The Landlock reason no longer applies — OpenShell#810 (Landlock fix) merged 2026-04-13, OpenShell#905 closed 2026-04-22, and install-openshell.sh:39 pins MIN_VERSION=0.0.32 which is above the fix. Current comment:

# DISABLED: Landlock /sandbox writability regression + CLI/docs command
# reference drift. Re-enable once both are fixed.

Either (a) try re-enabling the job (flip vars.CLOUD_EXPERIMENTAL_E2E_ENABLED=true) — if it passes, the whole disable can come out; or (b) if docs-drift still breaks it, drop the Landlock mention from the comment and reference only the docs-drift issue (file one if none exists).

@jyaunches jyaunches self-assigned this Apr 23, 2026
@prekshivyas

Copy link
Copy Markdown
Contributor

Update on the cloud-experimental-e2e disable:

OpenShell v0.0.36 was released 2026-04-23 15:18 UTC and includes OpenShell#910 "preserve explicit read-only baseline paths", which fixes OpenShell#905 — the enrich_proto_baseline_paths() regression that made /sandbox writable and caused 04-landlock-readonly check #1 to fail.

So the Landlock half of the reason this job is disabled has an upstream fix available now. NemoClaw is currently pinned to 0.0.29; PR #2307 bumps to 0.0.32; a follow-up bump to 0.0.36 picks up the fix.

Suggested follow-up (separate PR, not blocking this one):

  1. Bump OpenShell pin to >= 0.0.36 (blueprint.yaml + scripts/install-openshell.sh).
  2. Revert the if: false on cloud-experimental-e2e in nightly-e2e.yaml and restore it to the notify-on-failure needs:/if: block.
  3. Address the remaining CLI/docs command reference drift (the second reason cited in your comment) if it still reproduces on 0.0.36.

The docs-drift half is independent of the OpenShell version, so re-enabling still needs that piece resolved first. Flagging for visibility so the job doesn't stay disabled longer than necessary.

Signed-off-by: Prekshi Vyas prekshiv@nvidia.com

jyaunches and others added 3 commits April 23, 2026 16:01
Merge main into fix/nightly-e2e-repairs-v4, resolving conflicts:
- nightly-e2e.yaml: keep contains(needs.*.result, 'failure') from #2371,
  remove cloud-experimental-e2e from notify-on-failure needs array
- test-snapshot-commands.sh: keep exit-code check from this PR + tolerant
  grep regex from #2314

Address reviewer feedback:
- Drop NEMOCLAW_RECREATE_SANDBOX from deployment-services to avoid
  re-introducing the delete+create race fixed in #2313 (@prekshivyas)
- Update cloud-experimental-e2e disable comment: drop stale Landlock
  mention (fixed in OpenShell#810 v0.0.32+), reference only docs-drift
  (@prekshivyas)
- Extract dump_diagnostics() in test-snapshot-commands.sh to centralize
  duplicate probes from fail() and Phase 2b (CodeRabbit)
…ds list)

Merge main into fix/nightly-e2e-repairs-v4, resolving the conflict in
.github/workflows/nightly-e2e.yaml:
- Keep multi-line needs array format (PR readability)
- Restore cloud-experimental-e2e in needs list (from main)
- Adopt cancelled-job notification condition (from #2379)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ericksoa ericksoa merged commit 5b1082b into main Apr 23, 2026
19 checks passed
@wscurran wscurran added the E2E label Apr 23, 2026
@wscurran wscurran added area: e2e End-to-end tests, nightly failures, or validation infrastructure bug-fix PR fixes a bug or regression and removed E2E labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: e2e End-to-end tests, nightly failures, or validation infrastructure bug-fix PR fixes a bug or regression

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants