refactor(e2e): enhance test of cloudflared tunnel and test of backup/restore state#3517
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughSplit the prior Deployment & Services E2E into two focused suites: a Tunnel Lifecycle job and a State Backup/Restore job. Added a new backup/restore test script, narrowed and rewired the tunnel script, updated parity/migration docs and generated inventory, and wired two new jobs into the nightly-e2e workflow and reviews config. ChangesTunnel Lifecycle E2E
State Backup/Restore E2E + CI / docs wiring
Sequence diagram (high-level Tunnel Lifecycle flow) sequenceDiagram
participant CI as "GitHub Actions (nightly-e2e)"
participant Runner as "Workflow runner"
participant Host as "E2E host / Backup storage"
participant Sandbox as "NemoClaw sandbox"
participant Cloudflare as "cloudflared / trycloudflare edge"
CI->>Runner: dispatch `tunnel-lifecycle-e2e`
Runner->>Sandbox: onboard sandbox / start local OpenClaw
Runner->>Sandbox: wait_local_dashboard_ready()
Runner->>Sandbox: run `nemoclaw tunnel start`
Sandbox->>Cloudflare: cloudflared starts
Cloudflare-->>Sandbox: provide trycloudflare URL (or exit)
Runner->>Cloudflare: probe public URL (exponential backoff)
alt URL healthy and contains dashboard markers
Runner->>Runner: PASS TC-DEPLOY-01b
else URL absent or bad
Runner->>Runner: classify_cloudflared_log()
Runner->>Sandbox: stop tunnel, capture logs
Runner->>CI: upload failure artifacts
end
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
| # Record a skipped test. (Kept for symmetry with pass/fail; unused after SKIP→FAIL conversion.) | ||
| # shellcheck disable=SC2329 | ||
| skip() { | ||
| ((SKIP += 1)) |
There was a problem hiding this comment.
Removed the unreachable line(s)
| # shellcheck disable=SC2329 | ||
| skip() { | ||
| ((SKIP += 1)) | ||
| ((TOTAL += 1)) |
There was a problem hiding this comment.
Removed the unreachable line(s)
| skip() { | ||
| ((SKIP += 1)) | ||
| ((TOTAL += 1)) | ||
| echo -e "${YELLOW} SKIP${NC} $1 — $2" | tee -a "$LOG_FILE" |
There was a problem hiding this comment.
Removed the unreachable line(s)
E2E Advisor RecommendationRequired E2E: Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
|
Cross-link from a related PR: #3537 fixes the CLI-side of the same |
|
Got it @prekshivyas , thanks for the cross-link! |
…/Inference fields (#3537) > Re-signed replacement for #3534. Same code, single squashed commit, this time SSH-signed (the original branch had `commit.gpgsign=false` set as a local override on my clone — the prior PR's commits ended up unverified). Force-push is blocked on the original branch, so this is a fresh branch + new PR; #3534 will be closed in favour of this one. ## Summary Fixes #2604. Both @wangericnv (2026-05-11) and @cv (2026-05-14) re-reported the same symptom on v0.0.38 / v0.0.41 — `nemoclaw status` prints `● cloudflared (stopped)` in all three failure modes (no PID file, garbage PID, dead/wrong-process PID) with no cause and no remediation. The bare command also omits the `Connected:` / `Inference:` labeled fields the original bug requested. Both symptoms are addressed here. ## Root cause - **Cloudflared diagnostic.** `src/lib/actions/sandbox/doctor.ts` already distinguished three states and emitted matching hints. `showStatus()` in `src/lib/tunnel/services.ts` was written before that and only had `isRunning() ? ok : "(stopped)"` — every failure mode collapsed into the same un-actionable line. - **Bare-status fields.** PR #2884 added `Inference:` / `Connected:` labels to the per-sandbox `nemoclaw <name> status` but left bare `nemoclaw status` showing only the model in parens. ## What this PR does 1. **Share the cloudflared state machine.** New `readCloudflaredState(pidDir)` in `src/lib/tunnel/services.ts` returns a discriminated union `{ kind: "running" | "stopped" | "stale-pid-file" | "stale-pid-process" }`. `showStatus()` switches on it and emits a coloured marker + one-line remediation. `cloudflaredDoctorCheck()` consumes the same function and translates each state into its `DoctorCheck`, removing duplicated PID-file / `/proc/<pid>/cmdline` logic. 2. **Remediation wording — `no cloudflared process; restart with ...`.** All three failure modes lead with the cause and point at `nemoclaw tunnel start`. Both reporters asked for that exact shape. `nemoclaw tunnel start` already handles all three states because `isRunning()` returns false and `startService` proceeds and overwrites any stale PID file — so one command recovers in every case. The same wording is used in `doctor.ts` for consistency. 3. **Bare-status `Inference:` and `Connected:` lines.** `showStatusCommand` in `src/lib/inventory/index.ts` now prints labeled `Inference: <provider> / <model>` and `Connected: yes (N session) / no` under each sandbox row. Provider/model prefer live gateway values for the default sandbox (consistent with the existing `(model)` rendering, #2369). `getActiveSessionCount` is wired through `buildStatusCommandDeps`, mirroring the cached SSH-process probe already used by `buildListCommandDeps`. 4. **Tests** (15 new). Three cases each for `showStatus` failure-mode rendering, five cases for `readCloudflaredState`, seven for bare-status `Inference:` / `Connected:` rendering across live/stored/missing dep variants. All existing cli-doctor tests still pass against the refactored shared function. ## Behavioural diff Before (v0.0.41 baseline): ``` ● cloudflared (stopped) # identical output for all three failure modes ``` After: ``` ● cloudflared (stopped) no cloudflared process; run `nemoclaw tunnel start` to start it ● cloudflared (stale PID file) no cloudflared process (stored PID is invalid); run `nemoclaw tunnel start` to restart it ● cloudflared (stale PID 999999999) no cloudflared process (PID 999999999 is dead or not cloudflared); run `nemoclaw tunnel start` to restart it ``` Bare status sandbox row: ``` # Before test-sandbox * (qwen2.5:7b) :18789 # After test-sandbox * (qwen2.5:7b) :18789 Inference: ollama-local / qwen2.5:7b Connected: no ``` ## Brev reproduction — 3× per case, baseline and fix (Run on the prior PR #3534 branch, same code as here.) Fresh `n2d-standard-2` (Ubuntu 22.04 / Linux 6.8 GCP), v0.0.41 from tag, faked registry, three PID-dir states 3× each. Re-installed from this branch. <details> <summary><b>Baseline v0.0.41 — 9/9 runs reproduce the bug</b></summary> ``` ### Case: stopped — no PID file --- Run 1 --- ● cloudflared (stopped) --- Run 2 --- ● cloudflared (stopped) --- Run 3 --- ● cloudflared (stopped) ### Case: stale-pid-file — garbage PID contents --- Run 1 --- ● cloudflared (stopped) --- Run 2 --- ● cloudflared (stopped) --- Run 3 --- ● cloudflared (stopped) ### Case: stale-pid-process — PID is dead --- Run 1 --- ● cloudflared (stopped) --- Run 2 --- ● cloudflared (stopped) --- Run 3 --- ● cloudflared (stopped) ``` </details> <details> <summary><b>Fix branch — 9/9 runs emit a state-specific remediation</b></summary> Same three-case 3× harness, with the new wording. Identical output across reruns within each case — no flake. </details> ## Out of scope (filed/to-file as follow-ups) - **Auto-restart on crash.** Needs a watchdog daemon / systemd unit / `while true; cloudflared || sleep 5` shim. Real design work, separate PR. Manual `pkill cloudflared` is not preventable from inside the CLI either way; the right answer there is actionable status output, which this PR provides. - **Intermittent cloudflared exits in nightly E2E (#3494).** Different scope — CI-infra fault attribution. #3517 covers it; touches disjoint files. ## Test plan - [x] `npm run build:cli` clean - [x] `npx tsc -p tsconfig.cli.json` clean - [x] `vitest run src/lib/tunnel/services.test.ts` — 23 pass - [x] `vitest run src/lib/inventory/index.test.ts` — 31 pass - [x] `vitest run test/cli.test.ts -t doctor` — 4/4 pass (covers refactored `cloudflaredDoctorCheck`) - [x] Brev `n2d-standard-2` repro: 9/9 baseline reproduces; 9/9 fix shows the new diagnostic + remediation lines, no flake - [x] Commit SSH-signed and verified by GitHub (`reason: valid`) Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Status command now displays inference provider and model per sandbox. * Status command includes active SSH session count when available. * **Bug Fixes** * Enhanced cloudflared health checks with refined state detection (running, stopped, stale PID). * Improved remediation hints for cloudflared diagnostic messages. * **Tests** * Added tests for expanded status output with provider, model, and session information. * Added tests for cloudflared state detection across various scenarios. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3537) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>
…DIA/NemoClaw into refactor/intermittent-tunnel-failure
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.coderabbit.yaml:
- Around line 246-258: The .coderabbit.yaml entry for E2E hints only lists
"src/lib/deploy/**" but omits state files; add a matching path_instructions
entry that references the state implementation (e.g., "src/lib/state/sandbox.ts"
and any other state restore owners) and include the same
`state-backup-restore-e2e` instructions block so reviewers of Sandbox/restore
logic see the E2E recommendation; update the YAML to include these state paths
with the same instructions content as the deploy entry.
In `@test/e2e/docs/MIGRATION.md`:
- Around line 124-125: The entries in MIGRATION.md use emoji checklist markers
"⬜" which violates the repo's MD style; replace those emoji markers with
plain-text checkbox markers such as "[ ]" for the two lines referencing
"test-state-backup-restore.sh (378) → lifecycle/state-backup-restore/" and
"test-tunnel-lifecycle.sh (472) → lifecycle/tunnel-lifecycle/" so the lines read
with "[ ]" instead of "⬜", preserving all other text and punctuation.
In `@test/e2e/test-state-backup-restore.sh`:
- Around line 194-201: The backup and restore invocations currently mask
failures by appending "|| true" and only inspecting stdout with grep; remove the
"|| true", capture the command exit status (e.g., run backup_output=$(bash
"$REPO_ROOT/scripts/backup-workspace.sh" backup "$SANDBOX_NAME" 2>&1); rc=$?),
and if rc is non‑zero call fail with the captured output (use the same pattern
for the restore invocation), otherwise proceed to validate success text (grep -q
"Backup saved"). Also apply the same change to the corresponding restore block
(lines referencing the restore call around 276-283) so you don't rely solely on
substring matching when the process actually failed.
- Around line 186-189: The current check only fails when files_written or
memory_written are zero; change it to require full expected counts before
proceeding by comparing files_written against the expected total (5) and
memory_written against its expected total (1) and call fail (same fail
function/TC-STATE-01) if either does not equal the expected value so partial
setups (e.g., 3/5) are rejected; update the conditional that references
files_written and memory_written accordingly and keep the existing fail message
semantics.
In `@test/e2e/test-tunnel-lifecycle.sh`:
- Around line 160-205: get_cloudflared_log_path can cause the script to exit
under set -euo pipefail when the glob matches nothing; change its command
substitution to prevent non-zero exit (e.g., capture ls output with fallback:
log=$(ls -t /tmp/nemoclaw-services-*/cloudflared.log 2>/dev/null || true)) and
make the function always return 0 (echo "" and return 0 instead of returning 1)
so callers like classify_cloudflared_log and show_cloudflared_log don't
terminate the script; ensure callers treat an empty string as "no log" (they
already check -z) so behavior stays the same.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 96fabd70-9a3e-4e13-ba30-f2f78dbdc5ee
📒 Files selected for processing (7)
.coderabbit.yaml.github/workflows/nightly-e2e.yamltest/e2e/docs/MIGRATION.mdtest/e2e/docs/parity-inventory.generated.jsontest/e2e/docs/parity-map.yamltest/e2e/test-state-backup-restore.shtest/e2e/test-tunnel-lifecycle.sh
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
test/e2e/test-state-backup-restore.sh (1)
193-200:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not mask backup/restore command failures.
Line 193 and Line 275 swallow non-zero exits with
|| true, then rely on output text. This can misattribute failures and weaken diagnostics.Suggested fix
- local backup_output - backup_output=$(bash "$REPO_ROOT/scripts/backup-workspace.sh" backup "$SANDBOX_NAME" 2>&1) || true + local backup_output backup_rc=0 + backup_output=$(bash "$REPO_ROOT/scripts/backup-workspace.sh" backup "$SANDBOX_NAME" 2>&1) || backup_rc=$? log " Backup output: ${backup_output}" - if echo "$backup_output" | grep -q "Backup saved"; then + if [[ $backup_rc -eq 0 ]] && echo "$backup_output" | grep -q "Backup saved"; then pass "TC-STATE-01: Backup completed successfully" else - fail "TC-STATE-01: Backup" "backup-workspace.sh did not report success" + fail "TC-STATE-01: Backup" "backup-workspace.sh backup failed (exit=$backup_rc) or did not report success" return fi @@ - local restore_output - restore_output=$(bash "$REPO_ROOT/scripts/backup-workspace.sh" restore "$SANDBOX_NAME" 2>&1) || true + local restore_output restore_rc=0 + restore_output=$(bash "$REPO_ROOT/scripts/backup-workspace.sh" restore "$SANDBOX_NAME" 2>&1) || restore_rc=$? log " Restore output: ${restore_output}" - if echo "$restore_output" | grep -q "Restored"; then + if [[ $restore_rc -eq 0 ]] && echo "$restore_output" | grep -q "Restored"; then pass "TC-STATE-01: Restore completed successfully" else - fail "TC-STATE-01: Restore" "backup-workspace.sh restore did not report success" + fail "TC-STATE-01: Restore" "backup-workspace.sh restore failed (exit=$restore_rc) or did not report success" return fiAlso applies to: 275-283
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/test-state-backup-restore.sh` around lines 193 - 200, The test currently masks failures by running the backup command with "|| true" and only inspecting backup_output; instead run the backup command without swallowing errors, capture both its stdout/stderr into backup_output and its exit code (e.g., via bash -c or by saving $? immediately after), log the output (backup_output) and then assert success based on the exit code rather than just grepping for "Backup saved"; apply the same change for the restore invocation (restore-workspace.sh / restore_output) so failures are detected and reported with both output and a non-zero exit status.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/test-state-backup-restore.sh`:
- Line 140: The test forcibly removes the onboard lock file with rm -f
"$HOME/.nemoclaw/onboard.lock" which can clobber a legitimately held lock;
delete that rm invocation from test/e2e/test-state-backup-restore.sh and instead
rely on the existing onboard stale-lock handling described in the teardown
notes, or replace it with a safe ownership check that only removes the lock if
it is owned by the current test (e.g., read the PID/owner from the onboard.lock
and compare before deleting) so you no longer unconditionally remove
onboard.lock.
---
Duplicate comments:
In `@test/e2e/test-state-backup-restore.sh`:
- Around line 193-200: The test currently masks failures by running the backup
command with "|| true" and only inspecting backup_output; instead run the backup
command without swallowing errors, capture both its stdout/stderr into
backup_output and its exit code (e.g., via bash -c or by saving $? immediately
after), log the output (backup_output) and then assert success based on the exit
code rather than just grepping for "Backup saved"; apply the same change for the
restore invocation (restore-workspace.sh / restore_output) so failures are
detected and reported with both output and a non-zero exit status.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6d8ce6fd-d22c-4e31-b8ab-539f1cc9ed11
📒 Files selected for processing (2)
.coderabbit.yamltest/e2e/test-state-backup-restore.sh
…lock and fix soft-failure in cloudflared-log helpers under set -euo pipefail
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/e2e/test-tunnel-lifecycle.sh (1)
415-417:⚠️ Potential issue | 🟠 Major | ⚡ Quick winThe new onboard-lock rationale is still contradicted earlier in this file.
These lines say
~/.nemoclaw/onboard.lockis PID-aware and should not be unlinked, butonboard_sandbox()still deletes it on Line 147. With nightly E2E now split into multiple jobs, one run can clear another run's lock and race onboarding. Remove the pre-onboard unlink too so the lock contract is consistent end to end.Suggested fix
diff --git a/test/e2e/test-tunnel-lifecycle.sh b/test/e2e/test-tunnel-lifecycle.sh @@ - rm -f "$HOME/.nemoclaw/onboard.lock" 2>/dev/null || true NEMOCLAW_SANDBOX_NAME="$name" \ NEMOCLAW_NON_INTERACTIVE=1 \ NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 \🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/test-tunnel-lifecycle.sh` around lines 415 - 417, The file documents that ~/.nemoclaw/onboard.lock is PID-ownership-aware and must not be unlinked during teardown, but the function onboard_sandbox() still deletes that lock before onboarding; remove the pre-onboard unlink of ~/.nemoclaw/onboard.lock from onboard_sandbox() (or guard it so it does not remove that specific file) so the lock contract is consistent across jobs and only the sandbox-teardown logic manages stale locks.
♻️ Duplicate comments (1)
test/e2e/test-state-backup-restore.sh (1)
192-199:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon’t mask backup/restore command failures.
On Line 192 and Line 274,
|| truedrops non-zero exit codes, so the later text checks can misreport command failures. Capture and assert exit status first, then validate success text.Proposed fix
- local backup_output - backup_output=$(bash "$REPO_ROOT/scripts/backup-workspace.sh" backup "$SANDBOX_NAME" 2>&1) || true + local backup_output backup_rc=0 + backup_output=$(bash "$REPO_ROOT/scripts/backup-workspace.sh" backup "$SANDBOX_NAME" 2>&1) || backup_rc=$? log " Backup output: ${backup_output}" - if echo "$backup_output" | grep -q "Backup saved"; then + if [[ $backup_rc -eq 0 ]] && echo "$backup_output" | grep -q "Backup saved"; then pass "TC-STATE-01: Backup completed successfully" else - fail "TC-STATE-01: Backup" "backup-workspace.sh did not report success" + fail "TC-STATE-01: Backup" "backup-workspace.sh backup failed (exit=$backup_rc) or did not report success" return fi @@ - local restore_output - restore_output=$(bash "$REPO_ROOT/scripts/backup-workspace.sh" restore "$SANDBOX_NAME" 2>&1) || true + local restore_output restore_rc=0 + restore_output=$(bash "$REPO_ROOT/scripts/backup-workspace.sh" restore "$SANDBOX_NAME" 2>&1) || restore_rc=$? log " Restore output: ${restore_output}" - if echo "$restore_output" | grep -q "Restored"; then + if [[ $restore_rc -eq 0 ]] && echo "$restore_output" | grep -q "Restored"; then pass "TC-STATE-01: Restore completed successfully" else - fail "TC-STATE-01: Restore" "backup-workspace.sh restore did not report success" + fail "TC-STATE-01: Restore" "backup-workspace.sh restore failed (exit=$restore_rc) or did not report success" return fiAlso applies to: 274-281
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/test-state-backup-restore.sh` around lines 192 - 199, The test is masking backup/restore failures by using "|| true" when running the scripts and only inspecting output text; update the commands that invoke backup-workspace.sh (the assignment to backup_output) and the corresponding restore invocation to remove "|| true", capture the command's exit status immediately (e.g., via $? or using an if/then construct around the invocation), assert/fail the test if the exit code is non-zero before proceeding, and only then validate the stdout string (e.g., check backup_output for "Backup saved"); apply the same pattern to the restore invocation so failures are not hidden.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/test-state-backup-restore.sh`:
- Around line 216-217: The marker checks use regex grep which can falsely match
metacharacters; update the grep invocations that test for
"${marker_content}_${f}" (used alongside variables backup_dir, f, marker_content
and incrementing backup_files_ok) to use fixed-string matching by replacing
"grep -q" with "grep -Fq --" so the marker is treated literally; apply the same
change to the other similar grep calls in this script (the occurrences around
the other marker checks).
In `@test/e2e/test-tunnel-lifecycle.sh`:
- Around line 213-219: The readiness probe function probe_local_dashboard
currently appends a fallback echo which causes duplicate "000000" values on
failure; change the logic to capture curl's output into code and only set
code="000" if curl actually fails (e.g., run curl, check its exit status, and
assign "000" on non-zero), so the probe returns exactly "000" on failure. Also
fix the lock-handling discrepancy: do not unconditionally unlink
~/.nemoclaw/onboard.lock inside onboard_sandbox; instead only remove the lock if
the current process owns it (check PID ownership) or defer removal to the
teardown logic that is PID-aware, keeping the contract between onboard_sandbox
and teardown consistent.
---
Outside diff comments:
In `@test/e2e/test-tunnel-lifecycle.sh`:
- Around line 415-417: The file documents that ~/.nemoclaw/onboard.lock is
PID-ownership-aware and must not be unlinked during teardown, but the function
onboard_sandbox() still deletes that lock before onboarding; remove the
pre-onboard unlink of ~/.nemoclaw/onboard.lock from onboard_sandbox() (or guard
it so it does not remove that specific file) so the lock contract is consistent
across jobs and only the sandbox-teardown logic manages stale locks.
---
Duplicate comments:
In `@test/e2e/test-state-backup-restore.sh`:
- Around line 192-199: The test is masking backup/restore failures by using "||
true" when running the scripts and only inspecting output text; update the
commands that invoke backup-workspace.sh (the assignment to backup_output) and
the corresponding restore invocation to remove "|| true", capture the command's
exit status immediately (e.g., via $? or using an if/then construct around the
invocation), assert/fail the test if the exit code is non-zero before
proceeding, and only then validate the stdout string (e.g., check backup_output
for "Backup saved"); apply the same pattern to the restore invocation so
failures are not hidden.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e79d9e38-bca0-4dfb-ad34-cc3e1d75daff
📒 Files selected for processing (2)
test/e2e/test-state-backup-restore.shtest/e2e/test-tunnel-lifecycle.sh
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/e2e/test-state-backup-restore.sh (1)
216-217:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse fixed-string grep for strict marker assertions.
These assertions are strict/literal checks; regex
grep -qcan false-match marker strings containing metacharacters (notably.from.mdfilenames).Proposed patch
- if [[ -f "${backup_dir}/${f}" ]] && grep -q "${marker_content}_${f}" "${backup_dir}/${f}" 2>/dev/null; then + if [[ -f "${backup_dir}/${f}" ]] && grep -Fq -- "${marker_content}_${f}" "${backup_dir}/${f}" 2>/dev/null; then @@ - if ! grep -q "${marker_content}_daily" "${backup_dir}/memory/2026-04-20.md" 2>/dev/null; then + if ! grep -Fq -- "${marker_content}_daily" "${backup_dir}/memory/2026-04-20.md" 2>/dev/null; then @@ - if echo "$restored_content" | grep -q "${marker_content}_${f}"; then + if echo "$restored_content" | grep -Fq -- "${marker_content}_${f}"; then @@ - if grep -q "${marker_content}_daily" <<<"$memory_probe"; then + if grep -Fq -- "${marker_content}_daily" <<<"$memory_probe"; thenAlso applies to: 233-234, 289-290, 307-307
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/e2e/test-state-backup-restore.sh` around lines 216 - 217, The grep checks that look for literal marker strings (using grep -q "${marker_content}_${f}") can false-match due to regex metacharacters; update these to use fixed-string matching (grep -F or grep --fixed-strings) so the checks are literal, and keep the conditional logic that increments backup_files_ok and other similar checks intact; apply the same change for the other occurrences referenced (the checks around lines 233-234, 289-290, and 307) that use marker_content and backup_dir variables.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@test/e2e/test-state-backup-restore.sh`:
- Around line 216-217: The grep checks that look for literal marker strings
(using grep -q "${marker_content}_${f}") can false-match due to regex
metacharacters; update these to use fixed-string matching (grep -F or grep
--fixed-strings) so the checks are literal, and keep the conditional logic that
increments backup_files_ok and other similar checks intact; apply the same
change for the other occurrences referenced (the checks around lines 233-234,
289-290, and 307) that use marker_content and backup_dir variables.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 78d3bec7-5dfc-4d3b-9d65-3320e87642f8
📒 Files selected for processing (1)
test/e2e/test-state-backup-restore.sh
…or strict marker assertions and fix the 000000 false-positive in the local readiness probe
…DIA/NemoClaw into refactor/intermittent-tunnel-failure
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/test-state-backup-restore.sh`:
- Around line 307-310: The current success check for the memory restore probe
only looks for the marker in memory_probe; update the conditional to require
both STATE=EXISTS and the marker match: when testing memory_probe, first ensure
grep -q "^STATE=EXISTS" matches and also grep -Fq -- "${marker_content}_daily"
matches before calling pass ("TC-STATE-01..." ); keep the existing STATE=MISSING
branch that calls print_restore_output_for_diag("$restore_output") and ensure
any other non-EXISTS/non-MISSING case is handled consistently (e.g., treated as
failure) so restore success strictly requires STATE=EXISTS plus marker presence,
referencing memory_probe and marker_content in the change.
- Line 203: The current assignment to backup_dir uses lexical sort which can
pick the wrong directory; change it to select the most-recently modified
directory by sorting on mtime instead of name: build the find pipeline that
prints each directory's modification timestamp with its path, sort numerically
descending (newest first), take the first entry and extract just the path, and
assign that to the backup_dir variable so backup_dir always points to the most
recently modified backup directory.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 3db9f20a-bae2-48a3-92c8-8fb306a95842
📒 Files selected for processing (2)
test/e2e/test-state-backup-restore.shtest/e2e/test-tunnel-lifecycle.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- test/e2e/test-tunnel-lifecycle.sh
…DIA/NemoClaw into refactor/intermittent-tunnel-failure
Selective E2E Results — ❌ Some jobs failedRun: 25893893790
|
|
This extraction looks really great and is the right direction — splitting the old deployment-services E2E into focused tunnel and backup/restore jobs makes failures much easier to triage. I reviewed the current head ( Blockers
Warnings
Suggestions
Again, the decomposition is a solid move — the main thing is preserving coverage parity and making sure the new required E2E jobs are green before merge. |
|
A quick note on the E2E Advisor flow here: The Advisor did run on this PR and recommended these required jobs:
Right now, the Advisor posts the recommendations in the PR comments, but the selective E2E auto-dispatch loop is not fully tightened yet. In this run, auto-dispatch skipped with:
So for now, when you see Advisor recommendations in the PR comments, ask your agent to manually rerun those tests, for example: gh workflow run nightly-e2e.yaml \
--ref refactor/intermittent-tunnel-failure \
-f jobs=state-backup-restore-e2e,tunnel-lifecycle-e2eI manually dispatched that run for this PR as
We have plans in flight to tighten this selective-E2E loop so Advisor-required tests are easier to trigger/observe directly from the PR, but for now the PR comment recommendation is the source of truth and the selective run needs to be kicked off explicitly when auto-dispatch skips. |
…DIA/NemoClaw into refactor/intermittent-tunnel-failure
|
@jyaunches Thank you for the quick note on the E2E Advisor flow and the thorough review.
|
|
This PR is ready now. |
Summary
Split the deployment-services E2E into two focused, single-purpose files:
test-state-backup-restore.sh(workspace backup/restore lifecycle) andtest-tunnel-lifecycle.sh(cloudflared tunnel start/probe/stop). The tunnel test adds fault-attribution diagnostics that label every failure[NemoClaw fault]or[Cloudflare fault]— addressing intermittent flakes in issue #3494.Related Issue
Closes #3494
Changes
test/e2e/test-state-backup-restore.sh— TC-STATE-01 (workspace backup → destroy → recreate → restore lifecycle):FILES=(SOUL, USER, IDENTITY, AGENTS, MEMORY)— partial restore is treated as a real failure (no "partial tolerance" pass).DIRS=(memory)directory restore —MemoryDirRestoreis a hard FAIL (no SKIP-masks-bug).BackupCaptureFiles+BackupCaptureDirchecks before destroy, so a silent drop in the download chain surfaces immediately instead of as an ambiguous restore failure.MemoryDirRestoreprobe (STATE=EXISTS+ marker match /STATE=MISSING/ catch-all with SSH-rc diagnostic).FilesRestore,MemoryDirRestore.test/e2e/test-tunnel-lifecycle.sh— TC-DEPLOY-01a/b/c (cloudflared tunnel start / probe / stop), covering 4 of 5 suggestions in nightly-e2e: intermittent cloudflared tunnel failures in deployment-services-e2e #3494:localhost:${NEMOCLAW_DASHBOARD_PORT:-18789}before tunnel start; fast-fails withLocalReadinessif origin not serving — avoids ~50s wasted on a Cloudflare red herring./tmp/nemoclaw-services-<sandbox>/cloudflared.logand attributes Step 2 failures to one ofNoSpawn/CaptureBug/LocalOrigin/CloudflareRegister.CloudflareEdge) from local regression (LocalRegression).[NemoClaw fault]/[Cloudflare fault]/[Unclassified].test/e2e/test-deployment-services.sh— its concerns are now covered by the two split files above..github/workflows/nightly-e2e.yaml: replace the singledeployment-services-e2ejob with two jobs —state-backup-restore-e2e(runstest-state-backup-restore.sh) andtunnel-lifecycle-e2e(runstest-tunnel-lifecycle.sh); update the job choice list and downstreamneeds:dependencies accordingly.test-deployment-services.shentry fromtest/e2e/docs/parity-map.yaml.test-state-backup-restore.sh+ 17 fortest-tunnel-lifecycle.sh) understatus: deferredwith the standard e2e-maintainers ownership.test/e2e/docs/parity-inventory.generated.jsonviascripts/e2e/extract-legacy-assertions.ts(now 49 entrypoints / 1942 assertions); strictcheck-parity-map.tspasses.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesmake docsbuilds without warnings (doc changes only)Signed-off-by: Hung Le hple@nvidia.com
Summary by CodeRabbit
Tests
Chores
Documentation