refactor(e2e): enhance test of cloudflared tunnel and test of backup/restore state by hunglp6d · Pull Request #3517 · NVIDIA/NemoClaw

hunglp6d · 2026-05-14T15:33:31Z

Summary

Split the deployment-services E2E into two focused, single-purpose files: test-state-backup-restore.sh (workspace backup/restore lifecycle) and test-tunnel-lifecycle.sh (cloudflared tunnel start/probe/stop). The tunnel test adds fault-attribution diagnostics that label every failure [NemoClaw fault] or [Cloudflare fault] — addressing intermittent flakes in issue #3494.

Related Issue

Closes #3494

Changes

Add test/e2e/test-state-backup-restore.sh — TC-STATE-01 (workspace backup → destroy → recreate → restore lifecycle):
- Strict 5/5 verification for files under FILES=(SOUL, USER, IDENTITY, AGENTS, MEMORY) — partial restore is treated as a real failure (no "partial tolerance" pass).
- Strict verification for DIRS=(memory) directory restore — MemoryDirRestore is a hard FAIL (no SKIP-masks-bug).
- Host-side BackupCaptureFiles + BackupCaptureDir checks before destroy, so a silent drop in the download chain surfaces immediately instead of as an ambiguous restore failure.
- 3-mode MemoryDirRestore probe (STATE=EXISTS + marker match / STATE=MISSING / catch-all with SSH-rc diagnostic).
- Goal-oriented assertion labels: FilesRestore, MemoryDirRestore.
Add test/e2e/test-tunnel-lifecycle.sh — TC-DEPLOY-01a/b/c (cloudflared tunnel start / probe / stop), covering 4 of 5 suggestions in nightly-e2e: intermittent cloudflared tunnel failures in deployment-services-e2e #3494:
- Local dashboard pre-check (suggestion 3): probes localhost:${NEMOCLAW_DASHBOARD_PORT:-18789} before tunnel start; fast-fails with LocalReadiness if origin not serving — avoids ~50s wasted on a Cloudflare red herring.
- Cloudflared log classifier (suggestions 1 and 2): reads /tmp/nemoclaw-services-<sandbox>/cloudflared.log and attributes Step 2 failures to one of NoSpawn / CaptureBug / LocalOrigin / CloudflareRegister.
- Exponential backoff + mid-probe re-verify (suggestion 4): retries up to 15 times with 2 → 4 → 8 → 16 → 30s (capped); re-checks local before each retry log line to distinguish Cloudflare edge flap (CloudflareEdge) from local regression (LocalRegression).
- Fault-attribution prefixes on every fail message: [NemoClaw fault] / [Cloudflare fault] / [Unclassified].
Delete the original test/e2e/test-deployment-services.sh — its concerns are now covered by the two split files above.
Wire the new tests into .github/workflows/nightly-e2e.yaml: replace the single deployment-services-e2e job with two jobs — state-backup-restore-e2e (runs test-state-backup-restore.sh) and tunnel-lifecycle-e2e (runs test-tunnel-lifecycle.sh); update the job choice list and downstream needs: dependencies accordingly.
Update parity tracking to reflect the split:
- Remove the test-deployment-services.sh entry from test/e2e/docs/parity-map.yaml.
- Add 37 new assertion entries (20 for test-state-backup-restore.sh + 17 for test-tunnel-lifecycle.sh) under status: deferred with the standard e2e-maintainers ownership.
- Regenerate test/e2e/docs/parity-inventory.generated.json via scripts/e2e/extract-legacy-assertions.ts (now 49 entrypoints / 1942 assertions); strict check-parity-map.ts passes.

Coverage note: The legacy TC-DEPLOY-03: uninstall --keep-openshell assertion is intentionally retired. It bundled an unrelated destructive concern into the deployment-services script and no other E2E exercised it; an upcoming dedicated can be added in a follow-up PR.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
make docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Hung Le hple@nvidia.com

Summary by CodeRabbit

Tests
- Added E2E for workspace backup & restore.
- Added focused E2E for tunnel lifecycle (start/stop).
- Replaced legacy deployment-services suite with two narrower suites.
Chores
- CI workflow updated to add the new E2E jobs and include them in failure reporting and scorecard totals.
Documentation
- E2E migration docs and parity/mapping updated to reflect the new tests.

…e test

copy-pr-bot · 2026-05-14T15:33:35Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-05-14T15:33:38Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Split the prior Deployment & Services E2E into two focused suites: a Tunnel Lifecycle job and a State Backup/Restore job. Added a new backup/restore test script, narrowed and rewired the tunnel script, updated parity/migration docs and generated inventory, and wired two new jobs into the nightly-e2e workflow and reviews config.

Changes

Tunnel Lifecycle E2E

Layer / File(s)	Summary
Script scope, config, banners, entrypoint `test/e2e/test-tunnel-lifecycle.sh`	Contract and entrypoint narrowed to TC-DEPLOY-01a/b/c; new defaults (`SANDBOX_NAME`, `LOG_FILE`, `LOCAL_DASHBOARD_PORT`); banner/results labels updated; entrypoint now runs only `test_tunnel_lifecycle`.
Helpers, start/verify/stop flow, diagnostics `test/e2e/test-tunnel-lifecycle.sh`	Added `get_cloudflared_log_path`, `classify_cloudflared_log`, `show_cloudflared_log`, `probe_local_dashboard`, `wait_local_dashboard_ready`, and `test_tunnel_lifecycle`. Introduced local-dashboard readiness gating before tunnel start, exponential-backoff URL verification, log-based failure classification, and improved stop/log printing for CI triage.
Behavior removed `test/e2e/test-tunnel-lifecycle.sh`	Removed prior sandbox helper `sandbox_exec()` and removed legacy/destructive tests previously in this file (TC-STATE-02 backup/restore, TC-DEPLOY-03 uninstall, and older start/stop variants).

State Backup/Restore E2E + CI / docs wiring

Layer / File(s)	Summary
New backup/restore test script and lifecycle implementation `test/e2e/test-state-backup-restore.sh`	New TC-STATE-01 end-to-end script: strict bash, preflight (Docker, NVIDIA_API_KEY), `install_nemoclaw`, onboarding, write marker files, run `scripts/backup-workspace.sh backup`, validate host backup contents, destroy sandbox (retries), re-onboard, run restore, verify restored files via SSH, diagnostic dump on failure, teardown trap, aggregated PASS/FAIL/SKIP/TOTAL, timestamped logs.
Parity, migration, and generated inventory updates `test/e2e/docs/parity-map.yaml`, `test/e2e/docs/MIGRATION.md`, `test/e2e/docs/parity-inventory.generated.json`	Removed legacy `test-deployment-services.sh` entry; added `test-state-backup-restore.sh` and `test-tunnel-lifecycle.sh` mappings and generated inventory entries; updated Wave 9 migration list and totals in generated inventory.
CI workflow additions and orchestration wiring `.github/workflows/nightly-e2e.yaml`, `.coderabbit.yaml`	Added `state-backup-restore-e2e` and `tunnel-lifecycle-e2e` job definitions and selective-dispatch predicates to nightly-e2e; wired both jobs into downstream `needs` lists (`notify-on-failure`, `report-to-pr`, `scorecard`); updated `.coderabbit.yaml` reviews.path_instructions to recommend the new job names and adjusted `gh workflow run nightly-e2e.yaml` `inputs.jobs` allowlist snippets accordingly.

Sequence diagram (high-level Tunnel Lifecycle flow)

sequenceDiagram
  participant CI as "GitHub Actions (nightly-e2e)"
  participant Runner as "Workflow runner"
  participant Host as "E2E host / Backup storage"
  participant Sandbox as "NemoClaw sandbox"
  participant Cloudflare as "cloudflared / trycloudflare edge"

  CI->>Runner: dispatch `tunnel-lifecycle-e2e`
  Runner->>Sandbox: onboard sandbox / start local OpenClaw
  Runner->>Sandbox: wait_local_dashboard_ready()
  Runner->>Sandbox: run `nemoclaw tunnel start`
  Sandbox->>Cloudflare: cloudflared starts
  Cloudflare-->>Sandbox: provide trycloudflare URL (or exit)
  Runner->>Cloudflare: probe public URL (exponential backoff)
  alt URL healthy and contains dashboard markers
    Runner->>Runner: PASS TC-DEPLOY-01b
  else URL absent or bad
    Runner->>Runner: classify_cloudflared_log()
    Runner->>Sandbox: stop tunnel, capture logs
    Runner->>CI: upload failure artifacts
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

NVIDIA/NemoClaw#3426: Dispatcher/auto-dispatch logic that derives selectable nightly jobs from the same inputs.jobs predicate; job-set changes here affect that dispatcher.

Suggested labels

refactor, E2E, enhancement: testing, CI/CD

Suggested reviewers

jyaunches
prekshivyas
ericksoa

"A rabbit cheers the nightly run,
Tunnels hum and backups spun,
Logs whisper why the start ran cold,
Helpers parse the tail of old,
Tests now focused — hop, job done!"

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main changes: refactoring E2E tests by enhancing tunnel lifecycle and backup/restore state testing.
Linked Issues check	✅ Passed	The PR implements all coding requirements from issue `#3494`: cloudflared logging capture, local dashboard health checks before tunnel start, exponential backoff for URL polling, fault attribution, and separation of tunnel and backup/restore tests.
Out of Scope Changes check	✅ Passed	All changes are scoped to E2E test refactoring and CI configuration updates directly addressing issue `#3494`; no unrelated changes detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refactor/intermittent-tunnel-failure

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

hunglp6d · 2026-05-14T20:23:39Z

+# Record a skipped test. (Kept for symmetry with pass/fail; unused after SKIP→FAIL conversion.)
+# shellcheck disable=SC2329
+skip() {
+  ((SKIP += 1))


Removed the unreachable line(s)

hunglp6d · 2026-05-14T20:23:44Z

+# shellcheck disable=SC2329
+skip() {
+  ((SKIP += 1))
+  ((TOTAL += 1))


Removed the unreachable line(s)

hunglp6d · 2026-05-14T20:23:22Z

+skip() {
+  ((SKIP += 1))
+  ((TOTAL += 1))
+  echo -e "${YELLOW}  SKIP${NC} $1 — $2" | tee -a "$LOG_FILE"


Removed the unreachable line(s)

github-actions · 2026-05-14T15:35:54Z

E2E Advisor Recommendation

Required E2E: state-backup-restore-e2e, tunnel-lifecycle-e2e
Optional E2E: None

Dispatch hint: state-backup-restore-e2e,tunnel-lifecycle-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

state-backup-restore-e2e (high): Validates the newly added workflow job and script for backup/restore workspace state across sandbox destroy and re-onboard. Required because this PR changes the actual E2E job definition and test script for a sandbox state lifecycle flow.
tunnel-lifecycle-e2e (high): Validates the newly added workflow job and script for cloudflared tunnel start/probe/stop. Required because this PR changes the actual E2E job definition and test script for deployment tunnel lifecycle coverage.

Optional E2E

None.

New E2E recommendations

None.

Dispatch hint

Workflow: nightly-e2e.yaml
jobs input: state-backup-restore-e2e,tunnel-lifecycle-e2e

prekshivyas · 2026-05-14T18:53:42Z

Cross-link from a related PR: #3537 fixes the CLI-side of the same cloudflared (stopped) symptom your evidence table cites in #3494 (run 25615293545 etc.). Different root cause from #3494 (manual kill in #2604 vs CI quick-tunnel flake here), different layer (CLI rendering vs shell-script test attribution), and we touch disjoint files — but with #3537 merged, the renderer would distinguish (stale PID N) from (stopped) so your cloudflared.log-based classifier in this PR gets a complementary signal straight from nemoclaw status. No dependency either direction; flagging for awareness.

…unnel lifecycle diagnostics

hunglp6d · 2026-05-14T19:22:24Z

Got it @prekshivyas , thanks for the cross-link!
This PR is a refactor of test-deployment-services.sh into two focused files with diagnostic enhancements (structured logging, fault-attribution prefixes, cloudflared.log classifier, local pre-check + backoff).
Agreed #3537 is complementary — once it lands, the (stale PID N) vs (stopped) distinction pairs nicely with the log-level classifier here. Happy to do a follow-up PR to add a StalePID label after #3537 merges.
Thanks for flagging!

@wangericnv

…/Inference fields (#3537) > Re-signed replacement for #3534. Same code, single squashed commit, this time SSH-signed (the original branch had `commit.gpgsign=false` set as a local override on my clone — the prior PR's commits ended up unverified). Force-push is blocked on the original branch, so this is a fresh branch + new PR; #3534 will be closed in favour of this one. ## Summary Fixes #2604. Both @wangericnv (2026-05-11) and @cv (2026-05-14) re-reported the same symptom on v0.0.38 / v0.0.41 — `nemoclaw status` prints `● cloudflared (stopped)` in all three failure modes (no PID file, garbage PID, dead/wrong-process PID) with no cause and no remediation. The bare command also omits the `Connected:` / `Inference:` labeled fields the original bug requested. Both symptoms are addressed here. ## Root cause - **Cloudflared diagnostic.** `src/lib/actions/sandbox/doctor.ts` already distinguished three states and emitted matching hints. `showStatus()` in `src/lib/tunnel/services.ts` was written before that and only had `isRunning() ? ok : "(stopped)"` — every failure mode collapsed into the same un-actionable line. - **Bare-status fields.** PR #2884 added `Inference:` / `Connected:` labels to the per-sandbox `nemoclaw <name> status` but left bare `nemoclaw status` showing only the model in parens. ## What this PR does 1. **Share the cloudflared state machine.** New `readCloudflaredState(pidDir)` in `src/lib/tunnel/services.ts` returns a discriminated union `{ kind: "running" | "stopped" | "stale-pid-file" | "stale-pid-process" }`. `showStatus()` switches on it and emits a coloured marker + one-line remediation. `cloudflaredDoctorCheck()` consumes the same function and translates each state into its `DoctorCheck`, removing duplicated PID-file / `/proc/<pid>/cmdline` logic. 2. **Remediation wording — `no cloudflared process; restart with ...`.** All three failure modes lead with the cause and point at `nemoclaw tunnel start`. Both reporters asked for that exact shape. `nemoclaw tunnel start` already handles all three states because `isRunning()` returns false and `startService` proceeds and overwrites any stale PID file — so one command recovers in every case. The same wording is used in `doctor.ts` for consistency. 3. **Bare-status `Inference:` and `Connected:` lines.** `showStatusCommand` in `src/lib/inventory/index.ts` now prints labeled `Inference: <provider> / <model>` and `Connected: yes (N session) / no` under each sandbox row. Provider/model prefer live gateway values for the default sandbox (consistent with the existing `(model)` rendering, #2369). `getActiveSessionCount` is wired through `buildStatusCommandDeps`, mirroring the cached SSH-process probe already used by `buildListCommandDeps`. 4. **Tests** (15 new). Three cases each for `showStatus` failure-mode rendering, five cases for `readCloudflaredState`, seven for bare-status `Inference:` / `Connected:` rendering across live/stored/missing dep variants. All existing cli-doctor tests still pass against the refactored shared function. ## Behavioural diff Before (v0.0.41 baseline): ``` ● cloudflared (stopped) # identical output for all three failure modes ``` After: ``` ● cloudflared (stopped) no cloudflared process; run `nemoclaw tunnel start` to start it ● cloudflared (stale PID file) no cloudflared process (stored PID is invalid); run `nemoclaw tunnel start` to restart it ● cloudflared (stale PID 999999999) no cloudflared process (PID 999999999 is dead or not cloudflared); run `nemoclaw tunnel start` to restart it ``` Bare status sandbox row: ``` # Before test-sandbox * (qwen2.5:7b) :18789 # After test-sandbox * (qwen2.5:7b) :18789 Inference: ollama-local / qwen2.5:7b Connected: no ``` ## Brev reproduction — 3× per case, baseline and fix (Run on the prior PR #3534 branch, same code as here.) Fresh `n2d-standard-2` (Ubuntu 22.04 / Linux 6.8 GCP), v0.0.41 from tag, faked registry, three PID-dir states 3× each. Re-installed from this branch. <details> <summary><b>Baseline v0.0.41 — 9/9 runs reproduce the bug</b></summary> ``` ### Case: stopped — no PID file --- Run 1 --- ● cloudflared (stopped) --- Run 2 --- ● cloudflared (stopped) --- Run 3 --- ● cloudflared (stopped) ### Case: stale-pid-file — garbage PID contents --- Run 1 --- ● cloudflared (stopped) --- Run 2 --- ● cloudflared (stopped) --- Run 3 --- ● cloudflared (stopped) ### Case: stale-pid-process — PID is dead --- Run 1 --- ● cloudflared (stopped) --- Run 2 --- ● cloudflared (stopped) --- Run 3 --- ● cloudflared (stopped) ``` </details> <details> <summary><b>Fix branch — 9/9 runs emit a state-specific remediation</b></summary> Same three-case 3× harness, with the new wording. Identical output across reruns within each case — no flake. </details> ## Out of scope (filed/to-file as follow-ups) - **Auto-restart on crash.** Needs a watchdog daemon / systemd unit / `while true; cloudflared || sleep 5` shim. Real design work, separate PR. Manual `pkill cloudflared` is not preventable from inside the CLI either way; the right answer there is actionable status output, which this PR provides. - **Intermittent cloudflared exits in nightly E2E (#3494).** Different scope — CI-infra fault attribution. #3517 covers it; touches disjoint files. ## Test plan - [x] `npm run build:cli` clean - [x] `npx tsc -p tsconfig.cli.json` clean - [x] `vitest run src/lib/tunnel/services.test.ts` — 23 pass - [x] `vitest run src/lib/inventory/index.test.ts` — 31 pass - [x] `vitest run test/cli.test.ts -t doctor` — 4/4 pass (covers refactored `cloudflaredDoctorCheck`) - [x] Brev `n2d-standard-2` repro: 9/9 baseline reproduces; 9/9 fix shows the new diagnostic + remediation lines, no flake - [x] Commit SSH-signed and verified by GitHub (`reason: valid`) Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>  ## Summary by CodeRabbit ## Release Notes * **New Features** * Status command now displays inference provider and model per sandbox. * Status command includes active SSH session count when available. * **Bug Fixes** * Enhanced cloudflared health checks with refined state detection (running, stopped, stale PID). * Improved remediation hints for cloudflared diagnostic messages. * **Tests** * Added tests for expanded status output with provider, model, and session information. * Added tests for cloudflared state detection across various scenarios.  [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3537)   Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>

…DIA/NemoClaw into refactor/intermittent-tunnel-failure

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.coderabbit.yaml:
- Around line 246-258: The .coderabbit.yaml entry for E2E hints only lists
"src/lib/deploy/**" but omits state files; add a matching path_instructions
entry that references the state implementation (e.g., "src/lib/state/sandbox.ts"
and any other state restore owners) and include the same
`state-backup-restore-e2e` instructions block so reviewers of Sandbox/restore
logic see the E2E recommendation; update the YAML to include these state paths
with the same instructions content as the deploy entry.

In `@test/e2e/docs/MIGRATION.md`:
- Around line 124-125: The entries in MIGRATION.md use emoji checklist markers
"⬜" which violates the repo's MD style; replace those emoji markers with
plain-text checkbox markers such as "[ ]" for the two lines referencing
"test-state-backup-restore.sh (378) → lifecycle/state-backup-restore/" and
"test-tunnel-lifecycle.sh (472) → lifecycle/tunnel-lifecycle/" so the lines read
with "[ ]" instead of "⬜", preserving all other text and punctuation.

In `@test/e2e/test-state-backup-restore.sh`:
- Around line 194-201: The backup and restore invocations currently mask
failures by appending "|| true" and only inspecting stdout with grep; remove the
"|| true", capture the command exit status (e.g., run backup_output=$(bash
"$REPO_ROOT/scripts/backup-workspace.sh" backup "$SANDBOX_NAME" 2>&1); rc=$?),
and if rc is non‑zero call fail with the captured output (use the same pattern
for the restore invocation), otherwise proceed to validate success text (grep -q
"Backup saved"). Also apply the same change to the corresponding restore block
(lines referencing the restore call around 276-283) so you don't rely solely on
substring matching when the process actually failed.
- Around line 186-189: The current check only fails when files_written or
memory_written are zero; change it to require full expected counts before
proceeding by comparing files_written against the expected total (5) and
memory_written against its expected total (1) and call fail (same fail
function/TC-STATE-01) if either does not equal the expected value so partial
setups (e.g., 3/5) are rejected; update the conditional that references
files_written and memory_written accordingly and keep the existing fail message
semantics.

In `@test/e2e/test-tunnel-lifecycle.sh`:
- Around line 160-205: get_cloudflared_log_path can cause the script to exit
under set -euo pipefail when the glob matches nothing; change its command
substitution to prevent non-zero exit (e.g., capture ls output with fallback:
log=$(ls -t /tmp/nemoclaw-services-*/cloudflared.log 2>/dev/null || true)) and
make the function always return 0 (echo "" and return 0 instead of returning 1)
so callers like classify_cloudflared_log and show_cloudflared_log don't
terminate the script; ensure callers treat an empty string as "no log" (they
already check -z) so behavior stays the same.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 96fabd70-9a3e-4e13-ba30-f2f78dbdc5ee

📥 Commits

Reviewing files that changed from the base of the PR and between f9bc5f1 and fc76854.

📒 Files selected for processing (7)

.coderabbit.yaml
.github/workflows/nightly-e2e.yaml
test/e2e/docs/MIGRATION.md
test/e2e/docs/parity-inventory.generated.json
test/e2e/docs/parity-map.yaml
test/e2e/test-state-backup-restore.sh
test/e2e/test-tunnel-lifecycle.sh

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

test/e2e/test-state-backup-restore.sh (1)

193-200: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not mask backup/restore command failures.

Line 193 and Line 275 swallow non-zero exits with || true, then rely on output text. This can misattribute failures and weaken diagnostics.

Suggested fix

-  local backup_output
-  backup_output=$(bash "$REPO_ROOT/scripts/backup-workspace.sh" backup "$SANDBOX_NAME" 2>&1) || true
+  local backup_output backup_rc=0
+  backup_output=$(bash "$REPO_ROOT/scripts/backup-workspace.sh" backup "$SANDBOX_NAME" 2>&1) || backup_rc=$?
   log "  Backup output: ${backup_output}"

-  if echo "$backup_output" | grep -q "Backup saved"; then
+  if [[ $backup_rc -eq 0 ]] && echo "$backup_output" | grep -q "Backup saved"; then
     pass "TC-STATE-01: Backup completed successfully"
   else
-    fail "TC-STATE-01: Backup" "backup-workspace.sh did not report success"
+    fail "TC-STATE-01: Backup" "backup-workspace.sh backup failed (exit=$backup_rc) or did not report success"
     return
   fi
@@
-  local restore_output
-  restore_output=$(bash "$REPO_ROOT/scripts/backup-workspace.sh" restore "$SANDBOX_NAME" 2>&1) || true
+  local restore_output restore_rc=0
+  restore_output=$(bash "$REPO_ROOT/scripts/backup-workspace.sh" restore "$SANDBOX_NAME" 2>&1) || restore_rc=$?
   log "  Restore output: ${restore_output}"

-  if echo "$restore_output" | grep -q "Restored"; then
+  if [[ $restore_rc -eq 0 ]] && echo "$restore_output" | grep -q "Restored"; then
     pass "TC-STATE-01: Restore completed successfully"
   else
-    fail "TC-STATE-01: Restore" "backup-workspace.sh restore did not report success"
+    fail "TC-STATE-01: Restore" "backup-workspace.sh restore failed (exit=$restore_rc) or did not report success"
     return
   fi

Also applies to: 275-283

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-state-backup-restore.sh` around lines 193 - 200, The test
currently masks failures by running the backup command with "|| true" and only
inspecting backup_output; instead run the backup command without swallowing
errors, capture both its stdout/stderr into backup_output and its exit code
(e.g., via bash -c or by saving $? immediately after), log the output
(backup_output) and then assert success based on the exit code rather than just
grepping for "Backup saved"; apply the same change for the restore invocation
(restore-workspace.sh / restore_output) so failures are detected and reported
with both output and a non-zero exit status.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/test-state-backup-restore.sh`:
- Line 140: The test forcibly removes the onboard lock file with rm -f
"$HOME/.nemoclaw/onboard.lock" which can clobber a legitimately held lock;
delete that rm invocation from test/e2e/test-state-backup-restore.sh and instead
rely on the existing onboard stale-lock handling described in the teardown
notes, or replace it with a safe ownership check that only removes the lock if
it is owned by the current test (e.g., read the PID/owner from the onboard.lock
and compare before deleting) so you no longer unconditionally remove
onboard.lock.

---

Duplicate comments:
In `@test/e2e/test-state-backup-restore.sh`:
- Around line 193-200: The test currently masks failures by running the backup
command with "|| true" and only inspecting backup_output; instead run the backup
command without swallowing errors, capture both its stdout/stderr into
backup_output and its exit code (e.g., via bash -c or by saving $? immediately
after), log the output (backup_output) and then assert success based on the exit
code rather than just grepping for "Backup saved"; apply the same change for the
restore invocation (restore-workspace.sh / restore_output) so failures are
detected and reported with both output and a non-zero exit status.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6d8ce6fd-d22c-4e31-b8ab-539f1cc9ed11

📥 Commits

Reviewing files that changed from the base of the PR and between fc76854 and 48574cb.

📒 Files selected for processing (2)

.coderabbit.yaml
test/e2e/test-state-backup-restore.sh

…lock and fix soft-failure in cloudflared-log helpers under set -euo pipefail

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

test/e2e/test-tunnel-lifecycle.sh (1)
415-417: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

The new onboard-lock rationale is still contradicted earlier in this file.

These lines say ~/.nemoclaw/onboard.lock is PID-aware and should not be unlinked, but onboard_sandbox() still deletes it on Line 147. With nightly E2E now split into multiple jobs, one run can clear another run's lock and race onboarding. Remove the pre-onboard unlink too so the lock contract is consistent end to end.
Suggested fix
diff --git a/test/e2e/test-tunnel-lifecycle.sh b/test/e2e/test-tunnel-lifecycle.sh
@@
-  rm -f "$HOME/.nemoclaw/onboard.lock" 2>/dev/null || true
   NEMOCLAW_SANDBOX_NAME="$name" \
     NEMOCLAW_NON_INTERACTIVE=1 \
     NEMOCLAW_ACCEPT_THIRD_PARTY_SOFTWARE=1 \
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-tunnel-lifecycle.sh` around lines 415 - 417, The file documents
that ~/.nemoclaw/onboard.lock is PID-ownership-aware and must not be unlinked
during teardown, but the function onboard_sandbox() still deletes that lock
before onboarding; remove the pre-onboard unlink of ~/.nemoclaw/onboard.lock
from onboard_sandbox() (or guard it so it does not remove that specific file) so
the lock contract is consistent across jobs and only the sandbox-teardown logic
manages stale locks.

♻️ Duplicate comments (1)

test/e2e/test-state-backup-restore.sh (1)

192-199: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t mask backup/restore command failures.

On Line 192 and Line 274, || true drops non-zero exit codes, so the later text checks can misreport command failures. Capture and assert exit status first, then validate success text.

Proposed fix

-  local backup_output
-  backup_output=$(bash "$REPO_ROOT/scripts/backup-workspace.sh" backup "$SANDBOX_NAME" 2>&1) || true
+  local backup_output backup_rc=0
+  backup_output=$(bash "$REPO_ROOT/scripts/backup-workspace.sh" backup "$SANDBOX_NAME" 2>&1) || backup_rc=$?
   log "  Backup output: ${backup_output}"
 
-  if echo "$backup_output" | grep -q "Backup saved"; then
+  if [[ $backup_rc -eq 0 ]] && echo "$backup_output" | grep -q "Backup saved"; then
     pass "TC-STATE-01: Backup completed successfully"
   else
-    fail "TC-STATE-01: Backup" "backup-workspace.sh did not report success"
+    fail "TC-STATE-01: Backup" "backup-workspace.sh backup failed (exit=$backup_rc) or did not report success"
     return
   fi
@@
-  local restore_output
-  restore_output=$(bash "$REPO_ROOT/scripts/backup-workspace.sh" restore "$SANDBOX_NAME" 2>&1) || true
+  local restore_output restore_rc=0
+  restore_output=$(bash "$REPO_ROOT/scripts/backup-workspace.sh" restore "$SANDBOX_NAME" 2>&1) || restore_rc=$?
   log "  Restore output: ${restore_output}"
 
-  if echo "$restore_output" | grep -q "Restored"; then
+  if [[ $restore_rc -eq 0 ]] && echo "$restore_output" | grep -q "Restored"; then
     pass "TC-STATE-01: Restore completed successfully"
   else
-    fail "TC-STATE-01: Restore" "backup-workspace.sh restore did not report success"
+    fail "TC-STATE-01: Restore" "backup-workspace.sh restore failed (exit=$restore_rc) or did not report success"
     return
   fi

Also applies to: 274-281

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-state-backup-restore.sh` around lines 192 - 199, The test is
masking backup/restore failures by using "|| true" when running the scripts and
only inspecting output text; update the commands that invoke backup-workspace.sh
(the assignment to backup_output) and the corresponding restore invocation to
remove "|| true", capture the command's exit status immediately (e.g., via $? or
using an if/then construct around the invocation), assert/fail the test if the
exit code is non-zero before proceeding, and only then validate the stdout
string (e.g., check backup_output for "Backup saved"); apply the same pattern to
the restore invocation so failures are not hidden.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/test-state-backup-restore.sh`:
- Around line 216-217: The marker checks use regex grep which can falsely match
metacharacters; update the grep invocations that test for
"${marker_content}_${f}" (used alongside variables backup_dir, f, marker_content
and incrementing backup_files_ok) to use fixed-string matching by replacing
"grep -q" with "grep -Fq --" so the marker is treated literally; apply the same
change to the other similar grep calls in this script (the occurrences around
the other marker checks).

In `@test/e2e/test-tunnel-lifecycle.sh`:
- Around line 213-219: The readiness probe function probe_local_dashboard
currently appends a fallback echo which causes duplicate "000000" values on
failure; change the logic to capture curl's output into code and only set
code="000" if curl actually fails (e.g., run curl, check its exit status, and
assign "000" on non-zero), so the probe returns exactly "000" on failure. Also
fix the lock-handling discrepancy: do not unconditionally unlink
~/.nemoclaw/onboard.lock inside onboard_sandbox; instead only remove the lock if
the current process owns it (check PID ownership) or defer removal to the
teardown logic that is PID-aware, keeping the contract between onboard_sandbox
and teardown consistent.

---

Outside diff comments:
In `@test/e2e/test-tunnel-lifecycle.sh`:
- Around line 415-417: The file documents that ~/.nemoclaw/onboard.lock is
PID-ownership-aware and must not be unlinked during teardown, but the function
onboard_sandbox() still deletes that lock before onboarding; remove the
pre-onboard unlink of ~/.nemoclaw/onboard.lock from onboard_sandbox() (or guard
it so it does not remove that specific file) so the lock contract is consistent
across jobs and only the sandbox-teardown logic manages stale locks.

---

Duplicate comments:
In `@test/e2e/test-state-backup-restore.sh`:
- Around line 192-199: The test is masking backup/restore failures by using "||
true" when running the scripts and only inspecting output text; update the
commands that invoke backup-workspace.sh (the assignment to backup_output) and
the corresponding restore invocation to remove "|| true", capture the command's
exit status immediately (e.g., via $? or using an if/then construct around the
invocation), assert/fail the test if the exit code is non-zero before
proceeding, and only then validate the stdout string (e.g., check backup_output
for "Backup saved"); apply the same pattern to the restore invocation so
failures are not hidden.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e79d9e38-bca0-4dfb-ad34-cc3e1d75daff

📥 Commits

Reviewing files that changed from the base of the PR and between 48574cb and 59f3c61.

📒 Files selected for processing (2)

test/e2e/test-state-backup-restore.sh
test/e2e/test-tunnel-lifecycle.sh

… failures

coderabbitai

♻️ Duplicate comments (1)

test/e2e/test-state-backup-restore.sh (1)

216-217: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use fixed-string grep for strict marker assertions.

These assertions are strict/literal checks; regex grep -q can false-match marker strings containing metacharacters (notably . from .md filenames).

Proposed patch

-    if [[ -f "${backup_dir}/${f}" ]] && grep -q "${marker_content}_${f}" "${backup_dir}/${f}" 2>/dev/null; then
+    if [[ -f "${backup_dir}/${f}" ]] && grep -Fq -- "${marker_content}_${f}" "${backup_dir}/${f}" 2>/dev/null; then
@@
-  if ! grep -q "${marker_content}_daily" "${backup_dir}/memory/2026-04-20.md" 2>/dev/null; then
+  if ! grep -Fq -- "${marker_content}_daily" "${backup_dir}/memory/2026-04-20.md" 2>/dev/null; then
@@
-    if echo "$restored_content" | grep -q "${marker_content}_${f}"; then
+    if echo "$restored_content" | grep -Fq -- "${marker_content}_${f}"; then
@@
-  if grep -q "${marker_content}_daily" <<<"$memory_probe"; then
+  if grep -Fq -- "${marker_content}_daily" <<<"$memory_probe"; then

Also applies to: 233-234, 289-290, 307-307

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-state-backup-restore.sh` around lines 216 - 217, The grep
checks that look for literal marker strings (using grep -q
"${marker_content}_${f}") can false-match due to regex metacharacters; update
these to use fixed-string matching (grep -F or grep --fixed-strings) so the
checks are literal, and keep the conditional logic that increments
backup_files_ok and other similar checks intact; apply the same change for the
other occurrences referenced (the checks around lines 233-234, 289-290, and 307)
that use marker_content and backup_dir variables.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@test/e2e/test-state-backup-restore.sh`:
- Around line 216-217: The grep checks that look for literal marker strings
(using grep -q "${marker_content}_${f}") can false-match due to regex
metacharacters; update these to use fixed-string matching (grep -F or grep
--fixed-strings) so the checks are literal, and keep the conditional logic that
increments backup_files_ok and other similar checks intact; apply the same
change for the other occurrences referenced (the checks around lines 233-234,
289-290, and 307) that use marker_content and backup_dir variables.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 78d3bec7-5dfc-4d3b-9d65-3320e87642f8

📥 Commits

Reviewing files that changed from the base of the PR and between 59f3c61 and 07c0565.

📒 Files selected for processing (1)

test/e2e/test-state-backup-restore.sh

…or strict marker assertions and fix the 000000 false-positive in the local readiness probe

…DIA/NemoClaw into refactor/intermittent-tunnel-failure

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/test-state-backup-restore.sh`:
- Around line 307-310: The current success check for the memory restore probe
only looks for the marker in memory_probe; update the conditional to require
both STATE=EXISTS and the marker match: when testing memory_probe, first ensure
grep -q "^STATE=EXISTS" matches and also grep -Fq -- "${marker_content}_daily"
matches before calling pass ("TC-STATE-01..." ); keep the existing STATE=MISSING
branch that calls print_restore_output_for_diag("$restore_output") and ensure
any other non-EXISTS/non-MISSING case is handled consistently (e.g., treated as
failure) so restore success strictly requires STATE=EXISTS plus marker presence,
referencing memory_probe and marker_content in the change.
- Line 203: The current assignment to backup_dir uses lexical sort which can
pick the wrong directory; change it to select the most-recently modified
directory by sorting on mtime instead of name: build the find pipeline that
prints each directory's modification timestamp with its path, sort numerically
descending (newest first), take the first entry and extract just the path, and
assign that to the backup_dir variable so backup_dir always points to the most
recently modified backup directory.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3db9f20a-bae2-48a3-92c8-8fb306a95842

📥 Commits

Reviewing files that changed from the base of the PR and between 07c0565 and 673e789.

📒 Files selected for processing (2)

test/e2e/test-state-backup-restore.sh
test/e2e/test-tunnel-lifecycle.sh

🚧 Files skipped from review as they are similar to previous changes (1)

test/e2e/test-tunnel-lifecycle.sh

…DIA/NemoClaw into refactor/intermittent-tunnel-failure

…-tunnel-failure

github-actions · 2026-05-15T00:56:13Z

Selective E2E Results — ❌ Some jobs failed

Run: 25893893790
Target ref: refactor/intermittent-tunnel-failure
Requested jobs: state-backup-restore-e2e,tunnel-lifecycle-e2e
Summary: 1 passed, 1 failed, 0 skipped

Job	Result
state-backup-restore-e2e	❌ failure
tunnel-lifecycle-e2e	✅ success

Failed jobs: state-backup-restore-e2e. Check run artifacts for logs.

jyaunches · 2026-05-15T01:15:50Z

This extraction looks really great and is the right direction — splitting the old deployment-services E2E into focused tunnel and backup/restore jobs makes failures much easier to triage.

I reviewed the current head (1a7e7c8b) after merging main, and I think there are a few things to address before merge.

Blockers

Dropped uninstall coverage — .github/workflows/nightly-e2e.yaml
- The old deployment-services-e2e documented TC-DEPLOY-03: uninstall --keep-openshell coverage.
- The new split jobs cover backup/restore and tunnel lifecycle, but no longer cover uninstall.
- Please either add a dedicated uninstall lifecycle E2E, keep that assertion in one of the split jobs, or explicitly document that this coverage is intentionally retired.
Required tunnel E2E can pass without validating cloudflared — test/e2e/test-tunnel-lifecycle.sh
- If cloudflared install fails, preflight logs a warning, then the test records a skip and can exit 0.
- Since this job is the required tunnel validation, it should fail closed when cloudflared is unavailable.
Unpinned cloudflared download in a secret-bearing job — test/e2e/test-tunnel-lifecycle.sh
- The workflow runs with NVIDIA_API_KEY in scope, but downloads cloudflared from releases/latest and executes it.
- Please pin to an exact version + checksum, or use a trusted/pinned install path.
Required E2E did not pass — run 25893893790
- tunnel-lifecycle-e2e passed.
- state-backup-restore-e2e failed at TC-STATE-01: MemoryDirRestore.
- The log shows backup captured memory/2026-04-20.md, restore reported uploading memory/, but the file did not exist in the sandbox afterward. This may be an openshell sandbox upload directory restore issue or a restore path semantics mismatch, but either way the new required test is currently red.

Warnings

Floating action refs in secret-bearing jobs — .github/workflows/nightly-e2e.yaml
- This follows existing workflow style, but these jobs use secrets and still reference actions by floating tags (actions/checkout@v6, actions/upload-artifact@v4). Full SHA pinning would be safer.
Backup selection is global newest directory — test/e2e/test-state-backup-restore.sh
- The test finds the newest directory under ~/.nemoclaw/backups.
- That is probably fine on isolated CI, but extracting the backup path from backup_output would be less fragile.

Suggestions

Consider whether state-backup-restore-e2e should be part of this Cloudflare-focused PR, or split into a follow-up if the stricter restore assertions expose unrelated backup/upload bugs.
If the stricter memory-directory assertion is intentional, it would be helpful to link/file the underlying restore/upload issue so this PR’s scope is clear.

Again, the decomposition is a solid move — the main thing is preserving coverage parity and making sure the new required E2E jobs are green before merge.

jyaunches · 2026-05-15T01:15:52Z

A quick note on the E2E Advisor flow here:

The Advisor did run on this PR and recommended these required jobs:

state-backup-restore-e2e
tunnel-lifecycle-e2e

Right now, the Advisor posts the recommendations in the PR comments, but the selective E2E auto-dispatch loop is not fully tightened yet. In this run, auto-dispatch skipped with:

no required advisor recommendations matched dispatchable jobs in the target workflow

So for now, when you see Advisor recommendations in the PR comments, ask your agent to manually rerun those tests, for example:

gh workflow run nightly-e2e.yaml \
  --ref refactor/intermittent-tunnel-failure \
  -f jobs=state-backup-restore-e2e,tunnel-lifecycle-e2e

I manually dispatched that run for this PR as 25893893790. Result:

tunnel-lifecycle-e2e: passed
state-backup-restore-e2e: failed

We have plans in flight to tighten this selective-E2E loop so Advisor-required tests are easier to trigger/observe directly from the PR, but for now the PR comment recommendation is the source of truth and the selective run needs to be kicked off explicitly when auto-dispatch skips.

…DIA/NemoClaw into refactor/intermittent-tunnel-failure

hunglp6d · 2026-05-15T05:01:54Z

@jyaunches Thank you for the quick note on the E2E Advisor flow and the thorough review.
Status on the blockers:

Dropped uninstall coverage → Put a Coverage note in the PR body; this test
would be implemented in a follow-up if essential.
Required tunnel E2E can pass without validating cloudflared → Done, fails
closed at two layers (preflight exit 1 on install fail + test guard returns
FAIL if cloudflared is still unavailable).
Unpinned cloudflared download in a secret-bearing job → Done. Switched to
Cloudflare's GPG-signed APT repo (matches your "trusted/pinned install path"
alternative); APT verifies signed Release → package SHA256, no per-version SHA
pin needed.
Related issue created: [Backup/Restore] scripts/backup-workspace.sh restore nests memory/ directory creating wrong path on sandbox (silent data loss) #3555 — the strict MemoryDirRestore assertion in
this PR is what surfaced the upstream restore bug; this PR is intentionally
left red until [Backup/Restore] scripts/backup-workspace.sh restore nests memory/ directory creating wrong path on sandbox (silent data loss) #3555 lands.

hunglp6d · 2026-05-15T05:04:25Z

This PR is ready now.

hunglp6d added 2 commits May 14, 2026 16:31

refactor(e2e): enhance test of cloudflared tunnel | separating files

d102139

refactor(e2e): clasify tunnel logs and rename function in back/restor…

f463c4d

…e test

github-advanced-security AI found potential problems May 14, 2026

View reviewed changes

This was referenced May 14, 2026

fix(status): explain cloudflared-stopped reason and surface Connected/Inference fields #3534

Closed

fix(status): explain cloudflared-stopped reason and surface Connected/Inference fields #3537

Merged

hunglp6d and others added 2 commits May 15, 2026 00:42

refactor(e2e): Update parity files | Tighten state backup/restore | T…

89c92fa

…unnel lifecycle diagnostics

Merge branch 'main' into refactor/intermittent-tunnel-failure

f837eb5

hunglp6d and others added 4 commits May 15, 2026 01:07

refactor(e2e): update the old references

c75e5ed

Merge branch 'refactor/intermittent-tunnel-failure' of github.com:NVI…

44b0373

…DIA/NemoClaw into refactor/intermittent-tunnel-failure

Merge branch 'main' into refactor/intermittent-tunnel-failure

dc20cef

refactor(e2e): apply shfmt to new test files

fc76854

hunglp6d marked this pull request as ready for review May 14, 2026 19:56

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread .coderabbit.yaml

Comment thread test/e2e/docs/MIGRATION.md

Comment thread test/e2e/test-state-backup-restore.sh Outdated

Comment thread test/e2e/test-state-backup-restore.sh Outdated

Comment thread test/e2e/test-tunnel-lifecycle.sh

refactor(e2e): resolve rabbit comments 1 and 3

48574cb

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread test/e2e/test-state-backup-restore.sh Outdated

refactor(e2e): resolve rabbit comments of avoid deleting the onboard …

59f3c61

…lock and fix soft-failure in cloudflared-log helpers under set -euo pipefail

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread test/e2e/test-state-backup-restore.sh Outdated

Comment thread test/e2e/test-tunnel-lifecycle.sh

hunglp6d and others added 2 commits May 15, 2026 02:00

refactor(e2e): resolve rabbit comments of mask backup/restore command…

6287be6

… failures

Merge branch 'main' into refactor/intermittent-tunnel-failure

07c0565

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

hunglp6d added 2 commits May 15, 2026 02:21

refactor(e2e): resolve rabbit comments of use fixed-string matching f…

2d9ea06

…or strict marker assertions and fix the 000000 false-positive in the local readiness probe

Merge branch 'refactor/intermittent-tunnel-failure' of github.com:NVI…

673e789

…DIA/NemoClaw into refactor/intermittent-tunnel-failure

coderabbitai Bot reviewed May 14, 2026

View reviewed changes

Comment thread test/e2e/test-state-backup-restore.sh Outdated

Comment thread test/e2e/test-state-backup-restore.sh Outdated

refactor(e2e): resolve rabbit comments

d34275e

hunglp6d and others added 4 commits May 15, 2026 02:41

Merge branch 'refactor/intermittent-tunnel-failure' of github.com:NVI…

b10bb6e

…DIA/NemoClaw into refactor/intermittent-tunnel-failure

refactor(e2e): update stale parity

604112c

Merge branch 'main' into refactor/intermittent-tunnel-failure

0b28f7e

Merge remote-tracking branch 'origin/main' into refactor/intermittent…

1a7e7c8

…-tunnel-failure

hunglp6d mentioned this pull request May 15, 2026

[Backup/Restore] scripts/backup-workspace.sh restore nests memory/ directory creating wrong path on sandbox (silent data loss) #3555

Closed

2 tasks

hunglp6d and others added 5 commits May 15, 2026 10:06

Merge branch 'main' into refactor/intermittent-tunnel-failure

aa95752

refactor(e2e): pin cloudflared via signed APT repo + fail-closed

3b47290

Merge branch 'main' into refactor/intermittent-tunnel-failure

c760562

refactor(e2e): apply formatting

83590fa

Merge branch 'refactor/intermittent-tunnel-failure' of github.com:NVI…

6fa1626

…DIA/NemoClaw into refactor/intermittent-tunnel-failure

hunglp6d self-assigned this May 15, 2026

hunglp6d added E2E VRDC Issues and PRs submitted by NVIDIA VRDC test team. labels May 15, 2026

hunglp6d and others added 2 commits May 15, 2026 14:08

Merge branch 'main' into refactor/intermittent-tunnel-failure

6ce56a3

Merge branch 'main' into refactor/intermittent-tunnel-failure

692059d

cv enabled auto-merge (squash) May 15, 2026 13:49

cv approved these changes May 15, 2026

View reviewed changes

cv merged commit a28f365 into main May 15, 2026
21 checks passed

This was referenced May 15, 2026

fix(e2e): stabilize backup and dashboard regressions #3622

Merged

fix(security): harden low-risk code scanning findings #3657

Merged

feat(e2e): migrate baseline onboarding coverage #3897

Merged

wscurran added area: e2e End-to-end tests, nightly failures, or validation infrastructure refactor PR restructures code without intended behavior change and removed E2E labels Jun 3, 2026

Conversation

hunglp6d commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 14, 2026

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Uh oh!

hunglp6d May 14, 2026

Choose a reason for hiding this comment

Uh oh!

hunglp6d May 14, 2026

Choose a reason for hiding this comment

Uh oh!

hunglp6d May 14, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

prekshivyas commented May 14, 2026

Uh oh!

hunglp6d commented May 14, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented May 15, 2026

Selective E2E Results — ❌ Some jobs failed

Uh oh!

jyaunches commented May 15, 2026

Blockers

Warnings

Suggestions

Uh oh!

jyaunches commented May 15, 2026

Uh oh!

hunglp6d commented May 15, 2026

Uh oh!

hunglp6d commented May 15, 2026

Uh oh!

Uh oh!

hunglp6d commented May 14, 2026 •

edited

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 14, 2026 •

edited

Loading