Skip to content

fix(test): stop swallowing error output in snapshot E2E#2350

Closed
ericksoa wants to merge 1 commit into
mainfrom
fix/snapshot-e2e-diagnostics
Closed

fix(test): stop swallowing error output in snapshot E2E#2350
ericksoa wants to merge 1 commit into
mainfrom
fix/snapshot-e2e-diagnostics

Conversation

@ericksoa

@ericksoa ericksoa commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

Summary

  • The snapshot-commands-e2e nightly job has been failing consistently at Phase 3 with zero diagnostic output — the actual error from nemoclaw snapshot create is captured but never printed because set -e kills the script first
  • This PR adds a run_capture helper that captures output and exit code separately, so the error message is always visible in CI logs
  • Adds Phase 2b pre-snapshot diagnostics (registry state, sandbox list, docker containers, stale lock detection)
  • Enriches the fail() function with additional context (registry contents, lock state, docker ps, node version)

Root cause analysis

The test uses this pattern:

set -euo pipefail
SNAPSHOT_OUTPUT=$(nemoclaw "${SANDBOX_NAME}" snapshot create 2>&1)
echo "$SNAPSHOT_OUTPUT"  # never reached if the command fails

When nemoclaw exits non-zero, set -e terminates the script at the assignment line. The error output is captured in $SNAPSHOT_OUTPUT but the echo on the next line is never executed. The fail() handler is also never reached.

This has been masking the failure since at least 2026-04-22. The test will still fail at the same point, but now we'll see exactly why.

Test plan

  • Merge and trigger nightly E2E — the snapshot test will still fail but now with visible error output
  • shellcheck test/e2e/test-snapshot-commands.sh — clean

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests
    • Enhanced end-to-end snapshot test diagnostics with comprehensive pre-test environment logging including version information, sandbox configuration, registry status, and container state.
    • Improved error reporting and debugging for snapshot commands by capturing and displaying detailed output and exit codes in failure messages.
    • Strengthened test infrastructure with improved command execution and failure handling.

The snapshot-commands E2E test uses `$(nemoclaw ... 2>&1)` inside
`set -euo pipefail`, which causes the shell to exit immediately when
the command fails — before the captured output is ever printed. This
has been masking the actual error message for every Phase 3+ failure.

Changes:
- Add `run_capture` helper that captures both output and exit code
  without triggering set -e on non-zero exit
- Replace all bare `$(nemoclaw ... 2>&1)` calls with run_capture
- Add Phase 2b pre-snapshot diagnostics: registry state, sandbox
  list, docker containers, stale lock check
- Enrich fail() diagnostics with registry contents, lock state,
  docker ps, node version

The test will still fail at the same point, but now the actual error
message from nemoclaw will be visible in CI logs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Apr 23, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Adds run_capture() utility function to capture command output and exit codes without premature script termination. Implements pre-snapshot diagnostics phase logging environment and local state. Updates snapshot lifecycle steps to use the new utility with explicit failure handling, and enhances failure diagnostics with additional registry and Docker information.

Changes

Cohort / File(s) Summary
Test Snapshot Enhancements
test/e2e/test-snapshot-commands.sh
Added run_capture() utility to safely capture command stdout/stderr and exit codes. Introduced pre-snapshot diagnostics phase logging NemoClaw version, openshell sandbox, registry files, config directory, and Docker container status. Updated snapshot lifecycle steps (create/list/restore/help) to use run_capture() with explicit failure handling including captured output in error messages. Enhanced fail() diagnostics with additional registry and Docker/node/nemoclaw path information.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A script that captures with care,
Diagnostics logged everywhere,
No early exit, no surprise,
Just better snapshots, test-wise!
—Yours truly, the code-testing hare 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(test): stop swallowing error output in snapshot E2E' directly and clearly summarizes the main objective of the changeset: preventing error output from being lost in the snapshot E2E tests.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/snapshot-e2e-diagnostics

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-snapshot-commands.sh`:
- Around line 256-263: run_capture's exit code (_CAPTURE_RC) must be checked
before asserting HELP_OUTPUT contents: after calling run_capture (and before
grepping HELP_OUTPUT), verify that _CAPTURE_RC is 0 and call fail with a
descriptive message if not; only if _CAPTURE_RC is 0 proceed to test HELP_OUTPUT
for "snapshot create", "snapshot list", and "snapshot restore" and call pass or
fail accordingly. Use the existing symbols run_capture, _CAPTURE_RC,
HELP_OUTPUT, pass, and fail to implement the early exit on non-zero capture
return.
- Around line 58-64: The run_capture function uses eval to assign captured
output to a dynamic variable name, which can execute shell code in the output;
replace the unsafe eval "${_var_name}=\${_output}" with a safe dynamic
assignment using printf -v (e.g. printf -v "$_var_name" '%s' "$_output") so the
output is stored literally without evaluation, keeping the rest of run_capture
(variables _CAPTURE_RC, _output, and invocation) unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: f6276c4d-2ca5-442a-975d-9c3c2c54cb18

📥 Commits

Reviewing files that changed from the base of the PR and between b1afc50 and 9baf085.

📒 Files selected for processing (1)
  • test/e2e/test-snapshot-commands.sh

Comment on lines +58 to +64
run_capture() {
local _var_name="$1"; shift
_CAPTURE_RC=0
local _output
_output=$("$@" 2>&1) || _CAPTURE_RC=$?
eval "${_var_name}=\${_output}"
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid eval in run_capture; it can execute captured output.

On Line 63, eval is applied to command output. If output contains shell syntax like $()/backticks, it can be executed during assignment.

Suggested fix
 run_capture() {
   local _var_name="$1"; shift
   _CAPTURE_RC=0
   local _output
   _output=$("$@" 2>&1) || _CAPTURE_RC=$?
-  eval "${_var_name}=\${_output}"
+  printf -v "$_var_name" '%s' "$_output"
 }
#!/bin/bash
set -euo pipefail

# Verify current usage in the file
rg -n -C2 'run_capture|eval' test/e2e/test-snapshot-commands.sh

# Demonstrate why eval on captured output is unsafe
bash -c 'set -euo pipefail; _output='\''$(echo "EVAL_EXECUTED" >&2)'\''; eval "x=\${_output}"; printf "x=%s\n" "$x"'

Expected: second command prints EVAL_EXECUTED, proving evaluation occurred.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-snapshot-commands.sh` around lines 58 - 64, The run_capture
function uses eval to assign captured output to a dynamic variable name, which
can execute shell code in the output; replace the unsafe eval
"${_var_name}=\${_output}" with a safe dynamic assignment using printf -v (e.g.
printf -v "$_var_name" '%s' "$_output") so the output is stored literally
without evaluation, keeping the rest of run_capture (variables _CAPTURE_RC,
_output, and invocation) unchanged.

Comment on lines +256 to 263
run_capture HELP_OUTPUT nemoclaw "${SANDBOX_NAME}" snapshot
if echo "$HELP_OUTPUT" | grep -q "snapshot create" \
&& echo "$HELP_OUTPUT" | grep -q "snapshot list" \
&& echo "$HELP_OUTPUT" | grep -q "snapshot restore"; then
pass "snapshot help shows create/list/restore"
else
fail "snapshot help incomplete: ${HELP_OUTPUT}"
fail "snapshot help incomplete (code $_CAPTURE_RC): ${HELP_OUTPUT}"
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Check _CAPTURE_RC before asserting help content.

At Line 256 onward, run_capture is used but non-zero exit is not handled first. A failing command could still satisfy string checks and incorrectly pass Phase 9.

Suggested fix
 run_capture HELP_OUTPUT nemoclaw "${SANDBOX_NAME}" snapshot
+if [ "$_CAPTURE_RC" -ne 0 ]; then
+  fail "snapshot help exited with code $_CAPTURE_RC: ${HELP_OUTPUT}"
+fi
 if echo "$HELP_OUTPUT" | grep -q "snapshot create" \
   && echo "$HELP_OUTPUT" | grep -q "snapshot list" \
   && echo "$HELP_OUTPUT" | grep -q "snapshot restore"; then
   pass "snapshot help shows create/list/restore"
 else
-  fail "snapshot help incomplete (code $_CAPTURE_RC): ${HELP_OUTPUT}"
+  fail "snapshot help incomplete: ${HELP_OUTPUT}"
 fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
run_capture HELP_OUTPUT nemoclaw "${SANDBOX_NAME}" snapshot
if echo "$HELP_OUTPUT" | grep -q "snapshot create" \
&& echo "$HELP_OUTPUT" | grep -q "snapshot list" \
&& echo "$HELP_OUTPUT" | grep -q "snapshot restore"; then
pass "snapshot help shows create/list/restore"
else
fail "snapshot help incomplete: ${HELP_OUTPUT}"
fail "snapshot help incomplete (code $_CAPTURE_RC): ${HELP_OUTPUT}"
fi
run_capture HELP_OUTPUT nemoclaw "${SANDBOX_NAME}" snapshot
if [ "$_CAPTURE_RC" -ne 0 ]; then
fail "snapshot help exited with code $_CAPTURE_RC: ${HELP_OUTPUT}"
fi
if echo "$HELP_OUTPUT" | grep -q "snapshot create" \
&& echo "$HELP_OUTPUT" | grep -q "snapshot list" \
&& echo "$HELP_OUTPUT" | grep -q "snapshot restore"; then
pass "snapshot help shows create/list/restore"
else
fail "snapshot help incomplete: ${HELP_OUTPUT}"
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-snapshot-commands.sh` around lines 256 - 263, run_capture's
exit code (_CAPTURE_RC) must be checked before asserting HELP_OUTPUT contents:
after calling run_capture (and before grepping HELP_OUTPUT), verify that
_CAPTURE_RC is 0 and call fail with a descriptive message if not; only if
_CAPTURE_RC is 0 proceed to test HELP_OUTPUT for "snapshot create", "snapshot
list", and "snapshot restore" and call pass or fail accordingly. Use the
existing symbols run_capture, _CAPTURE_RC, HELP_OUTPUT, pass, and fail to
implement the early exit on non-zero capture return.

@ericksoa

Copy link
Copy Markdown
Contributor Author

Superseded by the new PR from fix/nightly-e2e-repairs which includes these diagnostics plus fixes for the remaining three E2E failures.

@ericksoa ericksoa closed this Apr 23, 2026
ericksoa added a commit that referenced this pull request Apr 23, 2026
## Summary

Fixes four nightly E2E failures that have been masking each other since
April 21–22. Supersedes #2350 (which added the diagnostics that helped
identify these).

### snapshot-commands-e2e
- **Root cause:** PR #2184 changed the CLI success output from `Snapshot
created` to `Snapshot v1 created` (version string inserted), but the
test's `grep -q "Snapshot created"` assertion was never updated
- **Previous masking:** This was hidden behind the `safeTarExtract`
symlink regression (#2268, fixed by #2308) — the snapshot never got far
enough to print the success message. Now that #2308 landed, the snapshot
succeeds but the grep fails
- **Fix:** Change grep to `"Snapshot.*created"` to match the versioned
output format
- Also includes the `run_capture` helper and Phase 2b diagnostics from
#2350 so future failures are visible

### deployment-services-e2e
- **Root cause:** `SANDBOX_NAME` and `LOG_FILE` are never defined — the
script uses `set -euo pipefail` (`-u` = nounset) and crashes immediately
at line 70
- **Fix:** Add variable definitions matching the pattern used by other
E2E tests; pass `NEMOCLAW_SANDBOX_NAME` and `NEMOCLAW_RECREATE_SANDBOX`
in the workflow

### network-policy-e2e
- **Root cause:** `setup_sandbox()` destroys the sandbox, then calls
`nemoclaw onboard` which detects the stale registry entry and fails with
"Sandbox already exists but is not ready"
- **Fix:** Add `NEMOCLAW_RECREATE_SANDBOX=1` to the onboard env so it
overwrites the stale entry

### cloud-experimental-e2e (not addressed here)
Two unrelated failures remain: Landlock regression on `/sandbox`
writability and CLI/docs command reference drift. These need separate
investigation.

## Test plan

- [ ] Trigger nightly E2E on this branch — snapshot-commands,
deployment-services, and network-policy jobs should pass
- [x] `shellcheck` clean on all three modified test scripts

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Tests**
* More reliable E2E runs via consistent sandbox naming and automatic
sandbox recreation.
* Timestamped logs, a capture wrapper that preserves command output and
exit codes.
* Expanded pre-snapshot diagnostics and richer failure logs for clearer
snapshot troubleshooting.
* **Chores**
* CI updated to disable an experimental E2E job and adjust failure
notifications to avoid false triggers.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@wscurran wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants