fix(sandbox): allow intra-sandbox symlinks in safeTarExtract audit (#2268)#2308
Conversation
…2268) Closes #2268 (P1 regression in v0.0.22). PR #2163's post-extraction symlink audit treated every target outside the extraction temp dir as an escape. The sandbox base image legitimately places intra-sandbox symlinks (e.g. /sandbox/.openclaw → /sandbox/.openclaw-data) — those point outside the host extraction dir but INSIDE the canonical /sandbox root, where they'll be correctly resolved on restore. The audit flagged them as escapes, nuked the extraction, and broke every rebuild + snapshot create on v0.0.22. Fix: auditExtractedSymlinks now accepts an array of allowed roots. safeTarExtract passes [targetDir, "/sandbox"] so intra-sandbox layout is honored while genuine escapes (symlinks to /etc/passwd, ../../.ssh, etc.) still abort. Two new tests lock in both behaviours. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe symlink validation logic in sandbox extraction is enhanced to accept multiple allowed root paths instead of a single root. The Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
ericksoa
left a comment
There was a problem hiding this comment.
LGTM. The allowedRoots array approach is the right fix — it preserves the security guardrail for genuine escape attempts while correctly allowing intra-sandbox absolute symlinks that are legitimate in the restored sandbox filesystem. The two new tests properly cover both the regression (intra-sandbox symlinks pass) and the security invariant (escape to /etc/passwd still blocked). All 20 CI checks green.
## Summary Fixes four nightly E2E failures that have been masking each other since April 21–22. Supersedes #2350 (which added the diagnostics that helped identify these). ### snapshot-commands-e2e - **Root cause:** PR #2184 changed the CLI success output from `Snapshot created` to `Snapshot v1 created` (version string inserted), but the test's `grep -q "Snapshot created"` assertion was never updated - **Previous masking:** This was hidden behind the `safeTarExtract` symlink regression (#2268, fixed by #2308) — the snapshot never got far enough to print the success message. Now that #2308 landed, the snapshot succeeds but the grep fails - **Fix:** Change grep to `"Snapshot.*created"` to match the versioned output format - Also includes the `run_capture` helper and Phase 2b diagnostics from #2350 so future failures are visible ### deployment-services-e2e - **Root cause:** `SANDBOX_NAME` and `LOG_FILE` are never defined — the script uses `set -euo pipefail` (`-u` = nounset) and crashes immediately at line 70 - **Fix:** Add variable definitions matching the pattern used by other E2E tests; pass `NEMOCLAW_SANDBOX_NAME` and `NEMOCLAW_RECREATE_SANDBOX` in the workflow ### network-policy-e2e - **Root cause:** `setup_sandbox()` destroys the sandbox, then calls `nemoclaw onboard` which detects the stale registry entry and fails with "Sandbox already exists but is not ready" - **Fix:** Add `NEMOCLAW_RECREATE_SANDBOX=1` to the onboard env so it overwrites the stale entry ### cloud-experimental-e2e (not addressed here) Two unrelated failures remain: Landlock regression on `/sandbox` writability and CLI/docs command reference drift. These need separate investigation. ## Test plan - [ ] Trigger nightly E2E on this branch — snapshot-commands, deployment-services, and network-policy jobs should pass - [x] `shellcheck` clean on all three modified test scripts 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * More reliable E2E runs via consistent sandbox naming and automatic sandbox recreation. * Timestamped logs, a capture wrapper that preserves command output and exit codes. * Expanded pre-snapshot diagnostics and richer failure logs for clearer snapshot troubleshooting. * **Chores** * CI updated to disable an experimental E2E job and adjust failure notifications to avoid false triggers. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
) ## Summary The \`test-snapshot-commands.sh\` nightly E2E was failing despite \`nemoclaw snapshot create\` actually succeeding. The test greps for the literal string \`\"Snapshot created\"\` but the real success output includes a version token between: \`\`\` ✓ Snapshot v1 created (12 directories) \`\`\` Grep didn't match → false red in nightly despite the command succeeding. Switching to regex \`\"Snapshot v[0-9]+.*created\"\` that tolerates the version token. ## Why found now Investigating #2268 revealed this test reporting FAIL on a run where the operation had actually succeeded per the log. Pre-existing on main — predates both #2308 and any recent PR. ## Test plan - [x] Shell syntax valid (\`bash -n\`) - [ ] Nightly E2E shows \`snapshot-commands-e2e\` green after merge 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved snapshot creation detection to recognize success messages that include version tokens, making snapshot-related validation more tolerant and tests more reliable. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>
Summary
Closes #2268 — P1 regression in v0.0.22.
`safeTarExtract`'s post-extraction symlink audit (added in #2163 to block tar path-traversal attacks) flagged every symlink whose target resolved outside the extraction temp directory. The sandbox base image legitimately ships intra-sandbox symlinks like `/sandbox/.openclaw → /sandbox/.openclaw-data` — those resolve outside the host temp dir but inside the canonical `/sandbox` root, where they'll be correctly resolved when the backup is restored into a fresh sandbox.
Result: the audit nuked every backup, breaking `nemoclaw rebuild` and `nemoclaw snapshot create` on v0.0.22. The only workaround was `destroy + onboard` (loses all state).
Fix
The security guardrail is intact: `/sandbox` is a subtree root, so crafted symlinks with absolute targets outside `/sandbox` (e.g. `/etc/passwd`) are still rejected.
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit