fix(rebuild): handle partial tar backup when files are root-owned#2734
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughSSH+tar backups now accept GNU tar exit codes 0/1/2 when archive bytes are produced, attempt partial extraction, attribute read errors to specific state directories, and persist verified ChangesBackup extraction, manifest, and rebuild flow
Sequence DiagramsequenceDiagram
participant CLI
participant Sandbox as "Sandbox (SSH+tar)"
participant Extractor as "Local extractor\n(safeTarExtract + FS)"
participant Manifest as "Manifest store\n(write/read)"
CLI->>Sandbox: run tar -cf - -C /sandbox/.openclaw-data <stateDirs>
Sandbox-->>CLI: stdout=archive bytes, stderr=tar errors, exit=0|1|2
CLI->>Extractor: pass archive bytes to safeTarExtract
Extractor->>Extractor: extract files -> compute extractedDirs (FS)
Extractor->>Extractor: parse tar stderr -> attributedFailedDirs
Extractor->>Manifest: compute backedUpDirs and failedDirs
Manifest-->>Extractor: persist manifest.backedUpDirs
CLI->>Manifest: on restore, read manifest.backedUpDirs ?? stateDirs
CLI->>Extractor: restore only existing backedUpDirs from backupPath
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/sandbox-state.ts`:
- Around line 766-776: The check that treats a directory as successfully backed
up if existsSync(path.join(backupPath, d)) is too weak because tar can create an
empty directory header; update the logic in the loop that populates
backedUpDirs/failedDirs (references: existingDirs, backedUpDirs, failedDirs) to
verify actual archived content under that directory instead of just directory
existence—e.g., inspect the extracted directory tree at path.join(backupPath, d)
(using readdir/traverse) and only mark as backed up if it contains at least one
regular file or non-empty subtree; otherwise log and push into failedDirs so
sandboxRebuild() behaves correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 4349f3fa-12ab-48d2-96f1-f471fce921d6
📒 Files selected for processing (2)
src/lib/sandbox-state.tssrc/nemoclaw.ts
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/lib/sandbox-state.ts (1)
766-775:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
readdirSync(...).length > 0is still the wrong success signal here.Line 774 only proves the extracted top-level directory has an entry, not that any restorable file data survived.
tarcan still leave you with nested empty directory headers and make this path look successful, while a legitimately empty state dir now gets marked failed. That meansbackedUpDirs.lengthcan still be wrong in both directions, which is risky becausesrc/nemoclaw.ts:3239-3260uses it to decide whether rebuild aborts or proceeds.Please switch this to a stronger check: recursively confirm at least one extracted file exists, and separately treat source dirs that were known-empty before backup as successful.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 72879221-de48-465e-a1d5-dd23b0fd231d
📒 Files selected for processing (1)
src/lib/sandbox-state.ts
|
✨ Thanks for submitting this PR that fixes a bug where Related open issues: |
|
Re: CodeRabbit's concern about checking actual archived content vs directory existence — The current implementation already handles this. Lines 773-778 use let hasContent = false;
try {
hasContent = readdirSync(dirPath).length > 0;
} catch {
/* dir doesn't exist at all */
}
if (hasContent) {
backedUpDirs.push(d);
} else {
_log(`Dir ${d} empty or missing from backup after extraction — marking failed`);
failedDirs.push(d);
}This correctly distinguishes between an empty directory header (tar exit 2 with permission-denied children) and a directory with actual backed-up content. |
…IDIA#2727) GNU tar exits with code 2 when it encounters permission-denied errors, but still writes all accessible files to stdout. The backup code previously treated any non-zero tar exit as a complete failure, marking all state dirs as failed and aborting the rebuild. Changes: - sandbox-state.ts: Accept tar exit codes 0, 1, and 2 when stdout contains data. After extraction, check which dirs actually landed on disk to distinguish per-dir success from failure. - nemoclaw.ts: Only abort rebuild on total backup failure (zero dirs backed up). When some dirs succeeded, proceed with a warning so a handful of root-owned files don't block the entire rebuild.
Address CodeRabbit feedback: tar can create empty directory headers even when all files inside were permission-denied. Use readdirSync to confirm at least one entry exists before marking a dir as backed up.
5f15266 to
a333331
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/lib/sandbox-state.ts (1)
1032-1048:⚠️ Potential issue | 🟠 Major | ⚡ Quick winVerify extracted directories before marking them as backed up.
This branch marks every non-attributed directory as successful, but nothing checks that the directory actually landed under
backupPathafter extraction. If tar aborts before later args are written, or stderr attribution misses a path,manifest.backedUpDirscan contain false positives and rebuild will proceed on an incomplete backup.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/sandbox-state.ts` around lines 1032 - 1048, The current branch marks any directory in existingDirs that isn't listed in tarFailedDirs as backed up without confirming it actually exists under backupPath; change the logic in the block that handles tarFailedDirs (around failedDirsFromTarStderr, existingDirs, backedUpDirs, and backupPath) to verify each candidate directory was actually extracted to backupPath (e.g., check filesystem presence or expected marker) before pushing it into backedUpDirs/manifest.backedUpDirs, and if the directory is missing treat it as failed (push to failedDirs and log an error), preserving the existing tar-failure handling and logs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/sandbox-state.ts`:
- Around line 167-168: The manifest validation currently only checks
backedUpDirs with isStringArray and then later uses it in the tar restore, which
allows path traversal; update validation around backedUpDirs to mirror
stateFiles checks: normalize each entry (using path.normalize/posix), reject
absolute paths and any entry that includes ".." or resolves outside the expected
base, and then intersect the validated backedUpDirs with stateDirs (or the
normalized list used for stateDirs) so only known state directories are allowed
as restore sources; apply the same stricter validation at both locations
referenced (the isStringArray check at backedUpDirs and the separate check
around lines 1148-1149) and ensure any failure causes manifest rejection rather
than trusting the input for the tar command.
---
Duplicate comments:
In `@src/lib/sandbox-state.ts`:
- Around line 1032-1048: The current branch marks any directory in existingDirs
that isn't listed in tarFailedDirs as backed up without confirming it actually
exists under backupPath; change the logic in the block that handles
tarFailedDirs (around failedDirsFromTarStderr, existingDirs, backedUpDirs, and
backupPath) to verify each candidate directory was actually extracted to
backupPath (e.g., check filesystem presence or expected marker) before pushing
it into backedUpDirs/manifest.backedUpDirs, and if the directory is missing
treat it as failed (push to failedDirs and log an error), preserving the
existing tar-failure handling and logs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 54f3c417-e608-4379-84bb-77f341d1bc4a
📒 Files selected for processing (3)
src/lib/sandbox-state.tssrc/nemoclaw.tstest/snapshot.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/nemoclaw.ts
# Conflicts: # src/nemoclaw.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/sandbox-state.ts`:
- Around line 601-608: The helper existingBackupDirs currently uses existsSync
and can return files or broken symlinks; change it to call lstatSync on
path.join(backupPath, dirName) and only push dirName if stats.isDirectory() is
true, wrapping the lstatSync call in a try/catch to ignore non-existent/broken
entries or errors. Update function existingBackupDirs to perform this
directory-only check so restoreSandboxState and downstream localDirs only see
real directories.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a348755c-a89f-490a-84eb-cf1961b75427
📒 Files selected for processing (2)
src/lib/sandbox-state.tstest/snapshot.test.ts
ericksoa
left a comment
There was a problem hiding this comment.
Approved. I reviewed the current head ea9f322 and the focused fix now directly addresses #2727: partial tar output is preserved without treating clean empty dirs as failures, failed dirs are excluded from restore, manifest-backed restorable dirs are validated, and restore only uses real extracted directories. Local build/typecheck/focused tests pass, and the targeted nightly-e2e run for snapshot-commands-e2e and rebuild-openclaw-e2e is green.
Replays blueprint policy additions application and strict Ollama chat-completions tool-call validation on current main. Drops the rebuild backup portion because NVIDIA#2734 already covers NVIDIA#2727.
Summary
Fixes #2727.
nemoclaw rebuildaborts whentarencounters root-owned files inside.openclaw-data. GNU tar exits with code 2 (errors encountered) but still writes all accessible files to stdout. The backup code previously treated any non-zero tar exit as a complete failure, marking all state dirs as failed and aborting the rebuild — even though the archive contained most of the data.Changes
src/lib/sandbox-state.tsexistsSync) to distinguish per-dir success from failure — instead of marking all dirs as failed on any tar errorsrc/nemoclaw.ts⚠warning so a handful of root-owned files don't block the entire rebuildBehavior change
How to test
nemoclaw <name> rebuild --yesSummary by CodeRabbit
Bug Fixes
Tests
Signed-off-by: Aaron Erickson aerickson@nvidia.com