fix(rebuild): handle partial tar backup when files are root-owned by kagura-agent · Pull Request #2734 · NVIDIA/NemoClaw

kagura-agent · 2026-04-30T03:27:28Z

Summary

nemoclaw rebuild aborts when tar encounters root-owned files inside .openclaw-data. GNU tar exits with code 2 (errors encountered) but still writes all accessible files to stdout. The backup code previously treated any non-zero tar exit as a complete failure, marking all state dirs as failed and aborting the rebuild — even though the archive contained most of the data.

Changes

`src/lib/sandbox-state.ts`

Accept tar exit codes 0, 1, and 2 when stdout contains data (previously only exit 0)
After extraction, check which dirs actually landed on disk (existsSync) to distinguish per-dir success from failure — instead of marking all dirs as failed on any tar error

`src/nemoclaw.ts`

Only abort rebuild on total backup failure (zero dirs backed up)
When some dirs succeeded (partial backup), proceed with a ⚠ warning so a handful of root-owned files don't block the entire rebuild

Behavior change

Scenario	Before	After
tar exit 0 (all files OK)	✅ Rebuild proceeds	✅ Rebuild proceeds (no change)
tar exit 2, some dirs backed up	❌ Rebuild aborts	⚠️ Warning + rebuild proceeds
tar exit 2, zero dirs backed up	❌ Rebuild aborts	❌ Rebuild aborts (no change)
tar exit >2 or no stdout	❌ Rebuild aborts	❌ Rebuild aborts (no change)

How to test

Exec into a sandbox as root and create a permission-denied file:

kubectl exec <sandbox> -- sh -c 'echo "{}" > /sandbox/.openclaw-data/agents/main/sessions/sessions.json && chmod 600 /sandbox/.openclaw-data/agents/main/sessions/sessions.json'

Run nemoclaw <name> rebuild --yes
Before: rebuild aborts with "Failed to back up sandbox state"
After: rebuild proceeds with a partial-backup warning

Summary by CodeRabbit

Bug Fixes
- Salvage archives even when the backup tool exits non‑zero but produced data; attempts partial extraction.
- Extraction now verifies actual extracted dirs: only verified dirs are recorded as backed up; read‑error or missing dirs are excluded and reported.
- Rebuild no longer aborts on partial backup failure — continues with warnings listing counts and names of failed dirs/files.
- Restores prefer manifest-recorded backed-up directories to restore only verified data.
Tests
- Added tests covering partial/failed backup and restore semantics and unsafe/invalid manifest cases.

Signed-off-by: Aaron Erickson aerickson@nvidia.com

copy-pr-bot · 2026-04-30T03:27:31Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-30T03:27:40Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

SSH+tar backups now accept GNU tar exit codes 0/1/2 when archive bytes are produced, attempt partial extraction, attribute read errors to specific state directories, and persist verified backedUpDirs in the manifest. Restore prefers backedUpDirs. Rebuild only aborts if nothing was backed up; partial backups warn and continue.

Changes

Backup extraction, manifest, and rebuild flow

Layer / File(s)	Summary
Data shape / Manifest `src/lib/sandbox-state.ts`	`RebuildManifest` gains optional `backedUpDirs?: string[]`; `manifest.backedUpDirs` is written after backup.
Validation / Utilities `src/lib/sandbox-state.ts`	Adds `isSafeStateDirPath`, `isStateDirArray`, `isBackedUpDirArray`, and `existingBackupDirs` to validate paths and list existing dirs under a backup.
SSH+tar handling `src/lib/sandbox-state.ts`	Treats tar exit codes 0,1,2 with non-empty stdout as eligible for extraction (`tarExitedWithData`) and logs an “attempting partial extraction” message for non-zero exits with data.
Extraction / Attribution `src/lib/sandbox-state.ts`	Runs `safeTarExtract` on received archive bytes; computes `extractedDirs` from disk; uses `failedDirsFromTarStderr(stderr, existingDirs)` to attribute read failures to specific dirs; for non-attributed partial results, missing dirs are marked failed (or all failed if no attribution).
Restore wiring `src/lib/sandbox-state.ts`	`restoreSandboxState` selects restorable dirs from `manifest.backedUpDirs ?? manifest.stateDirs` and filters by `existingBackupDirs(backupPath, ...)` before restoring.
Rebuild orchestration / logging `src/nemoclaw.ts`	Abort-on-backup-failure changed: only abort if backup failed and no backed-up dirs/files exist; otherwise log warnings listing backed/failed dirs/files and continue; unchanged full-success path preserved.
Tests / Fixtures `test/snapshot.test.ts`	New helpers `writeOpenClawRegistry`, `writeFakeOpenshell`; expanded `listBackups` tests for unsafe or non-subset `backedUpDirs`; new `sandbox directory backup semantics` tests simulate tar success, tar-per-dir failures, and non-attributed partial extractions to verify manifest.backedUpDirs, failedDirs attribution, and restore behavior.

Sequence Diagram

sequenceDiagram
  participant CLI
  participant Sandbox as "Sandbox (SSH+tar)"
  participant Extractor as "Local extractor\n(safeTarExtract + FS)"
  participant Manifest as "Manifest store\n(write/read)"

  CLI->>Sandbox: run tar -cf - -C /sandbox/.openclaw-data <stateDirs>
  Sandbox-->>CLI: stdout=archive bytes, stderr=tar errors, exit=0|1|2
  CLI->>Extractor: pass archive bytes to safeTarExtract
  Extractor->>Extractor: extract files -> compute extractedDirs (FS)
  Extractor->>Extractor: parse tar stderr -> attributedFailedDirs
  Extractor->>Manifest: compute backedUpDirs and failedDirs
  Manifest-->>Extractor: persist manifest.backedUpDirs
  CLI->>Manifest: on restore, read manifest.backedUpDirs ?? stateDirs
  CLI->>Extractor: restore only existing backedUpDirs from backupPath

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I munched through tar crumbs, whiskers full of bits,

Some burrows were blocked, some treasures still fit.
I flagged the dark tunnels, kept the bright in store,
Hopped on with a warning, and backed up a few more.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 11.76% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'fix(rebuild): handle partial tar backup when files are root-owned' accurately describes the main change: handling partial tar backups when encountering root-owned files.
Linked Issues check	✅ Passed	The PR comprehensively addresses all objectives from issue `#2727`: treats tar exits 1/2 as partial success when stdout has data, marks per-directory results via readdirSync verification, allows rebuild to continue with warnings on partial backups, and only aborts on total backup failure.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to fixing the partial tar backup handling: modifications to sandbox-state.ts backup logic, nemoclaw.ts rebuild abort conditions, RebuildManifest interface, and corresponding test coverage for the new behavior.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/sandbox-state.ts`:
- Around line 766-776: The check that treats a directory as successfully backed
up if existsSync(path.join(backupPath, d)) is too weak because tar can create an
empty directory header; update the logic in the loop that populates
backedUpDirs/failedDirs (references: existingDirs, backedUpDirs, failedDirs) to
verify actual archived content under that directory instead of just directory
existence—e.g., inspect the extracted directory tree at path.join(backupPath, d)
(using readdir/traverse) and only mark as backed up if it contains at least one
regular file or non-empty subtree; otherwise log and push into failedDirs so
sandboxRebuild() behaves correctly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4349f3fa-12ab-48d2-96f1-f471fce921d6

📥 Commits

Reviewing files that changed from the base of the PR and between 16794d5 and 9afa0f2.

📒 Files selected for processing (2)

src/lib/sandbox-state.ts
src/nemoclaw.ts

coderabbitai

♻️ Duplicate comments (1)

src/lib/sandbox-state.ts (1)

766-775: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

readdirSync(...).length > 0 is still the wrong success signal here.

Line 774 only proves the extracted top-level directory has an entry, not that any restorable file data survived. tar can still leave you with nested empty directory headers and make this path look successful, while a legitimately empty state dir now gets marked failed. That means backedUpDirs.length can still be wrong in both directions, which is risky because src/nemoclaw.ts:3239-3260 uses it to decide whether rebuild aborts or proceeds.

Please switch this to a stronger check: recursively confirm at least one extracted file exists, and separately treat source dirs that were known-empty before backup as successful.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 72879221-de48-465e-a1d5-dd23b0fd231d

📥 Commits

Reviewing files that changed from the base of the PR and between 9afa0f2 and 5f15266.

📒 Files selected for processing (1)

src/lib/sandbox-state.ts

wscurran · 2026-04-30T14:39:27Z

✨ Thanks for submitting this PR that fixes a bug where nemoclaw rebuild aborts when encountering root-owned files. This proposes a change to the backup code to handle partial tar backups and proceed with a warning when some dirs succeed.

Related open issues:

#2727 nemoclaw rebuild aborts when files in .openclaw-data are root-owned

kagura-agent · 2026-05-04T08:15:07Z

Re: CodeRabbit's concern about checking actual archived content vs directory existence —

The current implementation already handles this. Lines 773-778 use readdirSync(dirPath).length > 0 rather than existsSync() to verify that the directory contains actual extracted files, not just an empty tar header:

let hasContent = false;
try {
  hasContent = readdirSync(dirPath).length > 0;
} catch {
  /* dir doesn't exist at all */
}
if (hasContent) {
  backedUpDirs.push(d);
} else {
  _log(`Dir ${d} empty or missing from backup after extraction — marking failed`);
  failedDirs.push(d);
}

This correctly distinguishes between an empty directory header (tar exit 2 with permission-denied children) and a directory with actual backed-up content.

…IDIA#2727) GNU tar exits with code 2 when it encounters permission-denied errors, but still writes all accessible files to stdout. The backup code previously treated any non-zero tar exit as a complete failure, marking all state dirs as failed and aborting the rebuild. Changes: - sandbox-state.ts: Accept tar exit codes 0, 1, and 2 when stdout contains data. After extraction, check which dirs actually landed on disk to distinguish per-dir success from failure. - nemoclaw.ts: Only abort rebuild on total backup failure (zero dirs backed up). When some dirs succeeded, proceed with a warning so a handful of root-owned files don't block the entire rebuild.

Address CodeRabbit feedback: tar can create empty directory headers even when all files inside were permission-denied. Use readdirSync to confirm at least one entry exists before marking a dir as backed up.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

src/lib/sandbox-state.ts (1)
1032-1048: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Verify extracted directories before marking them as backed up.

This branch marks every non-attributed directory as successful, but nothing checks that the directory actually landed under backupPath after extraction. If tar aborts before later args are written, or stderr attribution misses a path, manifest.backedUpDirs can contain false positives and rebuild will proceed on an incomplete backup.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/sandbox-state.ts` around lines 1032 - 1048, The current branch marks
any directory in existingDirs that isn't listed in tarFailedDirs as backed up
without confirming it actually exists under backupPath; change the logic in the
block that handles tarFailedDirs (around failedDirsFromTarStderr, existingDirs,
backedUpDirs, and backupPath) to verify each candidate directory was actually
extracted to backupPath (e.g., check filesystem presence or expected marker)
before pushing it into backedUpDirs/manifest.backedUpDirs, and if the directory
is missing treat it as failed (push to failedDirs and log an error), preserving
the existing tar-failure handling and logs.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/sandbox-state.ts`:
- Around line 167-168: The manifest validation currently only checks
backedUpDirs with isStringArray and then later uses it in the tar restore, which
allows path traversal; update validation around backedUpDirs to mirror
stateFiles checks: normalize each entry (using path.normalize/posix), reject
absolute paths and any entry that includes ".." or resolves outside the expected
base, and then intersect the validated backedUpDirs with stateDirs (or the
normalized list used for stateDirs) so only known state directories are allowed
as restore sources; apply the same stricter validation at both locations
referenced (the isStringArray check at backedUpDirs and the separate check
around lines 1148-1149) and ensure any failure causes manifest rejection rather
than trusting the input for the tar command.

---

Duplicate comments:
In `@src/lib/sandbox-state.ts`:
- Around line 1032-1048: The current branch marks any directory in existingDirs
that isn't listed in tarFailedDirs as backed up without confirming it actually
exists under backupPath; change the logic in the block that handles
tarFailedDirs (around failedDirsFromTarStderr, existingDirs, backedUpDirs, and
backupPath) to verify each candidate directory was actually extracted to
backupPath (e.g., check filesystem presence or expected marker) before pushing
it into backedUpDirs/manifest.backedUpDirs, and if the directory is missing
treat it as failed (push to failedDirs and log an error), preserving the
existing tar-failure handling and logs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 54f3c417-e608-4379-84bb-77f341d1bc4a

📥 Commits

Reviewing files that changed from the base of the PR and between a333331 and 9cb76e7.

📒 Files selected for processing (3)

src/lib/sandbox-state.ts
src/nemoclaw.ts
test/snapshot.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/nemoclaw.ts

# Conflicts: # src/nemoclaw.ts

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/sandbox-state.ts`:
- Around line 601-608: The helper existingBackupDirs currently uses existsSync
and can return files or broken symlinks; change it to call lstatSync on
path.join(backupPath, dirName) and only push dirName if stats.isDirectory() is
true, wrapping the lstatSync call in a try/catch to ignore non-existent/broken
entries or errors. Update function existingBackupDirs to perform this
directory-only check so restoreSandboxState and downstream localDirs only see
real directories.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a348755c-a89f-490a-84eb-cf1961b75427

📥 Commits

Reviewing files that changed from the base of the PR and between 9cb76e7 and d97d986.

📒 Files selected for processing (2)

src/lib/sandbox-state.ts
test/snapshot.test.ts

ericksoa

Approved. I reviewed the current head ea9f322 and the focused fix now directly addresses #2727: partial tar output is preserved without treating clean empty dirs as failures, failed dirs are excluded from restore, manifest-backed restorable dirs are validated, and restore only uses real extracted directories. Local build/typecheck/focused tests pass, and the targeted nightly-e2e run for snapshot-commands-e2e and rebuild-openclaw-e2e is green.

Replays blueprint policy additions application and strict Ollama chat-completions tool-call validation on current main. Drops the rebuild backup portion because NVIDIA#2734 already covers NVIDIA#2727.

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

Comment thread src/lib/sandbox-state.ts Outdated

coderabbitai Bot reviewed Apr 30, 2026

View reviewed changes

wscurran added bug Something fails against expected or documented behavior NemoClaw CLI dependencies Pull requests that update a dependency file labels Apr 30, 2026

wscurran removed the dependencies Pull requests that update a dependency file label Apr 30, 2026

kagura-agent added 2 commits May 4, 2026 20:24

fix: verify backed-up dirs contain files, not just empty headers

a333331

Address CodeRabbit feedback: tar can create empty directory headers even when all files inside were permission-denied. Use readdirSync to confirm at least one entry exists before marking a dir as backed up.

kagura-agent force-pushed the fix/rebuild-partial-tar-2727 branch from 5f15266 to a333331 Compare May 4, 2026 12:26

fix(rebuild): track restorable partial tar backups

9cb76e7

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

Comment thread src/lib/sandbox-state.ts Outdated

ericksoa added 2 commits May 4, 2026 15:16

fix(rebuild): validate restorable backup dirs

d97d986

Merge remote-tracking branch 'origin/main' into HEAD

b10b902

# Conflicts: # src/nemoclaw.ts

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

Comment thread src/lib/sandbox-state.ts

fix(rebuild): require extracted backup dirs

ea9f322

ericksoa approved these changes May 4, 2026

View reviewed changes

ericksoa enabled auto-merge (squash) May 4, 2026 23:04

ericksoa merged commit 75ee317 into NVIDIA:main May 4, 2026
45 of 48 checks passed

wscurran added area: cli Command line interface, flags, terminal UX, or output bug-fix PR fixes a bug or regression and removed NemoClaw CLI bug Something fails against expected or documented behavior labels Jun 3, 2026

Conversation

kagura-agent commented Apr 30, 2026 • edited by ericksoa Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

src/lib/sandbox-state.ts

src/nemoclaw.ts

Behavior change

How to test

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Apr 30, 2026

Uh oh!

coderabbitai Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

wscurran commented Apr 30, 2026

Uh oh!

kagura-agent commented May 4, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ericksoa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kagura-agent commented Apr 30, 2026 •

edited by ericksoa

Loading

`src/lib/sandbox-state.ts`

`src/nemoclaw.ts`

coderabbitai Bot commented Apr 30, 2026 •

edited

Loading