Skip to content

fix(backup): gracefully skip session files deleted during backup create#67472

Open
zhouhe-xydt wants to merge 1 commit intoopenclaw:mainfrom
zhouhe-xydt:fix/67417-backup-enoent-race
Open

fix(backup): gracefully skip session files deleted during backup create#67472
zhouhe-xydt wants to merge 1 commit intoopenclaw:mainfrom
zhouhe-xydt:fix/67417-backup-enoent-race

Conversation

@zhouhe-xydt
Copy link
Copy Markdown
Contributor

Summary

openclaw backup create could fail with ENOENT when the gateway's session compaction deleted a session file between enumeration and tar archiving. This change replaces tar.c with the underlying tar.Pack API so that ENOENT errors during archiving are caught and skipped rather than failing the entire backup. Skipped files are reported in the backup summary with the reason "cleaned up during backup".

Changes

  • src/infra/backup-create.ts: add createTarArchiveWithMissingSkip helper using tar.Pack with a custom error handler that ignores ENOENT and records skipped paths; replace the tar.c call in createBackupArchive
  • src/commands/backup.test-support.ts: update tar mock to include Pack so tests continue to work with the new implementation

Related issue

Fixes #67417

`openclaw backup create` could fail with ENOENT when the gateway's
session compaction deleted a session file between enumeration and
tar archiving. Replace `tar.c` with `tar.Pack` so that ENOENT errors
during archiving are caught and skipped rather than failing the
entire backup. Skipped files are reported in the backup summary.

Fixes openclaw#67417
@openclaw-barnacle openclaw-barnacle Bot added commands Command implementations size: S labels Apr 16, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ca60d00251

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 397 to +399
await fs.writeFile(manifestPath, `${JSON.stringify(manifest, null, 2)}\n`, "utf8");

await tar.c(
const skippedPaths = await createTarArchiveWithMissingSkip(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Regenerate manifest after archive-time path skips

createBackupArchive writes manifest.json before createTarArchiveWithMissingSkip discovers ENOENT paths and appends them to result.skipped. If an entire planned asset path disappears between planning and packing (for example, an external workspace dir), the archive omits that payload but manifest.assets still declares it present, which later fails verification in verifyManifestAgainstEntries (src/commands/backup-verify.ts:236-247) and leaves non-verified backups internally inconsistent. Rebuild or rewrite the manifest after applying skippedPaths (or drop missing assets from assets) before publishing.

Useful? React with 👍 / 👎.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 16, 2026

Greptile Summary

This PR fixes a race condition in openclaw backup create where session files deleted by gateway compaction between enumeration and archiving caused the entire backup to fail with ENOENT. The fix replaces tar.c with tar.Pack and a custom error handler that catches ENOENT/ENOTDIR, records the skipped paths, and lets the archive continue.

Three minor observations:

  • The file option passed to tar.Pack is stored but never used to open a WriteStream by Pack itself (only tar.c/createFile does that); the write is driven entirely by the explicit pack.pipe(createWriteStream(...)). Harmless but misleading.
  • The manifest is serialised before archiving, so ENOENT-skipped entries are never captured inside the in-archive manifest (they do appear in the CLI summary).
  • MockPack.end() always propagates rejections to stream.emit("error"), including ENOENT errors the pack listener intentionally swallows, making it impossible to write a passing regression test for the new skip path.

Confidence Score: 5/5

Safe to merge; all findings are P2 style or test-quality observations that don't affect production correctness.

The production fix is logically sound — tar.Pack emits file-level errors without erroring the write stream, so the Promise correctly resolves via stream.on('close') even when ENOENT entries are skipped. No P0 or P1 issues were found. The three noted issues are style (redundant file option), documentation (in-archive manifest gap), and test quality (mock can't exercise the new path), none of which affect runtime correctness.

src/commands/backup.test-support.ts — MockPack.end() propagates all errors to the stream, blocking future regression tests for the ENOENT-skip path.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/commands/backup.test-support.ts
Line: 41-47

Comment:
**Mock propagates ENOENT to stream, preventing testing of skip behavior**

In `MockPack.end()`, any `tarCreateMock` rejection always reaches `this.stream?.emit("error", err)` — even for ENOENT errors that the pack error listener (line 155–161 in `backup-create.ts`) intentionally swallows by returning early. In production, a file-level ENOENT emitted on the Pack doesn't error the write stream; the stream keeps going and eventually closes. In the mock, that same ENOENT would still fire `stream.on("error", reject)` on line 154, causing the Promise to reject instead of resolving with the populated `skipped` array.

This means no regression test can be written for the primary fix using the current mock — any test that simulates a missing file via `tarCreateMock.mockRejectedValue({ code: "ENOENT", path: "/..." })` would see the operation fail rather than succeed with a skipped entry. The mock needs to distinguish pack-level file errors (which don't propagate to the write stream in production) from genuine stream errors.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/infra/backup-create.ts
Line: 142-150

Comment:
**`file` option is unused by `tar.Pack` — only `tar.c`/`createFile` opens the stream**

`tar.Pack` stores `file` in its options object but never opens a WriteStream for it; that behaviour belongs exclusively to the higher-level `tar.c` / `createFile` helpers. The actual output here is driven entirely by `pack.pipe(createWriteStream(params.file))` two lines below. Passing `file` to the Pack constructor is harmless but misleading — it looks like it controls where the archive is written (as it does in `tar.c`), but it doesn't.

```suggestion
  const pack = new tar.Pack({
    gzip: params.gzip,
    portable: params.portable,
    preservePaths: params.preservePaths,
    onWriteEntry: params.onWriteEntry,
  });
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/infra/backup-create.ts
Line: 397-414

Comment:
**Manifest written before archiving won't reflect ENOENT-skipped files**

The manifest is serialised and written to `manifestPath` before `createTarArchiveWithMissingSkip` runs. ENOENT-skipped entries are appended to `result.skipped` only after archiving completes, so the manifest embedded in the archive will never list them. Any restore tool (or human) inspecting the in-archive manifest would see the sessions asset as fully present even when individual files were silently omitted.

This is a trade-off inherent in writing the manifest first so it can be included in the archive, and the CLI summary output does report the skipped files correctly. Worth an inline comment explaining the gap so future maintainers don't assume the in-archive manifest is a complete record of what was actually archived vs. skipped.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(backup): gracefully skip session fil..." | Re-trigger Greptile

Comment on lines +41 to +47
} catch (err) {
for (const listener of this.errorListeners) {
listener(err);
}
this.stream?.emit("error" as never, err);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Mock propagates ENOENT to stream, preventing testing of skip behavior

In MockPack.end(), any tarCreateMock rejection always reaches this.stream?.emit("error", err) — even for ENOENT errors that the pack error listener (line 155–161 in backup-create.ts) intentionally swallows by returning early. In production, a file-level ENOENT emitted on the Pack doesn't error the write stream; the stream keeps going and eventually closes. In the mock, that same ENOENT would still fire stream.on("error", reject) on line 154, causing the Promise to reject instead of resolving with the populated skipped array.

This means no regression test can be written for the primary fix using the current mock — any test that simulates a missing file via tarCreateMock.mockRejectedValue({ code: "ENOENT", path: "/..." }) would see the operation fail rather than succeed with a skipped entry. The mock needs to distinguish pack-level file errors (which don't propagate to the write stream in production) from genuine stream errors.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/commands/backup.test-support.ts
Line: 41-47

Comment:
**Mock propagates ENOENT to stream, preventing testing of skip behavior**

In `MockPack.end()`, any `tarCreateMock` rejection always reaches `this.stream?.emit("error", err)` — even for ENOENT errors that the pack error listener (line 155–161 in `backup-create.ts`) intentionally swallows by returning early. In production, a file-level ENOENT emitted on the Pack doesn't error the write stream; the stream keeps going and eventually closes. In the mock, that same ENOENT would still fire `stream.on("error", reject)` on line 154, causing the Promise to reject instead of resolving with the populated `skipped` array.

This means no regression test can be written for the primary fix using the current mock — any test that simulates a missing file via `tarCreateMock.mockRejectedValue({ code: "ENOENT", path: "/..." })` would see the operation fail rather than succeed with a skipped entry. The mock needs to distinguish pack-level file errors (which don't propagate to the write stream in production) from genuine stream errors.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +142 to +150
const pack = new tar.Pack({
file: params.file,
gzip: params.gzip,
portable: params.portable,
preservePaths: params.preservePaths,
onWriteEntry: params.onWriteEntry,
});
const stream = createWriteStream(params.file);
pack.pipe(stream);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 file option is unused by tar.Pack — only tar.c/createFile opens the stream

tar.Pack stores file in its options object but never opens a WriteStream for it; that behaviour belongs exclusively to the higher-level tar.c / createFile helpers. The actual output here is driven entirely by pack.pipe(createWriteStream(params.file)) two lines below. Passing file to the Pack constructor is harmless but misleading — it looks like it controls where the archive is written (as it does in tar.c), but it doesn't.

Suggested change
const pack = new tar.Pack({
file: params.file,
gzip: params.gzip,
portable: params.portable,
preservePaths: params.preservePaths,
onWriteEntry: params.onWriteEntry,
});
const stream = createWriteStream(params.file);
pack.pipe(stream);
const pack = new tar.Pack({
gzip: params.gzip,
portable: params.portable,
preservePaths: params.preservePaths,
onWriteEntry: params.onWriteEntry,
});
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/backup-create.ts
Line: 142-150

Comment:
**`file` option is unused by `tar.Pack` — only `tar.c`/`createFile` opens the stream**

`tar.Pack` stores `file` in its options object but never opens a WriteStream for it; that behaviour belongs exclusively to the higher-level `tar.c` / `createFile` helpers. The actual output here is driven entirely by `pack.pipe(createWriteStream(params.file))` two lines below. Passing `file` to the Pack constructor is harmless but misleading — it looks like it controls where the archive is written (as it does in `tar.c`), but it doesn't.

```suggestion
  const pack = new tar.Pack({
    gzip: params.gzip,
    portable: params.portable,
    preservePaths: params.preservePaths,
    onWriteEntry: params.onWriteEntry,
  });
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 397 to 414
await fs.writeFile(manifestPath, `${JSON.stringify(manifest, null, 2)}\n`, "utf8");

await tar.c(
const skippedPaths = await createTarArchiveWithMissingSkip(
{
file: tempArchivePath,
gzip: true,
portable: true,
preservePaths: true,
onWriteEntry: (entry) => {
entry.path = remapArchiveEntryPath({
entryPath: entry.path,
(entry as { path: string }).path = remapArchiveEntryPath({
entryPath: (entry as { path: string }).path,
manifestPath,
archiveRoot,
});
},
},
[manifestPath, ...result.assets.map((asset) => asset.sourcePath)],
);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Manifest written before archiving won't reflect ENOENT-skipped files

The manifest is serialised and written to manifestPath before createTarArchiveWithMissingSkip runs. ENOENT-skipped entries are appended to result.skipped only after archiving completes, so the manifest embedded in the archive will never list them. Any restore tool (or human) inspecting the in-archive manifest would see the sessions asset as fully present even when individual files were silently omitted.

This is a trade-off inherent in writing the manifest first so it can be included in the archive, and the CLI summary output does report the skipped files correctly. Worth an inline comment explaining the gap so future maintainers don't assume the in-archive manifest is a complete record of what was actually archived vs. skipped.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/infra/backup-create.ts
Line: 397-414

Comment:
**Manifest written before archiving won't reflect ENOENT-skipped files**

The manifest is serialised and written to `manifestPath` before `createTarArchiveWithMissingSkip` runs. ENOENT-skipped entries are appended to `result.skipped` only after archiving completes, so the manifest embedded in the archive will never list them. Any restore tool (or human) inspecting the in-archive manifest would see the sessions asset as fully present even when individual files were silently omitted.

This is a trade-off inherent in writing the manifest first so it can be included in the archive, and the CLI summary output does report the skipped files correctly. Worth an inline comment explaining the gap so future maintainers don't assume the in-archive manifest is a complete record of what was actually archived vs. skipped.

How can I resolve this? If you propose a fix, please make it concise.

@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented Apr 27, 2026

Codex review: found issues before merge.

What this changes:

The PR replaces backup creation's tar.c archiving call with a tar.Pack helper that records missing paths as skipped and adds a Pack mock to backup test support.

Maintainer follow-up before merge:

This is an open contributor PR with real correctness findings and an overlapping open backup-race PR; maintainers should choose whether to revise this Pack-based approach, fold in volatile filtering, or supersede it with #72251 before an automated repair lane attempts code changes.

Security review:

Security review cleared: Security review cleared: the diff changes local backup archiving code and a Vitest mock only, with no new dependency, workflow, package script, permission, publishing, secret-handling, or downloaded-artifact surface.

Review findings:

  • [P1] Complete skipped Pack jobs before resolving — src/infra/backup-create.ts:155-160
  • [P2] Keep skipped assets out of the manifest — src/infra/backup-create.ts:415-422
  • [P3] Separate Pack errors from stream errors in the mock — src/commands/backup.test-support.ts:37-40
Review details

Best possible solution:

Revise or replace the patch with a backup archiving path that either avoids feeding known-volatile session/log files to tar or handles archiver errors in a way that actually drains/completes the tar job. The shipped behavior should keep result, manifest, and --verify semantics consistent, add a focused disappearing-session regression test, include the user-facing changelog entry, and reconcile the remaining volatile-file work with #72251.

Do we have a high-confidence way to reproduce the issue?

Yes. #67417 gives a concrete reproduction path: include live session transcripts in a backup plan, delete one before tar archives it, and current main can fail through tar.c. For this PR's patch, the dependency-level reproduction is to emit ENOENT from tar.Pack during stat/open and observe that returning from the Pack error listener does not complete the queued Pack job.

Is this the best way to solve the issue?

No. tar.Pack may be a useful lower-level seam, but this implementation is not the best solution because it swallows Pack errors without completing jobs and appends skipped paths after the archive manifest has already been written. A safer solution should filter or retry volatile paths while preserving manifest and verification integrity.

Full review comments:

  • [P1] Complete skipped Pack jobs before resolving — src/infra/backup-create.ts:155-160
    When node-tar fails a Pack stat/open/read with ENOENT, Pack emits error but does not remove the queued PackJob; JOBDONE only runs on entry end. Returning from this handler leaves the queue non-empty and the write stream may never close, so backup create can hang instead of succeeding with a skipped file.
    Confidence: 0.88
  • [P2] Keep skipped assets out of the manifest — src/infra/backup-create.ts:415-422
    The manifest is written before archive-time skips are discovered, then result.skipped is updated after archiving. If an entire planned asset is skipped, --verify can fail the archive it just created because manifest.assets still declares missing payload.
    Confidence: 0.78
  • [P3] Separate Pack errors from stream errors in the mock — src/commands/backup.test-support.ts:37-40
    MockPack invokes Pack error listeners and then emits the same error on the output stream. That makes an ENOENT skip reject through stream.on("error") even though the production helper relies on Pack-level file errors being handled separately.
    Confidence: 0.82
  • [P3] Add the required changelog entry — src/infra/backup-create.ts:399
    This is a user-facing backup fix, but the diff only changes source and test support. OpenClaw policy requires a CHANGELOG.md entry for fix changes so the shipped behavior is visible in release notes.
    Confidence: 0.9

Overall correctness: patch is incorrect
Overall confidence: 0.88

Acceptance criteria:

  • pnpm test src/commands/backup.test.ts src/commands/backup.atomic.test.ts src/commands/backup-verify.test.ts src/infra/backup-create.test.ts
  • pnpm exec oxfmt --check --threads=1 src/infra/backup-create.ts src/commands/backup.test-support.ts CHANGELOG.md

What I checked:

  • Current main still uses tar.c: createBackupArchive on current main writes manifest.json, loads tar, and calls tar.c(...) with the manifest and planned asset roots; there is no archive-time ENOENT/ENOTDIR skip path on main. (src/infra/backup-create.ts:371, 68c010906a94)
  • Patch swallows Pack missing-file errors: The PR branch records isNotFoundPathError(err) in skipped and returns from the Pack error handler, while waiting for the output stream close event to resolve. (src/infra/backup-create.ts:154, ca60d0025104)
  • node-tar Pack job completion contract: In tar 7.5.13, Pack stat/readdir/WriteEntry failures emit error; Pack jobs are removed only by JOBDONE on entry end, so handling the error externally does not by itself drain the Pack queue.
  • Manifest verification requires declared payloads: backup verify checks every manifest.assets entry for an exact or nested archive payload and throws when one is missing; the tests cover missing manifest payload as invalid. (src/commands/backup-verify.ts:235, 68c010906a94)
  • Related overlapping PR is still open: fix(backup): retry on tar EOF race and skip known volatile files #72251 is an open backup PR that addresses related live-backup races with volatile-path filtering and retry behavior, and its author explicitly noted overlap with this PR in the discussion.

Likely related people:

  • shichangs: Authored the merged local backup CLI feature that added openclaw backup create, backup verify, archive manifests, and the central backup files now being changed. (role: introduced behavior; confidence: high; commits: 0ecfd37b4465; files: src/infra/backup-create.ts, src/commands/backup-verify.ts, docs/cli/backup.md)
  • steipete: Recent backup-area history includes backup test helper sharing and backup verify refactors under Peter Steinberger's commits, making this a likely maintainer routing point for the current implementation and tests. (role: recent maintainer; confidence: high; commits: 93801281938b, f97b1fa0c3a3; files: src/commands/backup.test-support.ts, src/commands/backup.test.ts, src/commands/backup-verify.ts)
  • Gustavo Madeira Santana: Worked on backup verification/path safety and helper extraction shortly after the backup CLI landed, including the manifest verification area that this PR can make inconsistent. (role: adjacent owner; confidence: medium; commits: 09acbe65289d, 3ba64916599b; files: src/commands/backup-verify.ts, src/commands/backup-shared.ts, src/infra/backup-create.ts)
  • Vincent Koc: Recently shared backup command test fixtures, which overlaps the backup.test-support.ts mock changed by this PR. (role: test maintainer; confidence: medium; commits: c9e12cbd32ee; files: src/commands/backup.test-support.ts, src/commands/backup.test.ts)

Remaining risk / open question:

  • The final approach should be coordinated with fix(backup): retry on tar EOF race and skip known volatile files #72251 so OpenClaw does not land two divergent backup skip/retry models or duplicate summary fields.
  • The PR currently lacks a focused disappearing-file regression test and required changelog entry, so even a corrected implementation still needs merge-gate cleanup.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 68c010906a94.

@abnershang
Copy link
Copy Markdown
Contributor

Hi @zhouhe-xydt — author of #72251 here. Just spotted the overlap; we're solving adjacent problems in the same file.

#67472: ENOENT during tar archiving → swap tar.c for tar.Pack + inline skip. Clean, surgical.

#72251: Tar EOF race + proactive filter for known volatile paths. Belt-and-suspenders via pre-filter + post-retry.

I think these are complementary. Your tar.Pack approach probably obsoletes my EOF retry logic (since Pack handles the stream-level race without needing to restart tar). The volatile-file filter in #72251 remains useful as proactive exclusion — cheaper than letting tar hit the file then skip.

Proposal: once #67472 merges, I'll rebase #72251 on top, drop the retry scaffolding, keep only the volatile-filter layer. Or we can fold it all into #67472 if you prefer — happy either way. Which route works for you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

commands Command implementations size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: openclaw backup create fails with ENOENT when session file is cleaned up during backup

2 participants