Skip to content

openclaw backup create fails with "did not encounter expected EOF" on live installations #72249

@abnershang

Description

@abnershang

Bug Description

openclaw backup create aborts with Error: did not encounter expected EOF when the archiver reads a file that is being actively appended to during the tar.c() stream. On any live OpenClaw installation, this is reliably reproducible against session transcript .jsonl files, cron run .log files, and state logs/*.jsonl — all of which are append-only on a running gateway.

Exact Error

Error: did not encounter expected EOF
    at WriteEntry.<anonymous> (.../node_modules/tar/dist/...)
    ...
Process exited with code 1

err.path points at the file being appended to mid-stream (typically a live session transcript or gateway log).

Root Cause

node-tar's WriteEntry records the file size from the initial lstat(). It then streams file contents via fs.read(). If the file grows between the lstat() and the end of the read, the header size no longer matches the byte count actually consumed, the stream errors with "did not encounter expected EOF", and tar.c() rejects. Because the backup is a single transaction, the whole archive is aborted — partial writes are not salvaged.

This is not an OpenClaw bug in the strict sense: it's a fundamental mismatch between how tar packs files and how any live service writes logs. But every user who runs openclaw backup create against a running install will eventually hit it, so the CLI needs to handle it.

Steps to Reproduce

Generic reproducer (no OpenClaw state needed):

mkdir -p /tmp/backup-eof-repro/src
cd /tmp/backup-eof-repro

# Generate a non-trivial file that will be appended to during archiving.
yes "$(head -c 2000 /dev/urandom | base64)" | head -c 100M > src/live.log

# In one shell, keep appending:
while true; do date >> src/live.log; sleep 0.05; done &
APPENDER=$!

# In another, try to tar it:
tar -czf out.tar.gz src/  # or via node-tar c({ file, gzip: true }, ['src'])
# Observe: intermittent "did not encounter expected EOF" error.

kill $APPENDER

In a real OpenClaw install, just run openclaw backup create against a state directory that has any active session (i.e. a running gateway). With a state tree in the tens of GB and dozens of live sessions, failure is nearly deterministic.

Environment

  • OpenClaw: observed on 2026.4.x line
  • Node: v22–v25
  • OS: macOS and Linux both reproduce
  • Triggering files: {stateDir}/sessions/**/*.{jsonl,log}, {stateDir}/cron/runs/**/*.log, {stateDir}/logs/**/*.{jsonl,log}

Impact

  • Severity: High for users with a running gateway and a non-trivial state directory — backups can fail repeatedly until the gateway is stopped.
  • Workaround today: stop the gateway before running backup create. Not viable for scheduled/automated backups.

Expected Behavior

openclaw backup create should complete on a running install. Files known to be volatile (live logs, sockets, pid/lock markers) are not meaningful to snapshot anyway and can safely be skipped; transient races on other files should be retried.

Proposed Fix

  1. Default-exclude known volatile paths in the backup archiver:
    • {stateDir}/sessions/**/*.{jsonl,log}
    • {stateDir}/cron/runs/**/*.log
    • {stateDir}/logs/**/*.{jsonl,log}
    • *.{sock,pid,tmp,lock} anywhere
  2. Retry tar.c() on EOF-class errors (up to 3 attempts, 10s/20s backoff) for residual races on other files. Clean the partial temp archive between attempts.
  3. On final failure, include err.path and attempt count in the thrown message so users get an actionable report.
  4. Surface the skipped-volatile count in stdout and in --json output for observability.

This is distinct from #67417 (ENOENT when a session file is deleted mid-backup — same race family, different failure mode and different fix). A broader exclude-rule system is proposed in #67990; this bug asks for the minimum viable built-in filter that makes backup create work out of the box on any live install, without requiring user configuration.

A PR implementing the above is on the way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions