fix(sessions): preserve corrupt-header transcripts by charles-openclaw · Pull Request #89050 · openclaw/openclaw

charles-openclaw · 2026-06-01T10:23:34Z

Summary

Recover corrupted first-line session headers in SessionManager.open() before the destructive rewrite path can run.
Preserve the original bytes in a same-mode .corrupt-* backup, then rewrite the active transcript as valid JSONL.
Keep recovered user-only transcripts through run preparation and first assistant persistence, with regression coverage for the production path.

Linked context

Real behavior proof (required for external PRs)

Behavior addressed: resuming a session whose first JSONL header line is corrupted no longer truncates persisted transcript rows; recovered user-only transcripts also survive the production SessionManager.open() -> prepareSessionManagerForRun() -> first assistant append path.
Real environment tested: local OpenClaw source checkout on the PR branch, using production SessionManager.open() and prepareSessionManagerForRun() against disposable JSONL session files under the OS temp directory.
Exact steps or command run after this patch: node --import tsx .artifacts/corrupt-session-proof.ts.
Evidence after fix: terminal output from the disposable OpenClaw runtime proof:

corrupt header + assistant resume
active lines: 3
contains user: true
contains assistant: true
backup files: 1
corrupt header + user-only production path
active lines: 3
header id: sess-runtime
contains user: true
assistant appended: true

Observed result after fix: the corrupted-header file is rewritten to a valid three-line transcript while preserving the user and assistant rows, one .corrupt-* backup is created, and the recovered user-only production path keeps the user row before appending the first assistant row.
What was not tested: no private user transcript was opened; the proof uses disposable local session files with the same corrupted-header JSONL shapes.
Before evidence: current main rewrites the corrupted-header transcript to a fresh one-line session header when SessionManager.open() receives a non-empty file whose first parseable entry is not a session header.

Tests and validation

pnpm test src/agents/sessions/session-manager.test.ts src/agents/embedded-agent-runner/session-manager-init.test.ts -- --reporter=dot passed 2 files / 6 tests.
pnpm tsgo:prod passed.
pnpm check:test-types passed.
pnpm lint --threads=8 passed.
git diff --check -- src/agents/sessions/session-manager.ts src/agents/sessions/session-manager.test.ts src/agents/embedded-agent-runner/session-manager-init.ts src/agents/embedded-agent-runner/session-manager-init.test.ts passed.
.agents/skills/autoreview/scripts/autoreview --mode local reported no accepted/actionable findings.

Risk checklist

Did user-visible behavior change? Yes, corrupt session-header resumes now fail before destructive reset instead of silently wiping the file.
Did config, environment, or migration behavior change? No.
Did security, auth, secrets, network, or tool execution behavior change? No.
Highest-risk area: preserving the existing first-run session-manager workaround. The guard only reads and validates the existing first nonblank line before the reset; valid pre-created session headers still follow the old reset path.

Current review state

Maintainer fix 30307017a8 pushed and ClawSweeper re-review requested.

clawsweeper · 2026-06-01T10:25:56Z

Codex review: needs changes before merge. Reviewed June 2, 2026, 6:57 AM ET / 10:57 UTC.

Summary
The PR recovers corrupt first-line session JSONL headers in SessionManager.open(), preserves the original bytes in a same-mode corrupt backup, and adds session-manager/run-preparation regression coverage for recovered transcripts.

PR surface: Source +103, Tests +224. Total +327 across 4 files.

Reproducibility: yes. The linked issue and PR comments provide a high-confidence source reproduction through SessionManager.open() with a corrupt first header line, and the current main code path shows that malformed first line is skipped before the empty/corrupt rewrite branch.

Review metrics: 1 noteworthy metric.

Persistent transcript artifact surface: 1 new .corrupt-*.jsonl sidecar. The added sidecar contains private transcript bytes and crosses existing transcript classifier/scanner behavior before merge.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🦞 diamond lobster
Patch quality: 🧂 unranked krab
Result: blocked by patch quality or review findings.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

Classify generated corrupt backups as recovery/archive artifacts, not primary or usage-counted transcripts.
[P2] Add focused src/config/sessions/artifacts.test.ts coverage and rerun the focused session-manager/session-manager-init/artifacts tests.

Risk before merge

[P1] Persistent .corrupt-*.jsonl backups contain private transcript bytes but currently match primary/usage-counted transcript helpers, so memory, cost, doctor, and disk-budget paths can process them as live sessions.
[P1] Because this is a P0 data-loss recovery path, merge should include artifact classifier proof in addition to the focused recovery tests.

Maintainer options:

Classify corrupt backups before merge (recommended)
Change the backup naming/classification so generated corrupt backups are non-primary and non-usage-counted recovery artifacts, then add focused artifact tests.
Accept the scanner exposure risk
Maintainers could intentionally accept that corrupt backups are scanned like live transcripts, but that should be an explicit data-handling decision before merge.
Pause for recovery-artifact policy
If the team is not ready to define persistent corrupt transcript artifacts, pause this PR and land only after the recovery artifact contract is settled.

Copy recommended automerge instruction

@clawsweeper automerge

Special instructions:
Change corrupt transcript backup naming/classification so generated backups are not treated as primary or usage-counted session transcripts; add focused artifacts tests for the generated corrupt backup name; rerun the focused session-manager, session-manager-init, and artifacts tests plus git diff --check.

Next step before merge

[P2] A narrow automated repair can address the remaining blocker by changing corrupt backup naming/classification and adding focused classifier coverage.

Security
Needs attention: The diff creates a persistent copy of private transcript bytes, and its .jsonl name currently falls through live transcript classifiers.

Review findings

[P1] Stop corrupt backups from matching live transcript artifacts — src/agents/sessions/session-manager.ts:431

Review details

Best possible solution:

Keep the recovery-in-SessionManager.open() approach, but land it only after corrupt backups use a recognized recovery/archive classification and focused scanner/classifier tests prove they are not treated as live transcripts.

Do we have a high-confidence way to reproduce the issue?

Yes. The linked issue and PR comments provide a high-confidence source reproduction through SessionManager.open() with a corrupt first header line, and the current main code path shows that malformed first line is skipped before the empty/corrupt rewrite branch.

Is this the best way to solve the issue?

No, not yet. Moving recovery into SessionManager.open() is the right layer, but the merge-safe solution also needs the generated corrupt backup to be classified as a recovery/archive artifact rather than a live transcript.

Full review comments:

[P1] Stop corrupt backups from matching live transcript artifacts — src/agents/sessions/session-manager.ts:431
buildCorruptSessionBackupPath() creates backups named <session>.jsonl.corrupt-... .jsonl, but the existing artifact helpers treat any non-archive .jsonl as a primary and usage-counted transcript. That means this new private backup can be scanned or surfaced like a live session instead of a recovery artifact. Please use an existing archive-style suffix or add a corrupt archive classification, with focused tests proving these backups are non-primary and non-usage-counted.
Confidence: 0.9

Overall correctness: patch is incorrect
Overall confidence: 0.88

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 1db2c2a3e058.

Label changes

Label justifications:

P0: The PR addresses a linked data-loss bug where corrupted session headers can wipe persisted transcripts.
merge-risk: 🚨 session-state: Merging as-is adds a persistent recovered-transcript sidecar that current session scanners can treat as a live transcript.
merge-risk: 🚨 security-boundary: The sidecar preserves private transcript bytes, and its current .jsonl shape can expose it to memory/cost/session tooling instead of recovery-only handling.
rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🦞 diamond lobster and patch quality is 🧂 unranked krab.
status: ⏳ waiting on author: ClawSweeper has contributor-facing work open and is waiting for author action. Sufficient (terminal): The PR body and follow-up comments include terminal output from disposable filesystem-backed runs through production SessionManager.open() and prepareSessionManagerForRun() showing recovered messages, assistant append, backup creation, and backup mode.
proof: sufficient: Contributor real behavior proof is sufficient. The PR body and follow-up comments include terminal output from disposable filesystem-backed runs through production SessionManager.open() and prepareSessionManagerForRun() showing recovered messages, assistant append, backup creation, and backup mode.

Evidence reviewed

PR surface:

Source +103, Tests +224. Total +327 across 4 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	2	112	9	+103
Tests	2	224	0	+224
Docs	0	0	0	0
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	4	336	9	+327

Security concerns:

[medium] Corrupt transcript backup can be scanned as live session data — src/agents/sessions/session-manager.ts:431
The backup is same-mode, which is good, but it is still named like a normal .jsonl transcript while preserving private transcript bytes. Existing session memory, usage/cost, doctor, and disk-budget paths rely on filename classifiers, so the backup needs explicit recovery/archive classification before merge.
Confidence: 0.88

Acceptance criteria:

[P1] node scripts/run-vitest.mjs src/agents/sessions/session-manager.test.ts src/agents/embedded-agent-runner/session-manager-init.test.ts src/config/sessions/artifacts.test.ts.
[P1] git diff --check -- src/agents/sessions/session-manager.ts src/agents/sessions/session-manager.test.ts src/agents/embedded-agent-runner/session-manager-init.ts src/agents/embedded-agent-runner/session-manager-init.test.ts src/config/sessions/artifacts.ts src/config/sessions/artifacts.test.ts.

What I checked:

Recovery happens before destructive rewrite: At PR head, setSessionFile() attempts corrupt-header recovery before the old empty/corrupt rewrite branch, then rewrites recovered entries and marks the manager as recovered. (src/agents/sessions/session-manager.ts:775, 30307017a866)
Backup naming creates a live-looking JSONL: The new corrupt backup path is ${filePath}.corrupt-<timestamp>-<uuid>.jsonl, so a generated backup still ends in .jsonl and does not use an existing archive suffix. (src/agents/sessions/session-manager.ts:429, 30307017a866)
Current classifier treats unknown .jsonl as primary: isPrimarySessionTranscriptFileName() accepts any .jsonl that is not trajectory, checkpoint, or a recognized archive; recognized archive reasons are only deleted, reset, and bak. (src/config/sessions/artifacts.ts:88, 1db2c2a3e058)
Scanner callers depend on the classifier: Disk-budget cleanup includes primary transcript files as unreferenced artifacts, usage/cost scanning filters on usage-counted transcript names, and session memory hit parsing treats .jsonl names as transcript identities. (src/config/sessions/disk-budget.ts:287, 1db2c2a3e058)
Production entry path checked: The embedded runner opens the session file with SessionManager.open() before prepareSessionManagerForRun(), matching the linked data-loss reproduction path discussed on the PR. (src/agents/embedded-agent-runner/run/attempt.ts:1961, 1db2c2a3e058)
Scoped policy read: Root AGENTS.md and src/agents/AGENTS.md were read; the review applied the private session-state, scanner-proof, and whole-path review guidance. Maintainer notes were checked and only a Telegram note existed, so no matching maintainer note applied. (AGENTS.md:1, 1db2c2a3e058)

Likely related people:

steipete: Authored the latest PR-head maintainer fix commit and summarized the recovery behavior and local validation in the PR discussion. (role: recent fix author; confidence: high; commits: 30307017a866; files: src/agents/sessions/session-manager.ts, src/agents/embedded-agent-runner/session-manager-init.ts)
Abner Shang: Current checkout blame/log show the session manager, init helper, and artifact classifier files entering this checkout through commit 5be282e; history depth limits stronger ownership inference. (role: current-main area contributor; confidence: low; commits: 5be282e459ee; files: src/agents/sessions/session-manager.ts, src/agents/embedded-agent-runner/session-manager-init.ts, src/config/sessions/artifacts.ts)
yetval: Provided the linked data-loss issue and rechecked the PR against the real SessionManager.open() path, including the follow-up header-only edge case. (role: reporter and reproduction reviewer; confidence: medium; files: src/agents/sessions/session-manager.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

charles-openclaw · 2026-06-01T10:39:15Z

Thanks for the review. I cannot honestly provide live production resume proof against a real corrupted user transcript from this contributor environment without touching private/local session data.

What I can confirm publicly is the focused behavior proof in this PR: the regression uses the real filesystem-backed prepareSessionManagerForRun helper with a temp JSONL transcript whose first line is malformed and whose later line contains a persisted user message. After the guarded path rejects, the test asserts the transcript string remains byte-for-byte unchanged, which is the data-preservation behavior this patch is meant to protect.

Verification already run:

PNPM_CONFIG_OFFLINE=true corepack pnpm test src/agents/embedded-agent-runner/session-manager-init.test.ts -- --reporter=dot passed the focused shard / 3 tests.
git diff --check -- src/agents/embedded-agent-runner/session-manager-init.ts src/agents/embedded-agent-runner/session-manager-init.test.ts passed.

The remaining live proof item needs a maintainer or safe disposable session fixture that can reproduce the production resume path without exposing a real user's transcript.

yetval · 2026-06-01T10:44:14Z

Thanks for jumping on this @charles-openclaw — but I think this fix guards the wrong layer, and #89037's data loss is still reproducible on this branch (ffb09b97).

The real truncation happens before `prepareSessionManagerForRun` runs

The production resume path is:

attempt.ts:1961 sessionManager = guardSessionManager(SessionManager.open(params.sessionFile), …)
 └─ SessionManager.open setSessionFile() // src/agents/sessions/session-manager.ts
 └─ loadEntriesFromFile() returns [] //:384 — corrupt header line is skipped, entries[0]
 // is a message, header check fails []
 └─ "empty or corrupted" branch //:725
 └─ this.rewriteFile() //:731 — TRUNCATES the file to a fresh header
...
attempt.ts:2023 await prepareSessionManagerForRun({ … }) // runs LATER

By the time prepareSessionManagerForRun is reached, SessionManager.open() has already overwritten the corrupt-header file with a brand-new valid header. So assertExistingHeaderIsReadable() reads that fresh, valid header, never throws, and the original transcript is long gone. The guard added here can't fire for the case in the issue.

Proof on this PR branch (`ffb09b97`)

I ran the exact #89037 repro through the real resume entrypoint (SessionManager.open), not a hand-built object:

// file = [crash-truncated-header, user-msg, assistant-msg] (3 lines)
writeFileSync(file, `${JSON.stringify(header).slice(0, 30)}\n${JSON.stringify(u)}\n${JSON.stringify(a)}\n`);
SessionManager.open(file); // what attempt.ts:1961 does on resume
const text = readFileSync(file, "utf8");
expect(text).toContain("important question"); // FAILS

Output (PR branch):

before: 3 lines
after: 1 lines: [ '{"type":"session","version":3,"id":"019e82c7-…' ] // new id, both messages destroyed
AssertionError: expected '…' to contain 'important question'

Why the new regression test passes anyway

session-manager-init.test.ts builds a fake sessionManager object literal whose fileEntries already contain a header, so it never exercises SessionManager.open() / setSessionFile() / loadEntriesFromFile() — i.e. it skips the layer that actually does the destructive rewriteFile(). That's why it's green while the real bug remains. A faithful regression should drive SessionManager.open(corruptFile) (or go through the attempt.ts resume path) and assert the bytes survive.

Suggested fix location

Move the guard down into setSessionFile()'s empty/corrupt branch (session-manager.ts:725-735), where the data is still intact:

Don't rewriteFile() when the file is non-empty on disk but only fails header validation.
Before any rewrite, back up the original bytes (e.g. <file>.corrupt-<ts>.jsonl) so history is recoverable.
Prefer recovery: if a type:"session" entry exists anywhere in the parsed entries, use it; otherwise synthesize a header and prepend it, keeping the existing message lines.
Only take the "fresh start" path when the raw file is genuinely empty (0 bytes / no parseable lines), not merely "no valid header at line 0".

The guard in prepareSessionManagerForRun is still a reasonable belt-and-suspenders for that helper's own fs.writeFile(sessionFile, ""), so it's fine to keep — but it isn't sufficient to close #89037 on its own.

Happy to push the session-manager.ts fix + a real SessionManager.open() regression onto this branch (or a follow-up) if that helps. cc @charles-openclaw

charles-openclaw · 2026-06-01T11:01:56Z

Thanks, that reproduction is exactly right. I pushed follow-up commit 73f73255 moving the recovery into SessionManager.open() / setSessionFile(), before the destructive rewrite path can run.

Summary:

When an existing session file is non-empty but fails first-line header validation, setSessionFile() now backs up the original bytes to a .corrupt-*.jsonl sidecar before rewriting anything.
It recovers a valid session header found later in the parsed entries, or synthesizes a fresh header when only message entries are parseable.
It rewrites the active transcript as valid JSONL with the recovered/synthesized header plus the existing non-header entries, so the user/assistant messages survive SessionManager.open(corruptFile).
I kept the previous prepareSessionManagerForRun guard as the belt-and-suspenders check for that helper's reset path.

Verification:

Added a real SessionManager.open(sessionFile) regression for the corrupted first-line repro shape. The test asserts the user and assistant messages remain in the active transcript and the original bytes are present in a .corrupt-* backup.
Ran corepack pnpm exec oxfmt --write --threads=1 src/agents/sessions/session-manager.ts src/agents/sessions/session-manager.test.ts src/agents/embedded-agent-runner/session-manager-init.ts src/agents/embedded-agent-runner/session-manager-init.test.ts.
Ran PNPM_CONFIG_OFFLINE=true corepack pnpm test src/agents/sessions/session-manager.test.ts src/agents/embedded-agent-runner/session-manager-init.test.ts -- --reporter=dot -> 2 files / 4 tests passed.
Ran git diff --check -- src/agents/sessions/session-manager.ts src/agents/sessions/session-manager.test.ts src/agents/embedded-agent-runner/session-manager-init.ts src/agents/embedded-agent-runner/session-manager-init.test.ts -> passed.

@clawsweeper re-review

clawsweeper · 2026-06-01T11:02:00Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

State: Complete
Detail: The targeted re-review finished, the durable review comment was updated, and the synced verdict was routed.
Run: https://github.com/openclaw/clawsweeper/actions/runs/26750940321
Updated: 2026-06-01T11:08:16.343Z

charles-openclaw · 2026-06-01T11:21:28Z

Thanks, the backup-mode finding was right. I pushed follow-up commit 9d373724 to keep the persistent .corrupt-* sidecar under the same private-mode contract as the source transcript.

Summary:

recoverCorruptSessionEntries() now stats the original transcript, writes the backup with that mode, and explicitly chmods the sidecar so process umask does not leave a broader-readable copy.
The focused regression now chmods the disposable transcript to 0o600 and asserts the generated corrupt backup is also 0o600 on non-Windows platforms.

Verification:

corepack pnpm exec oxfmt --write --threads=1 src/agents/sessions/session-manager.ts src/agents/sessions/session-manager.test.ts
PNPM_CONFIG_OFFLINE=true corepack pnpm test src/agents/sessions/session-manager.test.ts src/agents/embedded-agent-runner/session-manager-init.test.ts -- --reporter=dot passed 2 files / 4 tests.
git diff --check -- src/agents/sessions/session-manager.ts src/agents/sessions/session-manager.test.ts src/agents/embedded-agent-runner/session-manager-init.ts src/agents/embedded-agent-runner/session-manager-init.test.ts

I still cannot honestly provide a real private production resume recording from this contributor environment without using local/private session state. The disposable filesystem-backed regression now covers the recovered transcript contents plus backup permissions, and the remaining live proof item is still best done by a maintainer or a safe disposable OpenClaw session setup.

@clawsweeper re-review

clawsweeper · 2026-06-01T11:21:31Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

yetval · 2026-06-01T11:37:54Z

Re-checked on 9d373724 — the recovery rework lands the fix where it matters. Verified against the real resume entrypoint (SessionManager.open(), i.e. attempt.ts:1961):

[corrupt-header, user-msg, assistant-msg] now survives the resume — both the user prompt and the assistant reply are preserved, and the original bytes are written to a private *.corrupt-<ts>.jsonl backup with the source mode.
corrupt-header + user-only assistant reply stays clean (1 header, 1 user).

So #89037's data loss is resolved. One small residual worth tightening before merge:

Header-only corrupt file duplicate header

When the corrupt file has no recoverable messages (e.g. a crash left only a half-written header line), recoverCorruptSessionEntries synthesizes [header], rewriteFile()s it, and sets flushed = true. The first userassistant turn then hits persist()'s no-assistant guard (flushed false) and re-appends the whole buffer, duplicating the header:

writeFileSync(file, `{"type":"session","version":3,"id":"sess`); // only a truncated header, no messages
const sm = SessionManager.open(file);
sm.appendMessage(msg("user", "hello"));
sm.appendMessage(msg("assistant", "hi"));
// final types: ['session','session','message','message'] -> headerCount: 2 (expected 1)

No data is lost here (there were no messages), so it's low severity — just a malformed two-header transcript. It's the same flushed/no-assistant fragility that createBranchedSession already guards against. Mirroring that here closes it:

// after building recoveredEntries
const hasAssistant = recoveredEntries.some(
 (e) => e.type === "message" && e.message.role === "assistant",
);
this.fileEntries = recoveredEntries;
const header = this.fileEntries.find((e) => e.type === "session");
this.sessionId = header?.id?? createSessionId();
this.buildIndex();
if (hasAssistant) {
 this.rewriteFile();
 this.flushed = true;
} else {
 this.flushed = false; // defer to persist(); avoids the duplicate-header re-flush
}
return;

(That also keeps the recovery branch consistent with the deferred-write contract documented in createBranchedSession.) Everything else looks good — nice turnaround. cc @charles-openclaw

charles-openclaw · 2026-06-01T11:40:19Z

I added a disposable real-path proof run for the current PR head after the latest review. This did not touch any real user/session state; it used a temp transcript under /tmp with synthetic markers only.

Summary:

Ran the production SessionManager.open(disposableCorruptSession, tempDir, "/tmp/disposable-openclaw-proof") path against a malformed first-line session transcript.
The active transcript retained both later persisted entries: proof-user-1 and proof-assistant-1.
The active transcript still contained both synthetic markers after recovery: DISPOSABLE_PROOF_USER_MARKER_89050 and DISPOSABLE_PROOF_ASSISTANT_MARKER_89050.
Exactly one same-directory .corrupt-* backup was created, and its bytes matched the original disposable corrupt transcript.
The source transcript, recovered active transcript, and backup all had mode 0600 on this Linux runner.

Command shape used locally:

node --import tsx <<'NODE'
# creates a temp corrupt JSONL transcript, chmods it 0600,
# calls SessionManager.open(...), checks recovered entries and backup mode,
# prints only synthetic/redacted proof data, then removes the temp dir
NODE

Observed proof output:

{
  "tempRoot": "/tmp/<redacted-temp-dir>",
  "openCall": "SessionManager.open(disposableCorruptSession, tempDir, /tmp/disposable-openclaw-proof)",
  "beforeMode": "0600",
  "afterMode": "0600",
  "backupMode": "0600",
  "recoveredEntryIds": ["proof-user-1", "proof-assistant-1"],
  "activeTranscriptHasUserMarker": true,
  "activeTranscriptHasAssistantMarker": true,
  "backupByteLength": 390,
  "backupMatchesOriginal": true,
  "backupFileNamePattern": "disposable-corrupt-session.jsonl.corrupt-<timestamp>.jsonl"
}

The remaining policy question is intentional: this PR preserves the original corrupt bytes in a same-directory, same-mode .corrupt-* sidecar so users have a recovery artifact instead of silent truncation. I think that is the right tradeoff for this data-loss case, but it does need maintainer acceptance because it creates a persistent transcript backup.

@clawsweeper re-review

clawsweeper · 2026-06-01T11:40:22Z

🦞👀
ClawSweeper picked this up.

Command router queued. I will update this comment with the next step.

Re-review progress:

State: Complete
Detail: The targeted re-review finished, the durable review comment was updated, and the synced verdict was routed.
Run: https://github.com/openclaw/clawsweeper/actions/runs/26752676003
Updated: 2026-06-01T11:46:34.676Z

charles-openclaw · 2026-06-01T17:35:02Z

@clawsweeper re-review

Thanks, the header-only recovery edge case was a good catch. I pushed follow-up commit 20ae0d331 to avoid duplicating recovered session headers.

Summary:

Corrupt-session recovery now only rewrites immediately when the recovered transcript already includes an assistant message.
Header-only or user-only recovered transcripts defer to the normal first-assistant persist path.
The deferred full-buffer persist now writes the session buffer instead of appending it, so an old corrupt prefix or already-written header cannot produce a malformed two-header transcript.
Added a regression for the exact header-only corrupt file shape: after appending a user and assistant message, the on-disk types are session, message, message with exactly one session header.

Verification:

corepack pnpm exec oxfmt --write --threads=1 src/agents/sessions/session-manager.ts src/agents/sessions/session-manager.test.ts
PNPM_CONFIG_OFFLINE=true corepack pnpm test src/agents/sessions/session-manager.test.ts src/agents/embedded-agent-runner/session-manager-init.test.ts -- --reporter=dot passed 2 files / 5 tests.
git diff --check -- src/agents/sessions/session-manager.ts src/agents/sessions/session-manager.test.ts

clawsweeper · 2026-06-01T17:35:05Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

State: Complete
Detail: The targeted re-review finished, the durable review comment was updated, and the synced verdict was routed.
Run: https://github.com/openclaw/clawsweeper/actions/runs/26771232465
Updated: 2026-06-01T17:42:37.410Z

steipete · 2026-06-02T10:40:50Z

Maintainer fix pushed: 30307017a8 (fix(sessions): preserve recovered corrupt transcripts).

What changed:

SessionManager.open() now rewrites every recovered corrupt-header transcript to valid JSONL immediately after creating the .corrupt-* backup, so the production initializer no longer sees unreadable disk bytes.
Recovered corrupt-header sessions now expose an internal recovery marker; prepareSessionManagerForRun() uses it to rewrite the header id/cwd while preserving recovered user-only rows instead of clearing them.
Added production-path regression coverage for corrupt header + user-only through SessionManager.open() -> prepareSessionManagerForRun() -> first assistant append.
Fixed the stale unused import and typed the new test messages so the prior CI failures are addressed.

Local proof on this branch after the fix:

pnpm test src/agents/sessions/session-manager.test.ts src/agents/embedded-agent-runner/session-manager-init.test.ts -- --reporter=dot -> 2 files / 6 tests passed.
pnpm tsgo:prod -> passed.
pnpm check:test-types -> passed.
pnpm lint --threads=8 -> passed.
git diff --check -- src/agents/sessions/session-manager.ts src/agents/sessions/session-manager.test.ts src/agents/embedded-agent-runner/session-manager-init.ts src/agents/embedded-agent-runner/session-manager-init.test.ts -> passed.
.agents/skills/autoreview/scripts/autoreview --mode local -> clean, no accepted/actionable findings.

@clawsweeper re-review

clawsweeper · 2026-06-02T10:40:53Z

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Re-review progress:

State: Complete
Detail: The targeted re-review finished, the durable review comment was updated, and the synced verdict was routed.
Run: https://github.com/openclaw/clawsweeper/actions/runs/26814984395
Updated: 2026-06-02T10:58:04.600Z

steipete · 2026-06-02T11:01:38Z

Land-ready maintainer proof for head 30307017a8668bc7dbeec7c5ee5d1bd47743ad1c.

Work done:

Preserves corrupt-header transcripts by backing up the original bytes and immediately rewriting recovered active JSONL in SessionManager.open().
Preserves recovered user-only rows through prepareSessionManagerForRun() before the first assistant append.
Adds focused regression coverage for both recovered assistant transcripts and recovered user-only production init.

Local proof run:

pnpm test src/agents/sessions/session-manager.test.ts src/agents/embedded-agent-runner/session-manager-init.test.ts -- --reporter=dot
pnpm tsgo:prod
pnpm check:test-types
pnpm lint --threads=8
git diff --check -- src/agents/sessions/session-manager.ts src/agents/sessions/session-manager.test.ts src/agents/embedded-agent-runner/session-manager-init.ts src/agents/embedded-agent-runner/session-manager-init.test.ts
.agents/skills/autoreview/scripts/autoreview --mode local reported no accepted/actionable findings.

Real behavior proof:

GitHub Real behavior proof passed on run https://github.com/openclaw/openclaw/actions/runs/26814603460.
Disposable production-path proof covered corrupt-header + assistant resume and corrupt-header + user-only run prep, with user rows preserved, assistant rows appendable, and one backup file created.

Known proof gap:

Scan changed paths (precise) failed on synthetic merge SHA 26a91536b9582b52072d01b952e1d81033f3822f; the failing findings are outside this PR's four changed files. The PR diff is limited to session manager and session-manager-init files/tests.

fix(sessions): preserve corrupt-header transcripts

ffb09b9

openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels Jun 1, 2026

yetval mentioned this pull request Jun 1, 2026

Bug: resuming a session with a corrupted header line silently wipes the entire transcript (data loss) #89037

Closed

fix(sessions): recover corrupt session headers on open

73f7325

openclaw-barnacle Bot added size: M and removed size: S labels Jun 1, 2026

clawsweeper Bot added the merge-risk: 🚨 security-boundary 🚨 May affect sandboxing, authorization, credentials, or sensitive data. label Jun 1, 2026

fix(sessions): keep corrupt backups private

9d37372

clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels Jun 1, 2026

clawsweeper Bot added the status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. label Jun 1, 2026

openclaw-barnacle Bot removed the triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. label Jun 1, 2026

fix(sessions): avoid duplicate recovered headers

20ae0d3

openclaw-barnacle Bot added triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. and removed proof: sufficient ClawSweeper judged the real behavior proof convincing. labels Jun 1, 2026

steipete self-assigned this Jun 2, 2026

fix(sessions): preserve recovered corrupt transcripts

3030701

openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels Jun 2, 2026

steipete merged commit 2c48dd2 into openclaw:main Jun 2, 2026
190 of 197 checks passed

Uh oh!

Conversation

charles-openclaw commented Jun 1, 2026 • edited by steipete Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Linked context

Real behavior proof (required for external PRs)

Tests and validation

Risk checklist

Current review state

Uh oh!

clawsweeper Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charles-openclaw commented Jun 1, 2026

Uh oh!

yetval commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The real truncation happens before prepareSessionManagerForRun runs

Proof on this PR branch (ffb09b97)

Why the new regression test passes anyway

Suggested fix location

Uh oh!

charles-openclaw commented Jun 1, 2026

Uh oh!

clawsweeper Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charles-openclaw commented Jun 1, 2026

Uh oh!

clawsweeper Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yetval commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Header-only corrupt file duplicate header

Uh oh!

charles-openclaw commented Jun 1, 2026

Uh oh!

clawsweeper Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charles-openclaw commented Jun 1, 2026

Uh oh!

clawsweeper Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steipete commented Jun 2, 2026

Uh oh!

clawsweeper Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

steipete commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

charles-openclaw commented Jun 1, 2026 •

edited by steipete

Loading

clawsweeper Bot commented Jun 1, 2026 •

edited

Loading

yetval commented Jun 1, 2026 •

edited

Loading

The real truncation happens before `prepareSessionManagerForRun` runs

Proof on this PR branch (`ffb09b97`)

clawsweeper Bot commented Jun 1, 2026 •

edited

Loading

clawsweeper Bot commented Jun 1, 2026 •

edited

Loading

yetval commented Jun 1, 2026 •

edited

Loading

clawsweeper Bot commented Jun 1, 2026 •

edited

Loading

clawsweeper Bot commented Jun 1, 2026 •

edited

Loading

clawsweeper Bot commented Jun 2, 2026 •

edited

Loading