Skip to content

fix(agents): dedupe transcript rewrite suffix replay#89387

Open
dripsmvcp wants to merge 1 commit into
openclaw:mainfrom
dripsmvcp:fix/66443-transcript-rewrite-dedupe
Open

fix(agents): dedupe transcript rewrite suffix replay#89387
dripsmvcp wants to merge 1 commit into
openclaw:mainfrom
dripsmvcp:fix/66443-transcript-rewrite-dedupe

Conversation

@dripsmvcp

@dripsmvcp dripsmvcp commented Jun 2, 2026

Copy link
Copy Markdown

Summary

Context-overflow recovery re-runs transcript suffix replay without a replay-identity guard, so byte-identical role=user entries are re-appended on every recovery pass and the persisted session JSONL accumulates duplicate messages (issue #66443).

  • Collapses only the proven overflow-clone pattern during replay: byte-identical role=user messages and byte-identical compactions. Exact suffix history is preserved for assistant, tool-result, and custom entries.
  • A legitimately repeated user message carries a distinct timestamp, so the full-payload hash keeps it while collapsing the identical recovery clone; compaction identity folds in the remapped firstKeptEntryId so two distinct compactions are not merged.
  • Intended outcome: overflow recovery becomes idempotent for the clone pattern without violating the rewrite's "preserve exact suffix history" contract for legitimate entries.
  • Intentionally out of scope: the broader duplicate-transcript umbrella, and repeated-bootstrap-custom dedup (custom entries are deliberately left untouched to avoid dropping legitimate history).
  • Reviewers should focus on the user/compaction scoping and the timestamp-based clone-vs-repeat distinction.

Linked context

Closes #66443

Related #79150 — superseded prior attempt; this PR addresses its "include the kept boundary in compaction identity" review note.

Not maintainer-requested; selected from the clawsweeper:queueable-fix backlog.

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: Overflow recovery duplicated role=user entries in the persisted session JSONL (issue Overflow recovery duplicates role=user messages in session JSONL, amplifying transcript growth #66443 captured one unique message stored as three byte-identical copies).
  • Real environment tested: Linux / Node 24, exercising the actual production functions rewriteTranscriptEntriesInSessionManager and rewriteTranscriptEntriesInState against a real in-memory SessionManager and persisted transcript-file-state — these functions are run for real, not mocked.
  • Exact steps or command run after this patch: node scripts/run-vitest.mjs src/agents/embedded-agent-runner/transcript-rewrite.test.ts
  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): copied terminal output. The regression reproduces the bug on unfixed main and passes after the fix, and added tests prove legitimate repeats survive:
unfixed main: × dedupes byte-identical replayed messages ...
              AssertionError: expected 2 to be 1   (the clone is re-appended)

with the fix: Test Files  1 passed (1)
              Tests       10 passed (10)           (clone collapsed; legitimate repeats kept)
  • Observed result after fix: after the overflow-style suffix replay, a byte-identical role=user clone is collapsed to one entry, while (a) two user messages with distinct timestamps, (b) byte-identical repeated assistant/tool-result entries, and (c) two distinct compactions are all preserved.
  • What was not tested: a live gateway context-overflow on a running OpenClaw deployment (see limitations).
  • Proof limitations or environment constraints: this is unit/integration-level proof against the real production functions, which CONTRIBUTING.md treats as supplemental rather than a live-setup behavior capture. I could not drive a live gateway into context overflow in this environment (it needs a full running OpenClaw + provider + a forced overflow, and the sandbox network blocked both a clean install and the crabbox check:changed gate). A maintainer with a live setup can confirm end-to-end behavior, or apply proof: override for this logic-only dedup fix; I am happy to add a live capture if someone can reproduce an overflow.
  • Before evidence (optional but encouraged): the failing assertion above (expected 2 to be 1) is the before-state captured from unfixed main.

Tests and validation

  • node scripts/run-vitest.mjs src/agents/embedded-agent-runner/transcript-rewrite.test.ts -> 10 passed (4 new).
  • Sibling callers: tool-result-truncation, context-engine-maintenance, compact.hooks, cli-runner.context-engine -> 126 passed.
  • tsgo -p tsconfig.core.json and core test types: no errors in the changed files (two unrelated pre-existing errors in src/config/io.ts and src/secrets/config-io.ts are present on main).
  • Regression coverage added in transcript-rewrite.test.ts: clone collapse (fails first with expected 2 to be 1, passes after); distinct compactions preserved; legitimate repeated user messages (distinct timestamps) preserved; byte-identical repeated assistant/tool-result entries preserved.

Risk checklist

  • Did user-visible behavior change? No — internal transcript maintenance that removes only byte-identical recovery clones.
  • Did config, environment, or migration behavior change? No.
  • Did security, auth, secrets, network, or tool execution behavior change? No.
  • Highest-risk area: collapsing genuinely distinct entries during replay (the merge-risk: session-state concern).
  • How is that risk mitigated? Dedup is constrained to role=user messages and compactions only; the full-payload hash (including timestamp) distinguishes a recovery clone from a legitimate repeat; compaction identity includes the remapped firstKeptEntryId; assistant/tool-result/custom entries are never deduped. Covered by the four regression tests above.

Current review state

  • Next action: maintainer review / ClawSweeper re-review (auto-triggered by this push).
  • Addressed ClawSweeper P1 "Preserve distinct replayed transcript entries": constrained the replay dedup to the proven user-message + compaction clone pattern (per the recommended "Constrain Replay Dedupe" option), restored exact-history for assistant/tool/custom entries, and added regression coverage proving legitimate repeated messages/tool entries survive. The dedup helper is shared by both the SessionManager and TranscriptFileState rewrite paths.
  • Still open: real behavior proof is unit-level only; a live-overflow capture needs a real setup or a maintainer proof: override.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels Jun 2, 2026
@clawsweeper

clawsweeper Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 2, 2026, 8:57 PM ET / 00:57 UTC.

Summary
The branch adds replay-identity dedupe to transcript rewrites for byte-identical user-message and compaction suffix entries, plus regression coverage for preserving legitimate repeats.

PR surface: Source +81, Tests +173. Total +254 across 2 files.

Reproducibility: yes. at source level. Current main replays every suffix entry after a transcript rewrite, and the linked issue provides persisted JSONL evidence of duplicate user rows and compaction chains; I did not execute the failing scenario in this read-only review.

Review metrics: none identified.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🐚 platinum hermit
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Add redacted real behavior proof from a persisted-session or live overflow path that shows before/after transcript counts.
  • [P1] Ask for an explicit maintainer proof override if a live overflow capture is not practical for this logic-only fix.

Proof guidance:

  • [P1] Needs real behavior proof before merge: The PR body provides copied test output only; before merge, the contributor should add redacted real behavior proof such as terminal output/logs from a persisted-session script or live overflow run, then update the PR body to trigger re-review or ask a maintainer to comment @clawsweeper re-review.

Risk before merge

  • [P1] Real behavior proof is still only copied unit/integration test output; external PR policy needs redacted live or persisted-session behavior proof, or a maintainer proof override, before merge.
  • [P1] Replay dedupe changes session-state semantics: if the identity is too broad, a legitimate user or compaction suffix entry could be collapsed or remapped.

Maintainer options:

  1. Add persisted-session proof before merge (recommended)
    Ask for redacted terminal output, logs, or a recording from a real OpenClaw checkout that rewrites a persisted JSONL session and shows the clone collapsing while legitimate repeats remain.
  2. Accept a maintainer proof override
    A maintainer can explicitly accept the source-level and test proof as enough for this logic-only replay fix, while owning the remaining session-state risk.
  3. Pause for the broader transcript track
    If maintainers want this handled with the larger duplicate-transcript umbrella, pause this PR and keep the linked issue as the canonical remaining work.

Next step before merge

  • [P1] The remaining blocker is proof or maintainer override, not a narrow code defect that an automated repair PR should change.

Security
Cleared: The diff adds a Node built-in crypto hash helper and tests, with no dependency, workflow, secret, network, or package-resolution changes.

Review details

Best possible solution:

Merge a narrow shared transcript-rewrite dedupe only after real persisted-session/live overflow proof is added or maintainers explicitly accept a proof override; keep the broader duplicate-transcript umbrella separate.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level. Current main replays every suffix entry after a transcript rewrite, and the linked issue provides persisted JSONL evidence of duplicate user rows and compaction chains; I did not execute the failing scenario in this read-only review.

Is this the best way to solve the issue?

Yes, this is the best layer for the scoped bug: the shared transcript rewrite helper is the common seam used by context-engine maintenance and tool-result truncation. The implementation is deliberately narrower than prior attempts, but it still needs real behavior proof before merge.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 8b546facaf49.

Label changes

Label changes:

  • add rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
  • add status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body provides copied test output only; before merge, the contributor should add redacted real behavior proof such as terminal output/logs from a persisted-session script or live overflow run, then update the PR body to trigger re-review or ask a maintainer to comment @clawsweeper re-review.
  • remove rating: 🌊 off-meta tidepool: Current PR rating is rating: 🦪 silver shellfish, so this older rating label is no longer current.

Label justifications:

  • P1: The PR targets a high-impact overflow recovery bug that can duplicate persisted user/session transcript state and worsen context overflows.
  • merge-risk: 🚨 session-state: The diff intentionally drops/remaps replayed transcript suffix entries, so an overly broad identity would affect persisted session history.
  • rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🐚 platinum hermit.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body provides copied test output only; before merge, the contributor should add redacted real behavior proof such as terminal output/logs from a persisted-session script or live overflow run, then update the PR body to trigger re-review or ask a maintainer to comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +81, Tests +173. Total +254 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 98 17 +81
Tests 1 173 0 +173
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 271 17 +254

What I checked:

  • Repository policy read: Root AGENTS.md and the scoped src/agents/AGENTS.md were read; the review applied the repo's high-confidence PR review, session-state, and real-behavior-proof requirements. (AGENTS.md:1, 8b546facaf49)
  • Current main replay behavior: Current main branches from the first rewritten entry and re-appends every suffix entry without replay identity dedupe, matching the bug mechanism described by the linked issue. (src/agents/embedded-agent-runner/transcript-rewrite.ts:230, 8b546facaf49)
  • PR replay identity scope: The PR computes replay identities only for user messages and compactions, then skips a suffix entry when the same identity has already been emitted in this rewrite pass. (src/agents/embedded-agent-runner/transcript-rewrite.ts:274, 2e18181289c6)
  • Regression coverage added: The added tests cover collapsing byte-identical user replay clones, preserving distinct timestamped user repeats, preserving assistant/tool-result repeats, and keeping compactions with different kept boundaries. (src/agents/embedded-agent-runner/transcript-rewrite.test.ts:299, 2e18181289c6)
  • Shared caller surface: Both context-engine maintenance and tool-result truncation call the shared transcript rewrite helpers, so the fix is at the common replay seam rather than a channel-specific workaround. (src/agents/embedded-agent-runner/tool-result-truncation.ts:702, 8b546facaf49)
  • Linked issue context: The linked issue is the canonical open core session/replay/idempotency track and includes concrete JSONL evidence of repeated user rows plus duplicated compaction/bootstrap chains.

Likely related people:

  • steipete: GitHub path history shows recent commits touching transcript rewrite, transcript file state, and agent-core session context, including the agent runtime internalization and core session goals work. (role: recent area contributor; confidence: high; commits: bb46b79d3c14, a509c48f0ea2, 85beee613c64; files: src/agents/embedded-agent-runner/transcript-rewrite.ts, src/agents/embedded-agent-runner/transcript-file-state.ts, packages/agent-core/src/harness/session/session.ts)
  • vincentkoc: Local blame marks the current checkout copy of the transcript rewrite and related file-state/truncation files as introduced in commit 2d11402. (role: current checkout provenance; confidence: medium; commits: 2d1140220869; files: src/agents/embedded-agent-runner/transcript-rewrite.ts, src/agents/embedded-agent-runner/transcript-file-state.ts, src/agents/embedded-agent-runner/tool-result-truncation.ts)
  • ooiuuii: Recent GitHub path history for tool-result truncation shows adjacent work on aggregate prompt tool-result bounding, which shares the recovery/truncation surface that calls transcript rewrite. (role: adjacent contributor; confidence: medium; commits: f49a3e4c266c; files: src/agents/embedded-agent-runner/tool-result-truncation.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels Jun 2, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels Jun 2, 2026
@dripsmvcp

Copy link
Copy Markdown
Author

@clawsweeper

Context-overflow recovery re-runs transcript suffix replay
(rewriteTranscriptEntriesInSessionManager / rewriteTranscriptEntriesInState)
without a replay-identity guard, so byte-identical role=user entries are
re-appended on every recovery pass. The persisted session JSONL then
accumulates duplicate messages, amplifying transcript growth and cascading
into further overflows.

Collapse only the proven overflow-clone pattern during replay: byte-identical
role=user messages (whose timestamp distinguishes a recovery clone from a
legitimate repeat) and byte-identical compactions (identity folds in the
remapped firstKeptEntryId so distinct kept boundaries survive). Exact suffix
history is preserved for assistant, tool-result, and custom entries, whose
identical payloads can be legitimate history.

Closes openclaw#66443
@dripsmvcp dripsmvcp force-pushed the fix/66443-transcript-rewrite-dedupe branch from f0591e3 to 2e18181 Compare June 3, 2026 00:39
@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. size: M status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Overflow recovery duplicates role=user messages in session JSONL, amplifying transcript growth

1 participant