fix(compaction): preserve assistant boundary replies by leno23 · Pull Request #90641 · openclaw/openclaw

leno23 · 2026-06-05T10:55:17Z

Summary

preserve the summarized assistant reply immediately before a surviving pre-compaction user message when writing successor transcripts
move the successor compaction replay boundary to that preserved assistant so agent context stays summary -> assistant -> user instead of hiding the reply
strip stale thinking signatures from preserved boundary assistants along with other pre-compaction kept messages

Tests

node scripts/run-vitest.mjs src/agents/embedded-agent-runner/compaction-successor-transcript.test.ts
git diff --check

Real behavior proof

Behavior addressed: successor transcripts now keep the assistant reply immediately before the first surviving pre-compaction user message so replay preserves the visible summary -> assistant -> user boundary.
Real environment tested: local OpenClaw checkout at PR head ba959f19332c40305f241e15ef483c4a8923cc7 on macOS 15.5 with Node v22.22.1 and pnpm 11.2.2.
Exact steps or command run after this patch: node scripts/run-vitest.mjs src/agents/embedded-agent-runner/compaction-successor-transcript.test.ts; git diff --check upstream/main...HEAD.
Evidence after fix: terminal output from the PR-head checkout:

$ node scripts/run-vitest.mjs src/agents/embedded-agent-runner/compaction-successor-transcript.test.ts
✓  agents  src/agents/embedded-agent-runner/compaction-successor-transcript.test.ts (12 tests) 132ms
Test Files  1 passed (1)
Tests  12 passed (12)
[test] passed 1 Vitest shard in 6.82s

$ git diff --check upstream/main...HEAD
# no output; exit code 0

Observed result after fix: the compaction successor transcript shard passed at PR head, including the preserved assistant-boundary replay coverage; the PR diff has no whitespace errors.
What was not tested: a live model session performing an end-to-end compaction run; this proof uses the local embedded-agent successor-transcript runtime harness.

clawsweeper · 2026-06-05T10:57:36Z

Codex review: needs real behavior proof before merge. Reviewed June 12, 2026, 9:23 AM ET / 13:23 UTC.

Summary
The PR preserves an immediately preceding summarized assistant message in successor transcripts, reanchors compaction replay to it, strips stale thinking signatures, and adds focused compaction-boundary tests.

PR surface: Source +69, Tests +84. Total +153 across 2 files.

Reproducibility: yes. at source level. Put a model_change, thinking_level_change, or session_info entry between a summarized assistant and the retained user; the PR scan stops at that marker although SessionManager does not replay it as a conversational message, so the assistant is removed.

Review metrics: none identified.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🦪 silver shellfish
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

Skip non-context state entries while preserving hard boundaries for appendable custom content, with focused regression coverage.
[P1] Add redacted live compaction proof showing the reply remains visible and the following turn receives correct context.

Proof guidance:

[P1] Needs real behavior proof before merge: The PR supplies focused Vitest harness output only and explicitly omits a live end-to-end compaction run; add redacted terminal/log, recording, or linked-artifact proof from a real session, then update the PR body to trigger review or ask a maintainer to comment @clawsweeper re-review.

Mantis proof suggestion
A real Control UI or webchat compaction run would directly show whether the disappearing reply and adjacent-turn batching behavior is fixed. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

visual task: verify a Control UI or webchat session with transcript compaction keeps the assistant reply visible after rotation and does not batch the next user turn with the prior one.

Risk before merge

[P1] Merging this head can still drop the assistant boundary whenever a non-context state entry sits between the assistant and retained user, preserving the user-visible reply-loss and combined-turn failure mode.
[P1] The patch changes persisted transcript retention and replay anchors, so a subtle boundary mismatch can alter future model context even when focused serialization tests pass.
[P1] The external PR still lacks after-fix proof from a real model session performing transcript compaction.

Maintainer options:

Fix state-marker boundary handling (recommended)
Update boundary discovery to skip non-context state entries, preserve appendable custom-content boundaries, and add a regression test before merge.
Pause for contributor proof
Keep the draft open until a redacted real session demonstrates the assistant reply survives compaction and the next user turn receives the correct context.

Next step before merge

[P1] The author should repair the concrete boundary defect and supply real after-fix session proof; automation cannot satisfy the external contributor proof gate.

Security
Cleared: The source/test-only diff does not change dependencies, workflows, permissions, secrets, downloads, publishing, or other security-sensitive execution surfaces.

Review findings

[P1] Skip non-context state entries when finding the assistant boundary — src/agents/embedded-agent-runner/compaction-successor-transcript.ts:261-263

Review details

Best possible solution:

Classify intervening entries by canonical replay effect: skip non-context state and metadata entries while treating appendable custom_message and branch_summary content as hard boundaries, then add the focused state-marker regression and prove a real rotated session plus its next model turn.

Do we have a high-confidence way to reproduce the issue?

Yes at source level. Put a model_change, thinking_level_change, or session_info entry between a summarized assistant and the retained user; the PR scan stops at that marker although SessionManager does not replay it as a conversational message, so the assistant is removed.

Is this the best way to solve the issue?

No, not yet. Preserving the assistant boundary is the right fix direction, but the implementation must distinguish non-context state markers from appendable custom context rather than treating every non-message entry identically.

Full review comments:

[P1] Skip non-context state entries when finding the assistant boundary — src/agents/embedded-agent-runner/compaction-successor-transcript.ts:261-263
This loop breaks on every non-message entry, but model_change, thinking_level_change, and session_info do not append a conversational message in buildSessionContext. If one sits between the summarized assistant and the retained user, the assistant is still removed and the linked disappearing-reply bug remains. Skip non-context state/metadata entries while keeping appendable custom_message and branch_summary entries as hard boundaries, and add that regression case.
Confidence: 0.98

Overall correctness: patch is incorrect
Overall confidence: 0.98

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against 0fc5a57a3440.

Label changes

Label changes:

add rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🦪 silver shellfish.
remove rating: 🧂 unranked krab: Current PR rating is rating: 🦪 silver shellfish, so this older rating label is no longer current.

Label justifications:

P1: The remaining defect can still remove visible assistant replies and cause adjacent user turns to be interpreted together after compaction.
merge-risk: 🚨 session-state: The PR changes persisted transcript membership and replay anchors, and the remaining boundary mismatch can produce incorrect agent context.
rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🦪 silver shellfish.
status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR supplies focused Vitest harness output only and explicitly omits a live end-to-end compaction run; add redacted terminal/log, recording, or linked-artifact proof from a real session, then update the PR body to trigger review or ask a maintainer to comment @clawsweeper re-review.

Evidence reviewed

PR surface:

Source +69, Tests +84. Total +153 across 2 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	1	73	4	+69
Tests	1	86	2	+84
Docs	0	0	0	0
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	2	159	6	+153

Acceptance criteria:

[P1] node scripts/run-vitest.mjs src/agents/embedded-agent-runner/compaction-successor-transcript.test.ts.
[P1] git diff --check.
[P1] Redacted real model-session compaction run showing successor transcript contents and the next turn's context.

What I checked:

Blocking boundary mismatch: The latest helper stops scanning whenever the immediately preceding entry is not a message, so model_change, thinking_level_change, or session_info entries suppress assistant preservation even though they do not produce replay messages. (src/agents/embedded-agent-runner/compaction-successor-transcript.ts:261, f29b52331539)
Canonical replay behavior: Current SessionManager appends only message, custom_message, and non-empty branch_summary entries into model context; model and thinking state changes are tracked separately and are not conversational boundaries. (packages/agent-core/src/harness/session/session.ts:45, 0fc5a57a3440)
Follow-up improvements: The four-commit branch adds tests and guards for appendable custom boundaries and inactive branch ordering, resolving the earlier broad reanchoring concern but not the non-context state-marker case. (src/agents/embedded-agent-runner/compaction-successor-transcript.test.ts:220, f29b52331539)
Current-main reproduction basis: Current main removes summarized message entries before firstKeptEntryId, while SessionManager replays only from that retained boundary, which source-reproduces the missing-assistant behavior described by the linked issue. (src/agents/embedded-agent-runner/compaction-successor-transcript.ts:145, 0fc5a57a3440)
Real proof gap: The PR body reports 12 focused Vitest tests and git diff --check, and explicitly states that no live model session performing end-to-end compaction was tested. (f29b52331539)
Feature provenance: Git history attributes the current compaction successor implementation and adjacent regression work primarily to Peter Steinberger; the current replay consumer was recently refactored by 兰之. (src/agents/embedded-agent-runner/compaction-successor-transcript.ts:111, 22e4289d3f05)

Likely related people:

Peter Steinberger: Current history and prior issue triage connect this contributor to the compaction successor implementation and its focused tests. (role: recent area contributor; confidence: high; commits: 22e4289d3f05, e93216080aa1, 00d8d7ead059; files: src/agents/embedded-agent-runner/compaction-successor-transcript.ts, src/agents/embedded-agent-runner/compaction-successor-transcript.test.ts)
兰之: Authored the current-main SessionManager replay implementation that defines which entry types become model context around a compaction boundary. (role: recent session replay contributor; confidence: high; commits: 9a6c71a47d; files: packages/agent-core/src/harness/session/session.ts)
njuboy11: Reported the disappearing-reply behavior with concrete transcript evidence and previously proposed an assistant-preservation fix for the same path. (role: reporter and prior proposed-fix contributor; confidence: medium; commits: 02e903152e0f; files: src/agents/pi-embedded-runner/compaction-successor-transcript.ts, src/auto-reply/reply/reply-run-registry.compaction-regression.test.ts)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ba959f1933

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-05T10:58:01Z

+      if (candidate?.type !== "message") {
+        continue;


Stop re-exposing summarized custom messages

When a surviving user message is preceded by a summarized assistant but has a custom_message or branch_summary between them, this loop skips the intervening non-message entry and still marks the assistant as the replay boundary. Since resolveSuccessorCompactionFirstKeptEntryId then moves firstKeptEntryId back to that assistant, buildSessionContext replays every appendable entry from there to the compaction (packages/agent-core/src/harness/session/session.ts:81-95), so summarized custom/branch content that previously stayed hidden behind the old first-kept user is sent to the model again after transcript rotation.

Useful? React with 👍 / 👎.

fix(compaction): preserve assistant boundary replies

ba959f1

openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels Jun 5, 2026

chatgpt-codex-connector Bot reviewed Jun 5, 2026

View reviewed changes

clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 5, 2026

openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels Jun 5, 2026

clawsweeper Bot added P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels Jun 5, 2026

leno23 added 3 commits June 8, 2026 11:19

fix(compaction): respect custom boundaries before kept users

9169c88

fix(compaction): respect custom boundaries before kept users

07e23e0

test(compaction): assert assistant boundary without summary text match

f29b523

github-actions Bot mentioned this pull request Jun 8, 2026

🦞 OpenClaw 生态日报 2026-06-08 litang9/big_model_radar#32

Open

leno23 marked this pull request as draft June 12, 2026 13:18

clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(compaction): preserve assistant boundary replies#90641

fix(compaction): preserve assistant boundary replies#90641
leno23 wants to merge 4 commits into
openclaw:mainfrom
leno23:fix/compaction-preserve-assistant-boundary-76729

leno23 commented Jun 5, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

leno23 commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Real behavior proof

Uh oh!

clawsweeper Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

leno23 commented Jun 5, 2026 •

edited

Loading

clawsweeper Bot commented Jun 5, 2026 •

edited

Loading