Skip to content

fix(compaction): preserve assistant boundary replies#90641

Draft
leno23 wants to merge 4 commits into
openclaw:mainfrom
leno23:fix/compaction-preserve-assistant-boundary-76729
Draft

fix(compaction): preserve assistant boundary replies#90641
leno23 wants to merge 4 commits into
openclaw:mainfrom
leno23:fix/compaction-preserve-assistant-boundary-76729

Conversation

@leno23

@leno23 leno23 commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Summary

  • preserve the summarized assistant reply immediately before a surviving pre-compaction user message when writing successor transcripts
  • move the successor compaction replay boundary to that preserved assistant so agent context stays summary -> assistant -> user instead of hiding the reply
  • strip stale thinking signatures from preserved boundary assistants along with other pre-compaction kept messages

Fixes #76729

Tests

  • node scripts/run-vitest.mjs src/agents/embedded-agent-runner/compaction-successor-transcript.test.ts
  • git diff --check

Real behavior proof

  • Behavior addressed: successor transcripts now keep the assistant reply immediately before the first surviving pre-compaction user message so replay preserves the visible summary -> assistant -> user boundary.
  • Real environment tested: local OpenClaw checkout at PR head ba959f19332c40305f241e15ef483c4a8923cc7 on macOS 15.5 with Node v22.22.1 and pnpm 11.2.2.
  • Exact steps or command run after this patch: node scripts/run-vitest.mjs src/agents/embedded-agent-runner/compaction-successor-transcript.test.ts; git diff --check upstream/main...HEAD.
  • Evidence after fix: terminal output from the PR-head checkout:
$ node scripts/run-vitest.mjs src/agents/embedded-agent-runner/compaction-successor-transcript.test.ts
✓  agents  src/agents/embedded-agent-runner/compaction-successor-transcript.test.ts (12 tests) 132ms
Test Files  1 passed (1)
Tests  12 passed (12)
[test] passed 1 Vitest shard in 6.82s

$ git diff --check upstream/main...HEAD
# no output; exit code 0
  • Observed result after fix: the compaction successor transcript shard passed at PR head, including the preserved assistant-boundary replay coverage; the PR diff has no whitespace errors.
  • What was not tested: a live model session performing an end-to-end compaction run; this proof uses the local embedded-agent successor-transcript runtime harness.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: S triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels Jun 5, 2026
@clawsweeper

clawsweeper Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 12, 2026, 9:23 AM ET / 13:23 UTC.

Summary
The PR preserves an immediately preceding summarized assistant message in successor transcripts, reanchors compaction replay to it, strips stale thinking signatures, and adds focused compaction-boundary tests.

PR surface: Source +69, Tests +84. Total +153 across 2 files.

Reproducibility: yes. at source level. Put a model_change, thinking_level_change, or session_info entry between a summarized assistant and the retained user; the PR scan stops at that marker although SessionManager does not replay it as a conversational message, so the assistant is removed.

Review metrics: none identified.

Merge readiness
Overall: 🦪 silver shellfish
Proof: 🦪 silver shellfish
Patch quality: 🦪 silver shellfish
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Skip non-context state entries while preserving hard boundaries for appendable custom content, with focused regression coverage.
  • [P1] Add redacted live compaction proof showing the reply remains visible and the following turn receives correct context.

Proof guidance:

  • [P1] Needs real behavior proof before merge: The PR supplies focused Vitest harness output only and explicitly omits a live end-to-end compaction run; add redacted terminal/log, recording, or linked-artifact proof from a real session, then update the PR body to trigger review or ask a maintainer to comment @clawsweeper re-review.

Mantis proof suggestion
A real Control UI or webchat compaction run would directly show whether the disappearing reply and adjacent-turn batching behavior is fixed. A maintainer can ask Mantis to capture proof by posting a new PR comment that starts with the OpenClaw Mantis account mention, followed by:

visual task: verify a Control UI or webchat session with transcript compaction keeps the assistant reply visible after rotation and does not batch the next user turn with the prior one.

Risk before merge

  • [P1] Merging this head can still drop the assistant boundary whenever a non-context state entry sits between the assistant and retained user, preserving the user-visible reply-loss and combined-turn failure mode.
  • [P1] The patch changes persisted transcript retention and replay anchors, so a subtle boundary mismatch can alter future model context even when focused serialization tests pass.
  • [P1] The external PR still lacks after-fix proof from a real model session performing transcript compaction.

Maintainer options:

  1. Fix state-marker boundary handling (recommended)
    Update boundary discovery to skip non-context state entries, preserve appendable custom-content boundaries, and add a regression test before merge.
  2. Pause for contributor proof
    Keep the draft open until a redacted real session demonstrates the assistant reply survives compaction and the next user turn receives the correct context.

Next step before merge

  • [P1] The author should repair the concrete boundary defect and supply real after-fix session proof; automation cannot satisfy the external contributor proof gate.

Security
Cleared: The source/test-only diff does not change dependencies, workflows, permissions, secrets, downloads, publishing, or other security-sensitive execution surfaces.

Review findings

  • [P1] Skip non-context state entries when finding the assistant boundary — src/agents/embedded-agent-runner/compaction-successor-transcript.ts:261-263
Review details

Best possible solution:

Classify intervening entries by canonical replay effect: skip non-context state and metadata entries while treating appendable custom_message and branch_summary content as hard boundaries, then add the focused state-marker regression and prove a real rotated session plus its next model turn.

Do we have a high-confidence way to reproduce the issue?

Yes at source level. Put a model_change, thinking_level_change, or session_info entry between a summarized assistant and the retained user; the PR scan stops at that marker although SessionManager does not replay it as a conversational message, so the assistant is removed.

Is this the best way to solve the issue?

No, not yet. Preserving the assistant boundary is the right fix direction, but the implementation must distinguish non-context state markers from appendable custom context rather than treating every non-message entry identically.

Full review comments:

  • [P1] Skip non-context state entries when finding the assistant boundary — src/agents/embedded-agent-runner/compaction-successor-transcript.ts:261-263
    This loop breaks on every non-message entry, but model_change, thinking_level_change, and session_info do not append a conversational message in buildSessionContext. If one sits between the summarized assistant and the retained user, the assistant is still removed and the linked disappearing-reply bug remains. Skip non-context state/metadata entries while keeping appendable custom_message and branch_summary entries as hard boundaries, and add that regression case.
    Confidence: 0.98

Overall correctness: patch is incorrect
Overall confidence: 0.98

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against 0fc5a57a3440.

Label changes

Label changes:

  • add rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🦪 silver shellfish.
  • remove rating: 🧂 unranked krab: Current PR rating is rating: 🦪 silver shellfish, so this older rating label is no longer current.

Label justifications:

  • P1: The remaining defect can still remove visible assistant replies and cause adjacent user turns to be interpreted together after compaction.
  • merge-risk: 🚨 session-state: The PR changes persisted transcript membership and replay anchors, and the remaining boundary mismatch can produce incorrect agent context.
  • rating: 🦪 silver shellfish: Overall readiness is 🦪 silver shellfish; proof is 🦪 silver shellfish and patch quality is 🦪 silver shellfish.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR supplies focused Vitest harness output only and explicitly omits a live end-to-end compaction run; add redacted terminal/log, recording, or linked-artifact proof from a real session, then update the PR body to trigger review or ask a maintainer to comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +69, Tests +84. Total +153 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 73 4 +69
Tests 1 86 2 +84
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 159 6 +153

Acceptance criteria:

  • [P1] node scripts/run-vitest.mjs src/agents/embedded-agent-runner/compaction-successor-transcript.test.ts.
  • [P1] git diff --check.
  • [P1] Redacted real model-session compaction run showing successor transcript contents and the next turn's context.

What I checked:

Likely related people:

  • Peter Steinberger: Current history and prior issue triage connect this contributor to the compaction successor implementation and its focused tests. (role: recent area contributor; confidence: high; commits: 22e4289d3f05, e93216080aa1, 00d8d7ead059; files: src/agents/embedded-agent-runner/compaction-successor-transcript.ts, src/agents/embedded-agent-runner/compaction-successor-transcript.test.ts)
  • 兰之: Authored the current-main SessionManager replay implementation that defines which entry types become model context around a compaction boundary. (role: recent session replay contributor; confidence: high; commits: 9a6c71a47d; files: packages/agent-core/src/harness/session/session.ts)
  • njuboy11: Reported the disappearing-reply behavior with concrete transcript evidence and previously proposed an assistant-preservation fix for the same path. (role: reporter and prior proposed-fix contributor; confidence: medium; commits: 02e903152e0f; files: src/agents/pi-embedded-runner/compaction-successor-transcript.ts, src/auto-reply/reply/reply-run-registry.compaction-regression.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ba959f1933

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +261 to +262
if (candidate?.type !== "message") {
continue;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop re-exposing summarized custom messages

When a surviving user message is preceded by a summarized assistant but has a custom_message or branch_summary between them, this loop skips the intervening non-message entry and still marks the assistant as the replay boundary. Since resolveSuccessorCompactionFirstKeptEntryId then moves firstKeptEntryId back to that assistant, buildSessionContext replays every appendable entry from there to the compaction (packages/agent-core/src/harness/session/session.ts:81-95), so summarized custom/branch content that previously stayed hidden behind the old first-kept user is sent to the model again after transcript rotation.

Useful? React with 👍 / 👎.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 5, 2026
@openclaw-barnacle openclaw-barnacle Bot added proof: supplied External PR includes structured after-fix real behavior proof. and removed triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. labels Jun 5, 2026
@clawsweeper clawsweeper Bot added P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. labels Jun 5, 2026
@leno23 leno23 marked this pull request as draft June 12, 2026 13:18
@clawsweeper clawsweeper Bot added rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. labels Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. size: S status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feishu replies disappear from webchat after compaction rotation (buildSuccessorEntries drops assistant messages)

1 participant