fix(compaction): make compaction guard content-aware to prevent false cancellations in heartbeat sessions#42119
Conversation
Greptile SummaryThis PR fixes a false-negative in the compaction guard and embedded-runner compaction path by replacing a naive role-only predicate ( Key observations:
Confidence Score: 4/5
Last reviewed commit: a613363 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a613363973
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
Addressed the review suggestions in 8a8c943:
Re-ran the targeted tests:
Result: 76 tests passed. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8a8c943163
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
8a8c943 to
62c013c
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 62c013c5bf
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| !preparation.messagesToSummarize.some((message, index, messages) => | ||
| isRealConversationMessage(message, messages, index), | ||
| ) |
There was a problem hiding this comment.
Include split-turn prefix in real-conversation gate
The early cancel check only scans preparation.messagesToSummarize, so split-turn runs can be cancelled even when the actual user request is present in preparation.turnPrefixMessages and would be summarized later in this same handler. Fresh evidence in this commit is that the new content-aware gate at messagesToSummarize.some((message, index, messages) => isRealConversationMessage(...)) was added here, while split-turn prefix handling still occurs later (preparation.isSplitTurn path), which can now incorrectly skip compaction and let history keep growing in heartbeat-heavy sessions.
Useful? React with 👍 / 👎.
245581a to
73877ef
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 73877ef9d6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| !preparation.messagesToSummarize.some((message, index, messages) => | ||
| isRealConversationMessage(message, messages, index), | ||
| ) |
There was a problem hiding this comment.
Include split-turn prefix in the real-conversation gate
This cancellation check only scans preparation.messagesToSummarize, but split-turn compaction later summarizes preparation.turnPrefixMessages as well, so a real user request living in the prefix can be ignored here. In heartbeat-heavy windows where messagesToSummarize is boilerplate, this now returns cancel: true and skips compaction even though meaningful content exists in the same compaction operation, which lets session history keep growing instead of being summarized.
Useful? React with 👍 / 👎.
| if (type !== "text") { | ||
| if (typeof type === "string" && TOOL_ONLY_BLOCK_TYPES.has(type)) { | ||
| continue; | ||
| } | ||
| sawMeaningfulNonTextBlock = true; |
There was a problem hiding this comment.
Ignore reasoning-only blocks when classifying meaningful turns
This branch treats every non-text block except tool-call variants as meaningful, so assistant turns that only contain provider reasoning/thinking metadata are counted as “real conversation.” Because transcript sanitization preserves thinking blocks for most providers by default (dropThinkingBlocks is usually false), heartbeat/NO_REPLY sessions on reasoning-capable models can still bypass this safeguard and continue compacting on non-conversational traffic.
Useful? React with 👍 / 👎.
73877ef to
dd5de49
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dd5de4913f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| !preparation.messagesToSummarize.some((message, index, messages) => | ||
| isRealConversationMessage(message, messages, index), | ||
| ) |
There was a problem hiding this comment.
Include turnPrefixMessages in real-conversation gate
The new cancellation guard only checks preparation.messagesToSummarize, so split-turn compactions can now be cancelled even when the actual user request lives in preparation.turnPrefixMessages. This is a regression because the same handler later summarizes prefix messages when preparation.isSplitTurn is true (see the turnPrefixMessages summarization path), so heartbeat-heavy windows in messagesToSummarize will incorrectly return { cancel: true } and skip compaction despite meaningful content being part of the current compaction run.
Useful? React with 👍 / 👎.
eafa81c to
fa82b87
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fa82b870c8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (isSilentReplyText(trimmed)) { | ||
| return false; | ||
| } | ||
| const heartbeat = stripHeartbeatToken(trimmed, { mode: "message" }); |
There was a problem hiding this comment.
Strip markup before checking NO_REPLY sentinel
hasMeaningfulText rejects silent turns only through isSilentReplyText(trimmed), but that matcher is exact-token only and does not handle wrapped forms like **NO_REPLY** or <b>NO_REPLY</b>. In heartbeat sessions where models format the sentinel, these assistant turns are misclassified as meaningful conversation, so the new guard still allows compaction on boilerplate traffic instead of canceling. This path already normalizes markup for HEARTBEAT_OK, so silent-token detection needs the same normalization to keep behavior consistent.
Useful? React with 👍 / 👎.
fa82b87 to
99d33bc
Compare
|
@jalehman Are you fixing this PR? |
99d33bc to
4a2a27f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4a2a27fcdb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| !preparation.messagesToSummarize.some((message, index, messages) => | ||
| isRealConversationMessage(message, messages, index), | ||
| ) |
There was a problem hiding this comment.
Include split-turn prefix messages in real-conversation gate
This early-cancel check only scans preparation.messagesToSummarize, but the same handler later summarizes preparation.turnPrefixMessages when preparation.isSplitTurn is true. In split-turn sessions where the meaningful user request is in the prefix and the summarize window contains heartbeat boilerplate (HEARTBEAT_OK/NO_REPLY), this condition now returns { cancel: true } and skips compaction even though real conversation is present in the current compaction operation, allowing history growth instead of summarization.
Useful? React with 👍 / 👎.
jalehman
left a comment
There was a problem hiding this comment.
Changes Requested
Thank you for this contribution! We've reviewed this PR and have some feedback before it can move forward.
Markdown-wrapped heartbeat boilerplate still counts as real conversation
The new helper still treats markdown-wrapped sentinel content like **HEARTBEAT_OK** and **NO_REPLY** as meaningful history. That means heartbeat-only sessions can still reach the false-cancellation path this PR is trying to prevent.
Please update the real-conversation check so it normalizes message text before deciding whether a turn is meaningful, using the same heartbeat/silent-reply handling that the rest of the compaction path relies on. The important part is that lightweight formatting around boilerplate tokens should not make them count as real conversation.
Once this is addressed, we'll re-review. Feel free to ask questions if anything is unclear.
4a2a27f to
ee04354
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ee04354692
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if ( | ||
| !session.messages.some((message, index, messages) => | ||
| hasRealConversationContent(message, messages, index), | ||
| ) | ||
| ) { |
There was a problem hiding this comment.
Let safeguard mode write its fallback compaction boundary
When agents.defaults.compaction.mode is safeguard, buildEmbeddedExtensionFactories() always registers compactionSafeguardExtension, and that handler deliberately returns a minimal compaction result instead of cancelling when there is no real conversation because otherwise _checkCompaction re-triggers in a loop (src/agents/pi-extensions/compaction-safeguard.ts:711-723). This new early return skips session.compact() entirely for heartbeat-only histories, so the safeguard never gets a chance to write that boundary, and overflow recovery later treats the result as a failed compaction. In safeguard-mode heartbeat sessions, this reintroduces the stuck/retrigger behavior the extension was added to prevent.
Useful? React with 👍 / 👎.
| if (type !== "text") { | ||
| if (typeof type === "string" && TOOL_ONLY_BLOCK_TYPES.has(type)) { | ||
| continue; | ||
| } | ||
| sawMeaningfulNonTextBlock = true; |
There was a problem hiding this comment.
Ignore Anthropic toolResult blocks in meaningful-content checks
Anthropic transcript repair keeps tool results inside role: "user" messages as content[].type === "toolResult" (src/agents/pi-embedded-helpers/turns.ts:49-58), but this branch treats every non-text block other than tool-call variants as meaningful. That means Claude sessions whose "user" turns are only tool-result payloads still count as real conversation, so the new guard will keep compacting heartbeat/tool-only traffic instead of skipping it. toolResult blocks need to be excluded here the same way tool-call blocks are.
Useful? React with 👍 / 👎.
…conversation Fixes openclaw#40727 The compaction safeguard and the embedded-runner compaction path both used a naive role check (user | assistant | toolResult) to decide whether a session contains real conversation worth summarising. In long-running heartbeat sessions nearly every message satisfies that check, yet the *content* is boilerplate: heartbeat polls produce empty or HEARTBEAT_OK user turns and NO_REPLY assistant turns. The guard therefore never fires, the session grows past the 200K ceiling, and the agent goes silent. Fix: replace the role-only predicate with a content-aware one: isRealConversationMessage / hasRealConversationContent (two separate copies – safeguard.ts and compact.ts – each get the same treatment) Rules: • user / assistant messages are real only when their text content is non-empty AND not a pure boilerplate sentinel (HEARTBEAT_OK or NO_REPLY). Non-text blocks (images, attachments) are always real. • toolResult messages are real only when a meaningful user message appears in the previous 20 messages of the window, so tool output that was triggered by a real human ask still counts. Both predicates are exported via __testing / direct export so the existing and new unit tests can cover them without mocks. Tests added: compaction-safeguard.test.ts – two new cases: • tool result linked to HEARTBEAT_OK user turn → not real • tool result linked to meaningful user ask → real compact.hooks.test.ts – two new integration cases: • boilerplate-only session → skips compaction (reason: no real conversation messages) • meaningful user ask + tool result → compaction runs
…ll-only assistant blocks
ee04354 to
3429643
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3429643315
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const start = Math.max(0, index - TOOL_RESULT_REAL_CONVERSATION_LOOKBACK); | ||
| for (let i = index - 1; i >= start; i -= 1) { | ||
| const candidate = messages[i]; | ||
| if (!candidate || candidate.role !== "user") { | ||
| continue; |
There was a problem hiding this comment.
Link tool results to summarized asks, not only in-window users
After one compaction cycle, the original user ask is no longer present as a message entry (session-truncation.ts removes summarized messages before firstKeptEntryId). A later tool-only span from that same task therefore has no role === "user" inside this lookback, so every toolResult here is misclassified as non-conversational. In compact.ts that turns overflow recovery into compacted: false, and in safeguard mode it falls back to previousSummary, dropping the new tool output from future context. Any long-running task that compacts once before the assistant replies can now lose state or fail overflow recovery.
Useful? React with 👍 / 👎.
| const type = (block as { type?: unknown }).type; | ||
| if (type !== "text") { | ||
| // Tool-call metadata and internal reasoning blocks do not make a | ||
| // heartbeat-only transcript count as real conversation. | ||
| if (typeof type === "string" && NON_CONVERSATION_BLOCK_TYPES.has(type)) { |
There was a problem hiding this comment.
Treat input/output_text blocks as textual content
OpenAI session history already accepts input_text and output_text blocks as normal message text (openai-ws-stream.ts handles both in contentToText()/contentToOpenAIParts()), but this classifier only recognizes literal type === "text". A boilerplate message like [{type:"output_text", text:"NO_REPLY"}] or [{type:"input_text", text:"HEARTBEAT_OK"}] therefore falls into the non-text branch and counts as real conversation, so the new safeguard still compacts boilerplate-only OpenAI/Responses transcripts instead of skipping them.
Useful? React with 👍 / 👎.
… cancellations in heartbeat sessions (openclaw#42119) Merged via squash. Prepared head SHA: 3429643 Co-authored-by: samzong <13782141+samzong@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman
… cancellations in heartbeat sessions (openclaw#42119) Merged via squash. Prepared head SHA: 3429643 Co-authored-by: samzong <13782141+samzong@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman
… cancellations in heartbeat sessions (openclaw#42119) Merged via squash. Prepared head SHA: 3429643 Co-authored-by: samzong <13782141+samzong@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman
… cancellations in heartbeat sessions (openclaw#42119) Merged via squash. Prepared head SHA: 3429643 Co-authored-by: samzong <13782141+samzong@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman
… cancellations in heartbeat sessions (openclaw#42119) Merged via squash. Prepared head SHA: 3429643 Co-authored-by: samzong <13782141+samzong@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman
… cancellations in heartbeat sessions (openclaw#42119) Merged via squash. Prepared head SHA: 3429643 Co-authored-by: samzong <13782141+samzong@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman
… cancellations in heartbeat sessions (openclaw#42119) Merged via squash. Prepared head SHA: 3429643 Co-authored-by: samzong <13782141+samzong@users.noreply.github.com> Co-authored-by: jalehman <550978+jalehman@users.noreply.github.com> Reviewed-by: @jalehman
Summary
Fixes #40727
The compaction safeguard (
compaction-safeguard.ts) and the embedded-runner compaction path (compact.ts) both used a naive role check to decide whether a session contains real conversation worth summarising:In long-running heartbeat sessions, nearly every message satisfies this check by role alone, yet the content is boilerplate: heartbeat polls produce empty or
HEARTBEAT_OKuser turns andNO_REPLYassistant turns. The guard therefore never fires, the session grows past the 200 K ceiling, and the agent goes silent.Root cause
isRealConversationMessage/hasRealConversationContentchecked role only; it never inspected the text content of the message.Fix
Replace the role-only predicate with a content-aware one in both files:
HEARTBEAT_OKorNO_REPLY). Non-text blocks (images, attachments) always count as real.The change is additive: sessions with real human messages continue to compact exactly as before.
Tests
Added in
compaction-safeguard.test.ts:HEARTBEAT_OKuser turn → not real → compaction cancelledAdded in
compact.hooks.test.ts:HEARTBEAT_OK+ tool output) → skips compaction with reasonno real conversation messagesAll 74 affected tests pass.
Checklist