Skip to content

feat: register compaction retry hook to prevent cascade overflow#10220

Open
1kuna wants to merge 9 commits intoopenclaw:mainfrom
1kuna:feat/compaction-retry-safeguard
Open

feat: register compaction retry hook to prevent cascade overflow#10220
1kuna wants to merge 9 commits intoopenclaw:mainfrom
1kuna:feat/compaction-retry-safeguard

Conversation

@1kuna
Copy link

@1kuna 1kuna commented Feb 6, 2026

Summary

Fixes #10613. Implements a compaction retry hook that prevents cascade overflow loops.

When auto-compaction fires during context overflow, the retry can immediately overflow again. This hook:

  • Intercepts the retry before it fires
  • Calculates a safe token budget
  • Downgrades the prompt to a slim one-shot version if needed
  • Cancels the retry if even the slim prompt won't fit

Dependencies

Requires setAutoCompactionRetryHook from pi-coding-agent (PR badlogic/pi-mono#1318).

Verification

  • pnpm tsgo passes clean with local pi-mono build
  • Tests included

@openclaw-barnacle openclaw-barnacle bot added the agents Agent runtime and tooling label Feb 6, 2026
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

package.json Outdated
Comment on lines +111 to +114
"@mariozechner/pi-agent-core": "^0.52.6",
"@mariozechner/pi-ai": "^0.52.6",
"@mariozechner/pi-coding-agent": "^0.52.6",
"@mariozechner/pi-tui": "^0.52.6",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependency range breaks hook gating

This PR relies on a new Pi SDK API (setAutoCompactionRetryHook), but package.json loosens all Pi deps to ^0.52.6. If the hook lands in 0.52.7+ (as described), users can end up on a Pi version that still doesn’t have the hook (or has a different shape) while the code assumes “maybe supported”. This makes the safeguard unreliable in the exact scenario it’s meant to fix.

Recommend pinning the minimum Pi versions that actually include the hook (or bump the range to ^<first-version-with-hook>).

Prompt To Fix With AI
This is a comment left during a code review.
Path: package.json
Line: 111:114

Comment:
**Dependency range breaks hook gating**

This PR relies on a *new* Pi SDK API (`setAutoCompactionRetryHook`), but `package.json` loosens all Pi deps to `^0.52.6`. If the hook lands in `0.52.7+` (as described), users can end up on a Pi version that still doesn’t have the hook (or has a different shape) while the code assumes “maybe supported”. This makes the safeguard unreliable in the exact scenario it’s meant to fix.

Recommend pinning the **minimum** Pi versions that actually include the hook (or bump the range to `^<first-version-with-hook>`).

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +521 to +533
const mutableSession = activeSession as unknown as {
_baseSystemPrompt?: string;
_rebuildSystemPrompt?: (toolNames: string[]) => string;
};
const previousBasePrompt = mutableSession._baseSystemPrompt;
const previousRebuild = mutableSession._rebuildSystemPrompt;
applySystemPromptOverrideToSession(activeSession, getRetrySystemPromptText());
restoreOneShotRetryPromptOverride = () => {
mutableSession._baseSystemPrompt = previousBasePrompt;
mutableSession._rebuildSystemPrompt = previousRebuild;
activeSession.agent.setSystemPrompt(previousBasePrompt ?? systemPromptText);
};
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prompt restore uses undefined

restoreOneShotRetryPromptOverride calls activeSession.agent.setSystemPrompt(previousBasePrompt ?? systemPromptText), but previousBasePrompt can be undefined if Pi’s internals don’t populate _baseSystemPrompt (or rename it). In that case, it resets the agent prompt to systemPromptText, but it also writes back mutableSession._baseSystemPrompt = previousBasePrompt (i.e. undefined), potentially leaving the session’s internal “base prompt” unset for subsequent _rebuildSystemPrompt calls.

This can break later prompt rebuilds (e.g. tool list changes) after a compaction event. Consider restoring to the current session base prompt when available (or only writing _baseSystemPrompt back if it was originally defined).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.ts
Line: 521:533

Comment:
**Prompt restore uses undefined**

`restoreOneShotRetryPromptOverride` calls `activeSession.agent.setSystemPrompt(previousBasePrompt ?? systemPromptText)`, but `previousBasePrompt` can be `undefined` if Pi’s internals don’t populate `_baseSystemPrompt` (or rename it). In that case, it resets the agent prompt to `systemPromptText`, but it also writes back `mutableSession._baseSystemPrompt = previousBasePrompt` (i.e. `undefined`), potentially leaving the session’s internal “base prompt” unset for subsequent `_rebuildSystemPrompt` calls.

This can break later prompt rebuilds (e.g. tool list changes) after a compaction event. Consider restoring to the **current** session base prompt when available (or only writing `_baseSystemPrompt` back if it was originally defined).

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 54 to +67
emitAgentEvent({
runId: ctx.params.runId,
stream: "compaction",
data: { phase: "end", willRetry },
data: { phase: "end", willRetry, retryCanceledMessage },
});
void ctx.params.onAgentEvent?.({
stream: "compaction",
data: { phase: "end", willRetry },
data: { phase: "end", willRetry, retryCanceledMessage },
});

if (!willRetry && retryCanceledMessage) {
// User-facing propagation: Pi succeeded compacting but refused to retry due to prompt sizing.
void ctx.params.onBlockReply?.({ text: retryCanceledMessage });
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User message emitted mid-stream

handleAutoCompactionEnd calls onBlockReply immediately when retryCanceledMessage is present. This happens on the compaction event stream, not the normal assistant response lifecycle, so it can interleave with other block buffering/chunking state and produce out-of-order user-visible output.

If onBlockReply is used by external messaging channels (which must only receive final replies), this risks sending a standalone message during an in-flight run. Consider routing this through the same “final reply” path you use for other user-facing errors (or gate it to internal UIs only).

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-subscribe.handlers.lifecycle.ts
Line: 54:67

Comment:
**User message emitted mid-stream**

`handleAutoCompactionEnd` calls `onBlockReply` immediately when `retryCanceledMessage` is present. This happens on the compaction event stream, not the normal assistant response lifecycle, so it can interleave with other block buffering/chunking state and produce out-of-order user-visible output.

If `onBlockReply` is used by external messaging channels (which must only receive final replies), this risks sending a standalone message during an in-flight run. Consider routing this through the same “final reply” path you use for other user-facing errors (or gate it to internal UIs only).

How can I resolve this? If you propose a fix, please make it concise.

@Takhoffman
Copy link
Contributor

Fixed in #12988.

This will go out in the next OpenClaw release.

If you still see this after updating to the first release that includes #12988, please open a new issue with:

  • your OpenClaw version
  • channel (Telegram/Slack/etc)
  • the exact prompt/response that got rewritten
  • whether Web UI showed the full text vs the channel being rewritten
  • relevant logs around send/normalize (if available)

Link back here for context.

1kuna and others added 9 commits February 11, 2026 17:54
Co-authored-by: Alyx <kunaclawd@gmail.com>
Co-authored-by: Alyx <kunaclawd@gmail.com>
Co-authored-by: Alyx <kunaclawd@gmail.com>
tsgo cannot track mutations through closures called asynchronously,
so restoreOneShotRetryPromptOverride narrows to never at the finally
block. Snapshot to a local const before calling to satisfy strict
control-flow analysis.
tsgo narrows closure-mutated let variables to their init type (null),
making them uncallable. Wrap in a { current } container object which
tsgo does not narrow through property access, matching the React
useRef pattern. No runtime behavior change.
@openclaw-barnacle
Copy link

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

@openclaw-barnacle openclaw-barnacle bot added the stale Marked as stale due to inactivity label Mar 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling stale Marked as stale due to inactivity

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compaction retry cascade causes context overflow loop

3 participants