feat: register compaction retry hook to prevent cascade overflow#10220
feat: register compaction retry hook to prevent cascade overflow#102201kuna wants to merge 9 commits intoopenclaw:mainfrom
Conversation
package.json
Outdated
| "@mariozechner/pi-agent-core": "^0.52.6", | ||
| "@mariozechner/pi-ai": "^0.52.6", | ||
| "@mariozechner/pi-coding-agent": "^0.52.6", | ||
| "@mariozechner/pi-tui": "^0.52.6", |
There was a problem hiding this comment.
Dependency range breaks hook gating
This PR relies on a new Pi SDK API (setAutoCompactionRetryHook), but package.json loosens all Pi deps to ^0.52.6. If the hook lands in 0.52.7+ (as described), users can end up on a Pi version that still doesn’t have the hook (or has a different shape) while the code assumes “maybe supported”. This makes the safeguard unreliable in the exact scenario it’s meant to fix.
Recommend pinning the minimum Pi versions that actually include the hook (or bump the range to ^<first-version-with-hook>).
Prompt To Fix With AI
This is a comment left during a code review.
Path: package.json
Line: 111:114
Comment:
**Dependency range breaks hook gating**
This PR relies on a *new* Pi SDK API (`setAutoCompactionRetryHook`), but `package.json` loosens all Pi deps to `^0.52.6`. If the hook lands in `0.52.7+` (as described), users can end up on a Pi version that still doesn’t have the hook (or has a different shape) while the code assumes “maybe supported”. This makes the safeguard unreliable in the exact scenario it’s meant to fix.
Recommend pinning the **minimum** Pi versions that actually include the hook (or bump the range to `^<first-version-with-hook>`).
How can I resolve this? If you propose a fix, please make it concise.| const mutableSession = activeSession as unknown as { | ||
| _baseSystemPrompt?: string; | ||
| _rebuildSystemPrompt?: (toolNames: string[]) => string; | ||
| }; | ||
| const previousBasePrompt = mutableSession._baseSystemPrompt; | ||
| const previousRebuild = mutableSession._rebuildSystemPrompt; | ||
| applySystemPromptOverrideToSession(activeSession, getRetrySystemPromptText()); | ||
| restoreOneShotRetryPromptOverride = () => { | ||
| mutableSession._baseSystemPrompt = previousBasePrompt; | ||
| mutableSession._rebuildSystemPrompt = previousRebuild; | ||
| activeSession.agent.setSystemPrompt(previousBasePrompt ?? systemPromptText); | ||
| }; | ||
| }; |
There was a problem hiding this comment.
Prompt restore uses undefined
restoreOneShotRetryPromptOverride calls activeSession.agent.setSystemPrompt(previousBasePrompt ?? systemPromptText), but previousBasePrompt can be undefined if Pi’s internals don’t populate _baseSystemPrompt (or rename it). In that case, it resets the agent prompt to systemPromptText, but it also writes back mutableSession._baseSystemPrompt = previousBasePrompt (i.e. undefined), potentially leaving the session’s internal “base prompt” unset for subsequent _rebuildSystemPrompt calls.
This can break later prompt rebuilds (e.g. tool list changes) after a compaction event. Consider restoring to the current session base prompt when available (or only writing _baseSystemPrompt back if it was originally defined).
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.ts
Line: 521:533
Comment:
**Prompt restore uses undefined**
`restoreOneShotRetryPromptOverride` calls `activeSession.agent.setSystemPrompt(previousBasePrompt ?? systemPromptText)`, but `previousBasePrompt` can be `undefined` if Pi’s internals don’t populate `_baseSystemPrompt` (or rename it). In that case, it resets the agent prompt to `systemPromptText`, but it also writes back `mutableSession._baseSystemPrompt = previousBasePrompt` (i.e. `undefined`), potentially leaving the session’s internal “base prompt” unset for subsequent `_rebuildSystemPrompt` calls.
This can break later prompt rebuilds (e.g. tool list changes) after a compaction event. Consider restoring to the **current** session base prompt when available (or only writing `_baseSystemPrompt` back if it was originally defined).
How can I resolve this? If you propose a fix, please make it concise.| emitAgentEvent({ | ||
| runId: ctx.params.runId, | ||
| stream: "compaction", | ||
| data: { phase: "end", willRetry }, | ||
| data: { phase: "end", willRetry, retryCanceledMessage }, | ||
| }); | ||
| void ctx.params.onAgentEvent?.({ | ||
| stream: "compaction", | ||
| data: { phase: "end", willRetry }, | ||
| data: { phase: "end", willRetry, retryCanceledMessage }, | ||
| }); | ||
|
|
||
| if (!willRetry && retryCanceledMessage) { | ||
| // User-facing propagation: Pi succeeded compacting but refused to retry due to prompt sizing. | ||
| void ctx.params.onBlockReply?.({ text: retryCanceledMessage }); | ||
| } |
There was a problem hiding this comment.
User message emitted mid-stream
handleAutoCompactionEnd calls onBlockReply immediately when retryCanceledMessage is present. This happens on the compaction event stream, not the normal assistant response lifecycle, so it can interleave with other block buffering/chunking state and produce out-of-order user-visible output.
If onBlockReply is used by external messaging channels (which must only receive final replies), this risks sending a standalone message during an in-flight run. Consider routing this through the same “final reply” path you use for other user-facing errors (or gate it to internal UIs only).
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-subscribe.handlers.lifecycle.ts
Line: 54:67
Comment:
**User message emitted mid-stream**
`handleAutoCompactionEnd` calls `onBlockReply` immediately when `retryCanceledMessage` is present. This happens on the compaction event stream, not the normal assistant response lifecycle, so it can interleave with other block buffering/chunking state and produce out-of-order user-visible output.
If `onBlockReply` is used by external messaging channels (which must only receive final replies), this risks sending a standalone message during an in-flight run. Consider routing this through the same “final reply” path you use for other user-facing errors (or gate it to internal UIs only).
How can I resolve this? If you propose a fix, please make it concise.e046781 to
31ca12b
Compare
|
Fixed in #12988. This will go out in the next OpenClaw release. If you still see this after updating to the first release that includes #12988, please open a new issue with:
Link back here for context. |
Co-authored-by: Alyx <kunaclawd@gmail.com>
Co-authored-by: Alyx <kunaclawd@gmail.com>
Co-authored-by: Alyx <kunaclawd@gmail.com>
Co-authored-by: Alyx <kunaclawd@gmail.com>
Co-authored-by: Alyx <kunaclawd@gmail.com>
tsgo cannot track mutations through closures called asynchronously, so restoreOneShotRetryPromptOverride narrows to never at the finally block. Snapshot to a local const before calling to satisfy strict control-flow analysis.
tsgo narrows closure-mutated let variables to their init type (null),
making them uncallable. Wrap in a { current } container object which
tsgo does not narrow through property access, matching the React
useRef pattern. No runtime behavior change.
31ca12b to
2ade518
Compare
bfc1ccb to
f92900f
Compare
|
This pull request has been automatically marked as stale due to inactivity. |
Summary
Fixes #10613. Implements a compaction retry hook that prevents cascade overflow loops.
When auto-compaction fires during context overflow, the retry can immediately overflow again. This hook:
Dependencies
Requires
setAutoCompactionRetryHookfrom pi-coding-agent (PR badlogic/pi-mono#1318).Verification
pnpm tsgopasses clean with local pi-mono build