Agents: add compaction modes (warn, error, none) with proactive conte…#54585
Agents: add compaction modes (warn, error, none) with proactive conte…#54585fierai wants to merge 7 commits into
Conversation
Greptile SummaryThis PR adds three new compaction modes ( Key changes:
Confidence Score: 4/5
Prompt To Fix All With AIThis is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.ts
Line: 2917-2925
Comment:
**Warn mode silently becomes a no-op when estimation fails**
When `estimateTokens` throws in `warn` mode, the catch block logs and falls through — leaving `totalTokens` at `0`. Execution then continues to the threshold check outside the try/catch:
```typescript
if (totalTokens > threshold) { // 0 > threshold → always false
```
Since `threshold` is always `≥ 0`, this check can never fire and the user never sees the `"🧹 Context near limit, use /compact"` message. A user who configured `mode=warn` expecting a guardrail gets silent fail-open behavior instead.
In `error` mode the failure is correctly surfaced (re-thrown), but `warn` mode has no equivalent escalation. Consider returning early with the warning message (or a fallback warning) when estimation fails in `warn` mode, rather than silently proceeding.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.ts
Line: 2904-2916
Comment:
**Double `as unknown as AgentMessage` casts suggest type mismatch**
The triple cast (`msg as unknown as AgentMessage`) applied to every element of `activeSession.messages` and to the synthesised system/user objects indicates the types don't align with what `estimateTokens` expects. If the actual runtime shape differs from `AgentMessage` in ways that matter to the estimator (e.g. array vs. string `content`, missing `role` discriminant), the returned count could be systematically wrong — leading to false-positive or false-negative guard triggers without any thrown exception.
It's worth confirming that `estimateTokens` handles the exact shapes produced by `activeSession.messages`, and if they differ, either adapt the objects before passing them in or add a lightweight adapter to avoid relying on the cast.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Agents: add compaction modes (warn, erro..." | Re-trigger Greptile |
| } catch (err) { | ||
| log.warn(`[compaction-guard] token estimation failed: ${String(err)}`); | ||
| if (currentCompactionMode === "error") { | ||
| throw new Error( | ||
| `[compaction-guard] token estimation failed for mode=error. Error: ${String(err)}`, | ||
| { cause: err }, | ||
| ); | ||
| } | ||
| } |
There was a problem hiding this comment.
Warn mode silently becomes a no-op when estimation fails
When estimateTokens throws in warn mode, the catch block logs and falls through — leaving totalTokens at 0. Execution then continues to the threshold check outside the try/catch:
if (totalTokens > threshold) { // 0 > threshold → always falseSince threshold is always ≥ 0, this check can never fire and the user never sees the "🧹 Context near limit, use /compact" message. A user who configured mode=warn expecting a guardrail gets silent fail-open behavior instead.
In error mode the failure is correctly surfaced (re-thrown), but warn mode has no equivalent escalation. Consider returning early with the warning message (or a fallback warning) when estimation fails in warn mode, rather than silently proceeding.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.ts
Line: 2917-2925
Comment:
**Warn mode silently becomes a no-op when estimation fails**
When `estimateTokens` throws in `warn` mode, the catch block logs and falls through — leaving `totalTokens` at `0`. Execution then continues to the threshold check outside the try/catch:
```typescript
if (totalTokens > threshold) { // 0 > threshold → always false
```
Since `threshold` is always `≥ 0`, this check can never fire and the user never sees the `"🧹 Context near limit, use /compact"` message. A user who configured `mode=warn` expecting a guardrail gets silent fail-open behavior instead.
In `error` mode the failure is correctly surfaced (re-thrown), but `warn` mode has no equivalent escalation. Consider returning early with the warning message (or a fallback warning) when estimation fails in `warn` mode, rather than silently proceeding.
How can I resolve this? If you propose a fix, please make it concise.| totalTokens = activeSession.messages.reduce( | ||
| (sum, msg) => sum + estimateTokens(msg as unknown as AgentMessage), | ||
| 0, | ||
| ); | ||
| totalTokens += estimateTokens({ | ||
| role: "system", | ||
| content: systemPromptText, | ||
| } as unknown as AgentMessage); | ||
| totalTokens += estimateTokens({ | ||
| role: "user", | ||
| content: effectivePrompt, | ||
| images: imageResult.images, | ||
| } as unknown as AgentMessage); |
There was a problem hiding this comment.
Double
as unknown as AgentMessage casts suggest type mismatch
The triple cast (msg as unknown as AgentMessage) applied to every element of activeSession.messages and to the synthesised system/user objects indicates the types don't align with what estimateTokens expects. If the actual runtime shape differs from AgentMessage in ways that matter to the estimator (e.g. array vs. string content, missing role discriminant), the returned count could be systematically wrong — leading to false-positive or false-negative guard triggers without any thrown exception.
It's worth confirming that estimateTokens handles the exact shapes produced by activeSession.messages, and if they differ, either adapt the objects before passing them in or add a lightweight adapter to avoid relying on the cast.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/agents/pi-embedded-runner/run/attempt.ts
Line: 2904-2916
Comment:
**Double `as unknown as AgentMessage` casts suggest type mismatch**
The triple cast (`msg as unknown as AgentMessage`) applied to every element of `activeSession.messages` and to the synthesised system/user objects indicates the types don't align with what `estimateTokens` expects. If the actual runtime shape differs from `AgentMessage` in ways that matter to the estimator (e.g. array vs. string `content`, missing `role` discriminant), the returned count could be systematically wrong — leading to false-positive or false-negative guard triggers without any thrown exception.
It's worth confirming that `estimateTokens` handles the exact shapes produced by `activeSession.messages`, and if they differ, either adapt the objects before passing them in or add a lightweight adapter to avoid relying on the cast.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 74c4a3ab3e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const reserveTokens = settingsManager.getCompactionReserveTokens(); | ||
| const contextWindow = params.model.contextWindow ?? DEFAULT_CONTEXT_TOKENS; | ||
| const threshold = Math.max(0, contextWindow - reserveTokens); | ||
|
|
||
| if (totalTokens > threshold) { |
There was a problem hiding this comment.
Bound reserve headroom before warn/error threshold checks
The proactive compaction guard computes threshold as contextWindow - reserveTokens and only clamps the result to zero, but reserveTokens is globally floored to 20,000 by default in the settings layer. For any allowed context window/cap below that floor (for example 16k–19,999 via model metadata or agents.defaults.contextTokens), this makes threshold become 0, so warn/error mode will fire on nearly every non-empty prompt rather than only when context is actually near full. Please clamp the effective reserve against the active context window (or reuse the effective prompt budget) before applying this check.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 40ec5af640
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const mode = params.cfg?.agents?.defaults?.compaction?.mode; | ||
| return mode === "warn" || mode === "error" || mode === "none"; |
There was a problem hiding this comment.
Honor
none mode in overflow recovery
Treating mode === "none" as “disable Pi auto-compaction” here does not disable the separate overflow-recovery compaction loop in src/agents/pi-embedded-runner/run.ts (the branch that calls contextEngine.compact after context-overflow errors). In practice, a session configured with agents.defaults.compaction.mode=none can still auto-compact and retry once it hits model limits, which contradicts the new mode’s documented behavior (“letting it fail at the model limit”) and makes this config mode ineffective.
Useful? React with 👍 / 👎.
…penclaw into feat/compaction-modes # Conflicts: # CHANGELOG.md
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c2e4f10ebc
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| totalTokens = activeSession.messages.reduce( | ||
| (sum, msg) => sum + estimateTokens(msg as unknown as AgentMessage), | ||
| 0, |
There was a problem hiding this comment.
Exclude tool-result details from proactive token estimation
The new proactive guard sums estimateTokens over raw activeSession.messages, which can include toolResult.details payloads that are not part of the model-visible transcript and may be very large. That inflates totalTokens (or can trigger estimator failures), causing warn/error modes to block normal turns even when actual prompt context is within budget. The compaction path already strips these details before token accounting (src/agents/compaction.ts), so this guard should apply the same sanitization before checking thresholds.
Useful? React with 👍 / 👎.
| return { | ||
| aborted: false, | ||
| timedOut: false, | ||
| timedOutDuringCompaction: false, | ||
| promptError: null, | ||
| sessionIdUsed: activeSession.sessionId, | ||
| messagesSnapshot: activeSession.messages.slice(), | ||
| assistantTexts: ["🧹 Context near limit, use /compact"], |
There was a problem hiding this comment.
Keep llm_output hooks on warn-mode early exits
This early return in the warn guard exits before the shared post-processing path that emits llm_output hooks (later in this function). As a result, guard-generated assistant warnings are sent to users but never surfaced to hook subscribers/telemetry that rely on llm_output, creating inconsistent plugin behavior for any session using compaction.mode=warn. Returning through the normal result path (instead of returning inline here) would preserve hook semantics.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 29c1b73ee6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const threshold = Math.max(0, contextWindow - reserveTokens); | ||
|
|
||
| if (totalTokens > threshold) { |
There was a problem hiding this comment.
Apply token safety margin before warn/error threshold check
The proactive guard compares raw estimateTokens totals against the reserve threshold, but this estimator is already treated elsewhere in the codebase as an under-approximation that needs headroom. Without a safety margin here, warn/error can be skipped on large prompts (especially code- or multibyte-heavy transcripts), so the turn still reaches provider-side context overflow instead of stopping early as these modes promise.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6aa4730249
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| log.warn( | ||
| `[compaction-guard] context near limit (mode=warn): tokens=${totalTokens} limit=${contextWindow} threshold=${threshold} reserve=${reserveTokens}`, | ||
| ); | ||
| return { |
There was a problem hiding this comment.
Route warn-mode guard through shared attempt epilogue
Returning directly from this guard path exits runEmbeddedAttempt before the shared post-turn epilogue runs, so finalizeAttemptContextEngineTurn(...) and agent_end hooks are skipped for every compaction.mode=warn near-limit turn even though a user-visible assistant warning is produced. In deployments that rely on context-engine after-turn maintenance or agent_end telemetry/plugins, this creates inconsistent state and missing side effects; this branch should flow through the normal result path instead of returning inline.
Useful? React with 👍 / 👎.
| @@ -29,6 +31,8 @@ Docs: https://docs.openclaw.ai | |||
|
|
|||
| ### Changes | |||
|
|
|||
| - Agents: add compaction modes (`warn`, `error`, `none`) with a proactive context guard that stops execution before reaching the model's limit, preventing cryptic provider errors and preserving control during long-running sessions. | |||
There was a problem hiding this comment.
Keep changelog entry only in the active release block
This line was inserted into historical release sections (for example under ## 2026.3.24) and repeated many times across past versions, which rewrites old release notes instead of documenting the change once in the active block. That violates the repository rule in AGENTS.md to append entries only in the active version section, and it can misstate version history for users while creating unnecessary churn/conflicts in future changelog updates.
Useful? React with 👍 / 👎.
6aa4730 to
6176332
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 61763323cd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // Proactive context guard for warn/error modes | ||
| const currentCompactionMode = resolveCompactionMode(params.config); | ||
| let skipPromptForCompactionGuard = false; | ||
| if (currentCompactionMode === "warn" || currentCompactionMode === "error") { |
There was a problem hiding this comment.
Skip llm_input hook when guard blocks the model call
The proactive compaction guard is evaluated after runLlmInput(...), so in compaction.mode=warn|error near-limit turns (or warn-mode estimator failures) the code can short-circuit before activeSession.prompt(...) while still emitting an llm_input hook event. That makes hook subscribers observe phantom model requests, which can skew telemetry/billing pipelines or trigger side effects even though no provider call occurred. Gate llm_input emission behind the guard result (or run the guard before the hook) so hook semantics stay aligned with actual LLM invocations.
Useful? React with 👍 / 👎.
…lse warnings on models with small context windows
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c1a56f720d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (params.onBlockReply) { | ||
| void params.onBlockReply({ text: warnMsg }); | ||
| } | ||
| assistantTexts.push(warnMsg); | ||
| } else if (currentCompactionMode === "error") { |
There was a problem hiding this comment.
Prevent warn mode from sending prompts after guard warning
In runEmbeddedAttempt, the warn branch only emits a warning (onBlockReply + assistantTexts.push) but never flips skipPromptForCompactionGuard, so the later prompt path still executes and calls the model. With agents.defaults.compaction.mode="warn", near-limit turns therefore still hit the LLM instead of stopping as documented, and can still overflow since auto-compaction is disabled for this mode.
Useful? React with 👍 / 👎.
|
This pull request has been automatically marked as stale due to inactivity. |
|
Codex review: needs real behavior proof before merge. Reviewed June 10, 2026, 1:01 AM ET / 05:01 UTC. Summary PR surface: Source +105, Docs +6, Generated +12. Total +123 across 9 files. Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path. Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Risk before merge
Maintainer options:
Next step before merge
Review detailsBest possible solution: Retry the Codex review after fixing the execution failure. Do we have a high-confidence way to reproduce the issue? Unclear. The review failed before ClawSweeper could establish a reproduction path. Is this the best way to solve the issue? Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction. AGENTS.md: unclear because the file could not be read completely. Codex review notes: model gpt-5.5, reasoning high; reviewed against b4cdd9211957. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +105, Docs +6, Generated +12. Total +123 across 9 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
Codex review: found issues before merge. What this changes: This PR extends agent compaction config/schema/help with Maintainer follow-up before merge: This is an open implementation PR with useful source work, but the remaining action is maintainer/product review and author rework across runtime semantics, config contract, overflow recovery, hooks, tests, changelog, and unrelated skill scope rather than an autonomous replacement fix. Review findings:
Review detailsBest possible solution: A mergeable version should first settle the product semantics for each mode, then implement them consistently across exported config types, strict/generated schemas, docs/help, Pi mode resolution, auto-compaction disabling, pre-prompt guard behavior, overflow recovery, hooks/finalization, and focused tests. The compaction work should stay isolated from unrelated bundled skill additions. Full review comments:
Overall correctness: patch is incorrect Acceptance criteria:
What I checked:
Likely related people:
Remaining risk / open question:
Codex review notes: model gpt-5.5, reasoning high; reviewed against e46dccb35374. |
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
|
This pull request has been automatically marked as stale due to inactivity. |
Summary
auto-compaction, or they encounter cryptic provider-specific "context
window exceeded" errors when the window is full.
management to prevent silent history loss and to receive clear, actionable
warnings before a session fails.
by a proactive context guard that estimates tokens before each LLM call.
Integrated Greptile feedback to ensure the guard is robust even if token
estimation fails.
safeguard modes remains untouched.
Change Type (select all)
Scope (select all touched areas)
Linked Issue/PR
Root Cause / Regression History (if applicable)
Regression Test Plan (if applicable)
message and error mode throws before hitting the LLM.
live environment to ensure the user-visible messages appear correctly in
the messaging channel.
User-visible / Behavior Changes
"error", and "none".
instead of the agent failing or compacting automatically.
Security Impact (required)
Repro + Verification
Environment
Steps
Expected
Actual
warning without calling the LLM.
Evidence
(mode=warn) logs in gateway.log.
Human Verification (required)
thrown), and none mode (auto-compaction disabled).
failures; error mode now re-throws with cause, and warn mode provides a
fallback warning instead of failing open.
histories, though estimateTokens is generally fast.
Review Conversations
this PR.
maintainer judgment.
Compatibility / Migration
Failure Recovery (if this breaks)
"default" or "safeguard".
Risks and Mitigations
falls back to a safe warning message; in error mode, it provides a
descriptive error to the user.