fix: auto-compaction fires on fresh cached token counts (#66520)#66716
fix: auto-compaction fires on fresh cached token counts (#66520)#66716KeWang0622 wants to merge 2 commits into
Conversation
…reshold When a provider like Anthropic has a high prompt cache hit rate, totalTokens on the session entry includes cached tokens (input + cacheRead + cacheWrite) and is marked as fresh. Previously, runPreflightCompactionIfNeeded returned early when totalTokensFresh was true WITHOUT checking the compaction threshold, so auto-compaction never triggered — even at 153% of the context window. This change restructures the token resolution so that fresh persisted totals are checked against the compaction threshold (contextWindow - reserveTokens - softThreshold) before deciding whether to compact. The stale/transcript fallback path is preserved unchanged. Fixes openclaw#66520 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Greptile SummaryThis PR fixes a real bug where Confidence Score: 5/5
Reviews (1): Last reviewed commit: "fix: preflight compaction now fires when..." | Re-trigger Greptile |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 86352d78cb
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if (!shouldUseTranscriptFallback) { | ||
| // Fresh persisted tokens available — project forward with prompt estimate | ||
| const projectedFreshTokens = resolveEffectivePromptTokens( | ||
| freshPersistedTokens, |
There was a problem hiding this comment.
Require explicit freshness before using persisted token totals
The new fresh-token branch now runs preflight compaction when shouldUseTranscriptFallback is false, but that predicate still treats totalTokensFresh: undefined as fresh. For legacy/unknown session entries (where freshness is not guaranteed), this branch uses freshPersistedTokens directly and can trigger compaction from stale totals, causing unnecessary summarization on the next turn. This behavior is introduced by the new branch because these sessions previously returned early without compacting; please gate this path on totalTokensFresh === true (or fall back to transcript estimation when freshness is unknown).
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in c149fe4. Three changes to runPreflightCompactionIfNeeded:
-
Replaced
shouldUseTranscriptFallbackwithhasFreshPersistedTokensthat requirestotalTokensFresh === trueexplicitly. The previous predicate (entry.totalTokensFresh === false || !hasPersistedTotalTokens) treatedundefinedas "not needing fallback" becauseundefined === falseisfalse. Now whentotalTokensFreshisundefined(legacy) orfalse(known-stale), the function falls through to transcript estimation. -
Always run transcript estimation in the fallback branch. The old guard (
typeof freshPersistedTokens === "number") skipped transcript estimation for legacy sessions becauseresolveFreshSessionTotalTokenstreatstotalTokensFresh: undefinedas fresh and returns a number. Now the else branch unconditionally estimates from the transcript. -
Bail out before
shouldRunPreflightCompactionwhen freshness is unconfirmed. Even with the above fixes, if transcript estimation returnsundefined(e.g. no session file),shouldRunPreflightCompactionwould fall back toresolveFreshSessionTotalTokensinternally, which again treatsundefinedas fresh. Added an explicit guard: iftokenCountForCompactionis not a number andtotalTokensFresh !== true, return early.
Added a test case for totalTokensFresh: undefined (legacy sessions) that verifies compaction is not triggered from stale persisted totals. All 9 tests in the test suite pass, along with the related test files (followup-runner, commands-compact, agent-runner-direct-runtime-config, reply-state).
The fresh-token branch in runPreflightCompactionIfNeeded ran when shouldUseTranscriptFallback was false, but that predicate treated totalTokensFresh: undefined as NOT needing fallback (undefined === false evaluates to false). For legacy sessions where totalTokensFresh was never set, this caused the fresh-token path to use potentially stale persisted totals, triggering unnecessary compaction. Three changes: 1. Replace shouldUseTranscriptFallback with hasFreshPersistedTokens that requires totalTokensFresh === true explicitly. When undefined (legacy) or false (known-stale), falls through to transcript estimation. 2. In the transcript fallback branch, always run transcript estimation instead of skipping when resolveFreshSessionTotalTokens returns a value (that function also treats undefined freshness as fresh). 3. Add bail-out before shouldRunPreflightCompaction when transcript estimation returns no count and freshness is unconfirmed, preventing the gate function from falling back to stale totals via resolveFreshSessionTotalTokens. Resolves Codex P2 review on PR openclaw#66716. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Related work from PRtags group Title: Preflight compaction skips fresh token totals
|
|
Codex review: needs real behavior proof before merge. Reviewed June 7, 2026, 1:05 AM ET / 05:05 UTC. Summary PR surface: Source +45, Tests +348. Total +393 across 3 files. Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path. Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Risk before merge
Maintainer options:
Next step before merge
Review detailsBest possible solution: Retry the Codex review after fixing the execution failure. Do we have a high-confidence way to reproduce the issue? Unclear. The review failed before ClawSweeper could establish a reproduction path. Is this the best way to solve the issue? Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction. AGENTS.md: unclear because the file could not be read completely. Codex review notes: model gpt-5.5, reasoning high; reviewed against 1d2bebbb41bf. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +45, Tests +348. Total +393 across 3 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
|
This pull request has been automatically marked as stale due to inactivity. |
Summary
runPreflightCompactionIfNeededreturned early whentotalTokensFresh === truewithout checking the compaction threshold, so auto-compaction never triggered for sessions with fresh token counts — even at 153% of the context windowcacheReadfrom Anthropic prompt caching) are projected forward and checked againstcontextWindow - reserveTokens - softThresholdbefore deciding whether to compactDetails
When Anthropic's prompt cache absorbs nearly all tokens (100% hit rate),
derivePromptTokenscorrectly computesinput + cacheRead + cacheWrite(e.g. 99 + 305,000 + 0 = 305,099), and this is persisted astotalTokenswithtotalTokensFresh: true. However,runPreflightCompactionIfNeededhad this logic:The fix moves the threshold check into both branches (fresh and stale), using the fresh persisted value directly when available.
Edge cases verified
cacheRead=0, sototalTokensis justinput + cacheWrite, same as beforeTest plan
shouldRunPreflightCompactionunit tests: 100% cache hit triggers, below threshold skips, partial cache boundary correctrunPreflightCompactionIfNeededintegration tests: fresh tokens above threshold trigger compaction, below threshold skip, stale tokens use transcript fallback, heartbeat skipsFixes #66520
🤖 Generated with Claude Code