fix: auto-compaction fires on fresh cached token counts (#66520) by KeWang0622 · Pull Request #66716 · openclaw/openclaw

KeWang0622 · 2026-04-14T17:48:45Z

Summary

Bug: runPreflightCompactionIfNeeded returned early when totalTokensFresh === true without checking the compaction threshold, so auto-compaction never triggered for sessions with fresh token counts — even at 153% of the context window
Root cause: The early-return optimization (skip transcript estimation when fresh data is available) accidentally bypassed the threshold comparison entirely
Fix: Restructure token resolution so fresh persisted totals (which include cacheRead from Anthropic prompt caching) are projected forward and checked against contextWindow - reserveTokens - softThreshold before deciding whether to compact

Details

When Anthropic's prompt cache absorbs nearly all tokens (100% hit rate), derivePromptTokens correctly computes input + cacheRead + cacheWrite (e.g. 99 + 305,000 + 0 = 305,099), and this is persisted as totalTokens with totalTokensFresh: true. However, runPreflightCompactionIfNeeded had this logic:

if (!shouldUseTranscriptFallback) {
    return entry;  // BUG: skips threshold check entirely
}

The fix moves the threshold check into both branches (fresh and stale), using the fresh persisted value directly when available.

Edge cases verified

100% cache hit (Anthropic): totalTokens=305k, contextWindow=200k — compaction now fires
0% cache hit: Stale tokens fall through to transcript estimation path — behavior unchanged
Partial cache: 180k total with 166k threshold — fires correctly
Below threshold: 50k total with 166k threshold — no compaction, as expected
Heartbeat/CLI: Still skipped regardless of token count
Non-Anthropic providers: No change — providers without caching have cacheRead=0, so totalTokens is just input + cacheWrite, same as before

Test plan

shouldRunPreflightCompaction unit tests: 100% cache hit triggers, below threshold skips, partial cache boundary correct
runPreflightCompactionIfNeeded integration tests: fresh tokens above threshold trigger compaction, below threshold skip, stale tokens use transcript fallback, heartbeat skips
All existing compaction tests pass unchanged
Session usage persistence tests pass (65 tests)
Followup runner tests pass (19 tests)
Preemptive compaction tests pass (9 tests)

Fixes #66520

🤖 Generated with Claude Code

…reshold When a provider like Anthropic has a high prompt cache hit rate, totalTokens on the session entry includes cached tokens (input + cacheRead + cacheWrite) and is marked as fresh. Previously, runPreflightCompactionIfNeeded returned early when totalTokensFresh was true WITHOUT checking the compaction threshold, so auto-compaction never triggered — even at 153% of the context window. This change restructures the token resolution so that fresh persisted totals are checked against the compaction threshold (contextWindow - reserveTokens - softThreshold) before deciding whether to compact. The stale/transcript fallback path is preserved unchanged. Fixes openclaw#66520 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-04-14T17:51:44Z

Greptile Summary

This PR fixes a real bug where runPreflightCompactionIfNeeded returned early for sessions with fresh token counts (totalTokensFresh === true), skipping the threshold comparison entirely. The fix restructures the token-resolution logic so fresh persisted totals — which correctly include cacheRead from Anthropic's prompt cache — are projected forward and checked against the compaction threshold before deciding whether to compact, matching the behavior already in place for the stale/transcript-fallback path.

Confidence Score: 5/5

This PR is safe to merge — it fixes a well-scoped compaction bypass with no regressions on existing paths.
The fix is minimal and surgically correct: it replaces an unconditional early return with a threshold check in the fresh-token branch while leaving the stale/transcript-fallback path byte-identical. The new tests validate all documented edge cases (100% cache hit, below threshold, partial cache, heartbeat bypass, stale fallback), and existing tests are described as passing unchanged. No security, data-integrity, or API-contract concerns.
No files require special attention.

_{Reviews (1): Last reviewed commit: "fix: preflight compaction now fires when..." | Re-trigger Greptile}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 86352d78cb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-14T17:53:52Z

+  if (!shouldUseTranscriptFallback) {
+    // Fresh persisted tokens available — project forward with prompt estimate
+    const projectedFreshTokens = resolveEffectivePromptTokens(
+      freshPersistedTokens,


Require explicit freshness before using persisted token totals

The new fresh-token branch now runs preflight compaction when shouldUseTranscriptFallback is false, but that predicate still treats totalTokensFresh: undefined as fresh. For legacy/unknown session entries (where freshness is not guaranteed), this branch uses freshPersistedTokens directly and can trigger compaction from stale totals, causing unnecessary summarization on the next turn. This behavior is introduced by the new branch because these sessions previously returned early without compacting; please gate this path on totalTokensFresh === true (or fall back to transcript estimation when freshness is unknown).

Useful? React with 👍 / 👎.

Fixed in c149fe4. Three changes to runPreflightCompactionIfNeeded:

Replaced shouldUseTranscriptFallback with hasFreshPersistedTokens that requires totalTokensFresh === true explicitly. The previous predicate (entry.totalTokensFresh === false || !hasPersistedTotalTokens) treated undefined as "not needing fallback" because undefined === false is false. Now when totalTokensFresh is undefined (legacy) or false (known-stale), the function falls through to transcript estimation.

Always run transcript estimation in the fallback branch. The old guard (typeof freshPersistedTokens === "number") skipped transcript estimation for legacy sessions because resolveFreshSessionTotalTokens treats totalTokensFresh: undefined as fresh and returns a number. Now the else branch unconditionally estimates from the transcript.

Bail out before shouldRunPreflightCompaction when freshness is unconfirmed. Even with the above fixes, if transcript estimation returns undefined (e.g. no session file), shouldRunPreflightCompaction would fall back to resolveFreshSessionTotalTokens internally, which again treats undefined as fresh. Added an explicit guard: if tokenCountForCompaction is not a number and totalTokensFresh !== true, return early.

Added a test case for totalTokensFresh: undefined (legacy sessions) that verifies compaction is not triggered from stale persisted totals. All 9 tests in the test suite pass, along with the related test files (followup-runner, commands-compact, agent-runner-direct-runtime-config, reply-state).

The fresh-token branch in runPreflightCompactionIfNeeded ran when shouldUseTranscriptFallback was false, but that predicate treated totalTokensFresh: undefined as NOT needing fallback (undefined === false evaluates to false). For legacy sessions where totalTokensFresh was never set, this caused the fresh-token path to use potentially stale persisted totals, triggering unnecessary compaction. Three changes: 1. Replace shouldUseTranscriptFallback with hasFreshPersistedTokens that requires totalTokensFresh === true explicitly. When undefined (legacy) or false (known-stale), falls through to transcript estimation. 2. In the transcript fallback branch, always run transcript estimation instead of skipping when resolveFreshSessionTotalTokens returns a value (that function also treats undefined freshness as fresh). 3. Add bail-out before shouldRunPreflightCompaction when transcript estimation returns no count and freshness is unconfirmed, preventing the gate function from falling back to stale totals via resolveFreshSessionTotalTokens. Resolves Codex P2 review on PR openclaw#66716. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

prtags · 2026-04-23T12:20:51Z

Related work from PRtags group infinite-dodo-7d3e

Title: Preflight compaction skips fresh token totals

Number	Title
#63892	Title unavailable
#64384	fix(reply): gate preflight compaction fast-path on token threshold (#63892)
#65600	Title unavailable
#65622	fix(agents): reevaluate preflight compaction on fresh totals
#66520	Title unavailable
#66716*	fix: auto-compaction fires on fresh cached token counts (#66520)

* This PR

clawsweeper · 2026-04-27T04:45:37Z

Codex review: needs real behavior proof before merge. Reviewed June 7, 2026, 1:05 AM ET / 05:05 UTC.

Summary
Review failed before ClawSweeper could summarize the requested change.

PR surface: Source +45, Tests +348. Total +393 across 3 files.

Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path.

Review metrics: none identified.

Merge readiness
Overall: 🌊 off-meta tidepool
Proof: 🌊 off-meta tidepool
Patch quality: 🌊 off-meta tidepool
Result: rating does not apply to this item.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Risk before merge

[P1] No close action taken because the review did not complete.

Maintainer options:

Decide the mitigation before merge
Retry the Codex review after fixing the execution failure.
Pause or close
Do not merge this PR until maintainers decide whether the risk is worth taking.

Next step before merge

[P1] Review did not complete, so no work-lane recommendation was made.

Review details

Best possible solution:

Retry the Codex review after fixing the execution failure.

Do we have a high-confidence way to reproduce the issue?

Unclear. The review failed before ClawSweeper could establish a reproduction path.

Is this the best way to solve the issue?

Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction.

AGENTS.md: unclear because the file could not be read completely.

Codex review notes: model gpt-5.5, reasoning high; reviewed against 1d2bebbb41bf.

Label changes

Label changes:

remove P1: Current review triage priority is none.
remove merge-risk: 🚨 session-state: Current PR review selected no merge-risk labels.

Label justifications:

rating: 🌊 off-meta tidepool: Overall readiness is 🌊 off-meta tidepool; proof is 🌊 off-meta tidepool and patch quality is 🌊 off-meta tidepool.

Evidence reviewed

PR surface:

Source +45, Tests +348. Total +393 across 3 files.

View PR surface stats

Area	Files	Added	Removed	Net
Source	1	67	22	+45
Tests	2	349	1	+348
Docs	0	0	0	0
Config	0	0	0	0
Generated	0	0	0	0
Other	0	0	0	0
Total	3	416	23	+393

What I checked:

failure reason: codex execution failed.
codex failure detail: Codex review failed for this PR with exit 1.
codex stdout: Per-item Codex failure; continuing with the rest of the shard.

Likely related people:

unknown: Codex failed before it could trace repository history. (role: review did not complete; confidence: low)

What the crustacean ranks mean

🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works

ClawSweeper keeps one durable marker-backed review comment per issue or PR.
Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
Maintainers can also comment @clawsweeper review to request a fresh review only.
Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

clawsweeper · 2026-05-20T22:18:23Z

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?

The egg game starts only after the PR passes the real-behavior proof check.
Before that, no creature or rarity is rolled. The treat waits for real proof.
This is still just collectible flavor: proof affects review readiness, not creature quality.

openclaw-barnacle · 2026-06-06T04:47:09Z

This pull request has been automatically marked as stale due to inactivity.
Please add updates or it will be closed.

openclaw-barnacle Bot added the size: M label Apr 14, 2026

chatgpt-codex-connector Bot reviewed Apr 14, 2026

View reviewed changes

openclaw-clownfish Bot mentioned this pull request Apr 29, 2026

fix(compaction): respect effective reserve tokens in compaction gates #74010

Closed

clawsweeper Bot mentioned this pull request Apr 30, 2026

Compaction emits empty fallback summary; tokensBefore counts cacheRead, triggering premature compactions on Opus 1M #72964

Open

openclaw-barnacle Bot added the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label May 19, 2026

Adam-Researchh mentioned this pull request May 19, 2026

Codex runtime allows >2M-token turns with compactionCount=0, then contextEngine maintenance fails #84305

Closed

openclaw-barnacle Bot added the stale Marked as stale due to inactivity label Jun 6, 2026

openclaw-barnacle Bot removed the stale Marked as stale due to inactivity label Jun 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: auto-compaction fires on fresh cached token counts (#66520)#66716

fix: auto-compaction fires on fresh cached token counts (#66520)#66716
KeWang0622 wants to merge 2 commits into
openclaw:mainfrom
KeWang0622:fix/compaction-cached-tokens

KeWang0622 commented Apr 14, 2026

Uh oh!

greptile-apps Bot commented Apr 14, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 14, 2026

Uh oh!

KeWang0622 Apr 14, 2026

Uh oh!

prtags Bot commented Apr 23, 2026

Uh oh!

clawsweeper Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

clawsweeper Bot commented May 20, 2026

Uh oh!

openclaw-barnacle Bot commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

KeWang0622 commented Apr 14, 2026

Summary

Details

Edge cases verified

Test plan

Uh oh!

greptile-apps Bot commented Apr 14, 2026

Greptile Summary

Confidence Score: 5/5

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

KeWang0622 Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

prtags Bot commented Apr 23, 2026

Uh oh!

clawsweeper Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clawsweeper Bot commented May 20, 2026

Uh oh!

openclaw-barnacle Bot commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clawsweeper Bot commented Apr 27, 2026 •

edited

Loading