fix(reply): gate preflight compaction fast-path on token threshold (#63892)#64384
Conversation
Greptile SummaryThis PR fixes a bug where proactive preflight compaction stops firing after the first compaction checkpoint. Once Confidence Score: 5/5Safe to merge — targeted one-line logic fix with correct tests and no surface-area changes outside the preflight gate. The fix is logically sound: the guard is coherent with shouldUseTranscriptFallback (when totalTokensFresh !== false and tokens are valid, freshPersistedTokens is guaranteed to be a positive number, so the new predicate behaves correctly in all reachable states). Both directions of the condition are covered by regression tests. No config, schema, or public SDK surface was touched. No remaining P0/P1 findings. No files require special attention. Reviews (1): Last reviewed commit: "fix(reply): gate preflight compaction fa..." | Re-trigger Greptile |
|
Related work from PRtags group Title: Preflight compaction skips fresh token totals
|
|
Codex review: needs real behavior proof before merge. Reviewed June 6, 2026, 12:49 AM ET / 04:49 UTC. Summary PR surface: Source +8, Tests +89, Docs +1. Total +98 across 3 files. Reproducibility: unclear. The review failed before ClawSweeper could establish a reproduction path. Review metrics: none identified. Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Risk before merge
Maintainer options:
Next step before merge
Review detailsBest possible solution: Retry the Codex review after fixing the execution failure. Do we have a high-confidence way to reproduce the issue? Unclear. The review failed before ClawSweeper could establish a reproduction path. Is this the best way to solve the issue? Unclear. Retry the review first so ClawSweeper can evaluate the actual issue and fix direction. AGENTS.md: unclear because the file could not be read completely. Codex review notes: model gpt-5.5, reasoning high; reviewed against 9313471fa579. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +8, Tests +89, Docs +1. Total +98 across 3 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
|
This pull request has been automatically marked as stale due to inactivity. |
Summary
Fixes #63892. After the first compaction, `entry.totalTokensFresh` is set to `true` by `incrementRunCompactionCount`. On subsequent runs, `runPreflightCompactionIfNeeded` unconditionally short-circuits at `src/auto-reply/reply/agent-runner-memory.ts:379` because `shouldUseTranscriptFallback` is `false`. The stale-tokens path below (which is what runs the actual threshold check) is never reached, so proactive compaction never re-fires. Overflow-retry recovery still worked because it runs through a separate path in `src/auto-reply/reply/pi-embedded-runner/run.ts` that catches `isLikelyContextOverflowError` directly, bypassing the preflight gate entirely.
Change Type
Scope
Verification
I also verified the equivalent gate on the compiled dist of v2026.4.9 on a long-running gateway against GLM-5.1:cloud (`contextWindowTokens` ≈ 200k, `reserveTokensFloor` 40k, `softThresholdTokens` 25k). Before the patch: zero proactive checkpoints after the first, overflow-retry checkpoints only. After the patch: proactive checkpoints fire on every subsequent cycle as expected.
Credit
The root cause analysis, exact file/line pointers, and the fix direction are all @martingarramon's from the issue thread. My earlier hypothesis about `compactionCount` latching was wrong — thanks for the correction. @mjamiv independently confirmed the repro on v2026.4.9, which is what motivated digging past the first hypothesis.
AI-assisted disclosure
This PR was written with AI assistance (Claude). I (the submitter) read and understand the change, verified the fix against the live compiled runtime before writing the source patch, and wrote the regression tests to directly exercise the `totalTokensFresh === true` re-fire case.
Fixes #63892