fix: move compaction from afterTurn to pre-assembly with cache-TTL check#385
fix: move compaction from afterTurn to pre-assembly with cache-TTL check#385liu51115 wants to merge 2 commits intoMartian-Engineering:mainfrom
Conversation
f5565e9 to
56eecc9
Compare
56eecc9 to
7fb97bd
Compare
|
There is genuine value here, but I think it is solving a different axis than #407 rather than replacing it. I checked our live installed mitigation and the post-patch logs before commenting. The tactical local patch that stops leaf compaction and threshold sweep from both firing on the same turn is still active in our installed extension. In But it did not close the stall problem by itself. The same post-patch window still shows repeated What I do think is genuinely valuable in this PR:
So my read is: useful ideas, but not a drop-in replacement for #407/#408. If this lands, I’d strongly prefer it as a narrower follow-up or split-out PR instead of a competing end-to-end compaction strategy. |
…heck (closes Martian-Engineering#367) Move all proactive compaction out of afterTurn() into pre-assembly in assemble(). afterTurn becomes telemetry-only (records cache state + lastApiCallAt). Pre-assembly triggers: 1. Budget overflow — must compact to fit 2. Cache-TTL — compact if idle > cacheTTLSeconds (default 300s) AND leaf trigger fires Removes: evaluateIncrementalCompaction, logIncrementalCompactionDecision, shouldApplyHotCacheHysteresis, resolveCacheAwareState, isComfortablyUnderTokenBudget, IncrementalCompactionDecision type, HOT_CACHE_HYSTERESIS_TURNS constant. Tests: 476/476 pass.
7fb97bd to
debbbf8
Compare
|
You're right about the pinned-file creep — that was a mistake on our end, we accidentally force-pushed a branch that bundled pinnedFiles (which has its own PR at #214) into this one. Just fixed it — the PR now contains only the cache-TTL pre-assembly change. Agreed this is a different axis from #407/#408. The deferred debt + coalescing + visibility work in #408 addresses foreground blocking and starvation. This PR addresses cache-safety timing — making sure compaction doesn't destroy a hot provider cache by firing at the wrong moment. They're complementary, not competing. Happy to keep this scoped as a narrower follow-up. |
Reads agent's params.cacheRetention from OC config and maps to TTL: - "long" → 3600s (1 hour) - "short" → 300s (5 min) - unset → falls back to cacheTTLSeconds config (default 300s) Prevents compacting during a still-valid long cache for agents configured with cacheRetention: "long".
|
Closing in favor of #408 which now covers cache-TTL safety + deferred compaction + maintenance visibility. The per-agent cache TTL approach there (via runtime telemetry retention) supersedes our config-based approach. |
Summary
Moves all compaction decisions from
afterTurn()toassemble()pre-assembly, using a persistedlastApiCallAttimestamp to determine whether the prompt cache has expired.Closes #367. Supersedes #362, #363. Related: #102.
Problem
afterTurn()evaluates compaction using the cache status of the call that just completed. This is a fundamental timing inversion: after any API call, the provider has just written (or refreshed) the prompt cache — the cache is always hot in afterTurn, regardless of what the cache status reading says. Compacting at this point always destroys a live cache.The three removed branches all suffer from this:
On providers that load-balance across backend instances with separate prompt caches (e.g. OpenRouter), a secondary problem compounds this: random cold readings from routing misses create a cascade of aggressive compaction → prefix rewrite → guaranteed cold → repeat. But even on direct Anthropic, the core inversion applies — afterTurn is simply the wrong time to evaluate cache state.
Fix
afterTurn()becomes telemetry-only: recordslastApiCallAt = Date.now(), nothing else.assemble()checks at pre-assembly time:cacheAwareCompaction.enabledandnow - lastApiCallAt > cacheTTLSecondsand there's material to compact → compact before assemblingThis is the only window where compaction is both safe and beneficial: the cache has genuinely expired, no live cache to destroy, and compacting reduces input cost on the upcoming cold call.
This approach supersedes the routing-noise guard in #362 (sticky cold counter is no longer needed when afterTurn doesn't make compaction decisions) and the deferred threshold compaction in #363 (all compaction is now off the reply path by design). It also addresses the same use case as the idle compaction trigger in #102 but with a simpler timestamp-based mechanism integrated into the assembly path.
Changes
src/engine.ts: remove two try/catch compaction blocks fromafterTurn(), add pre-assembly block toassemble(), delete dead methods (evaluateIncrementalCompaction,logIncrementalCompactionDecision,shouldApplyHotCacheHysteresis,resolveCacheAwareState,isComfortablyUnderTokenBudget)src/db/config.ts: addcacheTTLSecondstoCacheAwareCompactionConfig(default 300, envLCM_CACHE_TTL_SECONDS)src/db/migration.ts: addlast_api_call_at INTEGERcolumnsrc/store/compaction-telemetry-store.ts: addlastApiCallAtfield throughoutopenclaw.plugin.json: addcacheTTLSecondsto schematest/engine.test.ts: remove 11 tests for deleted afterTurn compaction behaviortest/config.test.ts: update assertionsTesting
All tests pass. Soak-tested in production: zero compactions during active use (turns 30–200s apart), compaction fires correctly after genuine cache expiry (idle > 300s), per-conversation TTL tracking confirmed working across multiple concurrent sessions.