You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Cache-aware compaction makes compaction decisions at the wrong time. The current design evaluates compaction in afterTurn() using the cache status of the call that just completed. This creates a fundamental timing inversion: the cache state after a call is the inverse of what the code assumes.
The Timing Problem
When a call returns cacheRead=0 (cold):
LCM records cacheState=cold → triggers aggressive cold-cache-catchup (double pass)
But the provider just wrote the prefix to cache on that call
The cache is now HOT for the next ~5 minutes
Compacting now destroys a freshly-written cache — the worst possible time
When a call returns cacheRead=119K (hot):
LCM records cacheState=hot → defers compaction
Correct in the moment, but the cache has a ~5 minute TTL
If the session goes idle for >5 min, the cache expires naturally
But afterTurn() never fires during idle periods — no compaction happens when it would be free
In both cases, the decision is backwards:
Cold reading → cache is actually hot now → should NOT compact
Hot reading during idle → cache will expire soon → COULD compact for free, but doesn't
Current Code Path
afterTurn() fires immediately after API response
→ updateCompactionTelemetry(cacheState from THIS call)
→ evaluateIncrementalCompaction()
if cold → aggressive catchup (2 passes, condensed) ← WRONG: cache just got written
if hot → defer ← RIGHT now, but misses idle window
The evaluation is synchronous with API calls. But the optimal compaction window is asynchronous — during idle periods when the cache has already expired.
Proposed Architecture
Instead of deciding compaction based on last-call cache status:
After any call: schedule a deferred compaction for now + cacheTTL (~5 min)
If another call happens before the timer: cancel and reschedule (cache got refreshed)
On timer expiry with no intervening call: compact freely — the cache has expired naturally, so rewriting the prefix costs nothing
Budget trigger: still fires immediately regardless (hard safety limit)
This inverts the current model:
Current: "The last call was cold, so compact aggressively" (wrong — cache is now hot)
Proposed: "No call for 5 minutes, so cache expired — compact now for free" (correct)
Benefits
Compaction always happens when the cache is genuinely cold (expired), not when it was just written
No routing noise sensitivity — OR load balancing becomes irrelevant since we're not reacting to per-call status
Summary
Cache-aware compaction makes compaction decisions at the wrong time. The current design evaluates compaction in
afterTurn()using the cache status of the call that just completed. This creates a fundamental timing inversion: the cache state after a call is the inverse of what the code assumes.The Timing Problem
When a call returns
cacheRead=0(cold):cacheState=cold→ triggers aggressivecold-cache-catchup(double pass)When a call returns
cacheRead=119K(hot):cacheState=hot→ defers compactionafterTurn()never fires during idle periods — no compaction happens when it would be freeIn both cases, the decision is backwards:
Current Code Path
The evaluation is synchronous with API calls. But the optimal compaction window is asynchronous — during idle periods when the cache has already expired.
Proposed Architecture
Instead of deciding compaction based on last-call cache status:
now + cacheTTL(~5 min)This inverts the current model:
Benefits
Relationship to Existing Work
Complexity
This is a bigger change than #362. It requires: