Skip to content

cache-aware compaction: timing inversion — decisions based on last-call status instead of cache expiry #367

@liu51115

Description

@liu51115

Summary

Cache-aware compaction makes compaction decisions at the wrong time. The current design evaluates compaction in afterTurn() using the cache status of the call that just completed. This creates a fundamental timing inversion: the cache state after a call is the inverse of what the code assumes.

The Timing Problem

When a call returns cacheRead=0 (cold):

  • LCM records cacheState=cold → triggers aggressive cold-cache-catchup (double pass)
  • But the provider just wrote the prefix to cache on that call
  • The cache is now HOT for the next ~5 minutes
  • Compacting now destroys a freshly-written cache — the worst possible time

When a call returns cacheRead=119K (hot):

  • LCM records cacheState=hot → defers compaction
  • Correct in the moment, but the cache has a ~5 minute TTL
  • If the session goes idle for >5 min, the cache expires naturally
  • But afterTurn() never fires during idle periods — no compaction happens when it would be free

In both cases, the decision is backwards:

  • Cold reading → cache is actually hot now → should NOT compact
  • Hot reading during idle → cache will expire soon → COULD compact for free, but doesn't

Current Code Path

afterTurn() fires immediately after API response
  → updateCompactionTelemetry(cacheState from THIS call)
  → evaluateIncrementalCompaction()
      if cold → aggressive catchup (2 passes, condensed)  ← WRONG: cache just got written
      if hot  → defer                                      ← RIGHT now, but misses idle window

The evaluation is synchronous with API calls. But the optimal compaction window is asynchronous — during idle periods when the cache has already expired.

Proposed Architecture

Instead of deciding compaction based on last-call cache status:

  1. After any call: schedule a deferred compaction for now + cacheTTL (~5 min)
  2. If another call happens before the timer: cancel and reschedule (cache got refreshed)
  3. On timer expiry with no intervening call: compact freely — the cache has expired naturally, so rewriting the prefix costs nothing
  4. Budget trigger: still fires immediately regardless (hard safety limit)

This inverts the current model:

  • Current: "The last call was cold, so compact aggressively" (wrong — cache is now hot)
  • Proposed: "No call for 5 minutes, so cache expired — compact now for free" (correct)

Benefits

  • Compaction always happens when the cache is genuinely cold (expired), not when it was just written
  • No routing noise sensitivity — OR load balancing becomes irrelevant since we're not reacting to per-call status
  • No need for sticky counters or hysteresis heuristics (fix: make cache-aware compaction resilient to routing noise #362) — the timer handles it cleanly
  • Idle sessions get compacted (currently they never do, since afterTurn never fires)

Relationship to Existing Work

Complexity

This is a bigger change than #362. It requires:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions