cache-aware compaction: timing inversion — decisions based on last-call status instead of cache expiry

## Summary

Cache-aware compaction makes compaction decisions at the wrong time. The current design evaluates compaction in `afterTurn()` using the cache status of the call that just completed. This creates a fundamental timing inversion: the cache state *after* a call is the inverse of what the code assumes.

## The Timing Problem

When a call returns `cacheRead=0` (cold):
- LCM records `cacheState=cold` → triggers aggressive `cold-cache-catchup` (double pass)
- But the provider just *wrote* the prefix to cache on that call
- **The cache is now HOT for the next ~5 minutes**
- Compacting now destroys a freshly-written cache — the worst possible time

When a call returns `cacheRead=119K` (hot):
- LCM records `cacheState=hot` → defers compaction
- Correct in the moment, but the cache has a ~5 minute TTL
- If the session goes idle for >5 min, the cache expires naturally
- But `afterTurn()` never fires during idle periods — no compaction happens when it would be free

**In both cases, the decision is backwards:**
- Cold reading → cache is actually hot now → should NOT compact
- Hot reading during idle → cache will expire soon → COULD compact for free, but doesn't

## Current Code Path

```
afterTurn() fires immediately after API response
  → updateCompactionTelemetry(cacheState from THIS call)
  → evaluateIncrementalCompaction()
      if cold → aggressive catchup (2 passes, condensed)  ← WRONG: cache just got written
      if hot  → defer                                      ← RIGHT now, but misses idle window
```

The evaluation is synchronous with API calls. But the optimal compaction window is asynchronous — during idle periods when the cache has already expired.

## Proposed Architecture

Instead of deciding compaction based on last-call cache status:

1. **After any call:** schedule a deferred compaction for `now + cacheTTL` (~5 min)
2. **If another call happens before the timer:** cancel and reschedule (cache got refreshed)
3. **On timer expiry with no intervening call:** compact freely — the cache has expired naturally, so rewriting the prefix costs nothing
4. **Budget trigger:** still fires immediately regardless (hard safety limit)

This inverts the current model:
- **Current:** "The last call was cold, so compact aggressively" (wrong — cache is now hot)
- **Proposed:** "No call for 5 minutes, so cache expired — compact now for free" (correct)

## Benefits

- Compaction always happens when the cache is genuinely cold (expired), not when it was just written
- No routing noise sensitivity — OR load balancing becomes irrelevant since we're not reacting to per-call status
- No need for sticky counters or hysteresis heuristics (#362) — the timer handles it cleanly
- Idle sessions get compacted (currently they never do, since afterTurn never fires)

## Relationship to Existing Work

- **#289, #306, #329** — original cache-aware compaction implementation. This issue proposes a fundamental redesign of the timing model.
- **#358** — OR routing noise cascade. A symptom of the timing inversion described here.
- **#362** — sticky cold counter. A mitigation that reduces the symptom but doesn't address the root cause.
- **#363** — moves proactive compaction off the reply path. Complementary — still triggered by afterTurn, but no longer blocking it.

## Complexity

This is a bigger change than #362. It requires:
- A per-session timer/scheduler (could piggyback on existing background compaction from #363)
- Cancellation logic when new calls arrive
- Cache TTL awareness (Anthropic: 5 min default, 1 hour extended)
- Integration with the existing budget-trigger hard limit


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cache-aware compaction: timing inversion — decisions based on last-call status instead of cache expiry #367

Summary

The Timing Problem

Current Code Path

Proposed Architecture

Benefits

Relationship to Existing Work

Complexity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cache-aware compaction: timing inversion — decisions based on last-call status instead of cache expiry #367

Description

Summary

The Timing Problem

Current Code Path

Proposed Architecture

Benefits

Relationship to Existing Work

Complexity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions