fix: make cache-aware compaction resilient to routing noise#362
Conversation
There was a problem hiding this comment.
Pull request overview
This PR hardens cache-aware incremental compaction against transient prompt-cache “cold” readings (e.g., OpenRouter routing noise) by making cache state decisions sticky and requiring multiple cold observations before treating cache as truly cold.
Changes:
- Adds
cacheAwareCompaction.coldCacheObservationThresholdconfig (default3) and env support (LCM_COLD_CACHE_OBSERVATION_THRESHOLD). - Persists a
consecutiveColdObservationscounter in compaction telemetry (DB + store + migration) and incorporates it into cache-state resolution. - Extends docs/plugin schema and adds/updates targeted tests for the new sticky behavior.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/engine.ts |
Implements sticky cache-state resolution and updates telemetry with a cold-observation counter. |
src/store/compaction-telemetry-store.ts |
Extends telemetry record + upsert/select to persist consecutive_cold_observations. |
src/db/migration.ts |
Adds DB column migration and updates initial schema for the new telemetry field. |
src/db/config.ts |
Adds config type + resolution for coldCacheObservationThreshold and env var parsing. |
docs/configuration.md |
Documents the new cache-aware compaction parameter and env var. |
openclaw.plugin.json |
Exposes the new config option in UI hints and JSON schema. |
skills/lossless-claw/references/config.md |
Adds reference documentation for the new cache threshold setting. |
test/config.test.ts |
Updates config resolution/schema expectations to include the new parameter. |
test/engine.test.ts |
Adds focused tests validating sticky hot behavior and cold-threshold behavior. |
test/expansion.test.ts |
Updates shared test config to include the new cache-aware parameter. |
test/circuit-breaker.test.ts |
Updates test config to include the new cache-aware parameter. |
test/session-operation-queues.test.ts |
Updates test config to include the new cache-aware parameter. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| : snapshot?.cacheState === "hot" | ||
| ? 0 | ||
| : snapshot?.cacheState === "cold" | ||
| ? (existing?.consecutiveColdObservations ?? 0) + 1 | ||
| : existing?.consecutiveColdObservations ?? 0; |
There was a problem hiding this comment.
consecutiveColdObservations is described/treated as a "consecutive" streak, but this update logic does not reset the counter on snapshot.cacheState === "unknown" (it preserves the prior value). That means a pattern like cold → unknown → cold will still advance toward the threshold even though the cold observations were not consecutive. Consider either resetting the counter on unknown (to enforce true consecutiveness) or renaming/adjusting the config + docs to reflect that unknown observations do not break the streak.
| : snapshot?.cacheState === "hot" | |
| ? 0 | |
| : snapshot?.cacheState === "cold" | |
| ? (existing?.consecutiveColdObservations ?? 0) + 1 | |
| : existing?.consecutiveColdObservations ?? 0; | |
| : snapshot?.cacheState === "cold" | |
| ? (existing?.consecutiveColdObservations ?? 0) + 1 | |
| : 0; |
| | `cacheAwareCompaction.maxColdCacheCatchupPasses` | `integer` | `2` | `LCM_MAX_COLD_CACHE_CATCHUP_PASSES` | Maximum bounded catch-up passes allowed in one maintenance cycle when cache telemetry is cold. | | ||
| | `cacheAwareCompaction.hotCachePressureFactor` | `number` | `4` | `LCM_HOT_CACHE_PRESSURE_FACTOR` | Multiplier applied to the hot-cache leaf trigger before raw-history pressure overrides cache preservation. | | ||
| | `cacheAwareCompaction.hotCacheBudgetHeadroomRatio` | `number` | `0.2` | `LCM_HOT_CACHE_BUDGET_HEADROOM_RATIO` | Minimum fraction of the real token budget that must remain free before hot-cache incremental compaction is skipped entirely. | | ||
| | `cacheAwareCompaction.coldCacheObservationThreshold` | `integer` | `3` | `LCM_COLD_CACHE_OBSERVATION_THRESHOLD` | Consecutive cold observations required before non-explicit cache misses are treated as truly cold. This dampens one-off routing noise and provider failover blips. | |
There was a problem hiding this comment.
This description says "Consecutive cold observations". In src/engine.ts, consecutiveColdObservations currently carries across unknown observations (unknown does not reset the streak), so the behavior is closer to "cold observations since last hot" than strict consecutiveness. Please clarify the wording here (and in the other user-facing descriptions) or adjust the implementation to match the documented semantics.
| | `cacheAwareCompaction.coldCacheObservationThreshold` | `integer` | `3` | `LCM_COLD_CACHE_OBSERVATION_THRESHOLD` | Consecutive cold observations required before non-explicit cache misses are treated as truly cold. This dampens one-off routing noise and provider failover blips. | | |
| | `cacheAwareCompaction.coldCacheObservationThreshold` | `integer` | `3` | `LCM_COLD_CACHE_OBSERVATION_THRESHOLD` | Cold observations required since the last hot observation before non-explicit cache misses are treated as truly cold. Unknown observations do not reset the count, which dampens one-off routing noise and provider failover blips. | |
… branching Remove hot-cache-budget-headroom, hot-cache-defer, and cold-cache-catchup branches from evaluateIncrementalCompaction(). These cache-state-dependent decisions in afterTurn cause compaction cascades on load-balanced providers where random cold readings trigger aggressive multi-pass compaction. After this change, afterTurn compaction uses: - budget-trigger: safety valve, always fires (unchanged) - leaf-trigger: single pass, no condensed passes, no cache-state branching Closes Martian-Engineering#367. Related: Martian-Engineering#358, Martian-Engineering#362.
Add the missing coldCacheObservationThreshold entry to the bundled lossless-claw config reference so the summary defaults match the runtime defaults and detailed option docs already added on this branch. Regeneration-Prompt: | Address the accepted PR review finding for PR Martian-Engineering#362 about the bundled lossless-claw config reference being internally inconsistent. Keep the change additive and minimal: do not alter runtime behavior or broader docs, just update the `cacheAwareCompaction` "Good defaults" summary in `skills/lossless-claw/references/config.md` so it includes the new `coldCacheObservationThreshold` default of 3, matching the detailed reference section and existing runtime config defaults.
f8efd9a to
7b27367
Compare
|
I traced the interaction between this PR and #408 on current Background: the afterTurn timing problemThis was the central insight behind #385 (later adopted by #408): #408 solved this correctly: afterTurn only records debt, actual consumption is gated by wall-clock TTL ( Why #362 is unnecessary under #408#362's stated problem: "a single Under #408's model, this problem doesn't exist:
The TTL gate is inherently blip-proof. It doesn't matter if you see 1 cold observation or 10 — the gate checks the clock, not observation counts. A noisy cold read never reaches execution because the cache hasn't actually expired. So the "routing noise" failure mode #362 addresses no longer exists after #408. There is nothing left for observation-count smoothing to fix. Why #362 is actively harmful with #408#362 modifies
#362 pre-empts #408's time-aware mechanism with a time-unaware one. It suppresses the signal that #408 would have handled correctly. The deeper issue: wrong primitiveCache liveness is a function of wall-clock time, not observation count:
#408 uses the right primitive (wall-clock TTL). #362 uses the wrong one (consecutive observation counts) and disrupts #408's pipeline in the process. SuggestionConsider reverting #362's |
…che smoothing Narrow fix for the interaction between #362 (cache-aware routing noise protection) and #408 (deferred proactive compaction). The issue: when incremental evaluation returns 'no compaction needed' due to hot-cache budget-headroom or hot-cache-defer (routing-noise suppression), deferred Anthropic sessions were incorrectly terminating their maintenance debt instead of continuing to execute after the prompt-cache TTL had expired. The fix: check whether deferred Anthropic sessions should override the cache-aware 'no compaction' decision once the TTL-safe hold has elapsed. If so, allow leaf compaction to execute despite the routing-noise hysteresis. Inline and non-deferred paths keep existing behavior unchanged. - Adds shouldForceDeferredAnthropicLeafCompaction() to gate override - Routes deferred Anthropic leaf compaction through executeLeafCompactionCore when TTL is stale - Adds regression test: 'assemble() still executes deferred Anthropic leaf debt after TTL expiry when cache smoothing remains effectively hot' - No changes to routing-noise protection or incremental evaluation logic Fixes: #408 Related: #362
…che smoothing (#434) * fix: execute deferred Anthropic leaf debt after TTL expiry despite cache smoothing Narrow fix for the interaction between #362 (cache-aware routing noise protection) and #408 (deferred proactive compaction). The issue: when incremental evaluation returns 'no compaction needed' due to hot-cache budget-headroom or hot-cache-defer (routing-noise suppression), deferred Anthropic sessions were incorrectly terminating their maintenance debt instead of continuing to execute after the prompt-cache TTL had expired. The fix: check whether deferred Anthropic sessions should override the cache-aware 'no compaction' decision once the TTL-safe hold has elapsed. If so, allow leaf compaction to execute despite the routing-noise hysteresis. Inline and non-deferred paths keep existing behavior unchanged. - Adds shouldForceDeferredAnthropicLeafCompaction() to gate override - Routes deferred Anthropic leaf compaction through executeLeafCompactionCore when TTL is stale - Adds regression test: 'assemble() still executes deferred Anthropic leaf debt after TTL expiry when cache smoothing remains effectively hot' - No changes to routing-noise protection or incremental evaluation logic Fixes: #408 Related: #362 * fix: use catch-up settings for stale TTL deferred Anthropic debt When deferred Anthropic leaf debt overrides hot-cache smoothing after the prompt-cache TTL expires, run the leaf compaction with the cold-cache catch-up envelope instead of the original hot-cache single-pass settings. This preserves the intended deferred recovery behavior and prevents the maintenance record from being cleared after a single underpowered pass.\n\nAdd a regression that proves the stale-TTL override enables catch-up execution rather than reusing the hot-cache defer envelope, and update the existing stale-TTL test to assert the new execution parameters.\n\nRegeneration-Prompt: |\n Address the review finding on PR #434 in lossless-claw. The stale-TTL deferred Anthropic compaction fix should not merely force one hot-cache-sized leaf compaction pass and then clear the maintenance debt. Keep the change narrow and additive inside the deferred-compaction path. When the Anthropic prompt-cache TTL has expired and deferred debt must override cache smoothing, execute with the same catch-up envelope that cold-cache recovery uses, especially the extra pass allowance and condensed-pass setting. Add regression coverage that would fail if the forced path still used maxPasses=1 or allowCondensedPasses=false, and update any stale expectations in the existing TTL-expiry test.
Summary
This fixes the cache-noise failure mode behind #358.
Current
maintreats a single cold observation as authoritative unless a very short hot-cache hysteresis window happens to save the session. That is too aggressive for provider failover and OpenRouter routing noise: a singlecacheRead=0can drop a previously hot session into cold-cache catch-up and trigger destructive compaction that was not justified by real memory pressure.This PR makes cache-aware compaction state sticky instead of single-sample authoritative.
What changed
cacheAwareCompaction.coldCacheObservationThreshold(default3)consecutive_cold_observationscounter in compaction telemetrycoldandunknownblips until the cold-observation threshold is reachedWhy this helps
The conservative failure mode for cache-aware compaction should be “preserve context until we are confident the cache is really cold,” not “compact aggressively because one read looked cold.”
This is especially important on OpenRouter, where identical prefixes can still produce transient cold reads because requests land on different backend instances.
Tests
pnpm exec vitest run test/config.test.ts test/engine.test.ts test/session-operation-queues.test.ts test/circuit-breaker.test.ts test/expansion.test.tsAdds focused coverage for:
Closes #358.