Skip to content

fix: make cache-aware compaction resilient to routing noise#362

Merged
jalehman merged 2 commits intoMartian-Engineering:mainfrom
electricsheephq:codex/cache-noise-guard
Apr 13, 2026
Merged

fix: make cache-aware compaction resilient to routing noise#362
jalehman merged 2 commits intoMartian-Engineering:mainfrom
electricsheephq:codex/cache-noise-guard

Conversation

@100yenadmin
Copy link
Copy Markdown
Contributor

Summary

This fixes the cache-noise failure mode behind #358.

Current main treats a single cold observation as authoritative unless a very short hot-cache hysteresis window happens to save the session. That is too aggressive for provider failover and OpenRouter routing noise: a single cacheRead=0 can drop a previously hot session into cold-cache catch-up and trigger destructive compaction that was not justified by real memory pressure.

This PR makes cache-aware compaction state sticky instead of single-sample authoritative.

What changed

  • add cacheAwareCompaction.coldCacheObservationThreshold (default 3)
  • persist a consecutive_cold_observations counter in compaction telemetry
  • keep explicit cache breaks authoritative immediately
  • keep prior hot sessions effectively hot through one-off cold and unknown blips until the cold-observation threshold is reached
  • preserve existing budget-pressure behavior once a session is truly under pressure

Why this helps

The conservative failure mode for cache-aware compaction should be “preserve context until we are confident the cache is really cold,” not “compact aggressively because one read looked cold.”

This is especially important on OpenRouter, where identical prefixes can still produce transient cold reads because requests land on different backend instances.

Tests

  • pnpm exec vitest run test/config.test.ts test/engine.test.ts test/session-operation-queues.test.ts test/circuit-breaker.test.ts test/expansion.test.ts

Adds focused coverage for:

  • single cold reading after a prior hot streak stays non-authoritative
  • repeated cold readings eventually become authoritative
  • unknown/no-usage observations do not collapse hot-cache protection
  • explicit breaks still force cold behavior immediately

Closes #358.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens cache-aware incremental compaction against transient prompt-cache “cold” readings (e.g., OpenRouter routing noise) by making cache state decisions sticky and requiring multiple cold observations before treating cache as truly cold.

Changes:

  • Adds cacheAwareCompaction.coldCacheObservationThreshold config (default 3) and env support (LCM_COLD_CACHE_OBSERVATION_THRESHOLD).
  • Persists a consecutiveColdObservations counter in compaction telemetry (DB + store + migration) and incorporates it into cache-state resolution.
  • Extends docs/plugin schema and adds/updates targeted tests for the new sticky behavior.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/engine.ts Implements sticky cache-state resolution and updates telemetry with a cold-observation counter.
src/store/compaction-telemetry-store.ts Extends telemetry record + upsert/select to persist consecutive_cold_observations.
src/db/migration.ts Adds DB column migration and updates initial schema for the new telemetry field.
src/db/config.ts Adds config type + resolution for coldCacheObservationThreshold and env var parsing.
docs/configuration.md Documents the new cache-aware compaction parameter and env var.
openclaw.plugin.json Exposes the new config option in UI hints and JSON schema.
skills/lossless-claw/references/config.md Adds reference documentation for the new cache threshold setting.
test/config.test.ts Updates config resolution/schema expectations to include the new parameter.
test/engine.test.ts Adds focused tests validating sticky hot behavior and cold-threshold behavior.
test/expansion.test.ts Updates shared test config to include the new cache-aware parameter.
test/circuit-breaker.test.ts Updates test config to include the new cache-aware parameter.
test/session-operation-queues.test.ts Updates test config to include the new cache-aware parameter.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/engine.ts
Comment on lines +1804 to +1808
: snapshot?.cacheState === "hot"
? 0
: snapshot?.cacheState === "cold"
? (existing?.consecutiveColdObservations ?? 0) + 1
: existing?.consecutiveColdObservations ?? 0;
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consecutiveColdObservations is described/treated as a "consecutive" streak, but this update logic does not reset the counter on snapshot.cacheState === "unknown" (it preserves the prior value). That means a pattern like cold → unknown → cold will still advance toward the threshold even though the cold observations were not consecutive. Consider either resetting the counter on unknown (to enforce true consecutiveness) or renaming/adjusting the config + docs to reflect that unknown observations do not break the streak.

Suggested change
: snapshot?.cacheState === "hot"
? 0
: snapshot?.cacheState === "cold"
? (existing?.consecutiveColdObservations ?? 0) + 1
: existing?.consecutiveColdObservations ?? 0;
: snapshot?.cacheState === "cold"
? (existing?.consecutiveColdObservations ?? 0) + 1
: 0;

Copilot uses AI. Check for mistakes.
Comment thread docs/configuration.md
| `cacheAwareCompaction.maxColdCacheCatchupPasses` | `integer` | `2` | `LCM_MAX_COLD_CACHE_CATCHUP_PASSES` | Maximum bounded catch-up passes allowed in one maintenance cycle when cache telemetry is cold. |
| `cacheAwareCompaction.hotCachePressureFactor` | `number` | `4` | `LCM_HOT_CACHE_PRESSURE_FACTOR` | Multiplier applied to the hot-cache leaf trigger before raw-history pressure overrides cache preservation. |
| `cacheAwareCompaction.hotCacheBudgetHeadroomRatio` | `number` | `0.2` | `LCM_HOT_CACHE_BUDGET_HEADROOM_RATIO` | Minimum fraction of the real token budget that must remain free before hot-cache incremental compaction is skipped entirely. |
| `cacheAwareCompaction.coldCacheObservationThreshold` | `integer` | `3` | `LCM_COLD_CACHE_OBSERVATION_THRESHOLD` | Consecutive cold observations required before non-explicit cache misses are treated as truly cold. This dampens one-off routing noise and provider failover blips. |
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description says "Consecutive cold observations". In src/engine.ts, consecutiveColdObservations currently carries across unknown observations (unknown does not reset the streak), so the behavior is closer to "cold observations since last hot" than strict consecutiveness. Please clarify the wording here (and in the other user-facing descriptions) or adjust the implementation to match the documented semantics.

Suggested change
| `cacheAwareCompaction.coldCacheObservationThreshold` | `integer` | `3` | `LCM_COLD_CACHE_OBSERVATION_THRESHOLD` | Consecutive cold observations required before non-explicit cache misses are treated as truly cold. This dampens one-off routing noise and provider failover blips. |
| `cacheAwareCompaction.coldCacheObservationThreshold` | `integer` | `3` | `LCM_COLD_CACHE_OBSERVATION_THRESHOLD` | Cold observations required since the last hot observation before non-explicit cache misses are treated as truly cold. Unknown observations do not reset the count, which dampens one-off routing noise and provider failover blips. |

Copilot uses AI. Check for mistakes.
liu51115 pushed a commit to liu51115/lossless-claw that referenced this pull request Apr 10, 2026
… branching

Remove hot-cache-budget-headroom, hot-cache-defer, and cold-cache-catchup
branches from evaluateIncrementalCompaction(). These cache-state-dependent
decisions in afterTurn cause compaction cascades on load-balanced providers
where random cold readings trigger aggressive multi-pass compaction.

After this change, afterTurn compaction uses:
- budget-trigger: safety valve, always fires (unchanged)
- leaf-trigger: single pass, no condensed passes, no cache-state branching

Closes Martian-Engineering#367. Related: Martian-Engineering#358, Martian-Engineering#362.
@jalehman jalehman self-assigned this Apr 11, 2026
Eva and others added 2 commits April 13, 2026 07:01
Add the missing coldCacheObservationThreshold entry to the bundled
lossless-claw config reference so the summary defaults match the
runtime defaults and detailed option docs already added on this branch.

Regeneration-Prompt: |
  Address the accepted PR review finding for PR Martian-Engineering#362 about the bundled
  lossless-claw config reference being internally inconsistent. Keep the
  change additive and minimal: do not alter runtime behavior or broader
  docs, just update the `cacheAwareCompaction` "Good defaults" summary in
  `skills/lossless-claw/references/config.md` so it includes the new
  `coldCacheObservationThreshold` default of 3, matching the detailed
  reference section and existing runtime config defaults.
@jalehman jalehman force-pushed the codex/cache-noise-guard branch from f8efd9a to 7b27367 Compare April 13, 2026 14:02
@jalehman jalehman merged commit e8c13c5 into Martian-Engineering:main Apr 13, 2026
1 check passed
@liu51115
Copy link
Copy Markdown
Contributor

cc @100yenadmin @jalehman

I traced the interaction between this PR and #408 on current main, informed by earlier work on #385 (which #408 superseded). I don't think #362 is needed after #408, and it actively interferes with #408's design.

Background: the afterTurn timing problem

This was the central insight behind #385 (later adopted by #408): afterTurn() fires right after the API call, when the cache was just written. You can't know from afterTurn whether the cache will still be valid at the next assembly — that's a wall-clock question. afterTurn is structurally the wrong place to make compaction decisions.

#408 solved this correctly: afterTurn only records debt, actual consumption is gated by wall-clock TTL (shouldDelayPromptMutatingDeferredCompactionisPromptCacheStillHot).

Why #362 is unnecessary under #408

#362's stated problem: "a single cacheRead=0 can trigger destructive compaction."

Under #408's model, this problem doesn't exist:

  1. Cold blip arrives → afterTurn records debt (cheap, no harm)
  2. Next assembly → TTL gate checks wall-clock time → cache still hot → delays consumption
  3. Cache eventually expires → next assembly consumes the debt → compaction runs when it's actually free

The TTL gate is inherently blip-proof. It doesn't matter if you see 1 cold observation or 10 — the gate checks the clock, not observation counts. A noisy cold read never reaches execution because the cache hasn't actually expired.

So the "routing noise" failure mode #362 addresses no longer exists after #408. There is nothing left for observation-count smoothing to fix.

Why #362 is actively harmful with #408

#362 modifies resolveCacheAwareState() to smooth cold observations → keeps state "hot" after blips. This feeds evaluateIncrementalCompaction() in afterTurn, which can then return shouldCompact=false via hot-cache-defer or hot-cache-budget-headroom — preventing debt from being recorded at all.

#362 pre-empts #408's time-aware mechanism with a time-unaware one. It suppresses the signal that #408 would have handled correctly.

The deeper issue: wrong primitive

Cache liveness is a function of wall-clock time, not observation count:

#408 uses the right primitive (wall-clock TTL). #362 uses the wrong one (consecutive observation counts) and disrupts #408's pipeline in the process.

Suggestion

Consider reverting #362's resolveCacheAwareState() changes. #408's TTL-based consumption gating already solves the noise problem, and #362 interferes with it by preventing debt from being recorded.

jalehman added a commit that referenced this pull request Apr 14, 2026
…che smoothing

Narrow fix for the interaction between #362 (cache-aware routing noise
protection) and #408 (deferred proactive compaction). The issue: when
incremental evaluation returns 'no compaction needed' due to hot-cache
budget-headroom or hot-cache-defer (routing-noise suppression), deferred
Anthropic sessions were incorrectly terminating their maintenance debt
instead of continuing to execute after the prompt-cache TTL had expired.

The fix: check whether deferred Anthropic sessions should override the
cache-aware 'no compaction' decision once the TTL-safe hold has elapsed.
If so, allow leaf compaction to execute despite the routing-noise
hysteresis. Inline and non-deferred paths keep existing behavior unchanged.

- Adds shouldForceDeferredAnthropicLeafCompaction() to gate override
- Routes deferred Anthropic leaf compaction through executeLeafCompactionCore
  when TTL is stale
- Adds regression test: 'assemble() still executes deferred Anthropic
  leaf debt after TTL expiry when cache smoothing remains effectively hot'
- No changes to routing-noise protection or incremental evaluation logic

Fixes: #408
Related: #362
jalehman added a commit that referenced this pull request Apr 14, 2026
…che smoothing (#434)

* fix: execute deferred Anthropic leaf debt after TTL expiry despite cache smoothing

Narrow fix for the interaction between #362 (cache-aware routing noise
protection) and #408 (deferred proactive compaction). The issue: when
incremental evaluation returns 'no compaction needed' due to hot-cache
budget-headroom or hot-cache-defer (routing-noise suppression), deferred
Anthropic sessions were incorrectly terminating their maintenance debt
instead of continuing to execute after the prompt-cache TTL had expired.

The fix: check whether deferred Anthropic sessions should override the
cache-aware 'no compaction' decision once the TTL-safe hold has elapsed.
If so, allow leaf compaction to execute despite the routing-noise
hysteresis. Inline and non-deferred paths keep existing behavior unchanged.

- Adds shouldForceDeferredAnthropicLeafCompaction() to gate override
- Routes deferred Anthropic leaf compaction through executeLeafCompactionCore
  when TTL is stale
- Adds regression test: 'assemble() still executes deferred Anthropic
  leaf debt after TTL expiry when cache smoothing remains effectively hot'
- No changes to routing-noise protection or incremental evaluation logic

Fixes: #408
Related: #362

* fix: use catch-up settings for stale TTL deferred Anthropic debt

When deferred Anthropic leaf debt overrides hot-cache smoothing after the prompt-cache TTL expires, run the leaf compaction with the cold-cache catch-up envelope instead of the original hot-cache single-pass settings. This preserves the intended deferred recovery behavior and prevents the maintenance record from being cleared after a single underpowered pass.\n\nAdd a regression that proves the stale-TTL override enables catch-up execution rather than reusing the hot-cache defer envelope, and update the existing stale-TTL test to assert the new execution parameters.\n\nRegeneration-Prompt: |\n  Address the review finding on PR #434 in lossless-claw. The stale-TTL deferred Anthropic compaction fix should not merely force one hot-cache-sized leaf compaction pass and then clear the maintenance debt. Keep the change narrow and additive inside the deferred-compaction path. When the Anthropic prompt-cache TTL has expired and deferred debt must override cache smoothing, execute with the same catch-up envelope that cold-cache recovery uses, especially the extra pass allowance and condensed-pass setting. Add regression coverage that would fail if the forced path still used maxPasses=1 or allowCondensedPasses=false, and update any stale expectations in the existing TTL-expiry test.
@jalehman
Copy link
Copy Markdown
Contributor

Thanks for the analysis @liu51115 — addressed in #434.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cacheAwareCompaction: OR routing noise triggers destructive compaction cascade

4 participants