fix: make cache-aware compaction resilient to routing noise by 100yenadmin · Pull Request #362 · Martian-Engineering/lossless-claw

100yenadmin · 2026-04-10T13:42:05Z

Summary

This fixes the cache-noise failure mode behind #358.

Current main treats a single cold observation as authoritative unless a very short hot-cache hysteresis window happens to save the session. That is too aggressive for provider failover and OpenRouter routing noise: a single cacheRead=0 can drop a previously hot session into cold-cache catch-up and trigger destructive compaction that was not justified by real memory pressure.

This PR makes cache-aware compaction state sticky instead of single-sample authoritative.

What changed

add cacheAwareCompaction.coldCacheObservationThreshold (default 3)
persist a consecutive_cold_observations counter in compaction telemetry
keep explicit cache breaks authoritative immediately
keep prior hot sessions effectively hot through one-off cold and unknown blips until the cold-observation threshold is reached
preserve existing budget-pressure behavior once a session is truly under pressure

Why this helps

The conservative failure mode for cache-aware compaction should be “preserve context until we are confident the cache is really cold,” not “compact aggressively because one read looked cold.”

This is especially important on OpenRouter, where identical prefixes can still produce transient cold reads because requests land on different backend instances.

Tests

pnpm exec vitest run test/config.test.ts test/engine.test.ts test/session-operation-queues.test.ts test/circuit-breaker.test.ts test/expansion.test.ts

Adds focused coverage for:

single cold reading after a prior hot streak stays non-authoritative
repeated cold readings eventually become authoritative
unknown/no-usage observations do not collapse hot-cache protection
explicit breaks still force cold behavior immediately

Closes #358.

Copilot

Pull request overview

This PR hardens cache-aware incremental compaction against transient prompt-cache “cold” readings (e.g., OpenRouter routing noise) by making cache state decisions sticky and requiring multiple cold observations before treating cache as truly cold.

Changes:

Adds cacheAwareCompaction.coldCacheObservationThreshold config (default 3) and env support (LCM_COLD_CACHE_OBSERVATION_THRESHOLD).
Persists a consecutiveColdObservations counter in compaction telemetry (DB + store + migration) and incorporates it into cache-state resolution.
Extends docs/plugin schema and adds/updates targeted tests for the new sticky behavior.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`src/engine.ts`	Implements sticky cache-state resolution and updates telemetry with a cold-observation counter.
`src/store/compaction-telemetry-store.ts`	Extends telemetry record + upsert/select to persist `consecutive_cold_observations`.
`src/db/migration.ts`	Adds DB column migration and updates initial schema for the new telemetry field.
`src/db/config.ts`	Adds config type + resolution for `coldCacheObservationThreshold` and env var parsing.
`docs/configuration.md`	Documents the new cache-aware compaction parameter and env var.
`openclaw.plugin.json`	Exposes the new config option in UI hints and JSON schema.
`skills/lossless-claw/references/config.md`	Adds reference documentation for the new cache threshold setting.
`test/config.test.ts`	Updates config resolution/schema expectations to include the new parameter.
`test/engine.test.ts`	Adds focused tests validating sticky hot behavior and cold-threshold behavior.
`test/expansion.test.ts`	Updates shared test config to include the new cache-aware parameter.
`test/circuit-breaker.test.ts`	Updates test config to include the new cache-aware parameter.
`test/session-operation-queues.test.ts`	Updates test config to include the new cache-aware parameter.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-10T13:47:34Z

+        : snapshot?.cacheState === "hot"
+          ? 0
+          : snapshot?.cacheState === "cold"
+            ? (existing?.consecutiveColdObservations ?? 0) + 1
+            : existing?.consecutiveColdObservations ?? 0;


consecutiveColdObservations is described/treated as a "consecutive" streak, but this update logic does not reset the counter on snapshot.cacheState === "unknown" (it preserves the prior value). That means a pattern like cold → unknown → cold will still advance toward the threshold even though the cold observations were not consecutive. Consider either resetting the counter on unknown (to enforce true consecutiveness) or renaming/adjusting the config + docs to reflect that unknown observations do not break the streak.

Suggested change

: snapshot?.cacheState === "hot"

? 0

: snapshot?.cacheState === "cold"

? (existing?.consecutiveColdObservations ?? 0) + 1

: existing?.consecutiveColdObservations ?? 0;

: snapshot?.cacheState === "cold"

? (existing?.consecutiveColdObservations ?? 0) + 1

: 0;

Copilot · 2026-04-10T13:47:34Z

 | `cacheAwareCompaction.maxColdCacheCatchupPasses` | `integer` | `2` | `LCM_MAX_COLD_CACHE_CATCHUP_PASSES` | Maximum bounded catch-up passes allowed in one maintenance cycle when cache telemetry is cold. |
 | `cacheAwareCompaction.hotCachePressureFactor` | `number` | `4` | `LCM_HOT_CACHE_PRESSURE_FACTOR` | Multiplier applied to the hot-cache leaf trigger before raw-history pressure overrides cache preservation. |
 | `cacheAwareCompaction.hotCacheBudgetHeadroomRatio` | `number` | `0.2` | `LCM_HOT_CACHE_BUDGET_HEADROOM_RATIO` | Minimum fraction of the real token budget that must remain free before hot-cache incremental compaction is skipped entirely. |
+| `cacheAwareCompaction.coldCacheObservationThreshold` | `integer` | `3` | `LCM_COLD_CACHE_OBSERVATION_THRESHOLD` | Consecutive cold observations required before non-explicit cache misses are treated as truly cold. This dampens one-off routing noise and provider failover blips. |


This description says "Consecutive cold observations". In src/engine.ts, consecutiveColdObservations currently carries across unknown observations (unknown does not reset the streak), so the behavior is closer to "cold observations since last hot" than strict consecutiveness. Please clarify the wording here (and in the other user-facing descriptions) or adjust the implementation to match the documented semantics.

Suggested change

| `cacheAwareCompaction.coldCacheObservationThreshold` | `integer` | `3` | `LCM_COLD_CACHE_OBSERVATION_THRESHOLD` | Consecutive cold observations required before non-explicit cache misses are treated as truly cold. This dampens one-off routing noise and provider failover blips. |

| `cacheAwareCompaction.coldCacheObservationThreshold` | `integer` | `3` | `LCM_COLD_CACHE_OBSERVATION_THRESHOLD` | Cold observations required since the last hot observation before non-explicit cache misses are treated as truly cold. Unknown observations do not reset the count, which dampens one-off routing noise and provider failover blips. |

… branching Remove hot-cache-budget-headroom, hot-cache-defer, and cold-cache-catchup branches from evaluateIncrementalCompaction(). These cache-state-dependent decisions in afterTurn cause compaction cascades on load-balanced providers where random cold readings trigger aggressive multi-pass compaction. After this change, afterTurn compaction uses: - budget-trigger: safety valve, always fires (unchanged) - leaf-trigger: single pass, no condensed passes, no cache-state branching Closes Martian-Engineering#367. Related: Martian-Engineering#358, Martian-Engineering#362.

Add the missing coldCacheObservationThreshold entry to the bundled lossless-claw config reference so the summary defaults match the runtime defaults and detailed option docs already added on this branch. Regeneration-Prompt: | Address the accepted PR review finding for PR Martian-Engineering#362 about the bundled lossless-claw config reference being internally inconsistent. Keep the change additive and minimal: do not alter runtime behavior or broader docs, just update the `cacheAwareCompaction` "Good defaults" summary in `skills/lossless-claw/references/config.md` so it includes the new `coldCacheObservationThreshold` default of 3, matching the detailed reference section and existing runtime config defaults.

liu51115 · 2026-04-14T01:06:49Z

cc @100yenadmin @jalehman

I traced the interaction between this PR and #408 on current main, informed by earlier work on #385 (which #408 superseded). I don't think #362 is needed after #408, and it actively interferes with #408's design.

Background: the afterTurn timing problem

This was the central insight behind #385 (later adopted by #408): afterTurn() fires right after the API call, when the cache was just written. You can't know from afterTurn whether the cache will still be valid at the next assembly — that's a wall-clock question. afterTurn is structurally the wrong place to make compaction decisions.

#408 solved this correctly: afterTurn only records debt, actual consumption is gated by wall-clock TTL (shouldDelayPromptMutatingDeferredCompaction → isPromptCacheStillHot).

Why #362 is unnecessary under #408

#362's stated problem: "a single cacheRead=0 can trigger destructive compaction."

Under #408's model, this problem doesn't exist:

Cold blip arrives → afterTurn records debt (cheap, no harm)
Next assembly → TTL gate checks wall-clock time → cache still hot → delays consumption
Cache eventually expires → next assembly consumes the debt → compaction runs when it's actually free

The TTL gate is inherently blip-proof. It doesn't matter if you see 1 cold observation or 10 — the gate checks the clock, not observation counts. A noisy cold read never reaches execution because the cache hasn't actually expired.

So the "routing noise" failure mode #362 addresses no longer exists after #408. There is nothing left for observation-count smoothing to fix.

Why #362 is actively harmful with #408

#362 modifies resolveCacheAwareState() to smooth cold observations → keeps state "hot" after blips. This feeds evaluateIncrementalCompaction() in afterTurn, which can then return shouldCompact=false via hot-cache-defer or hot-cache-budget-headroom — preventing debt from being recorded at all.

Defer proactive compaction debt and surface LCM maintenance state #408 alone: cold blip → debt recorded → TTL gate handles timing → executes when cache is actually cold
Defer proactive compaction debt and surface LCM maintenance state #408 + fix: make cache-aware compaction resilient to routing noise #362: cold blip smoothed to "hot" → no debt recorded → TTL gate never gets to run → compaction doesn't happen

#362 pre-empts #408's time-aware mechanism with a time-unaware one. It suppresses the signal that #408 would have handled correctly.

The deeper issue: wrong primitive

Cache liveness is a function of wall-clock time, not observation count:

1 cold observation after TTL expiry = genuinely cold. Smoothing it away is wrong.
3 cold observations within TTL = noise. But Defer proactive compaction debt and surface LCM maintenance state #408's TTL gate already catches this.
0 cold observations after hours of silence = cold. Observation counting can't detect this.

#408 uses the right primitive (wall-clock TTL). #362 uses the wrong one (consecutive observation counts) and disrupts #408's pipeline in the process.

Suggestion

Consider reverting #362's resolveCacheAwareState() changes. #408's TTL-based consumption gating already solves the noise problem, and #362 interferes with it by preventing debt from being recorded.

…che smoothing Narrow fix for the interaction between #362 (cache-aware routing noise protection) and #408 (deferred proactive compaction). The issue: when incremental evaluation returns 'no compaction needed' due to hot-cache budget-headroom or hot-cache-defer (routing-noise suppression), deferred Anthropic sessions were incorrectly terminating their maintenance debt instead of continuing to execute after the prompt-cache TTL had expired. The fix: check whether deferred Anthropic sessions should override the cache-aware 'no compaction' decision once the TTL-safe hold has elapsed. If so, allow leaf compaction to execute despite the routing-noise hysteresis. Inline and non-deferred paths keep existing behavior unchanged. - Adds shouldForceDeferredAnthropicLeafCompaction() to gate override - Routes deferred Anthropic leaf compaction through executeLeafCompactionCore when TTL is stale - Adds regression test: 'assemble() still executes deferred Anthropic leaf debt after TTL expiry when cache smoothing remains effectively hot' - No changes to routing-noise protection or incremental evaluation logic Fixes: #408 Related: #362

…che smoothing (#434) * fix: execute deferred Anthropic leaf debt after TTL expiry despite cache smoothing Narrow fix for the interaction between #362 (cache-aware routing noise protection) and #408 (deferred proactive compaction). The issue: when incremental evaluation returns 'no compaction needed' due to hot-cache budget-headroom or hot-cache-defer (routing-noise suppression), deferred Anthropic sessions were incorrectly terminating their maintenance debt instead of continuing to execute after the prompt-cache TTL had expired. The fix: check whether deferred Anthropic sessions should override the cache-aware 'no compaction' decision once the TTL-safe hold has elapsed. If so, allow leaf compaction to execute despite the routing-noise hysteresis. Inline and non-deferred paths keep existing behavior unchanged. - Adds shouldForceDeferredAnthropicLeafCompaction() to gate override - Routes deferred Anthropic leaf compaction through executeLeafCompactionCore when TTL is stale - Adds regression test: 'assemble() still executes deferred Anthropic leaf debt after TTL expiry when cache smoothing remains effectively hot' - No changes to routing-noise protection or incremental evaluation logic Fixes: #408 Related: #362 * fix: use catch-up settings for stale TTL deferred Anthropic debt When deferred Anthropic leaf debt overrides hot-cache smoothing after the prompt-cache TTL expires, run the leaf compaction with the cold-cache catch-up envelope instead of the original hot-cache single-pass settings. This preserves the intended deferred recovery behavior and prevents the maintenance record from being cleared after a single underpowered pass.\n\nAdd a regression that proves the stale-TTL override enables catch-up execution rather than reusing the hot-cache defer envelope, and update the existing stale-TTL test to assert the new execution parameters.\n\nRegeneration-Prompt: |\n Address the review finding on PR #434 in lossless-claw. The stale-TTL deferred Anthropic compaction fix should not merely force one hot-cache-sized leaf compaction pass and then clear the maintenance debt. Keep the change narrow and additive inside the deferred-compaction path. When the Anthropic prompt-cache TTL has expired and deferred debt must override cache smoothing, execute with the same catch-up envelope that cold-cache recovery uses, especially the extra pass allowance and condensed-pass setting. Add regression coverage that would fail if the forced path still used maxPasses=1 or allowCondensedPasses=false, and update any stale expectations in the existing TTL-expiry test.

jalehman · 2026-04-14T16:18:34Z

Thanks for the analysis @liu51115 — addressed in #434.

Copilot AI review requested due to automatic review settings April 10, 2026 13:42

Copilot started reviewing on behalf of 100yenadmin April 10, 2026 13:42 View session

100yenadmin mentioned this pull request Apr 10, 2026

fix: move proactive threshold compaction off the reply path #363

Closed

Copilot AI reviewed Apr 10, 2026

View reviewed changes

This was referenced Apr 10, 2026

refactor: simplify evaluateIncrementalCompaction — remove cache-state branching #370

Closed

fix: move compaction from afterTurn to pre-assembly with cache-TTL check #385

Closed

jalehman self-assigned this Apr 11, 2026

Eva and others added 2 commits April 13, 2026 07:01

fix: make cache-aware compaction resilient to routing noise

c1d31e0

jalehman force-pushed the codex/cache-noise-guard branch from f8efd9a to 7b27367 Compare April 13, 2026 14:02

jalehman merged commit e8c13c5 into Martian-Engineering:main Apr 13, 2026
1 check passed

jalehman mentioned this pull request Apr 14, 2026

fix: execute deferred Anthropic leaf debt after TTL expiry despite cache smoothing #434

Merged

jalehman mentioned this pull request Apr 14, 2026

Cache-aware compaction telemetry leaks hot-state across provider/model switches #439

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: make cache-aware compaction resilient to routing noise#362

fix: make cache-aware compaction resilient to routing noise#362
jalehman merged 2 commits intoMartian-Engineering:mainfrom
electricsheephq:codex/cache-noise-guard

100yenadmin commented Apr 10, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 10, 2026

Uh oh!

Copilot AI Apr 10, 2026

Uh oh!

Uh oh!

liu51115 commented Apr 14, 2026

Uh oh!

jalehman commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	\| `cacheAwareCompaction.coldCacheObservationThreshold` \| `integer` \| `3` \| `LCM_COLD_CACHE_OBSERVATION_THRESHOLD` \| Consecutive cold observations required before non-explicit cache misses are treated as truly cold. This dampens one-off routing noise and provider failover blips. \|
	\| `cacheAwareCompaction.coldCacheObservationThreshold` \| `integer` \| `3` \| `LCM_COLD_CACHE_OBSERVATION_THRESHOLD` \| Cold observations required since the last hot observation before non-explicit cache misses are treated as truly cold. Unknown observations do not reset the count, which dampens one-off routing noise and provider failover blips. \|

Conversation

100yenadmin commented Apr 10, 2026

Summary

What changed

Why this helps

Tests

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

liu51115 commented Apr 14, 2026

Background: the afterTurn timing problem

Why #362 is unnecessary under #408

Why #362 is actively harmful with #408

The deeper issue: wrong primitive

Suggestion

Uh oh!

jalehman commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants