feat: cache-aware leaf compaction guards with budget-pressure override by 100yenadmin · Pull Request #306 · Martian-Engineering/lossless-claw

100yenadmin · 2026-04-07T06:12:03Z

Summary

Adds cache-aware skip guards to evaluateLeafTrigger() that prevent unnecessary prompt-cache invalidation during leaf compaction. On models with prompt caching, a compaction pass that removes only 3% of tokens costs more in cache-miss penalties than it saves in token reduction.

Part 1 of 3 split from #289. Merge order: #306 → #307 → #308.

The Problem: Compaction That Costs More Than It Saves

How compaction invalidates the prompt cache

Every leaf compaction pass:

Replaces raw messages (positions 0–9) with a single summary (position 0)
Resequences all remaining ordinals to stay contiguous
The assembled prompt structure changes → the API prompt cache prefix no longer matches

The cost of a single unnecessary cache miss

Model	Input $/MTok	Cached $/MTok	Cache miss penalty	Miss on 150K cached prefix
Opus 4.6	$5.00	$0.50	$4.50/MTok	$0.68 per miss
Sonnet 4.6	$3.00	$0.30	$2.70/MTok	$0.41 per miss
Haiku 4.5	$1.00	$0.10	$0.90/MTok	$0.14 per miss

Cached input is always 1/10 of the base input price across all Anthropic models. Cache TTL is 5 minutes (refreshed on each hit).

Break-even formula

A compaction saving X tokens/turn that invalidates Y cached tokens takes:

turns_to_payback = (Y × miss_penalty) / (X × input_price)

For typical values (150K cached, 10K saved): ~13.5 turns to break even regardless of model tier. If the reduction is only 3% of context (~5K tokens), break-even extends to 27+ turns — most sessions never recoup the cost.

The current trigger fires blindly

The existing evaluateLeafTrigger() fires whenever rawTokensOutsideTail >= leafChunkTokens, regardless of:

How much compaction would actually remove (could be negligible)
Whether the context has plenty of budget headroom remaining
Whether the cache invalidation cost exceeds the token savings

The Solution: Three Guard Checks

Guard evaluation flow

evaluateLeafTrigger(conversationId, tokenBudget?, liveContextTokens?, precomputed?):

  ① Raw threshold gate (existing)
     if rawTokensOutsideTail < leafChunkTokens → SKIP (no change)

  ② Budget headroom gate (NEW)
     ceiling = headroomFactor × contextThreshold × tokenBudget
     if assembledTokens < ceiling → SKIP "budget-headroom"
     (plenty of room — no reason to compact yet)

  ③ Cache-aware reduction gate (NEW)
     estimatedReduction = min(rawTokens, chunkSize) - targetTokens
     if estimatedReduction < reductionThreshold × assembledTokens
        AND no budget pressure → SKIP "cache-aware"
     (reduction too small to justify cache invalidation)

  ④ Budget pressure override (NEW)
     if headroom enabled AND assembledTokens >= ceiling → COMPACT
     (context is critically full — force compaction regardless of cache cost)

Scenario A: Headroom skip (saves cache, defers compaction)

Setup: Orchestrator with 200K token budget, 40K assembled tokens, 18K raw tokens outside tail.

Before (no guards):
  18K raw >= 15K threshold → COMPACT
  Cost: 1 Haiku call ($0.03) + cache miss on 40K prefix ($0.18)
  Saved: ~12K tokens → saves $0.04/turn on Sonnet input
  Net: -$0.17 this turn, breaks even after 5+ more turns

After (with guards):
  ceiling = 0.8 × 0.75 × 200K = 120K
  40K assembled < 120K ceiling → SKIP "budget-headroom"
  Cost: $0.00 — cache preserved
  Compaction deferred until context actually needs it

Scenario B: Cache-aware skip (tiny reduction not worth cache bust)

Setup: Large context with 500K summary + 24K raw messages, no token budget provided.

Before:
  24K raw >= 20K threshold → COMPACT
  estimatedReduction = min(24K, 20K) - 2.4K = 17.6K
  But 17.6K is only 3.2% of 548K total assembled
  Cache miss on 500K prefix with Opus: $2.25
  Savings: 17.6K × $5/MTok = $0.088/turn
  Payback: 26 turns — session probably ends first

After:
  17.6K < 5% of 548K (27.4K threshold) → SKIP "cache-aware"
  Cache preserved, $2.25 miss avoided

Scenario C: Budget pressure override (prevents starvation)

Setup: Same large context but now 750K token budget provided.

ceiling = 0.8 × 0.75 × 750K = 450K
548K assembled > 450K ceiling → BUDGET PRESSURE
Cache-aware skip bypassed → COMPACT unconditionally
(Context is genuinely full — must compress to stay within budget)

Orchestrator vs sub-agent scenario

The same engine instance handles different budgets:

Orchestrator (200K budget): 40K assembled < 120K ceiling → SKIP
Sub-agent (16K budget): 40K assembled > 9.6K ceiling → COMPACT

Same context, different budget pressure. The guards adapt automatically.

Config Fields

Field	Default	Env Var	Effect
`leafSkipReductionThreshold`	`0.05`	`LCM_LEAF_SKIP_REDUCTION_THRESHOLD`	Min per-pass reduction as fraction of total assembled tokens. Set to `0` to disable cache-aware skip.
`leafBudgetHeadroomFactor`	`0.8`	`LCM_LEAF_BUDGET_HEADROOM_FACTOR`	Skip when assembled < factor × contextThreshold × tokenBudget. Set to `0` to disable headroom check and budget pressure detection.

Escape hatches: Both set to 0 = fully original behavior. No guards, no skips.

Changes by File

File	Lines	Change
`src/compaction.ts`	+114	`LeafTriggerResult` type with structured diagnostics. Rewrite `evaluateLeafTrigger()` with 3 guard checks. Normalize `liveContextTokens` with `Number.isFinite` guard. Add `currentTokenCount` to `compactLeaf`/`compactFullSweep`/`compact` signatures.
`src/db/config.ts`	+18	`leafSkipReductionThreshold` and `leafBudgetHeadroomFactor` fields with `clamp01()` validation. Env var + plugin config resolution.
`openclaw.plugin.json`	+20	Schema entries (type, range 0–1, descriptions) and UI hints for both fields.
`test/lcm-integration.test.ts`	+331	18 tests: basic threshold, headroom skip, cache-aware skip, budget pressure override, edge cases (empty conversation, negative reduction, custom thresholds, factor=0/1.5 clamping), orchestration scenario, per-pass chunk estimate, live token awareness.
`test/config.test.ts`	+42	4 tests: defaults (0.05/0.8), plugin config, env var override, manifest schema presence.
`.changeset/`	+5	Minor version bump.

Test Plan

121 tests passing (82 integration + 39 config)
All guard paths tested: skip on headroom, skip on cache-aware, compact on budget pressure
Edge cases: empty conversation, negative reduction, factor=0 disables without false pressure, factor>1 clamped
Orchestrator scenario: same context skips for large budget, compacts for small budget
NaN/Infinity guard on liveContextTokens input
Config tests: defaults, plugin config override, env var override, schema validation

On models with prompt caching (Claude, GPT-4), compaction that removes 3% of tokens costs more in cache-miss penalties than it saves. The current trigger fires whenever assembledTokens > threshold × budget regardless of how much compaction would actually remove. Add three guard checks to evaluateLeafTrigger(): 1. Budget headroom gate — skip when assembled < 80% of budget ceiling (leafBudgetHeadroomFactor, default 0.8, set 0 to disable) 2. Cache-aware reduction gate — skip when estimated reduction < 5% of total assembled tokens (leafSkipReductionThreshold, default 0.05) 3. Budget pressure override — force compaction when context reaches or exceeds the ceiling, preventing starvation in large contexts Also passes currentTokenCount through compactLeaf/compactFullSweep so headroom decisions use live observed counts when stored counts are stale. Split from Martian-Engineering#289 for reviewability.

Copilot

Pull request overview

Adds cache-aware skip guards to leaf compaction triggering to reduce prompt-cache invalidation and unnecessary compaction work, with new configurable thresholds and expanded test coverage.

Changes:

Reworked evaluateLeafTrigger() to add budget-headroom gating, cache-aware reduction gating, and a budget-pressure override (with structured diagnostics).
Added config resolution + manifest schema/UI hints for leafSkipReductionThreshold and leafBudgetHeadroomFactor.
Added integration/config tests covering the new guard logic and configuration defaults/overrides.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`src/compaction.ts`	Implements the new leaf-trigger decision logic and plumbs `currentTokenCount` into leaf/full-sweep compaction paths.
`src/db/config.ts`	Introduces new resolved config fields (with clamping) for leaf-compaction guards.
`openclaw.plugin.json`	Exposes the new config fields via schema + UI hints.
`test/lcm-integration.test.ts`	Adds integration coverage for the skip-guard decision tree and stale-token scenarios.
`test/config.test.ts`	Adds tests for defaults, plugin config, env var overrides, and manifest schema presence.
`.changeset/cache-aware-compaction-guards.md`	Declares a release bump entry for the feature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Pass tokenBudget and liveContextTokens from the engine's afterTurn and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep so cache-aware headroom decisions use fresh observed counts instead of potentially stale stored values. - evaluateLeafTrigger now receives tokenBudget + liveContextTokens from engine call sites - compactLeaf/compactFullSweep receive currentTokenCount (observedTokens) - afterTurn logs trigger context (assembled, pressure) on compaction - afterTurn logs skip reason when guards prevent compaction - CompactionConfig passes leafSkipReductionThreshold and leafBudgetHeadroomFactor from LcmConfig Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.

New comprehensive guide for operators tuning LCM compaction behavior: - docs/compaction-tuning.md (356 lines): TLDR, per-tier model presets (Opus, Sonnet, Haiku, GPT-4o-mini, Gemini Flash), cache economics break-even formula, debugging checklist, orchestration scenarios - docs/architecture.md: cache-aware guards section with Mermaid flowchart - docs/configuration.md: new settings reference, model comparison table - skills references: config field updates Split from Martian-Engineering#289 (Part 3 of 3). Independent of Martian-Engineering#306 and Martian-Engineering#307.

Restores two load-bearing inline comments from the original PR Martian-Engineering#289 that were lost during the split: - 3-line headroomEnabled rationale: explains why the guard uses three conditions and that factor=0 disables without creating false pressure - 8-line budget-pressure explanation: documents when pressure is true, when the cache-aware skip can fire, and the starvation prevention guarantee

Pass tokenBudget and liveContextTokens from the engine's afterTurn and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep so cache-aware headroom decisions use fresh observed counts instead of potentially stale stored values. - evaluateLeafTrigger now receives tokenBudget + liveContextTokens from engine call sites - compactLeaf/compactFullSweep receive currentTokenCount (observedTokens) - afterTurn logs trigger context (assembled, pressure) on compaction - afterTurn logs skip reason when guards prevent compaction - CompactionConfig passes leafSkipReductionThreshold and leafBudgetHeadroomFactor from LcmConfig Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.

- Fix changeset file to use standard frontmatter delimiters - Normalize liveContextTokens with Number.isFinite/Math.floor guard to prevent NaN/Infinity from corrupting headroom calculations (mirrors the pattern used in evaluate())

Pass tokenBudget and liveContextTokens from the engine's afterTurn and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep so cache-aware headroom decisions use fresh observed counts instead of potentially stale stored values. - evaluateLeafTrigger now receives tokenBudget + liveContextTokens from engine call sites - compactLeaf/compactFullSweep receive currentTokenCount (observedTokens) - afterTurn logs trigger context (assembled, pressure) on compaction - afterTurn logs skip reason when guards prevent compaction - CompactionConfig passes leafSkipReductionThreshold and leafBudgetHeadroomFactor from LcmConfig Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

src/compaction.ts:605

currentTokenCount is passed into evaluateLeafTrigger, but the early-return guard still uses only tokensBefore (stored DB count). If stored token counts are stale low while the live assembled prompt is over threshold, compactLeaf can incorrectly skip compaction (because tokensBefore <= threshold stays true). Consider normalizing input.currentTokenCount and using an effectiveTokensBefore = max(tokensBefore, currentTokenCount) for this guard (and any threshold comparisons).

    const tokensBefore = await this.summaryStore.getContextTokenCount(conversationId);
    const threshold = Math.floor(this.config.contextThreshold * tokenBudget);
    const leafTrigger = await this.evaluateLeafTrigger(
      conversationId,
      tokenBudget,
      input.currentTokenCount,
      tokensBefore,
    );

    if (!force && tokensBefore <= threshold && !leafTrigger.shouldCompact) {
      return {
        actionTaken: false,

src/compaction.ts:742

Same issue in compactFullSweep: currentTokenCount influences evaluateLeafTrigger, but the sweep can still return early (and all sweep/stop conditions start from tokensBefore) even when the live context is over the compaction threshold. Using a normalized effectiveTokensBefore = max(tokensBefore, currentTokenCount) for the early-return guard and initializing runningTokens from it would make full sweeps behave correctly when stored counts lag behind reality.

    const tokensBefore = await this.summaryStore.getContextTokenCount(conversationId);
    const threshold = Math.floor(this.config.contextThreshold * tokenBudget);
    const leafTrigger = await this.evaluateLeafTrigger(
      conversationId,
      tokenBudget,
      input.currentTokenCount,
      tokensBefore,
    );

    if (!force && tokensBefore <= threshold && !leafTrigger.shouldCompact) {
      return {
        actionTaken: false,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ineering#308)

Pass tokenBudget and liveContextTokens from the engine's afterTurn and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep so cache-aware headroom decisions use fresh observed counts instead of potentially stale stored values. - evaluateLeafTrigger now receives tokenBudget + liveContextTokens from engine call sites - compactLeaf/compactFullSweep receive currentTokenCount (observedTokens) - afterTurn logs trigger context (assembled, pressure) on compaction - afterTurn logs skip reason when guards prevent compaction - CompactionConfig passes leafSkipReductionThreshold and leafBudgetHeadroomFactor from LcmConfig Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/compaction.ts:742

Same issue as in compactLeaf: the early-return guard uses tokensBefore (stored token count) rather than incorporating the new currentTokenCount live estimate. If stored tokens are stale low but the live context is actually above the threshold, compactFullSweep can return actionTaken=false and skip both leaf and condensed passes even under real budget pressure. Use an effective token count (max of stored + normalized live) for this <= threshold comparison.

    const tokensBefore = await this.summaryStore.getContextTokenCount(conversationId);
    const threshold = Math.floor(this.config.contextThreshold * tokenBudget);
    const leafTrigger = await this.evaluateLeafTrigger(
      conversationId,
      tokenBudget,
      input.currentTokenCount,
      tokensBefore,
    );

    if (!force && tokensBefore <= threshold && !leafTrigger.shouldCompact) {
      return {
        actionTaken: false,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Users have no visibility into whether LCM compaction is saving or wasting money. This adds persistent event tracking, cost estimation, and efficiency reporting. Changes: - New compaction_events table (SQLite migration) records each compaction pass with token counts and model name - Static pricing table (pricing.ts) for cost estimation with fuzzy model prefix matching (11 models covered) - /lossless status gains an efficiency section showing passes, tokens saved, compaction cost, net efficiency, and recommendations - New /lossless efficiency subcommand with per-model breakdown and actionable recommendations (e.g., "Switch from Opus to Haiku") - persistCompactionEvent() now inserts DB row alongside console log - Best-effort recording — doesn't fail compaction if table is missing Closes Martian-Engineering#309. Depends on Martian-Engineering#306 and Martian-Engineering#307.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… shrink test strings - Remove duplicate summaryModel and summaryProvider properties in LcmConfig type (lines 35-38 were copies of lines 28-30 with wrong comments) - Replace 12KB test string literals with short descriptive text since tokenCountFn overrides the count

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Pass tokenBudget and liveContextTokens from the engine's afterTurn and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep so cache-aware headroom decisions use fresh observed counts instead of potentially stale stored values. - evaluateLeafTrigger now receives tokenBudget + liveContextTokens from engine call sites - compactLeaf/compactFullSweep receive currentTokenCount (observedTokens) - afterTurn logs trigger context (assembled, pressure) on compaction - afterTurn logs skip reason when guards prevent compaction - CompactionConfig passes leafSkipReductionThreshold and leafBudgetHeadroomFactor from LcmConfig Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.

Users have no visibility into whether LCM compaction is saving or wasting money. This adds persistent event tracking, cost estimation, and efficiency reporting. Changes: - New compaction_events table (SQLite migration) records each compaction pass with token counts and model name - Static pricing table (pricing.ts) for cost estimation with fuzzy model prefix matching (11 models covered) - /lossless status gains an efficiency section showing passes, tokens saved, compaction cost, net efficiency, and recommendations - New /lossless efficiency subcommand with per-model breakdown and actionable recommendations (e.g., "Switch from Opus to Haiku") - persistCompactionEvent() now inserts DB row alongside console log - Best-effort recording — doesn't fail compaction if table is missing Closes Martian-Engineering#309. Depends on Martian-Engineering#306 and Martian-Engineering#307.

jalehman · 2026-04-07T19:40:13Z

@100yenadmin This work (actually the previous work on #289) inspired me to take this a step further and make compaction directly cache-aware. LMK what you think.

100yenadmin · 2026-04-07T20:22:23Z

I'm all for it! Let me know if you want to build on or edit any of these (gave access to do so) @jalehman

This work is just part 1 of a 3 split from #289. Merge order: #306 → #307 → #308.

These 3 were same as original but ended up growing because of code reviewer test and feedback (mostly nits). The feature works well on our internal Hippo LCM so figured we'd share it OS.

Let me know how else I can help 🖤

jalehman · 2026-04-08T22:37:06Z

Closing this in favor of the cache-aware compaction paths I recently introduced in #329 and #315. I think there are some things worth incorporating in the PRs built off of this, will attend to those soon.

Copilot AI review requested due to automatic review settings April 7, 2026 06:12

Copilot started reviewing on behalf of 100yenadmin April 7, 2026 06:12 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread src/db/config.ts

Comment thread src/compaction.ts Outdated

Comment thread .changeset/cache-aware-compaction-guards.md Outdated

100yenadmin mentioned this pull request Apr 7, 2026

feat: wire live context token counts through engine to compaction guards #307

Closed

5 tasks

100yenadmin mentioned this pull request Apr 7, 2026

feat: cache-aware compaction guards with budget-pressure priority and per-tier tuning #289

Closed

100yenadmin mentioned this pull request Apr 7, 2026

docs: compaction tuning guide with per-tier presets and cache economics #308

Open

100yenadmin requested a review from Copilot April 7, 2026 07:02

Copilot started reviewing on behalf of 100yenadmin April 7, 2026 07:03 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread .changeset/cache-aware-compaction-guards.md Outdated

fix: remove 'per-tier tuning' from changeset (docs are in Martian-Eng…

120504d

…ineering#308)

100yenadmin requested a review from Copilot April 7, 2026 07:30

Copilot started reviewing on behalf of 100yenadmin April 7, 2026 07:31 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread src/compaction.ts

100yenadmin mentioned this pull request Apr 7, 2026

feat: compaction efficiency tracker in /lossless command #309

Open

100yenadmin requested a review from Copilot April 7, 2026 08:31

Copilot started reviewing on behalf of 100yenadmin April 7, 2026 08:32 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread src/compaction.ts

100yenadmin mentioned this pull request Apr 7, 2026

feat: compaction efficiency tracker in /lossless command #310

Closed

6 tasks

100yenadmin requested a review from Copilot April 7, 2026 09:32

Copilot started reviewing on behalf of 100yenadmin April 7, 2026 09:33 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread src/db/config.ts

Comment thread src/compaction.ts

Comment thread test/lcm-integration.test.ts

Comment thread test/lcm-integration.test.ts

100yenadmin requested a review from Copilot April 7, 2026 10:31

Copilot started reviewing on behalf of 100yenadmin April 7, 2026 10:31 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

liu51115 mentioned this pull request Apr 7, 2026

Compaction blocks reply delivery — should run async after agent_end #313

Open

100yenadmin mentioned this pull request Apr 7, 2026

feat: track actual compaction model through summarizer fallback chain #314

Closed

4 tasks

jalehman closed this Apr 8, 2026

liu51115 mentioned this pull request Apr 10, 2026

cache-aware compaction: timing inversion — decisions based on last-call status instead of cache expiry #367

Open

Conversation

100yenadmin commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The Problem: Compaction That Costs More Than It Saves

How compaction invalidates the prompt cache

The cost of a single unnecessary cache miss

Break-even formula

The current trigger fires blindly

The Solution: Three Guard Checks

Guard evaluation flow

Scenario A: Headroom skip (saves cache, defers compaction)

Scenario B: Cache-aware skip (tiny reduction not worth cache bust)

Scenario C: Budget pressure override (prevents starvation)

Orchestrator vs sub-agent scenario

Config Fields

Changes by File

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

jalehman commented Apr 7, 2026

Uh oh!

100yenadmin commented Apr 7, 2026

Uh oh!

jalehman commented Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

100yenadmin commented Apr 7, 2026 •

edited

Loading