Skip to content

feat: cache-aware leaf compaction guards with budget-pressure override#306

Closed
100yenadmin wants to merge 5 commits intoMartian-Engineering:mainfrom
electricsheephq:feat/cache-aware-compaction-guards
Closed

feat: cache-aware leaf compaction guards with budget-pressure override#306
100yenadmin wants to merge 5 commits intoMartian-Engineering:mainfrom
electricsheephq:feat/cache-aware-compaction-guards

Conversation

@100yenadmin
Copy link
Copy Markdown
Contributor

@100yenadmin 100yenadmin commented Apr 7, 2026

Summary

Adds cache-aware skip guards to evaluateLeafTrigger() that prevent unnecessary prompt-cache invalidation during leaf compaction. On models with prompt caching, a compaction pass that removes only 3% of tokens costs more in cache-miss penalties than it saves in token reduction.

Part 1 of 3 split from #289. Merge order: #306#307#308.


The Problem: Compaction That Costs More Than It Saves

How compaction invalidates the prompt cache

Every leaf compaction pass:

  1. Replaces raw messages (positions 0–9) with a single summary (position 0)
  2. Resequences all remaining ordinals to stay contiguous
  3. The assembled prompt structure changes → the API prompt cache prefix no longer matches

The cost of a single unnecessary cache miss

Model Input $/MTok Cached $/MTok Cache miss penalty Miss on 150K cached prefix
Opus 4.6 $5.00 $0.50 $4.50/MTok $0.68 per miss
Sonnet 4.6 $3.00 $0.30 $2.70/MTok $0.41 per miss
Haiku 4.5 $1.00 $0.10 $0.90/MTok $0.14 per miss

Cached input is always 1/10 of the base input price across all Anthropic models. Cache TTL is 5 minutes (refreshed on each hit).

Break-even formula

A compaction saving X tokens/turn that invalidates Y cached tokens takes:

turns_to_payback = (Y × miss_penalty) / (X × input_price)

For typical values (150K cached, 10K saved): ~13.5 turns to break even regardless of model tier. If the reduction is only 3% of context (~5K tokens), break-even extends to 27+ turns — most sessions never recoup the cost.

The current trigger fires blindly

The existing evaluateLeafTrigger() fires whenever rawTokensOutsideTail >= leafChunkTokens, regardless of:

  • How much compaction would actually remove (could be negligible)
  • Whether the context has plenty of budget headroom remaining
  • Whether the cache invalidation cost exceeds the token savings

The Solution: Three Guard Checks

Guard evaluation flow

evaluateLeafTrigger(conversationId, tokenBudget?, liveContextTokens?, precomputed?):

  ① Raw threshold gate (existing)
     if rawTokensOutsideTail < leafChunkTokens → SKIP (no change)

  ② Budget headroom gate (NEW)
     ceiling = headroomFactor × contextThreshold × tokenBudget
     if assembledTokens < ceiling → SKIP "budget-headroom"
     (plenty of room — no reason to compact yet)

  ③ Cache-aware reduction gate (NEW)
     estimatedReduction = min(rawTokens, chunkSize) - targetTokens
     if estimatedReduction < reductionThreshold × assembledTokens
        AND no budget pressure → SKIP "cache-aware"
     (reduction too small to justify cache invalidation)

  ④ Budget pressure override (NEW)
     if headroom enabled AND assembledTokens >= ceiling → COMPACT
     (context is critically full — force compaction regardless of cache cost)

Scenario A: Headroom skip (saves cache, defers compaction)

Setup: Orchestrator with 200K token budget, 40K assembled tokens, 18K raw tokens outside tail.

Before (no guards):
  18K raw >= 15K threshold → COMPACT
  Cost: 1 Haiku call ($0.03) + cache miss on 40K prefix ($0.18)
  Saved: ~12K tokens → saves $0.04/turn on Sonnet input
  Net: -$0.17 this turn, breaks even after 5+ more turns

After (with guards):
  ceiling = 0.8 × 0.75 × 200K = 120K
  40K assembled < 120K ceiling → SKIP "budget-headroom"
  Cost: $0.00 — cache preserved
  Compaction deferred until context actually needs it

Scenario B: Cache-aware skip (tiny reduction not worth cache bust)

Setup: Large context with 500K summary + 24K raw messages, no token budget provided.

Before:
  24K raw >= 20K threshold → COMPACT
  estimatedReduction = min(24K, 20K) - 2.4K = 17.6K
  But 17.6K is only 3.2% of 548K total assembled
  Cache miss on 500K prefix with Opus: $2.25
  Savings: 17.6K × $5/MTok = $0.088/turn
  Payback: 26 turns — session probably ends first

After:
  17.6K < 5% of 548K (27.4K threshold) → SKIP "cache-aware"
  Cache preserved, $2.25 miss avoided

Scenario C: Budget pressure override (prevents starvation)

Setup: Same large context but now 750K token budget provided.

ceiling = 0.8 × 0.75 × 750K = 450K
548K assembled > 450K ceiling → BUDGET PRESSURE
Cache-aware skip bypassed → COMPACT unconditionally
(Context is genuinely full — must compress to stay within budget)

Orchestrator vs sub-agent scenario

The same engine instance handles different budgets:

  • Orchestrator (200K budget): 40K assembled < 120K ceiling → SKIP
  • Sub-agent (16K budget): 40K assembled > 9.6K ceiling → COMPACT

Same context, different budget pressure. The guards adapt automatically.


Config Fields

Field Default Env Var Effect
leafSkipReductionThreshold 0.05 LCM_LEAF_SKIP_REDUCTION_THRESHOLD Min per-pass reduction as fraction of total assembled tokens. Set to 0 to disable cache-aware skip.
leafBudgetHeadroomFactor 0.8 LCM_LEAF_BUDGET_HEADROOM_FACTOR Skip when assembled < factor × contextThreshold × tokenBudget. Set to 0 to disable headroom check and budget pressure detection.

Escape hatches: Both set to 0 = fully original behavior. No guards, no skips.


Changes by File

File Lines Change
src/compaction.ts +114 LeafTriggerResult type with structured diagnostics. Rewrite evaluateLeafTrigger() with 3 guard checks. Normalize liveContextTokens with Number.isFinite guard. Add currentTokenCount to compactLeaf/compactFullSweep/compact signatures.
src/db/config.ts +18 leafSkipReductionThreshold and leafBudgetHeadroomFactor fields with clamp01() validation. Env var + plugin config resolution.
openclaw.plugin.json +20 Schema entries (type, range 0–1, descriptions) and UI hints for both fields.
test/lcm-integration.test.ts +331 18 tests: basic threshold, headroom skip, cache-aware skip, budget pressure override, edge cases (empty conversation, negative reduction, custom thresholds, factor=0/1.5 clamping), orchestration scenario, per-pass chunk estimate, live token awareness.
test/config.test.ts +42 4 tests: defaults (0.05/0.8), plugin config, env var override, manifest schema presence.
.changeset/ +5 Minor version bump.

Test Plan

  • 121 tests passing (82 integration + 39 config)
  • All guard paths tested: skip on headroom, skip on cache-aware, compact on budget pressure
  • Edge cases: empty conversation, negative reduction, factor=0 disables without false pressure, factor>1 clamped
  • Orchestrator scenario: same context skips for large budget, compacts for small budget
  • NaN/Infinity guard on liveContextTokens input
  • Config tests: defaults, plugin config override, env var override, schema validation

On models with prompt caching (Claude, GPT-4), compaction that removes
3% of tokens costs more in cache-miss penalties than it saves. The
current trigger fires whenever assembledTokens > threshold × budget
regardless of how much compaction would actually remove.

Add three guard checks to evaluateLeafTrigger():

1. Budget headroom gate — skip when assembled < 80% of budget ceiling
   (leafBudgetHeadroomFactor, default 0.8, set 0 to disable)
2. Cache-aware reduction gate — skip when estimated reduction < 5% of
   total assembled tokens (leafSkipReductionThreshold, default 0.05)
3. Budget pressure override — force compaction when context reaches or
   exceeds the ceiling, preventing starvation in large contexts

Also passes currentTokenCount through compactLeaf/compactFullSweep so
headroom decisions use live observed counts when stored counts are stale.

Split from Martian-Engineering#289 for reviewability.
Copilot AI review requested due to automatic review settings April 7, 2026 06:12
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds cache-aware skip guards to leaf compaction triggering to reduce prompt-cache invalidation and unnecessary compaction work, with new configurable thresholds and expanded test coverage.

Changes:

  • Reworked evaluateLeafTrigger() to add budget-headroom gating, cache-aware reduction gating, and a budget-pressure override (with structured diagnostics).
  • Added config resolution + manifest schema/UI hints for leafSkipReductionThreshold and leafBudgetHeadroomFactor.
  • Added integration/config tests covering the new guard logic and configuration defaults/overrides.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/compaction.ts Implements the new leaf-trigger decision logic and plumbs currentTokenCount into leaf/full-sweep compaction paths.
src/db/config.ts Introduces new resolved config fields (with clamping) for leaf-compaction guards.
openclaw.plugin.json Exposes the new config fields via schema + UI hints.
test/lcm-integration.test.ts Adds integration coverage for the skip-guard decision tree and stale-token scenarios.
test/config.test.ts Adds tests for defaults, plugin config, env var overrides, and manifest schema presence.
.changeset/cache-aware-compaction-guards.md Declares a release bump entry for the feature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/db/config.ts
Comment thread src/compaction.ts Outdated
Comment thread .changeset/cache-aware-compaction-guards.md Outdated
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request Apr 7, 2026
Pass tokenBudget and liveContextTokens from the engine's afterTurn
and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep
so cache-aware headroom decisions use fresh observed counts instead of
potentially stale stored values.

- evaluateLeafTrigger now receives tokenBudget + liveContextTokens
  from engine call sites
- compactLeaf/compactFullSweep receive currentTokenCount (observedTokens)
- afterTurn logs trigger context (assembled, pressure) on compaction
- afterTurn logs skip reason when guards prevent compaction
- CompactionConfig passes leafSkipReductionThreshold and
  leafBudgetHeadroomFactor from LcmConfig

Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request Apr 7, 2026
New comprehensive guide for operators tuning LCM compaction behavior:

- docs/compaction-tuning.md (356 lines): TLDR, per-tier model presets
  (Opus, Sonnet, Haiku, GPT-4o-mini, Gemini Flash), cache economics
  break-even formula, debugging checklist, orchestration scenarios
- docs/architecture.md: cache-aware guards section with Mermaid flowchart
- docs/configuration.md: new settings reference, model comparison table
- skills references: config field updates

Split from Martian-Engineering#289 (Part 3 of 3). Independent of Martian-Engineering#306 and Martian-Engineering#307.
Restores two load-bearing inline comments from the original PR Martian-Engineering#289
that were lost during the split:

- 3-line headroomEnabled rationale: explains why the guard uses three
  conditions and that factor=0 disables without creating false pressure
- 8-line budget-pressure explanation: documents when pressure is true,
  when the cache-aware skip can fire, and the starvation prevention
  guarantee
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request Apr 7, 2026
Pass tokenBudget and liveContextTokens from the engine's afterTurn
and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep
so cache-aware headroom decisions use fresh observed counts instead of
potentially stale stored values.

- evaluateLeafTrigger now receives tokenBudget + liveContextTokens
  from engine call sites
- compactLeaf/compactFullSweep receive currentTokenCount (observedTokens)
- afterTurn logs trigger context (assembled, pressure) on compaction
- afterTurn logs skip reason when guards prevent compaction
- CompactionConfig passes leafSkipReductionThreshold and
  leafBudgetHeadroomFactor from LcmConfig

Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
- Fix changeset file to use standard frontmatter delimiters
- Normalize liveContextTokens with Number.isFinite/Math.floor guard
  to prevent NaN/Infinity from corrupting headroom calculations
  (mirrors the pattern used in evaluate())
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request Apr 7, 2026
Pass tokenBudget and liveContextTokens from the engine's afterTurn
and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep
so cache-aware headroom decisions use fresh observed counts instead of
potentially stale stored values.

- evaluateLeafTrigger now receives tokenBudget + liveContextTokens
  from engine call sites
- compactLeaf/compactFullSweep receive currentTokenCount (observedTokens)
- afterTurn logs trigger context (assembled, pressure) on compaction
- afterTurn logs skip reason when guards prevent compaction
- CompactionConfig passes leafSkipReductionThreshold and
  leafBudgetHeadroomFactor from LcmConfig

Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
@100yenadmin 100yenadmin requested a review from Copilot April 7, 2026 07:02
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

src/compaction.ts:605

  • currentTokenCount is passed into evaluateLeafTrigger, but the early-return guard still uses only tokensBefore (stored DB count). If stored token counts are stale low while the live assembled prompt is over threshold, compactLeaf can incorrectly skip compaction (because tokensBefore <= threshold stays true). Consider normalizing input.currentTokenCount and using an effectiveTokensBefore = max(tokensBefore, currentTokenCount) for this guard (and any threshold comparisons).
    const tokensBefore = await this.summaryStore.getContextTokenCount(conversationId);
    const threshold = Math.floor(this.config.contextThreshold * tokenBudget);
    const leafTrigger = await this.evaluateLeafTrigger(
      conversationId,
      tokenBudget,
      input.currentTokenCount,
      tokensBefore,
    );

    if (!force && tokensBefore <= threshold && !leafTrigger.shouldCompact) {
      return {
        actionTaken: false,

src/compaction.ts:742

  • Same issue in compactFullSweep: currentTokenCount influences evaluateLeafTrigger, but the sweep can still return early (and all sweep/stop conditions start from tokensBefore) even when the live context is over the compaction threshold. Using a normalized effectiveTokensBefore = max(tokensBefore, currentTokenCount) for the early-return guard and initializing runningTokens from it would make full sweeps behave correctly when stored counts lag behind reality.
    const tokensBefore = await this.summaryStore.getContextTokenCount(conversationId);
    const threshold = Math.floor(this.config.contextThreshold * tokenBudget);
    const leafTrigger = await this.evaluateLeafTrigger(
      conversationId,
      tokenBudget,
      input.currentTokenCount,
      tokensBefore,
    );

    if (!force && tokensBefore <= threshold && !leafTrigger.shouldCompact) {
      return {
        actionTaken: false,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .changeset/cache-aware-compaction-guards.md Outdated
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request Apr 7, 2026
Pass tokenBudget and liveContextTokens from the engine's afterTurn
and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep
so cache-aware headroom decisions use fresh observed counts instead of
potentially stale stored values.

- evaluateLeafTrigger now receives tokenBudget + liveContextTokens
  from engine call sites
- compactLeaf/compactFullSweep receive currentTokenCount (observedTokens)
- afterTurn logs trigger context (assembled, pressure) on compaction
- afterTurn logs skip reason when guards prevent compaction
- CompactionConfig passes leafSkipReductionThreshold and
  leafBudgetHeadroomFactor from LcmConfig

Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
@100yenadmin 100yenadmin requested a review from Copilot April 7, 2026 07:30
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/compaction.ts:742

  • Same issue as in compactLeaf: the early-return guard uses tokensBefore (stored token count) rather than incorporating the new currentTokenCount live estimate. If stored tokens are stale low but the live context is actually above the threshold, compactFullSweep can return actionTaken=false and skip both leaf and condensed passes even under real budget pressure. Use an effective token count (max of stored + normalized live) for this <= threshold comparison.
    const tokensBefore = await this.summaryStore.getContextTokenCount(conversationId);
    const threshold = Math.floor(this.config.contextThreshold * tokenBudget);
    const leafTrigger = await this.evaluateLeafTrigger(
      conversationId,
      tokenBudget,
      input.currentTokenCount,
      tokensBefore,
    );

    if (!force && tokensBefore <= threshold && !leafTrigger.shouldCompact) {
      return {
        actionTaken: false,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/compaction.ts
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/compaction.ts
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request Apr 7, 2026
Users have no visibility into whether LCM compaction is saving or
wasting money. This adds persistent event tracking, cost estimation,
and efficiency reporting.

Changes:
- New compaction_events table (SQLite migration) records each
  compaction pass with token counts and model name
- Static pricing table (pricing.ts) for cost estimation with fuzzy
  model prefix matching (11 models covered)
- /lossless status gains an efficiency section showing passes, tokens
  saved, compaction cost, net efficiency, and recommendations
- New /lossless efficiency subcommand with per-model breakdown and
  actionable recommendations (e.g., "Switch from Opus to Haiku")
- persistCompactionEvent() now inserts DB row alongside console log
- Best-effort recording — doesn't fail compaction if table is missing

Closes Martian-Engineering#309. Depends on Martian-Engineering#306 and Martian-Engineering#307.
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request Apr 7, 2026
Users have no visibility into whether LCM compaction is saving or
wasting money. This adds persistent event tracking, cost estimation,
and efficiency reporting.

Changes:
- New compaction_events table (SQLite migration) records each
  compaction pass with token counts and model name
- Static pricing table (pricing.ts) for cost estimation with fuzzy
  model prefix matching (11 models covered)
- /lossless status gains an efficiency section showing passes, tokens
  saved, compaction cost, net efficiency, and recommendations
- New /lossless efficiency subcommand with per-model breakdown and
  actionable recommendations (e.g., "Switch from Opus to Haiku")
- persistCompactionEvent() now inserts DB row alongside console log
- Best-effort recording — doesn't fail compaction if table is missing

Closes Martian-Engineering#309. Depends on Martian-Engineering#306 and Martian-Engineering#307.
@100yenadmin 100yenadmin requested a review from Copilot April 7, 2026 09:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/db/config.ts
Comment thread src/compaction.ts
Comment thread test/lcm-integration.test.ts
Comment thread test/lcm-integration.test.ts
… shrink test strings

- Remove duplicate summaryModel and summaryProvider properties in
  LcmConfig type (lines 35-38 were copies of lines 28-30 with
  wrong comments)
- Replace 12KB test string literals with short descriptive text
  since tokenCountFn overrides the count
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

liu51115 pushed a commit to liu51115/lossless-claw that referenced this pull request Apr 7, 2026
Pass tokenBudget and liveContextTokens from the engine's afterTurn
and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep
so cache-aware headroom decisions use fresh observed counts instead of
potentially stale stored values.

- evaluateLeafTrigger now receives tokenBudget + liveContextTokens
  from engine call sites
- compactLeaf/compactFullSweep receive currentTokenCount (observedTokens)
- afterTurn logs trigger context (assembled, pressure) on compaction
- afterTurn logs skip reason when guards prevent compaction
- CompactionConfig passes leafSkipReductionThreshold and
  leafBudgetHeadroomFactor from LcmConfig

Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
liu51115 pushed a commit to liu51115/lossless-claw that referenced this pull request Apr 7, 2026
Users have no visibility into whether LCM compaction is saving or
wasting money. This adds persistent event tracking, cost estimation,
and efficiency reporting.

Changes:
- New compaction_events table (SQLite migration) records each
  compaction pass with token counts and model name
- Static pricing table (pricing.ts) for cost estimation with fuzzy
  model prefix matching (11 models covered)
- /lossless status gains an efficiency section showing passes, tokens
  saved, compaction cost, net efficiency, and recommendations
- New /lossless efficiency subcommand with per-model breakdown and
  actionable recommendations (e.g., "Switch from Opus to Haiku")
- persistCompactionEvent() now inserts DB row alongside console log
- Best-effort recording — doesn't fail compaction if table is missing

Closes Martian-Engineering#309. Depends on Martian-Engineering#306 and Martian-Engineering#307.
@jalehman
Copy link
Copy Markdown
Contributor

jalehman commented Apr 7, 2026

@100yenadmin This work (actually the previous work on #289) inspired me to take this a step further and make compaction directly cache-aware. LMK what you think.

@100yenadmin
Copy link
Copy Markdown
Contributor Author

I'm all for it! Let me know if you want to build on or edit any of these (gave access to do so) @jalehman

This work is just part 1 of a 3 split from #289. Merge order: #306#307#308.

These 3 were same as original but ended up growing because of code reviewer test and feedback (mostly nits). The feature works well on our internal Hippo LCM so figured we'd share it OS.

Let me know how else I can help 🖤

@jalehman
Copy link
Copy Markdown
Contributor

jalehman commented Apr 8, 2026

Closing this in favor of the cache-aware compaction paths I recently introduced in #329 and #315. I think there are some things worth incorporating in the PRs built off of this, will attend to those soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants