feat: cache-aware leaf compaction guards with budget-pressure override#306
feat: cache-aware leaf compaction guards with budget-pressure override#306100yenadmin wants to merge 5 commits intoMartian-Engineering:mainfrom
Conversation
On models with prompt caching (Claude, GPT-4), compaction that removes 3% of tokens costs more in cache-miss penalties than it saves. The current trigger fires whenever assembledTokens > threshold × budget regardless of how much compaction would actually remove. Add three guard checks to evaluateLeafTrigger(): 1. Budget headroom gate — skip when assembled < 80% of budget ceiling (leafBudgetHeadroomFactor, default 0.8, set 0 to disable) 2. Cache-aware reduction gate — skip when estimated reduction < 5% of total assembled tokens (leafSkipReductionThreshold, default 0.05) 3. Budget pressure override — force compaction when context reaches or exceeds the ceiling, preventing starvation in large contexts Also passes currentTokenCount through compactLeaf/compactFullSweep so headroom decisions use live observed counts when stored counts are stale. Split from Martian-Engineering#289 for reviewability.
There was a problem hiding this comment.
Pull request overview
Adds cache-aware skip guards to leaf compaction triggering to reduce prompt-cache invalidation and unnecessary compaction work, with new configurable thresholds and expanded test coverage.
Changes:
- Reworked
evaluateLeafTrigger()to add budget-headroom gating, cache-aware reduction gating, and a budget-pressure override (with structured diagnostics). - Added config resolution + manifest schema/UI hints for
leafSkipReductionThresholdandleafBudgetHeadroomFactor. - Added integration/config tests covering the new guard logic and configuration defaults/overrides.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/compaction.ts |
Implements the new leaf-trigger decision logic and plumbs currentTokenCount into leaf/full-sweep compaction paths. |
src/db/config.ts |
Introduces new resolved config fields (with clamping) for leaf-compaction guards. |
openclaw.plugin.json |
Exposes the new config fields via schema + UI hints. |
test/lcm-integration.test.ts |
Adds integration coverage for the skip-guard decision tree and stale-token scenarios. |
test/config.test.ts |
Adds tests for defaults, plugin config, env var overrides, and manifest schema presence. |
.changeset/cache-aware-compaction-guards.md |
Declares a release bump entry for the feature. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Pass tokenBudget and liveContextTokens from the engine's afterTurn and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep so cache-aware headroom decisions use fresh observed counts instead of potentially stale stored values. - evaluateLeafTrigger now receives tokenBudget + liveContextTokens from engine call sites - compactLeaf/compactFullSweep receive currentTokenCount (observedTokens) - afterTurn logs trigger context (assembled, pressure) on compaction - afterTurn logs skip reason when guards prevent compaction - CompactionConfig passes leafSkipReductionThreshold and leafBudgetHeadroomFactor from LcmConfig Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
New comprehensive guide for operators tuning LCM compaction behavior: - docs/compaction-tuning.md (356 lines): TLDR, per-tier model presets (Opus, Sonnet, Haiku, GPT-4o-mini, Gemini Flash), cache economics break-even formula, debugging checklist, orchestration scenarios - docs/architecture.md: cache-aware guards section with Mermaid flowchart - docs/configuration.md: new settings reference, model comparison table - skills references: config field updates Split from Martian-Engineering#289 (Part 3 of 3). Independent of Martian-Engineering#306 and Martian-Engineering#307.
Restores two load-bearing inline comments from the original PR Martian-Engineering#289 that were lost during the split: - 3-line headroomEnabled rationale: explains why the guard uses three conditions and that factor=0 disables without creating false pressure - 8-line budget-pressure explanation: documents when pressure is true, when the cache-aware skip can fire, and the starvation prevention guarantee
Pass tokenBudget and liveContextTokens from the engine's afterTurn and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep so cache-aware headroom decisions use fresh observed counts instead of potentially stale stored values. - evaluateLeafTrigger now receives tokenBudget + liveContextTokens from engine call sites - compactLeaf/compactFullSweep receive currentTokenCount (observedTokens) - afterTurn logs trigger context (assembled, pressure) on compaction - afterTurn logs skip reason when guards prevent compaction - CompactionConfig passes leafSkipReductionThreshold and leafBudgetHeadroomFactor from LcmConfig Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
- Fix changeset file to use standard frontmatter delimiters - Normalize liveContextTokens with Number.isFinite/Math.floor guard to prevent NaN/Infinity from corrupting headroom calculations (mirrors the pattern used in evaluate())
Pass tokenBudget and liveContextTokens from the engine's afterTurn and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep so cache-aware headroom decisions use fresh observed counts instead of potentially stale stored values. - evaluateLeafTrigger now receives tokenBudget + liveContextTokens from engine call sites - compactLeaf/compactFullSweep receive currentTokenCount (observedTokens) - afterTurn logs trigger context (assembled, pressure) on compaction - afterTurn logs skip reason when guards prevent compaction - CompactionConfig passes leafSkipReductionThreshold and leafBudgetHeadroomFactor from LcmConfig Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
src/compaction.ts:605
currentTokenCountis passed intoevaluateLeafTrigger, but the early-return guard still uses onlytokensBefore(stored DB count). If stored token counts are stale low while the live assembled prompt is overthreshold,compactLeafcan incorrectly skip compaction (becausetokensBefore <= thresholdstays true). Consider normalizinginput.currentTokenCountand using aneffectiveTokensBefore = max(tokensBefore, currentTokenCount)for this guard (and any threshold comparisons).
const tokensBefore = await this.summaryStore.getContextTokenCount(conversationId);
const threshold = Math.floor(this.config.contextThreshold * tokenBudget);
const leafTrigger = await this.evaluateLeafTrigger(
conversationId,
tokenBudget,
input.currentTokenCount,
tokensBefore,
);
if (!force && tokensBefore <= threshold && !leafTrigger.shouldCompact) {
return {
actionTaken: false,
src/compaction.ts:742
- Same issue in
compactFullSweep:currentTokenCountinfluencesevaluateLeafTrigger, but the sweep can still return early (and all sweep/stop conditions start fromtokensBefore) even when the live context is over the compaction threshold. Using a normalizedeffectiveTokensBefore = max(tokensBefore, currentTokenCount)for the early-return guard and initializingrunningTokensfrom it would make full sweeps behave correctly when stored counts lag behind reality.
const tokensBefore = await this.summaryStore.getContextTokenCount(conversationId);
const threshold = Math.floor(this.config.contextThreshold * tokenBudget);
const leafTrigger = await this.evaluateLeafTrigger(
conversationId,
tokenBudget,
input.currentTokenCount,
tokensBefore,
);
if (!force && tokensBefore <= threshold && !leafTrigger.shouldCompact) {
return {
actionTaken: false,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Pass tokenBudget and liveContextTokens from the engine's afterTurn and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep so cache-aware headroom decisions use fresh observed counts instead of potentially stale stored values. - evaluateLeafTrigger now receives tokenBudget + liveContextTokens from engine call sites - compactLeaf/compactFullSweep receive currentTokenCount (observedTokens) - afterTurn logs trigger context (assembled, pressure) on compaction - afterTurn logs skip reason when guards prevent compaction - CompactionConfig passes leafSkipReductionThreshold and leafBudgetHeadroomFactor from LcmConfig Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/compaction.ts:742
- Same issue as in
compactLeaf: the early-return guard usestokensBefore(stored token count) rather than incorporating the newcurrentTokenCountlive estimate. If stored tokens are stale low but the live context is actually above the threshold,compactFullSweepcan returnactionTaken=falseand skip both leaf and condensed passes even under real budget pressure. Use an effective token count (max of stored + normalized live) for this<= thresholdcomparison.
const tokensBefore = await this.summaryStore.getContextTokenCount(conversationId);
const threshold = Math.floor(this.config.contextThreshold * tokenBudget);
const leafTrigger = await this.evaluateLeafTrigger(
conversationId,
tokenBudget,
input.currentTokenCount,
tokensBefore,
);
if (!force && tokensBefore <= threshold && !leafTrigger.shouldCompact) {
return {
actionTaken: false,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Users have no visibility into whether LCM compaction is saving or wasting money. This adds persistent event tracking, cost estimation, and efficiency reporting. Changes: - New compaction_events table (SQLite migration) records each compaction pass with token counts and model name - Static pricing table (pricing.ts) for cost estimation with fuzzy model prefix matching (11 models covered) - /lossless status gains an efficiency section showing passes, tokens saved, compaction cost, net efficiency, and recommendations - New /lossless efficiency subcommand with per-model breakdown and actionable recommendations (e.g., "Switch from Opus to Haiku") - persistCompactionEvent() now inserts DB row alongside console log - Best-effort recording — doesn't fail compaction if table is missing Closes Martian-Engineering#309. Depends on Martian-Engineering#306 and Martian-Engineering#307.
Users have no visibility into whether LCM compaction is saving or wasting money. This adds persistent event tracking, cost estimation, and efficiency reporting. Changes: - New compaction_events table (SQLite migration) records each compaction pass with token counts and model name - Static pricing table (pricing.ts) for cost estimation with fuzzy model prefix matching (11 models covered) - /lossless status gains an efficiency section showing passes, tokens saved, compaction cost, net efficiency, and recommendations - New /lossless efficiency subcommand with per-model breakdown and actionable recommendations (e.g., "Switch from Opus to Haiku") - persistCompactionEvent() now inserts DB row alongside console log - Best-effort recording — doesn't fail compaction if table is missing Closes Martian-Engineering#309. Depends on Martian-Engineering#306 and Martian-Engineering#307.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… shrink test strings - Remove duplicate summaryModel and summaryProvider properties in LcmConfig type (lines 35-38 were copies of lines 28-30 with wrong comments) - Replace 12KB test string literals with short descriptive text since tokenCountFn overrides the count
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Pass tokenBudget and liveContextTokens from the engine's afterTurn and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep so cache-aware headroom decisions use fresh observed counts instead of potentially stale stored values. - evaluateLeafTrigger now receives tokenBudget + liveContextTokens from engine call sites - compactLeaf/compactFullSweep receive currentTokenCount (observedTokens) - afterTurn logs trigger context (assembled, pressure) on compaction - afterTurn logs skip reason when guards prevent compaction - CompactionConfig passes leafSkipReductionThreshold and leafBudgetHeadroomFactor from LcmConfig Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
Users have no visibility into whether LCM compaction is saving or wasting money. This adds persistent event tracking, cost estimation, and efficiency reporting. Changes: - New compaction_events table (SQLite migration) records each compaction pass with token counts and model name - Static pricing table (pricing.ts) for cost estimation with fuzzy model prefix matching (11 models covered) - /lossless status gains an efficiency section showing passes, tokens saved, compaction cost, net efficiency, and recommendations - New /lossless efficiency subcommand with per-model breakdown and actionable recommendations (e.g., "Switch from Opus to Haiku") - persistCompactionEvent() now inserts DB row alongside console log - Best-effort recording — doesn't fail compaction if table is missing Closes Martian-Engineering#309. Depends on Martian-Engineering#306 and Martian-Engineering#307.
|
@100yenadmin This work (actually the previous work on #289) inspired me to take this a step further and make compaction directly cache-aware. LMK what you think. |
|
I'm all for it! Let me know if you want to build on or edit any of these (gave access to do so) @jalehman This work is just part 1 of a 3 split from #289. Merge order: #306 → #307 → #308. These 3 were same as original but ended up growing because of code reviewer test and feedback (mostly nits). The feature works well on our internal Hippo LCM so figured we'd share it OS. Let me know how else I can help 🖤 |
Summary
Adds cache-aware skip guards to
evaluateLeafTrigger()that prevent unnecessary prompt-cache invalidation during leaf compaction. On models with prompt caching, a compaction pass that removes only 3% of tokens costs more in cache-miss penalties than it saves in token reduction.Part 1 of 3 split from #289. Merge order: #306 → #307 → #308.
The Problem: Compaction That Costs More Than It Saves
How compaction invalidates the prompt cache
Every leaf compaction pass:
The cost of a single unnecessary cache miss
Cached input is always 1/10 of the base input price across all Anthropic models. Cache TTL is 5 minutes (refreshed on each hit).
Break-even formula
A compaction saving X tokens/turn that invalidates Y cached tokens takes:
For typical values (150K cached, 10K saved): ~13.5 turns to break even regardless of model tier. If the reduction is only 3% of context (~5K tokens), break-even extends to 27+ turns — most sessions never recoup the cost.
The current trigger fires blindly
The existing
evaluateLeafTrigger()fires wheneverrawTokensOutsideTail >= leafChunkTokens, regardless of:The Solution: Three Guard Checks
Guard evaluation flow
Scenario A: Headroom skip (saves cache, defers compaction)
Setup: Orchestrator with 200K token budget, 40K assembled tokens, 18K raw tokens outside tail.
Scenario B: Cache-aware skip (tiny reduction not worth cache bust)
Setup: Large context with 500K summary + 24K raw messages, no token budget provided.
Scenario C: Budget pressure override (prevents starvation)
Setup: Same large context but now 750K token budget provided.
Orchestrator vs sub-agent scenario
The same engine instance handles different budgets:
Same context, different budget pressure. The guards adapt automatically.
Config Fields
leafSkipReductionThreshold0.05LCM_LEAF_SKIP_REDUCTION_THRESHOLD0to disable cache-aware skip.leafBudgetHeadroomFactor0.8LCM_LEAF_BUDGET_HEADROOM_FACTOR0to disable headroom check and budget pressure detection.Escape hatches: Both set to
0= fully original behavior. No guards, no skips.Changes by File
src/compaction.tsLeafTriggerResulttype with structured diagnostics. RewriteevaluateLeafTrigger()with 3 guard checks. NormalizeliveContextTokenswithNumber.isFiniteguard. AddcurrentTokenCounttocompactLeaf/compactFullSweep/compactsignatures.src/db/config.tsleafSkipReductionThresholdandleafBudgetHeadroomFactorfields withclamp01()validation. Env var + plugin config resolution.openclaw.plugin.jsontest/lcm-integration.test.tstest/config.test.ts.changeset/Test Plan