feat: cache-aware compaction guards with budget-pressure priority and per-tier tuning#289
Conversation
On high-traffic conversations, evaluateLeafTrigger() fires every turn because raw tokens outside the fresh tail constantly exceed leafChunkTokens. Each leaf pass creates a depth-0 summary that resequences all ordinals, invalidating the Anthropic prompt cache prefix. Cache hit dropped from 90%+ to 22% on large conversations. Add two skip guards to evaluateLeafTrigger(), evaluated only when the basic threshold IS exceeded: 1. Cache-aware skip: if estimated reduction is <5% of total assembled tokens, the cache invalidation cost exceeds the compression gain. 2. Budget headroom skip: if assembled tokens are below 80% of contextThreshold × tokenBudget, there is no budget pressure. Both are configurable: leafSkipReductionThreshold (default 0.05) and leafBudgetHeadroomFactor (default 0.8). Fixes Martian-Engineering#282 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Updates leaf compaction triggering to avoid unnecessary incremental leaf passes that destabilize prompt-cache prefixes and waste work on conversations with ample token budget headroom.
Changes:
- Adds cache-aware and budget-headroom skip guards to
CompactionEngine.evaluateLeafTrigger(). - Wires
tokenBudgetthrough the engine’s leaf-trigger evaluation and adds debug logging for skip reasons. - Introduces two new config knobs (
leafSkipReductionThreshold,leafBudgetHeadroomFactor) and updates the impacted engine test.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/db/config.ts |
Adds two new config values and resolves them from env/plugin config with defaults. |
src/compaction.ts |
Extends compaction config and rewrites evaluateLeafTrigger() to include skip guards and skip-reason reporting. |
src/engine.ts |
Passes tokenBudget into leaf-trigger evaluation and logs skip reasons; wires new config fields into CompactionConfig. |
test/engine.test.ts |
Updates spy assertion for the new evaluateLeafTrigger(..., tokenBudget) signature. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…er cache skip Adversarial review found that the cache-aware skip could permanently suppress leaf compaction in large contexts (e.g., 700K of 750K ceiling) because the 5% relative threshold scales with total assembled tokens. Fix: evaluate budget headroom FIRST. When over the headroom ceiling (budget pressure), bypass the cache-aware skip entirely — compaction fires regardless of cache impact. The cache-aware skip only applies when there is genuine headroom (no budget pressure). Also clamp leafBudgetHeadroomFactor to max 1.0 to prevent misconfiguration from silently disabling compaction. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adversarial Verification Report (5 agents)Ran 5 parallel adversarial review agents. Found and fixed one CRITICAL issue. CRITICAL — Compaction starvation at scale (FIXED in d56eab1)The cache-aware skip used a 5% relative threshold that scaled linearly with Root cause: The cache skip short-circuited before the budget headroom check could override it. Fix: Evaluate budget headroom FIRST. When over the headroom ceiling (budget pressure), bypass the cache-aware skip entirely. Also clamp Scenario verification after fix:
Other findings (non-blocking):
|
Addresses Copilot review round 2: 1. estimatedReduction was using rawTokensOutsideTail (all raw tokens) but a leaf pass only compacts one chunk capped at leafChunkTokens. Now uses Math.min(rawTokensOutsideTail, threshold) so the estimate reflects actual per-pass reduction. 2. Added leafSkipReductionThreshold and leafBudgetHeadroomFactor to openclaw.plugin.json configSchema (which has additionalProperties: false) so users can set them via plugin config, not just env vars. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds 16 new unit tests covering all paths through the cache-aware and budget-headroom skip logic, plus 5 config resolution tests: Skip logic tests (lcm-integration.test.ts): - Basic threshold: below/above leafChunkTokens - Budget headroom: skip when under ceiling, compact when over - Budget headroom: bypassed when tokenBudget undefined - Cache-aware: skip when reduction tiny relative to total context - Cache-aware: compact when reduction large enough - Budget pressure overrides cache skip (anti-starvation) - Edge cases: empty conversation, negative reduction - Config escape hatches: threshold=0 and factor=0 disable skips - Factor clamped to 1.0 (misconfiguration protection) - Orchestrator vs sub-agent: different budgets, different decisions - Per-pass chunk size estimate uses min(raw, threshold) Config tests (config.test.ts): - Default values: 0.05 and 0.8 - Plugin config override - Env var override - Schema entries in manifest Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 8 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…st fixes Addresses 5 Copilot review comments: 1. leafBudgetHeadroomFactor=0 now correctly disables the headroom check (headroomEnabled=false) instead of creating false budget pressure that bypassed the cache-aware skip. 2. Config values clamped to [0,1] in resolveLcmConfig via clamp01(). 3. Removed wasteful "x".repeat(summaryTokens*4) in test — mock store uses tokenCount directly, not content length. 4. Fixed leafChunkTokens=0 test — resolveLeafChunkTokens() normalizes non-positive to default. Use default threshold instead. 5. Updated factor=0 test comment to match corrected semantics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…nfo logging Three hardening improvements from adversarial review: 1. Token accuracy: evaluateLeafTrigger now accepts optional liveContextTokens param and uses max(stored, live) for headroom decisions. Stored token counts can lag after rapid ingestion; the live estimate from afterTurn provides a more accurate floor. 2. Structured telemetry: LeafTriggerResult now includes a `context` field with all decision inputs (totalAssembledTokens, budgetCeiling, budgetPressure, estimatedReduction, reductionThreshold, headroomFactor). Enables machine-parseable diagnostics and config tuning. 3. Observability: Skip and fire decisions logged at info level (not debug). Compaction fires include assembled/pressure context. Volume is at most 1 log per turn — negligible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The comment described when the cache-aware skip is applied but did not precisely reflect the budgetPressure gate semantics after the headroom refactor. Updated to accurately describe: budget pressure is only true when headroom is enabled AND ceiling is breached; otherwise cache-aware skip can fire. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skip decisions fire every turn in high-traffic sessions — too noisy for info level. Compaction triggers are infrequent (~every 7-10 turns) and worth info level as meaningful state changes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Adds documentation for the cache-aware compaction feature across all doc layers: New: docs/compaction-tuning.md — standalone deep-dive covering: - TLDR quick-setup with copy-paste configs per model tier - Compaction model selection guide (why fast models matter) - Full lifecycle diagrams (Mermaid) - Cache-aware decision flowchart - Economics tables (cache miss penalty, break-even formula) - Gateway stall timing per model - Debugging guide for common issues Updated: docs/architecture.md - Cache-aware skip guards section with Mermaid diagram - Budget pressure priority explanation - Prompt cache impact description Updated: docs/configuration.md - leafSkipReductionThreshold and leafBudgetHeadroomFactor reference - Compaction model selection table - Per-tier preset summary with link to tuning guide Updated: skills/lossless-claw/references/config.md - Added both new config fields to skill reference Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The code uses totalAssembledTokens < budgetCeiling for headroom (strict less-than), so budget pressure fires at >= budgetCeiling. Docs said 'exceed' which implies strict greater-than. Fixed to 'reach or exceed' across all 5 files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 12 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Thread observed token counts through leaf and threshold compaction workers so stale persisted counts do not suppress needed compaction after the outer trigger has already detected budget pressure. Add regression coverage for both the engine plumbing and the compaction engine stale-count path, correct the Sonnet 4.6 tuning guide to reflect its 1M context tier, and add the missing patch changeset. Regeneration-Prompt: | Address review findings on PR 289 in the PR worktree without changing unrelated behavior. The bug is that afterTurn/evaluation can use a live current token estimate, but the actual compactLeaf and compactFullSweep worker paths re-check leaf trigger conditions using only stored DB token counts, which can lag ingestion and incorrectly skip compaction under real budget pressure. Thread the observed token count through those worker calls and add tests that prove stale stored counts no longer suppress leaf or threshold sweeps. Also fix the compaction tuning docs so Sonnet 4.6 is described consistently with the documented 1M context window, and add a patch changeset because the PR changes user-facing runtime behavior and docs.
|
Given the size here, splitting it up into multiple PR's @jalehman |
Splitting this PR for easier reviewThis PR is +1047/-18 across 13 files — significantly larger than the others that merged quickly. To make review tractable, we're proposing to split it into 3 focused PRs: PR A: Cache-aware leaf compaction guards (~400 lines, 21 tests)The core feature. Rewrites
Files: PR B: Live token awareness (~60 lines, 5 tests)Depends on PR A. Passes Files: PR C: Documentation & tuning guide (~430 lines, 0 code)Independent — can land anytime. New Note: The branch is 8 commits behind main (main now has #288, #294, #295, #296, #298, #302). Rebase has extensive conflicts. We'll create fresh branches from current main and reapply the changes for each split PR. Should we proceed with this split, or would you prefer a different grouping? Happy to adjust the boundaries. |
On models with prompt caching (Claude, GPT-4), compaction that removes 3% of tokens costs more in cache-miss penalties than it saves. The current trigger fires whenever assembledTokens > threshold × budget regardless of how much compaction would actually remove. Add three guard checks to evaluateLeafTrigger(): 1. Budget headroom gate — skip when assembled < 80% of budget ceiling (leafBudgetHeadroomFactor, default 0.8, set 0 to disable) 2. Cache-aware reduction gate — skip when estimated reduction < 5% of total assembled tokens (leafSkipReductionThreshold, default 0.05) 3. Budget pressure override — force compaction when context reaches or exceeds the ceiling, preventing starvation in large contexts Also passes currentTokenCount through compactLeaf/compactFullSweep so headroom decisions use live observed counts when stored counts are stale. Split from Martian-Engineering#289 for reviewability.
Pass tokenBudget and liveContextTokens from the engine's afterTurn and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep so cache-aware headroom decisions use fresh observed counts instead of potentially stale stored values. - evaluateLeafTrigger now receives tokenBudget + liveContextTokens from engine call sites - compactLeaf/compactFullSweep receive currentTokenCount (observedTokens) - afterTurn logs trigger context (assembled, pressure) on compaction - afterTurn logs skip reason when guards prevent compaction - CompactionConfig passes leafSkipReductionThreshold and leafBudgetHeadroomFactor from LcmConfig Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
New comprehensive guide for operators tuning LCM compaction behavior: - docs/compaction-tuning.md (356 lines): TLDR, per-tier model presets (Opus, Sonnet, Haiku, GPT-4o-mini, Gemini Flash), cache economics break-even formula, debugging checklist, orchestration scenarios - docs/architecture.md: cache-aware guards section with Mermaid flowchart - docs/configuration.md: new settings reference, model comparison table - skills references: config field updates Split from Martian-Engineering#289 (Part 3 of 3). Independent of Martian-Engineering#306 and Martian-Engineering#307.
|
Split complete. This PR is now covered by three focused PRs rebased on current main:
Closing this PR in favor of the split. All 1047 lines of additions are preserved across the three PRs. |
Restores two load-bearing inline comments from the original PR Martian-Engineering#289 that were lost during the split: - 3-line headroomEnabled rationale: explains why the guard uses three conditions and that factor=0 disables without creating false pressure - 8-line budget-pressure explanation: documents when pressure is true, when the cache-aware skip can fire, and the starvation prevention guarantee
Pass tokenBudget and liveContextTokens from the engine's afterTurn and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep so cache-aware headroom decisions use fresh observed counts instead of potentially stale stored values. - evaluateLeafTrigger now receives tokenBudget + liveContextTokens from engine call sites - compactLeaf/compactFullSweep receive currentTokenCount (observedTokens) - afterTurn logs trigger context (assembled, pressure) on compaction - afterTurn logs skip reason when guards prevent compaction - CompactionConfig passes leafSkipReductionThreshold and leafBudgetHeadroomFactor from LcmConfig Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
Pass tokenBudget and liveContextTokens from the engine's afterTurn and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep so cache-aware headroom decisions use fresh observed counts instead of potentially stale stored values. - evaluateLeafTrigger now receives tokenBudget + liveContextTokens from engine call sites - compactLeaf/compactFullSweep receive currentTokenCount (observedTokens) - afterTurn logs trigger context (assembled, pressure) on compaction - afterTurn logs skip reason when guards prevent compaction - CompactionConfig passes leafSkipReductionThreshold and leafBudgetHeadroomFactor from LcmConfig Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
Pass tokenBudget and liveContextTokens from the engine's afterTurn and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep so cache-aware headroom decisions use fresh observed counts instead of potentially stale stored values. - evaluateLeafTrigger now receives tokenBudget + liveContextTokens from engine call sites - compactLeaf/compactFullSweep receive currentTokenCount (observedTokens) - afterTurn logs trigger context (assembled, pressure) on compaction - afterTurn logs skip reason when guards prevent compaction - CompactionConfig passes leafSkipReductionThreshold and leafBudgetHeadroomFactor from LcmConfig Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
On models with prompt caching (Claude, GPT-4), compaction that removes 3% of tokens costs more in cache-miss penalties than it saves. The current trigger fires whenever assembledTokens > threshold × budget regardless of how much compaction would actually remove. Add three guard checks to evaluateLeafTrigger(): 1. Budget headroom gate — skip when assembled < 80% of budget ceiling (leafBudgetHeadroomFactor, default 0.8, set 0 to disable) 2. Cache-aware reduction gate — skip when estimated reduction < 5% of total assembled tokens (leafSkipReductionThreshold, default 0.05) 3. Budget pressure override — force compaction when context reaches or exceeds the ceiling, preventing starvation in large contexts Also passes currentTokenCount through compactLeaf/compactFullSweep so headroom decisions use live observed counts when stored counts are stale. Split from Martian-Engineering#289 for reviewability.
Restores two load-bearing inline comments from the original PR Martian-Engineering#289 that were lost during the split: - 3-line headroomEnabled rationale: explains why the guard uses three conditions and that factor=0 disables without creating false pressure - 8-line budget-pressure explanation: documents when pressure is true, when the cache-aware skip can fire, and the starvation prevention guarantee
Pass tokenBudget and liveContextTokens from the engine's afterTurn and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep so cache-aware headroom decisions use fresh observed counts instead of potentially stale stored values. - evaluateLeafTrigger now receives tokenBudget + liveContextTokens from engine call sites - compactLeaf/compactFullSweep receive currentTokenCount (observedTokens) - afterTurn logs trigger context (assembled, pressure) on compaction - afterTurn logs skip reason when guards prevent compaction - CompactionConfig passes leafSkipReductionThreshold and leafBudgetHeadroomFactor from LcmConfig Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
Summary
Adds cache-aware compaction guards to the leaf compaction trigger, comprehensive documentation, per-model-tier tuning recommendations, and 21 new tests. Prevents unnecessary prompt-cache invalidation on high-traffic conversations while ensuring compaction fires under genuine budget pressure.
Fixes #282
Problem
On high-traffic conversations (18K+ messages),
evaluateLeafTrigger()fires every turn because raw tokens constantly exceedleafChunkTokens. Each leaf pass resequences all ordinals, invalidating the prompt cache prefix. Cache hit dropped from 90%+ to 22%.Solution
Three-tier decision logic
Budget pressure always overrides cache concerns — prevents starvation.
Configurable thresholds
leafSkipReductionThresholdleafBudgetHeadroomFactorPer-model-tier recommendations
Documentation
New comprehensive Compaction Tuning Guide covering:
Updated existing docs:
docs/architecture.md— Cache-aware guards section with Mermaid diagramdocs/configuration.md— New settings reference, model selection tableskills/lossless-claw/references/config.md— Skill reference for new fieldsTest Coverage (21 new tests)
Skip logic (16 tests): Basic threshold, budget headroom (skip/compact/bypass), cache-aware (skip/compact), budget pressure override, edge cases (empty conv, negative reduction, per-pass capping), config escape hatches (0=disable), factor clamping, orchestrator vs sub-agent scenario
Config resolution (5 tests): Defaults, plugin config override, env var override, schema entries
Files Changed (8)
src/compaction.tsLeafTriggerResulttype, rewrittenevaluateLeafTrigger(),liveContextTokensparamsrc/db/config.tsclamp01validationsrc/engine.tstokenBudget+liveContextTokens, structured telemetry, loggingopenclaw.plugin.jsondocs/compaction-tuning.mddocs/architecture.mddocs/configuration.mdskills/lossless-claw/references/config.mdtest/lcm-integration.test.tstest/config.test.tstest/engine.test.tsAdversarial Review Summary
5 agents reviewed across 4 rounds. All CRITICAL/HIGH findings resolved:
min(factor, 1.0)min(raw, threshold)openclaw.plugin.jsonheadroomEnabledgate prevents it