Skip to content

feat: cache-aware compaction guards with budget-pressure priority and per-tier tuning#289

Closed
100yenadmin wants to merge 22 commits intoMartian-Engineering:mainfrom
electricsheephq:fix/leaf-compaction-cache-aware-skip
Closed

feat: cache-aware compaction guards with budget-pressure priority and per-tier tuning#289
100yenadmin wants to merge 22 commits intoMartian-Engineering:mainfrom
electricsheephq:fix/leaf-compaction-cache-aware-skip

Conversation

@100yenadmin
Copy link
Copy Markdown
Contributor

@100yenadmin 100yenadmin commented Apr 6, 2026

Summary

Adds cache-aware compaction guards to the leaf compaction trigger, comprehensive documentation, per-model-tier tuning recommendations, and 21 new tests. Prevents unnecessary prompt-cache invalidation on high-traffic conversations while ensuring compaction fires under genuine budget pressure.

Fixes #282


Problem

On high-traffic conversations (18K+ messages), evaluateLeafTrigger() fires every turn because raw tokens constantly exceed leafChunkTokens. Each leaf pass resequences all ordinals, invalidating the prompt cache prefix. Cache hit dropped from 90%+ to 22%.

Solution

Three-tier decision logic

1. Raw tokens >= leafChunkTokens?  NO → skip
2. Assembled < headroom ceiling?   YES → skip (no pressure)
3. Budget pressure detected?       YES → COMPACT (budget wins)
                                   NO  → check cache-aware skip
4. Reduction < 5% of total?        YES → skip (cache cost > gain)
                                   NO  → COMPACT

Budget pressure always overrides cache concerns — prevents starvation.

Configurable thresholds

Setting Default Effect
leafSkipReductionThreshold 0.05 Min reduction fraction to justify compaction
leafBudgetHeadroomFactor 0.8 Skip when under this fraction of budget ceiling

Per-model-tier recommendations

Tier skipThreshold headroomFactor summaryModel
Opus 1M 0.02 0.45 Sonnet/Haiku
Sonnet 200K 0.05 (default) 0.80 (default) Haiku
Haiku quick 0.10 0.90 Haiku
Orchestration 0.02 0.60 Sonnet

Documentation

New comprehensive Compaction Tuning Guide covering:

  • TLDR — Copy-paste configs per model tier, cost savings example, verification step, glossary
  • How It Works — Lifecycle diagram, summary hierarchy, cache impact explanation, timing
  • Configuration Reference — All settings with defaults, economics tables, escape hatches
  • Advanced — Model latency comparison, skip guard flowchart, sub-agent isolation, debugging

Updated existing docs:

  • docs/architecture.md — Cache-aware guards section with Mermaid diagram
  • docs/configuration.md — New settings reference, model selection table
  • skills/lossless-claw/references/config.md — Skill reference for new fields

Test Coverage (21 new tests)

Skip logic (16 tests): Basic threshold, budget headroom (skip/compact/bypass), cache-aware (skip/compact), budget pressure override, edge cases (empty conv, negative reduction, per-pass capping), config escape hatches (0=disable), factor clamping, orchestrator vs sub-agent scenario

Config resolution (5 tests): Defaults, plugin config override, env var override, schema entries

Files Changed (8)

File Change
src/compaction.ts LeafTriggerResult type, rewritten evaluateLeafTrigger(), liveContextTokens param
src/db/config.ts New config fields + clamp01 validation
src/engine.ts Wire tokenBudget + liveContextTokens, structured telemetry, logging
openclaw.plugin.json Schema entries for new fields
docs/compaction-tuning.md NEW — Comprehensive tuning guide
docs/architecture.md Cache-aware guards section
docs/configuration.md New settings + model selection
skills/lossless-claw/references/config.md Skill reference update
test/lcm-integration.test.ts 16 skip logic tests
test/config.test.ts 5 config resolution tests
test/engine.test.ts Updated spy assertion

Adversarial Review Summary

5 agents reviewed across 4 rounds. All CRITICAL/HIGH findings resolved:

Finding Resolution
Compaction starvation at scale Budget pressure overrides cache skip
Factor >= 1.0 disables compaction Clamped to min(factor, 1.0)
Per-pass estimate inflated Uses min(raw, threshold)
Schema missing for new fields Added to openclaw.plugin.json
Factor=0 creates false budget pressure headroomEnabled gate prevents it
Test coverage zero for skip logic 21 new tests
Doc accuracy errors Fixed defaults, added glossary

On high-traffic conversations, evaluateLeafTrigger() fires every turn
because raw tokens outside the fresh tail constantly exceed
leafChunkTokens. Each leaf pass creates a depth-0 summary that
resequences all ordinals, invalidating the Anthropic prompt cache
prefix. Cache hit dropped from 90%+ to 22% on large conversations.

Add two skip guards to evaluateLeafTrigger(), evaluated only when the
basic threshold IS exceeded:

1. Cache-aware skip: if estimated reduction is <5% of total assembled
   tokens, the cache invalidation cost exceeds the compression gain.

2. Budget headroom skip: if assembled tokens are below 80% of
   contextThreshold × tokenBudget, there is no budget pressure.

Both are configurable: leafSkipReductionThreshold (default 0.05) and
leafBudgetHeadroomFactor (default 0.8).

Fixes Martian-Engineering#282

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 6, 2026 11:06
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates leaf compaction triggering to avoid unnecessary incremental leaf passes that destabilize prompt-cache prefixes and waste work on conversations with ample token budget headroom.

Changes:

  • Adds cache-aware and budget-headroom skip guards to CompactionEngine.evaluateLeafTrigger().
  • Wires tokenBudget through the engine’s leaf-trigger evaluation and adds debug logging for skip reasons.
  • Introduces two new config knobs (leafSkipReductionThreshold, leafBudgetHeadroomFactor) and updates the impacted engine test.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/db/config.ts Adds two new config values and resolves them from env/plugin config with defaults.
src/compaction.ts Extends compaction config and rewrites evaluateLeafTrigger() to include skip guards and skip-reason reporting.
src/engine.ts Passes tokenBudget into leaf-trigger evaluation and logs skip reasons; wires new config fields into CompactionConfig.
test/engine.test.ts Updates spy assertion for the new evaluateLeafTrigger(..., tokenBudget) signature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/compaction.ts Outdated
Comment thread src/db/config.ts
…er cache skip

Adversarial review found that the cache-aware skip could permanently
suppress leaf compaction in large contexts (e.g., 700K of 750K ceiling)
because the 5% relative threshold scales with total assembled tokens.

Fix: evaluate budget headroom FIRST. When over the headroom ceiling
(budget pressure), bypass the cache-aware skip entirely — compaction
fires regardless of cache impact. The cache-aware skip only applies
when there is genuine headroom (no budget pressure).

Also clamp leafBudgetHeadroomFactor to max 1.0 to prevent
misconfiguration from silently disabling compaction.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@100yenadmin
Copy link
Copy Markdown
Contributor Author

Adversarial Verification Report (5 agents)

Ran 5 parallel adversarial review agents. Found and fixed one CRITICAL issue.

CRITICAL — Compaction starvation at scale (FIXED in d56eab1)

The cache-aware skip used a 5% relative threshold that scaled linearly with totalAssembledTokens. In large contexts (e.g., Opus 1M at 700K assembled), the bar for "worth compacting" grew so high that leaf compaction was permanently suppressed — even at 93% of budget capacity.

Root cause: The cache skip short-circuited before the budget headroom check could override it.

Fix: Evaluate budget headroom FIRST. When over the headroom ceiling (budget pressure), bypass the cache-aware skip entirely. Also clamp leafBudgetHeadroomFactor to max 1.0 to prevent misconfiguration.

Scenario verification after fix:

Scenario Assembled Budget Result Correct?
A: Original issue (2K leafChunkTokens) 150K 200K COMPACT (budget pressure)
B: Small context, genuine pressure 90K 128K COMPACT
C: Opus 1M at 93% capacity 700K 1M COMPACT (budget wins) ✅ Fixed!
D: Orchestrator with headroom 40K 200K SKIP (headroom)
E: Rapid tool use at ceiling 150K 200K COMPACT

Other findings (non-blocking):

  • Test coverage: INSUFFICIENT — All existing tests mock evaluateLeafTrigger, so zero coverage of the actual skip logic. Recommend adding unit tests in a follow-up.
  • Config UX — Setting to 0 correctly disables each skip (escape hatch works). Neither setting appears in /lossless status diagnostic output.
  • Economics — Default 5%/0.8 is well-calibrated for Sonnet 200K. For Opus 1M sessions, recommend leafSkipReductionThreshold=0.03, leafBudgetHeadroomFactor=0.65. For Haiku quick tasks: 0.10, 0.90.

Addresses Copilot review round 2:

1. estimatedReduction was using rawTokensOutsideTail (all raw tokens)
   but a leaf pass only compacts one chunk capped at leafChunkTokens.
   Now uses Math.min(rawTokensOutsideTail, threshold) so the estimate
   reflects actual per-pass reduction.

2. Added leafSkipReductionThreshold and leafBudgetHeadroomFactor to
   openclaw.plugin.json configSchema (which has additionalProperties:
   false) so users can set them via plugin config, not just env vars.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds 16 new unit tests covering all paths through the cache-aware
and budget-headroom skip logic, plus 5 config resolution tests:

Skip logic tests (lcm-integration.test.ts):
- Basic threshold: below/above leafChunkTokens
- Budget headroom: skip when under ceiling, compact when over
- Budget headroom: bypassed when tokenBudget undefined
- Cache-aware: skip when reduction tiny relative to total context
- Cache-aware: compact when reduction large enough
- Budget pressure overrides cache skip (anti-starvation)
- Edge cases: empty conversation, negative reduction
- Config escape hatches: threshold=0 and factor=0 disable skips
- Factor clamped to 1.0 (misconfiguration protection)
- Orchestrator vs sub-agent: different budgets, different decisions
- Per-pass chunk size estimate uses min(raw, threshold)

Config tests (config.test.ts):
- Default values: 0.05 and 0.8
- Plugin config override
- Env var override
- Schema entries in manifest

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@100yenadmin 100yenadmin changed the title fix: cache-aware skip + budget headroom guard for leaf compaction trigger feat: cache-aware compaction guards with budget-pressure priority and per-tier tuning Apr 6, 2026
@100yenadmin 100yenadmin requested a review from Copilot April 6, 2026 13:04
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/lcm-integration.test.ts Outdated
Comment thread src/compaction.ts Outdated
Comment thread src/db/config.ts Outdated
Comment thread test/lcm-integration.test.ts Outdated
Comment thread test/lcm-integration.test.ts Outdated
…st fixes

Addresses 5 Copilot review comments:

1. leafBudgetHeadroomFactor=0 now correctly disables the headroom
   check (headroomEnabled=false) instead of creating false budget
   pressure that bypassed the cache-aware skip.

2. Config values clamped to [0,1] in resolveLcmConfig via clamp01().

3. Removed wasteful "x".repeat(summaryTokens*4) in test — mock store
   uses tokenCount directly, not content length.

4. Fixed leafChunkTokens=0 test — resolveLeafChunkTokens() normalizes
   non-positive to default. Use default threshold instead.

5. Updated factor=0 test comment to match corrected semantics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/compaction.ts Outdated
…nfo logging

Three hardening improvements from adversarial review:

1. Token accuracy: evaluateLeafTrigger now accepts optional
   liveContextTokens param and uses max(stored, live) for headroom
   decisions. Stored token counts can lag after rapid ingestion;
   the live estimate from afterTurn provides a more accurate floor.

2. Structured telemetry: LeafTriggerResult now includes a `context`
   field with all decision inputs (totalAssembledTokens, budgetCeiling,
   budgetPressure, estimatedReduction, reductionThreshold, headroomFactor).
   Enables machine-parseable diagnostics and config tuning.

3. Observability: Skip and fire decisions logged at info level (not
   debug). Compaction fires include assembled/pressure context.
   Volume is at most 1 log per turn — negligible.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/engine.ts
Eva and others added 2 commits April 6, 2026 20:52
The comment described when the cache-aware skip is applied but
did not precisely reflect the budgetPressure gate semantics after
the headroom refactor. Updated to accurately describe: budget
pressure is only true when headroom is enabled AND ceiling is
breached; otherwise cache-aware skip can fire.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skip decisions fire every turn in high-traffic sessions — too noisy
for info level. Compaction triggers are infrequent (~every 7-10 turns)
and worth info level as meaningful state changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/lcm-integration.test.ts Outdated
Eva and others added 2 commits April 6, 2026 21:13
Adds documentation for the cache-aware compaction feature across all
doc layers:

New: docs/compaction-tuning.md — standalone deep-dive covering:
  - TLDR quick-setup with copy-paste configs per model tier
  - Compaction model selection guide (why fast models matter)
  - Full lifecycle diagrams (Mermaid)
  - Cache-aware decision flowchart
  - Economics tables (cache miss penalty, break-even formula)
  - Gateway stall timing per model
  - Debugging guide for common issues

Updated: docs/architecture.md
  - Cache-aware skip guards section with Mermaid diagram
  - Budget pressure priority explanation
  - Prompt cache impact description

Updated: docs/configuration.md
  - leafSkipReductionThreshold and leafBudgetHeadroomFactor reference
  - Compaction model selection table
  - Per-tier preset summary with link to tuning guide

Updated: skills/lossless-claw/references/config.md
  - Added both new config fields to skill reference

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/architecture.md Outdated
Comment thread docs/configuration.md Outdated
Comment thread docs/compaction-tuning.md Outdated
Comment thread src/compaction.ts Outdated
Comment thread skills/lossless-claw/references/config.md Outdated
@100yenadmin
Copy link
Copy Markdown
Contributor Author

Merge order: 3rd — Merge after #288 and #294. Largest PR — cache-aware compaction + docs.

See #297 for the full sprint tracking issue with all 5 PRs.

Recommended merge sequence: #288#294#289#295#296

The code uses totalAssembledTokens < budgetCeiling for headroom
(strict less-than), so budget pressure fires at >= budgetCeiling.
Docs said 'exceed' which implies strict greater-than. Fixed to
'reach or exceed' across all 5 files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Thread observed token counts through leaf and threshold compaction workers so stale persisted counts do not suppress needed compaction after the outer trigger has already detected budget pressure. Add regression coverage for both the engine plumbing and the compaction engine stale-count path, correct the Sonnet 4.6 tuning guide to reflect its 1M context tier, and add the missing patch changeset.

Regeneration-Prompt: |
  Address review findings on PR 289 in the PR worktree without changing unrelated behavior. The bug is that afterTurn/evaluation can use a live current token estimate, but the actual compactLeaf and compactFullSweep worker paths re-check leaf trigger conditions using only stored DB token counts, which can lag ingestion and incorrectly skip compaction under real budget pressure. Thread the observed token count through those worker calls and add tests that prove stale stored counts no longer suppress leaf or threshold sweeps. Also fix the compaction tuning docs so Sonnet 4.6 is described consistently with the documented 1M context window, and add a patch changeset because the PR changes user-facing runtime behavior and docs.
@100yenadmin
Copy link
Copy Markdown
Contributor Author

Given the size here, splitting it up into multiple PR's @jalehman

@100yenadmin
Copy link
Copy Markdown
Contributor Author

Splitting this PR for easier review

This PR is +1047/-18 across 13 files — significantly larger than the others that merged quickly. To make review tractable, we're proposing to split it into 3 focused PRs:

PR A: Cache-aware leaf compaction guards (~400 lines, 21 tests)

The core feature. Rewrites evaluateLeafTrigger() with three skip guards that prevent unnecessary compaction when it would waste more in prompt-cache invalidation than it saves in tokens:

  1. Raw threshold gate — skip when assembled tokens haven't crossed the threshold (existing behavior)
  2. Budget headroom gate — skip when assembled < 80% of budget ceiling (leafBudgetHeadroomFactor)
  3. Reduction threshold gate — skip when estimated reduction < 5% of total tokens (leafSkipReductionThreshold)
  4. Budget pressure override — force compaction when context is >95% full regardless of other gates

Files: src/compaction.ts, src/db/config.ts, openclaw.plugin.json, test/lcm-integration.test.ts (16 tests), test/config.test.ts (5 tests)

PR B: Live token awareness (~60 lines, 5 tests)

Depends on PR A. Passes currentTokenCount (live observed counts) through to evaluateLeafTrigger() and compaction methods so headroom decisions use fresh data instead of stale stored counts.

Files: src/compaction.ts (4 signatures), src/engine.ts (2 call sites), test/lcm-integration.test.ts (2 tests), test/engine.test.ts (3 updates)

PR C: Documentation & tuning guide (~430 lines, 0 code)

Independent — can land anytime. New docs/compaction-tuning.md (356 lines) with model-tier presets, cache economics analysis, break-even formulas, and debugging checklist. Updates to architecture.md, configuration.md, skill references.


Note: The branch is 8 commits behind main (main now has #288, #294, #295, #296, #298, #302). Rebase has extensive conflicts. We'll create fresh branches from current main and reapply the changes for each split PR.

Should we proceed with this split, or would you prefer a different grouping? Happy to adjust the boundaries.

100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request Apr 7, 2026
On models with prompt caching (Claude, GPT-4), compaction that removes
3% of tokens costs more in cache-miss penalties than it saves. The
current trigger fires whenever assembledTokens > threshold × budget
regardless of how much compaction would actually remove.

Add three guard checks to evaluateLeafTrigger():

1. Budget headroom gate — skip when assembled < 80% of budget ceiling
   (leafBudgetHeadroomFactor, default 0.8, set 0 to disable)
2. Cache-aware reduction gate — skip when estimated reduction < 5% of
   total assembled tokens (leafSkipReductionThreshold, default 0.05)
3. Budget pressure override — force compaction when context reaches or
   exceeds the ceiling, preventing starvation in large contexts

Also passes currentTokenCount through compactLeaf/compactFullSweep so
headroom decisions use live observed counts when stored counts are stale.

Split from Martian-Engineering#289 for reviewability.
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request Apr 7, 2026
Pass tokenBudget and liveContextTokens from the engine's afterTurn
and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep
so cache-aware headroom decisions use fresh observed counts instead of
potentially stale stored values.

- evaluateLeafTrigger now receives tokenBudget + liveContextTokens
  from engine call sites
- compactLeaf/compactFullSweep receive currentTokenCount (observedTokens)
- afterTurn logs trigger context (assembled, pressure) on compaction
- afterTurn logs skip reason when guards prevent compaction
- CompactionConfig passes leafSkipReductionThreshold and
  leafBudgetHeadroomFactor from LcmConfig

Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request Apr 7, 2026
New comprehensive guide for operators tuning LCM compaction behavior:

- docs/compaction-tuning.md (356 lines): TLDR, per-tier model presets
  (Opus, Sonnet, Haiku, GPT-4o-mini, Gemini Flash), cache economics
  break-even formula, debugging checklist, orchestration scenarios
- docs/architecture.md: cache-aware guards section with Mermaid flowchart
- docs/configuration.md: new settings reference, model comparison table
- skills references: config field updates

Split from Martian-Engineering#289 (Part 3 of 3). Independent of Martian-Engineering#306 and Martian-Engineering#307.
@100yenadmin
Copy link
Copy Markdown
Contributor Author

Split complete. This PR is now covered by three focused PRs rebased on current main:

Closing this PR in favor of the split. All 1047 lines of additions are preserved across the three PRs.

@100yenadmin 100yenadmin closed this Apr 7, 2026
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request Apr 7, 2026
Restores two load-bearing inline comments from the original PR Martian-Engineering#289
that were lost during the split:

- 3-line headroomEnabled rationale: explains why the guard uses three
  conditions and that factor=0 disables without creating false pressure
- 8-line budget-pressure explanation: documents when pressure is true,
  when the cache-aware skip can fire, and the starvation prevention
  guarantee
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request Apr 7, 2026
Pass tokenBudget and liveContextTokens from the engine's afterTurn
and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep
so cache-aware headroom decisions use fresh observed counts instead of
potentially stale stored values.

- evaluateLeafTrigger now receives tokenBudget + liveContextTokens
  from engine call sites
- compactLeaf/compactFullSweep receive currentTokenCount (observedTokens)
- afterTurn logs trigger context (assembled, pressure) on compaction
- afterTurn logs skip reason when guards prevent compaction
- CompactionConfig passes leafSkipReductionThreshold and
  leafBudgetHeadroomFactor from LcmConfig

Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request Apr 7, 2026
Pass tokenBudget and liveContextTokens from the engine's afterTurn
and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep
so cache-aware headroom decisions use fresh observed counts instead of
potentially stale stored values.

- evaluateLeafTrigger now receives tokenBudget + liveContextTokens
  from engine call sites
- compactLeaf/compactFullSweep receive currentTokenCount (observedTokens)
- afterTurn logs trigger context (assembled, pressure) on compaction
- afterTurn logs skip reason when guards prevent compaction
- CompactionConfig passes leafSkipReductionThreshold and
  leafBudgetHeadroomFactor from LcmConfig

Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request Apr 7, 2026
Pass tokenBudget and liveContextTokens from the engine's afterTurn
and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep
so cache-aware headroom decisions use fresh observed counts instead of
potentially stale stored values.

- evaluateLeafTrigger now receives tokenBudget + liveContextTokens
  from engine call sites
- compactLeaf/compactFullSweep receive currentTokenCount (observedTokens)
- afterTurn logs trigger context (assembled, pressure) on compaction
- afterTurn logs skip reason when guards prevent compaction
- CompactionConfig passes leafSkipReductionThreshold and
  leafBudgetHeadroomFactor from LcmConfig

Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
liu51115 pushed a commit to liu51115/lossless-claw that referenced this pull request Apr 7, 2026
On models with prompt caching (Claude, GPT-4), compaction that removes
3% of tokens costs more in cache-miss penalties than it saves. The
current trigger fires whenever assembledTokens > threshold × budget
regardless of how much compaction would actually remove.

Add three guard checks to evaluateLeafTrigger():

1. Budget headroom gate — skip when assembled < 80% of budget ceiling
   (leafBudgetHeadroomFactor, default 0.8, set 0 to disable)
2. Cache-aware reduction gate — skip when estimated reduction < 5% of
   total assembled tokens (leafSkipReductionThreshold, default 0.05)
3. Budget pressure override — force compaction when context reaches or
   exceeds the ceiling, preventing starvation in large contexts

Also passes currentTokenCount through compactLeaf/compactFullSweep so
headroom decisions use live observed counts when stored counts are stale.

Split from Martian-Engineering#289 for reviewability.
liu51115 pushed a commit to liu51115/lossless-claw that referenced this pull request Apr 7, 2026
Restores two load-bearing inline comments from the original PR Martian-Engineering#289
that were lost during the split:

- 3-line headroomEnabled rationale: explains why the guard uses three
  conditions and that factor=0 disables without creating false pressure
- 8-line budget-pressure explanation: documents when pressure is true,
  when the cache-aware skip can fire, and the starvation prevention
  guarantee
liu51115 pushed a commit to liu51115/lossless-claw that referenced this pull request Apr 7, 2026
Pass tokenBudget and liveContextTokens from the engine's afterTurn
and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep
so cache-aware headroom decisions use fresh observed counts instead of
potentially stale stored values.

- evaluateLeafTrigger now receives tokenBudget + liveContextTokens
  from engine call sites
- compactLeaf/compactFullSweep receive currentTokenCount (observedTokens)
- afterTurn logs trigger context (assembled, pressure) on compaction
- afterTurn logs skip reason when guards prevent compaction
- CompactionConfig passes leafSkipReductionThreshold and
  leafBudgetHeadroomFactor from LcmConfig

Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Leaf compaction trigger destabilizes Anthropic prompt cache on high-traffic conversations

3 participants