feat: wire live context token counts through engine to compaction guards#307
feat: wire live context token counts through engine to compaction guards#307100yenadmin wants to merge 10 commits intoMartian-Engineering:mainfrom
Conversation
On models with prompt caching (Claude, GPT-4), compaction that removes 3% of tokens costs more in cache-miss penalties than it saves. The current trigger fires whenever assembledTokens > threshold × budget regardless of how much compaction would actually remove. Add three guard checks to evaluateLeafTrigger(): 1. Budget headroom gate — skip when assembled < 80% of budget ceiling (leafBudgetHeadroomFactor, default 0.8, set 0 to disable) 2. Cache-aware reduction gate — skip when estimated reduction < 5% of total assembled tokens (leafSkipReductionThreshold, default 0.05) 3. Budget pressure override — force compaction when context reaches or exceeds the ceiling, preventing starvation in large contexts Also passes currentTokenCount through compactLeaf/compactFullSweep so headroom decisions use live observed counts when stored counts are stale. Split from Martian-Engineering#289 for reviewability.
New comprehensive guide for operators tuning LCM compaction behavior: - docs/compaction-tuning.md (356 lines): TLDR, per-tier model presets (Opus, Sonnet, Haiku, GPT-4o-mini, Gemini Flash), cache economics break-even formula, debugging checklist, orchestration scenarios - docs/architecture.md: cache-aware guards section with Mermaid flowchart - docs/configuration.md: new settings reference, model comparison table - skills references: config field updates Split from Martian-Engineering#289 (Part 3 of 3). Independent of Martian-Engineering#306 and Martian-Engineering#307.
There was a problem hiding this comment.
Pull request overview
Wires live/observed context token counts from the engine layer into compaction trigger guards so headroom/skip decisions use up-to-date context sizing rather than potentially stale stored counts.
Changes:
- Pass
tokenBudget+ live token estimates intoevaluateLeafTriggerfromafterTurn, and passcurrentTokenCountinto leaf/full-sweep compaction paths. - Extend leaf-trigger evaluation to return structured skip diagnostics and log trigger/skip context from the engine.
- Update/add tests to assert the new parameter plumbing and skip-guard behavior; add config defaults/schema coverage for the new guard knobs.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/engine.ts |
Plumbs live token counts into trigger + compaction calls; adds trigger/skip logging; passes guard config into compaction config. |
src/compaction.ts |
Extends evaluateLeafTrigger with headroom/cache-aware guards using live vs stored counts; threads currentTokenCount into leaf/sweep trigger evaluation. |
src/db/config.ts |
Adds leafSkipReductionThreshold and leafBudgetHeadroomFactor to resolved config (env/plugin/default). |
openclaw.plugin.json |
Exposes the two new config options via schema + UI hints. |
test/engine.test.ts |
Updates expectations for new evaluateLeafTrigger signature and asserts currentTokenCount plumbing (incl. async worker). |
test/config.test.ts |
Adds tests for defaults, plugin config, env overrides, and manifest schema for the new config fields. |
test/lcm-integration.test.ts |
Adds integration coverage for “stale stored vs live tokens” and an additional skip-guard-focused suite. |
.changeset/cache-aware-compaction-guards.md |
Adds a changeset entry for the feature. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Restores two load-bearing inline comments from the original PR Martian-Engineering#289 that were lost during the split: - 3-line headroomEnabled rationale: explains why the guard uses three conditions and that factor=0 disables without creating false pressure - 8-line budget-pressure explanation: documents when pressure is true, when the cache-aware skip can fire, and the starvation prevention guarantee
f511eea to
3a48d12
Compare
- Fix changeset file to use standard frontmatter delimiters - Normalize liveContextTokens with Number.isFinite/Math.floor guard to prevent NaN/Infinity from corrupting headroom calculations (mirrors the pattern used in evaluate())
3a48d12 to
60b4a31
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Pass tokenBudget and liveContextTokens from the engine's afterTurn and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep so cache-aware headroom decisions use fresh observed counts instead of potentially stale stored values. - evaluateLeafTrigger now receives tokenBudget + liveContextTokens from engine call sites - compactLeaf/compactFullSweep receive currentTokenCount (observedTokens) - afterTurn logs trigger context (assembled, pressure) on compaction - afterTurn logs skip reason when guards prevent compaction - CompactionConfig passes leafSkipReductionThreshold and leafBudgetHeadroomFactor from LcmConfig Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
Adds negative test ensuring compactLeafAsync does not pass currentTokenCount to compaction.compactLeaf when the caller omits it, preventing undefined from leaking into headroom math.
…mport - compact() wrapper now includes currentTokenCount in its input type so TS excess-property checks pass and live counts flow through to compactFullSweep - engine.ts evaluateLeafTrigger uses imported LeafTriggerResult type instead of duplicating the shape inline, preventing type drift
- Document LCM_LEAF_SKIP_REDUCTION_THRESHOLD, LCM_LEAF_BUDGET_HEADROOM_FACTOR, and LCM_FALLBACK_PROVIDERS in the README environment variable table - Replace 12KB string literals in stale-token tests with short strings since tokenCountFn overrides the count anyway
60b4a31 to
8ea347f
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Users have no visibility into whether LCM compaction is saving or wasting money. This adds persistent event tracking, cost estimation, and efficiency reporting. Changes: - New compaction_events table (SQLite migration) records each compaction pass with token counts and model name - Static pricing table (pricing.ts) for cost estimation with fuzzy model prefix matching (11 models covered) - /lossless status gains an efficiency section showing passes, tokens saved, compaction cost, net efficiency, and recommendations - New /lossless efficiency subcommand with per-model breakdown and actionable recommendations (e.g., "Switch from Opus to Haiku") - persistCompactionEvent() now inserts DB row alongside console log - Best-effort recording — doesn't fail compaction if table is missing Closes Martian-Engineering#309. Depends on Martian-Engineering#306 and Martian-Engineering#307.
Skip the DB read for storedTokens when leafBudgetHeadroomFactor=0 AND leafSkipReductionThreshold=0, since neither guard will run. Also add boundary-value tests for clamp01 with out-of-range inputs.
Users have no visibility into whether LCM compaction is saving or wasting money. This adds persistent event tracking, cost estimation, and efficiency reporting. Changes: - New compaction_events table (SQLite migration) records each compaction pass with token counts and model name - Static pricing table (pricing.ts) for cost estimation with fuzzy model prefix matching (11 models covered) - /lossless status gains an efficiency section showing passes, tokens saved, compaction cost, net efficiency, and recommendations - New /lossless efficiency subcommand with per-model breakdown and actionable recommendations (e.g., "Switch from Opus to Haiku") - persistCompactionEvent() now inserts DB row alongside console log - Best-effort recording — doesn't fail compaction if table is missing Closes Martian-Engineering#309. Depends on Martian-Engineering#306 and Martian-Engineering#307.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Users have no visibility into whether LCM compaction is saving or wasting money. This adds persistent event tracking, cost estimation, and efficiency reporting. Changes: - New compaction_events table (SQLite migration) records each compaction pass with token counts and model name - Static pricing table (pricing.ts) for cost estimation with fuzzy model prefix matching (11 models covered) - /lossless status gains an efficiency section showing passes, tokens saved, compaction cost, net efficiency, and recommendations - New /lossless efficiency subcommand with per-model breakdown and actionable recommendations (e.g., "Switch from Opus to Haiku") - persistCompactionEvent() now inserts DB row alongside console log - Best-effort recording — doesn't fail compaction if table is missing Closes Martian-Engineering#309. Depends on Martian-Engineering#306 and Martian-Engineering#307.
|
This branch looks functionally superseded now. The runtime behavior it was pushing toward landed through #318 and #329, and the branch itself is now conflicting with current I'm treating this as replaceable history rather than something worth reviving directly. The live follow-up work from the cost/compaction sweep will build on current |
Summary
Wires live observed token counts from the engine layer into the compaction guards added in #306, so headroom and cache-aware skip decisions use fresh data instead of potentially stale stored counts. Also propagates the new config fields (
leafSkipReductionThreshold,leafBudgetHeadroomFactor) fromLcmConfigintoCompactionConfig.Part 2 of 3 from #289 split. Depends on #306. Merge order: #306 → #307 → #308.
The Problem: Stale Token Counts Cause Wrong Guard Decisions
The compaction guards in #306 decide whether to skip or compact based on
totalAssembledTokens— a value derived from the stored token count in the database. But stored counts can lag behind reality:Stale-low scenario (most dangerous)
After rapid message ingestion (e.g., a tool that emits 15 messages in 1 second), the DB count hasn't caught up:
Without live counts, the headroom guard sees 30K < 60K → SKIP. But the context is actually 75K — well over the ceiling. The guard should detect budget pressure and COMPACT.
The fix:
max(stored, live)By passing
liveContextTokens(fromestimateSessionTokenCountForAfterTurn) through toevaluateLeafTrigger, the guard uses whichever count is higher:This is the safe/conservative choice — it errs on the side of compacting when counts disagree, which is the correct bias for preventing context overflow.
What This PR Wires
Engine → CompactionEngine config
LcmConfignow flowsleafSkipReductionThresholdandleafBudgetHeadroomFactorinto theCompactionConfigobject, so plugin/env var overrides actually take effect at runtime (without this, the guards always use their internal defaults).afterTurn → evaluateLeafTrigger
The engine's
afterTurn()already computesliveContextTokens— this PR passes it (along withtokenBudget) through toevaluateLeafTrigger()so the guards can make informed decisions.afterTurn logging
New structured logging for compaction decisions:
[lcm] afterTurn: leaf compaction triggered (raw=24000, threshold=20000, assembled=548000, pressure=true)[lcm] afterTurn: leaf compaction skipped (budget-headroom: 30000 assembled < 120000 ceiling)These match the log lines documented in the tuning guide (#308).
compactLeaf/compactFullSweep → currentTokenCount
Both paths now pass
observedTokens(fromnormalizeObservedTokenCount) ascurrentTokenCount, whichevaluateLeafTriggeruses asprecomputedTokenCountto avoid a duplicate DB read.compact() type fix
compact()wrapper now includescurrentTokenCountin its input type so TypeScript excess-property checks pass and the value flows through tocompactFullSweep.LeafTriggerResult import
Engine's
evaluateLeafTriggernow importsLeafTriggerResultfromcompaction.tsinstead of re-declaring the shape inline, preventing type drift.README env var table
Added
LCM_LEAF_SKIP_REDUCTION_THRESHOLD,LCM_LEAF_BUDGET_HEADROOM_FACTOR, andLCM_FALLBACK_PROVIDERSto the README environment variable reference table.Changes by File
src/engine.tstokenBudget+liveContextTokenstoevaluateLeafTriggerfrom afterTurn. PasscurrentTokenCounttocompactLeaf/compact. ImportLeafTriggerResult. Add trigger/skip logging.src/compaction.tscurrentTokenCounttocompact()input type.test/engine.test.tsevaluateLeafTriggerassertion for 4-arg signature. AddcurrentTokenCountto compact plumbing assertions. New test:compactLeafAsyncpassescurrentTokenCount. New test: omission when not provided.test/lcm-integration.test.tsREADME.mdTest Plan
evaluateLeafTriggercalled with(sessionId, sessionKey, tokenBudget, liveContextTokens)currentTokenCount: 500flows through compact plumbing tocompactFullSweepcurrentTokenCountomitted fromcompactLeafcall when not provided (no undefined leakage)