feat: wire live context token counts through engine to compaction guards by 100yenadmin · Pull Request #307 · Martian-Engineering/lossless-claw

100yenadmin · 2026-04-07T06:19:11Z

Summary

Wires live observed token counts from the engine layer into the compaction guards added in #306, so headroom and cache-aware skip decisions use fresh data instead of potentially stale stored counts. Also propagates the new config fields (leafSkipReductionThreshold, leafBudgetHeadroomFactor) from LcmConfig into CompactionConfig.

Part 2 of 3 from #289 split. Depends on #306. Merge order: #306 → #307 → #308.

The Problem: Stale Token Counts Cause Wrong Guard Decisions

The compaction guards in #306 decide whether to skip or compact based on totalAssembledTokens — a value derived from the stored token count in the database. But stored counts can lag behind reality:

Stale-low scenario (most dangerous)

After rapid message ingestion (e.g., a tool that emits 15 messages in 1 second), the DB count hasn't caught up:

Stored count: 30K (stale from 2 seconds ago)
Live count: 75K (actual prompt tokens the API will see)
Budget ceiling: 60K

Without live counts, the headroom guard sees 30K < 60K → SKIP. But the context is actually 75K — well over the ceiling. The guard should detect budget pressure and COMPACT.

The fix: `max(stored, live)`

By passing liveContextTokens (from estimateSessionTokenCountForAfterTurn) through to evaluateLeafTrigger, the guard uses whichever count is higher:

Stale-low stored → uses live (prevents missed compaction)
Stale-high stored → uses stored (conservative, prevents premature skip)

This is the safe/conservative choice — it errs on the side of compacting when counts disagree, which is the correct bias for preventing context overflow.

What This PR Wires

Engine → CompactionEngine config

LcmConfig now flows leafSkipReductionThreshold and leafBudgetHeadroomFactor into the CompactionConfig object, so plugin/env var overrides actually take effect at runtime (without this, the guards always use their internal defaults).

afterTurn → evaluateLeafTrigger

The engine's afterTurn() already computes liveContextTokens — this PR passes it (along with tokenBudget) through to evaluateLeafTrigger() so the guards can make informed decisions.

afterTurn logging

New structured logging for compaction decisions:

Trigger: [lcm] afterTurn: leaf compaction triggered (raw=24000, threshold=20000, assembled=548000, pressure=true)
Skip: [lcm] afterTurn: leaf compaction skipped (budget-headroom: 30000 assembled < 120000 ceiling)

These match the log lines documented in the tuning guide (#308).

compactLeaf/compactFullSweep → currentTokenCount

Both paths now pass observedTokens (from normalizeObservedTokenCount) as currentTokenCount, which evaluateLeafTrigger uses as precomputedTokenCount to avoid a duplicate DB read.

compact() type fix

compact() wrapper now includes currentTokenCount in its input type so TypeScript excess-property checks pass and the value flows through to compactFullSweep.

LeafTriggerResult import

Engine's evaluateLeafTrigger now imports LeafTriggerResult from compaction.ts instead of re-declaring the shape inline, preventing type drift.

README env var table

Added LCM_LEAF_SKIP_REDUCTION_THRESHOLD, LCM_LEAF_BUDGET_HEADROOM_FACTOR, and LCM_FALLBACK_PROVIDERS to the README environment variable reference table.

Changes by File

File	Lines	Change
`src/engine.ts`	+26/-3	Pass config fields to CompactionConfig. Pass `tokenBudget` + `liveContextTokens` to `evaluateLeafTrigger` from afterTurn. Pass `currentTokenCount` to `compactLeaf`/`compact`. Import `LeafTriggerResult`. Add trigger/skip logging.
`src/compaction.ts`	+1	Add `currentTokenCount` to `compact()` input type.
`test/engine.test.ts`	+48/-1	Update `evaluateLeafTrigger` assertion for 4-arg signature. Add `currentTokenCount` to compact plumbing assertions. New test: `compactLeafAsync` passes `currentTokenCount`. New test: omission when not provided.
`test/lcm-integration.test.ts`	+2	Shrink test message bodies (was 12KB strings, now short descriptive text).
`README.md`	+3	Add 3 env vars to reference table.

Test Plan

204 tests passing (122 engine + 82 integration)
evaluateLeafTrigger called with (sessionId, sessionKey, tokenBudget, liveContextTokens)
currentTokenCount: 500 flows through compact plumbing to compactFullSweep
currentTokenCount omitted from compactLeaf call when not provided (no undefined leakage)
Stale-token integration tests: compactLeaf and compactFullSweep trigger with live counts

On models with prompt caching (Claude, GPT-4), compaction that removes 3% of tokens costs more in cache-miss penalties than it saves. The current trigger fires whenever assembledTokens > threshold × budget regardless of how much compaction would actually remove. Add three guard checks to evaluateLeafTrigger(): 1. Budget headroom gate — skip when assembled < 80% of budget ceiling (leafBudgetHeadroomFactor, default 0.8, set 0 to disable) 2. Cache-aware reduction gate — skip when estimated reduction < 5% of total assembled tokens (leafSkipReductionThreshold, default 0.05) 3. Budget pressure override — force compaction when context reaches or exceeds the ceiling, preventing starvation in large contexts Also passes currentTokenCount through compactLeaf/compactFullSweep so headroom decisions use live observed counts when stored counts are stale. Split from Martian-Engineering#289 for reviewability.

New comprehensive guide for operators tuning LCM compaction behavior: - docs/compaction-tuning.md (356 lines): TLDR, per-tier model presets (Opus, Sonnet, Haiku, GPT-4o-mini, Gemini Flash), cache economics break-even formula, debugging checklist, orchestration scenarios - docs/architecture.md: cache-aware guards section with Mermaid flowchart - docs/configuration.md: new settings reference, model comparison table - skills references: config field updates Split from Martian-Engineering#289 (Part 3 of 3). Independent of Martian-Engineering#306 and Martian-Engineering#307.

Copilot

Pull request overview

Wires live/observed context token counts from the engine layer into compaction trigger guards so headroom/skip decisions use up-to-date context sizing rather than potentially stale stored counts.

Changes:

Pass tokenBudget + live token estimates into evaluateLeafTrigger from afterTurn, and pass currentTokenCount into leaf/full-sweep compaction paths.
Extend leaf-trigger evaluation to return structured skip diagnostics and log trigger/skip context from the engine.
Update/add tests to assert the new parameter plumbing and skip-guard behavior; add config defaults/schema coverage for the new guard knobs.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`src/engine.ts`	Plumbs live token counts into trigger + compaction calls; adds trigger/skip logging; passes guard config into compaction config.
`src/compaction.ts`	Extends `evaluateLeafTrigger` with headroom/cache-aware guards using live vs stored counts; threads `currentTokenCount` into leaf/sweep trigger evaluation.
`src/db/config.ts`	Adds `leafSkipReductionThreshold` and `leafBudgetHeadroomFactor` to resolved config (env/plugin/default).
`openclaw.plugin.json`	Exposes the two new config options via schema + UI hints.
`test/engine.test.ts`	Updates expectations for new `evaluateLeafTrigger` signature and asserts `currentTokenCount` plumbing (incl. async worker).
`test/config.test.ts`	Adds tests for defaults, plugin config, env overrides, and manifest schema for the new config fields.
`test/lcm-integration.test.ts`	Adds integration coverage for “stale stored vs live tokens” and an additional skip-guard-focused suite.
`.changeset/cache-aware-compaction-guards.md`	Adds a changeset entry for the feature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Restores two load-bearing inline comments from the original PR Martian-Engineering#289 that were lost during the split: - 3-line headroomEnabled rationale: explains why the guard uses three conditions and that factor=0 disables without creating false pressure - 8-line budget-pressure explanation: documents when pressure is true, when the cache-aware skip can fire, and the starvation prevention guarantee

- Fix changeset file to use standard frontmatter delimiters - Normalize liveContextTokens with Number.isFinite/Math.floor guard to prevent NaN/Infinity from corrupting headroom calculations (mirrors the pattern used in evaluate())

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ineering#308)

Pass tokenBudget and liveContextTokens from the engine's afterTurn and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep so cache-aware headroom decisions use fresh observed counts instead of potentially stale stored values. - evaluateLeafTrigger now receives tokenBudget + liveContextTokens from engine call sites - compactLeaf/compactFullSweep receive currentTokenCount (observedTokens) - afterTurn logs trigger context (assembled, pressure) on compaction - afterTurn logs skip reason when guards prevent compaction - CompactionConfig passes leafSkipReductionThreshold and leafBudgetHeadroomFactor from LcmConfig Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.

Adds negative test ensuring compactLeafAsync does not pass currentTokenCount to compaction.compactLeaf when the caller omits it, preventing undefined from leaking into headroom math.

…mport - compact() wrapper now includes currentTokenCount in its input type so TS excess-property checks pass and live counts flow through to compactFullSweep - engine.ts evaluateLeafTrigger uses imported LeafTriggerResult type instead of duplicating the shape inline, preventing type drift

- Document LCM_LEAF_SKIP_REDUCTION_THRESHOLD, LCM_LEAF_BUDGET_HEADROOM_FACTOR, and LCM_FALLBACK_PROVIDERS in the README environment variable table - Replace 12KB string literals in stale-token tests with short strings since tokenCountFn overrides the count anyway

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Users have no visibility into whether LCM compaction is saving or wasting money. This adds persistent event tracking, cost estimation, and efficiency reporting. Changes: - New compaction_events table (SQLite migration) records each compaction pass with token counts and model name - Static pricing table (pricing.ts) for cost estimation with fuzzy model prefix matching (11 models covered) - /lossless status gains an efficiency section showing passes, tokens saved, compaction cost, net efficiency, and recommendations - New /lossless efficiency subcommand with per-model breakdown and actionable recommendations (e.g., "Switch from Opus to Haiku") - persistCompactionEvent() now inserts DB row alongside console log - Best-effort recording — doesn't fail compaction if table is missing Closes Martian-Engineering#309. Depends on Martian-Engineering#306 and Martian-Engineering#307.

Skip the DB read for storedTokens when leafBudgetHeadroomFactor=0 AND leafSkipReductionThreshold=0, since neither guard will run. Also add boundary-value tests for clamp01 with out-of-range inputs.

Users have no visibility into whether LCM compaction is saving or wasting money. This adds persistent event tracking, cost estimation, and efficiency reporting. Changes: - New compaction_events table (SQLite migration) records each compaction pass with token counts and model name - Static pricing table (pricing.ts) for cost estimation with fuzzy model prefix matching (11 models covered) - /lossless status gains an efficiency section showing passes, tokens saved, compaction cost, net efficiency, and recommendations - New /lossless efficiency subcommand with per-model breakdown and actionable recommendations (e.g., "Switch from Opus to Haiku") - persistCompactionEvent() now inserts DB row alongside console log - Best-effort recording — doesn't fail compaction if table is missing Closes Martian-Engineering#309. Depends on Martian-Engineering#306 and Martian-Engineering#307.

Copilot

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Users have no visibility into whether LCM compaction is saving or wasting money. This adds persistent event tracking, cost estimation, and efficiency reporting. Changes: - New compaction_events table (SQLite migration) records each compaction pass with token counts and model name - Static pricing table (pricing.ts) for cost estimation with fuzzy model prefix matching (11 models covered) - /lossless status gains an efficiency section showing passes, tokens saved, compaction cost, net efficiency, and recommendations - New /lossless efficiency subcommand with per-model breakdown and actionable recommendations (e.g., "Switch from Opus to Haiku") - persistCompactionEvent() now inserts DB row alongside console log - Best-effort recording — doesn't fail compaction if table is missing Closes Martian-Engineering#309. Depends on Martian-Engineering#306 and Martian-Engineering#307.

100yenadmin · 2026-04-10T13:30:04Z

This branch looks functionally superseded now. The runtime behavior it was pushing toward landed through #318 and #329, and the branch itself is now conflicting with current main.

I'm treating this as replaceable history rather than something worth reviving directly. The live follow-up work from the cost/compaction sweep will build on current main, not on this branch.

100yenadmin · 2026-04-10T13:30:09Z

Closing as superseded by merged runtime work on current main, primarily #318 and #329. Any remaining cost/compaction follow-up will be rebased onto current main rather than carried on this conflicting branch.

Copilot AI review requested due to automatic review settings April 7, 2026 06:19

Copilot started reviewing on behalf of 100yenadmin April 7, 2026 06:19 View session

100yenadmin mentioned this pull request Apr 7, 2026

feat: cache-aware compaction guards with budget-pressure priority and per-tier tuning #289

Closed

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread src/engine.ts

Comment thread src/engine.ts Outdated

Comment thread .changeset/cache-aware-compaction-guards.md Outdated

100yenadmin force-pushed the feat/live-token-compaction-awareness branch from f511eea to 3a48d12 Compare April 7, 2026 06:49

100yenadmin force-pushed the feat/live-token-compaction-awareness branch from 3a48d12 to 60b4a31 Compare April 7, 2026 06:53

100yenadmin mentioned this pull request Apr 7, 2026

docs: compaction tuning guide with per-tier presets and cache economics #308

Open

100yenadmin requested a review from Copilot April 7, 2026 07:02

Copilot started reviewing on behalf of 100yenadmin April 7, 2026 07:03 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread src/compaction.ts

Comment thread src/db/config.ts

Comment thread test/lcm-integration.test.ts

Eva added 5 commits April 7, 2026 14:19

fix: remove 'per-tier tuning' from changeset (docs are in Martian-Eng…

120504d

…ineering#308)

test: verify currentTokenCount is omitted when not provided

615e8a0

Adds negative test ensuring compactLeafAsync does not pass currentTokenCount to compaction.compactLeaf when the caller omits it, preventing undefined from leaking into headroom math.

100yenadmin force-pushed the feat/live-token-compaction-awareness branch from 60b4a31 to 8ea347f Compare April 7, 2026 07:21

100yenadmin requested a review from Copilot April 7, 2026 07:30

Copilot started reviewing on behalf of 100yenadmin April 7, 2026 07:31 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread README.md Outdated

100yenadmin mentioned this pull request Apr 7, 2026

feat: cache-aware leaf compaction guards with budget-pressure override #306

Closed

6 tasks

fix: correct fallbackProviders format to provider/model in README

87c7df2

100yenadmin mentioned this pull request Apr 7, 2026

feat: compaction efficiency tracker in /lossless command #309

Open

100yenadmin requested a review from Copilot April 7, 2026 08:31

Copilot started reviewing on behalf of 100yenadmin April 7, 2026 08:32 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

Comment thread src/compaction.ts

Comment thread src/db/config.ts

100yenadmin mentioned this pull request Apr 7, 2026

feat: compaction efficiency tracker in /lossless command #310

Closed

6 tasks

perf: short-circuit evaluateLeafTrigger when both guards disabled

9b9a04b

Skip the DB read for storedTokens when leafBudgetHeadroomFactor=0 AND leafSkipReductionThreshold=0, since neither guard will run. Also add boundary-value tests for clamp01 with out-of-range inputs.

100yenadmin requested a review from Copilot April 7, 2026 09:32

Copilot started reviewing on behalf of 100yenadmin April 7, 2026 09:33 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

100yenadmin mentioned this pull request Apr 7, 2026

feat: track actual compaction model through summarizer fallback chain #314

Closed

4 tasks

100yenadmin closed this Apr 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: wire live context token counts through engine to compaction guards#307

feat: wire live context token counts through engine to compaction guards#307
100yenadmin wants to merge 10 commits intoMartian-Engineering:mainfrom
electricsheephq:feat/live-token-compaction-awareness

100yenadmin commented Apr 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

100yenadmin commented Apr 10, 2026

Uh oh!

100yenadmin commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

100yenadmin commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The Problem: Stale Token Counts Cause Wrong Guard Decisions

Stale-low scenario (most dangerous)

The fix: max(stored, live)

What This PR Wires

Engine → CompactionEngine config

afterTurn → evaluateLeafTrigger

afterTurn logging

compactLeaf/compactFullSweep → currentTokenCount

compact() type fix

LeafTriggerResult import

README env var table

Changes by File

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

100yenadmin commented Apr 10, 2026

Uh oh!

100yenadmin commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

100yenadmin commented Apr 7, 2026 •

edited

Loading

The fix: `max(stored, live)`