Skip to content

feat: wire live context token counts through engine to compaction guards#307

Closed
100yenadmin wants to merge 10 commits intoMartian-Engineering:mainfrom
electricsheephq:feat/live-token-compaction-awareness
Closed

feat: wire live context token counts through engine to compaction guards#307
100yenadmin wants to merge 10 commits intoMartian-Engineering:mainfrom
electricsheephq:feat/live-token-compaction-awareness

Conversation

@100yenadmin
Copy link
Copy Markdown
Contributor

@100yenadmin 100yenadmin commented Apr 7, 2026

Summary

Wires live observed token counts from the engine layer into the compaction guards added in #306, so headroom and cache-aware skip decisions use fresh data instead of potentially stale stored counts. Also propagates the new config fields (leafSkipReductionThreshold, leafBudgetHeadroomFactor) from LcmConfig into CompactionConfig.

Part 2 of 3 from #289 split. Depends on #306. Merge order: #306#307#308.


The Problem: Stale Token Counts Cause Wrong Guard Decisions

The compaction guards in #306 decide whether to skip or compact based on totalAssembledTokens — a value derived from the stored token count in the database. But stored counts can lag behind reality:

Stale-low scenario (most dangerous)

After rapid message ingestion (e.g., a tool that emits 15 messages in 1 second), the DB count hasn't caught up:

  • Stored count: 30K (stale from 2 seconds ago)
  • Live count: 75K (actual prompt tokens the API will see)
  • Budget ceiling: 60K

Without live counts, the headroom guard sees 30K < 60K → SKIP. But the context is actually 75K — well over the ceiling. The guard should detect budget pressure and COMPACT.

The fix: max(stored, live)

By passing liveContextTokens (from estimateSessionTokenCountForAfterTurn) through to evaluateLeafTrigger, the guard uses whichever count is higher:

  • Stale-low stored → uses live (prevents missed compaction)
  • Stale-high stored → uses stored (conservative, prevents premature skip)

This is the safe/conservative choice — it errs on the side of compacting when counts disagree, which is the correct bias for preventing context overflow.


What This PR Wires

Engine → CompactionEngine config

LcmConfig now flows leafSkipReductionThreshold and leafBudgetHeadroomFactor into the CompactionConfig object, so plugin/env var overrides actually take effect at runtime (without this, the guards always use their internal defaults).

afterTurn → evaluateLeafTrigger

The engine's afterTurn() already computes liveContextTokens — this PR passes it (along with tokenBudget) through to evaluateLeafTrigger() so the guards can make informed decisions.

afterTurn logging

New structured logging for compaction decisions:

  • Trigger: [lcm] afterTurn: leaf compaction triggered (raw=24000, threshold=20000, assembled=548000, pressure=true)
  • Skip: [lcm] afterTurn: leaf compaction skipped (budget-headroom: 30000 assembled < 120000 ceiling)

These match the log lines documented in the tuning guide (#308).

compactLeaf/compactFullSweep → currentTokenCount

Both paths now pass observedTokens (from normalizeObservedTokenCount) as currentTokenCount, which evaluateLeafTrigger uses as precomputedTokenCount to avoid a duplicate DB read.

compact() type fix

compact() wrapper now includes currentTokenCount in its input type so TypeScript excess-property checks pass and the value flows through to compactFullSweep.

LeafTriggerResult import

Engine's evaluateLeafTrigger now imports LeafTriggerResult from compaction.ts instead of re-declaring the shape inline, preventing type drift.

README env var table

Added LCM_LEAF_SKIP_REDUCTION_THRESHOLD, LCM_LEAF_BUDGET_HEADROOM_FACTOR, and LCM_FALLBACK_PROVIDERS to the README environment variable reference table.


Changes by File

File Lines Change
src/engine.ts +26/-3 Pass config fields to CompactionConfig. Pass tokenBudget + liveContextTokens to evaluateLeafTrigger from afterTurn. Pass currentTokenCount to compactLeaf/compact. Import LeafTriggerResult. Add trigger/skip logging.
src/compaction.ts +1 Add currentTokenCount to compact() input type.
test/engine.test.ts +48/-1 Update evaluateLeafTrigger assertion for 4-arg signature. Add currentTokenCount to compact plumbing assertions. New test: compactLeafAsync passes currentTokenCount. New test: omission when not provided.
test/lcm-integration.test.ts +2 Shrink test message bodies (was 12KB strings, now short descriptive text).
README.md +3 Add 3 env vars to reference table.

Test Plan

  • 204 tests passing (122 engine + 82 integration)
  • evaluateLeafTrigger called with (sessionId, sessionKey, tokenBudget, liveContextTokens)
  • currentTokenCount: 500 flows through compact plumbing to compactFullSweep
  • currentTokenCount omitted from compactLeaf call when not provided (no undefined leakage)
  • Stale-token integration tests: compactLeaf and compactFullSweep trigger with live counts

On models with prompt caching (Claude, GPT-4), compaction that removes
3% of tokens costs more in cache-miss penalties than it saves. The
current trigger fires whenever assembledTokens > threshold × budget
regardless of how much compaction would actually remove.

Add three guard checks to evaluateLeafTrigger():

1. Budget headroom gate — skip when assembled < 80% of budget ceiling
   (leafBudgetHeadroomFactor, default 0.8, set 0 to disable)
2. Cache-aware reduction gate — skip when estimated reduction < 5% of
   total assembled tokens (leafSkipReductionThreshold, default 0.05)
3. Budget pressure override — force compaction when context reaches or
   exceeds the ceiling, preventing starvation in large contexts

Also passes currentTokenCount through compactLeaf/compactFullSweep so
headroom decisions use live observed counts when stored counts are stale.

Split from Martian-Engineering#289 for reviewability.
Copilot AI review requested due to automatic review settings April 7, 2026 06:19
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request Apr 7, 2026
New comprehensive guide for operators tuning LCM compaction behavior:

- docs/compaction-tuning.md (356 lines): TLDR, per-tier model presets
  (Opus, Sonnet, Haiku, GPT-4o-mini, Gemini Flash), cache economics
  break-even formula, debugging checklist, orchestration scenarios
- docs/architecture.md: cache-aware guards section with Mermaid flowchart
- docs/configuration.md: new settings reference, model comparison table
- skills references: config field updates

Split from Martian-Engineering#289 (Part 3 of 3). Independent of Martian-Engineering#306 and Martian-Engineering#307.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Wires live/observed context token counts from the engine layer into compaction trigger guards so headroom/skip decisions use up-to-date context sizing rather than potentially stale stored counts.

Changes:

  • Pass tokenBudget + live token estimates into evaluateLeafTrigger from afterTurn, and pass currentTokenCount into leaf/full-sweep compaction paths.
  • Extend leaf-trigger evaluation to return structured skip diagnostics and log trigger/skip context from the engine.
  • Update/add tests to assert the new parameter plumbing and skip-guard behavior; add config defaults/schema coverage for the new guard knobs.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/engine.ts Plumbs live token counts into trigger + compaction calls; adds trigger/skip logging; passes guard config into compaction config.
src/compaction.ts Extends evaluateLeafTrigger with headroom/cache-aware guards using live vs stored counts; threads currentTokenCount into leaf/sweep trigger evaluation.
src/db/config.ts Adds leafSkipReductionThreshold and leafBudgetHeadroomFactor to resolved config (env/plugin/default).
openclaw.plugin.json Exposes the two new config options via schema + UI hints.
test/engine.test.ts Updates expectations for new evaluateLeafTrigger signature and asserts currentTokenCount plumbing (incl. async worker).
test/config.test.ts Adds tests for defaults, plugin config, env overrides, and manifest schema for the new config fields.
test/lcm-integration.test.ts Adds integration coverage for “stale stored vs live tokens” and an additional skip-guard-focused suite.
.changeset/cache-aware-compaction-guards.md Adds a changeset entry for the feature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/engine.ts
Comment thread src/engine.ts Outdated
Comment thread .changeset/cache-aware-compaction-guards.md Outdated
Restores two load-bearing inline comments from the original PR Martian-Engineering#289
that were lost during the split:

- 3-line headroomEnabled rationale: explains why the guard uses three
  conditions and that factor=0 disables without creating false pressure
- 8-line budget-pressure explanation: documents when pressure is true,
  when the cache-aware skip can fire, and the starvation prevention
  guarantee
@100yenadmin 100yenadmin force-pushed the feat/live-token-compaction-awareness branch from f511eea to 3a48d12 Compare April 7, 2026 06:49
- Fix changeset file to use standard frontmatter delimiters
- Normalize liveContextTokens with Number.isFinite/Math.floor guard
  to prevent NaN/Infinity from corrupting headroom calculations
  (mirrors the pattern used in evaluate())
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/compaction.ts
Comment thread src/db/config.ts
Comment thread test/lcm-integration.test.ts
Eva added 5 commits April 7, 2026 14:19
Pass tokenBudget and liveContextTokens from the engine's afterTurn
and compact paths into evaluateLeafTrigger and compactLeaf/compactFullSweep
so cache-aware headroom decisions use fresh observed counts instead of
potentially stale stored values.

- evaluateLeafTrigger now receives tokenBudget + liveContextTokens
  from engine call sites
- compactLeaf/compactFullSweep receive currentTokenCount (observedTokens)
- afterTurn logs trigger context (assembled, pressure) on compaction
- afterTurn logs skip reason when guards prevent compaction
- CompactionConfig passes leafSkipReductionThreshold and
  leafBudgetHeadroomFactor from LcmConfig

Split from Martian-Engineering#289 (Part 2 of 3). Depends on Martian-Engineering#306.
Adds negative test ensuring compactLeafAsync does not pass
currentTokenCount to compaction.compactLeaf when the caller
omits it, preventing undefined from leaking into headroom math.
…mport

- compact() wrapper now includes currentTokenCount in its input type
  so TS excess-property checks pass and live counts flow through to
  compactFullSweep
- engine.ts evaluateLeafTrigger uses imported LeafTriggerResult type
  instead of duplicating the shape inline, preventing type drift
- Document LCM_LEAF_SKIP_REDUCTION_THRESHOLD,
  LCM_LEAF_BUDGET_HEADROOM_FACTOR, and LCM_FALLBACK_PROVIDERS in
  the README environment variable table
- Replace 12KB string literals in stale-token tests with short strings
  since tokenCountFn overrides the count anyway
@100yenadmin 100yenadmin force-pushed the feat/live-token-compaction-awareness branch from 60b4a31 to 8ea347f Compare April 7, 2026 07:21
@100yenadmin 100yenadmin requested a review from Copilot April 7, 2026 07:30
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/compaction.ts
Comment thread src/db/config.ts
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request Apr 7, 2026
Users have no visibility into whether LCM compaction is saving or
wasting money. This adds persistent event tracking, cost estimation,
and efficiency reporting.

Changes:
- New compaction_events table (SQLite migration) records each
  compaction pass with token counts and model name
- Static pricing table (pricing.ts) for cost estimation with fuzzy
  model prefix matching (11 models covered)
- /lossless status gains an efficiency section showing passes, tokens
  saved, compaction cost, net efficiency, and recommendations
- New /lossless efficiency subcommand with per-model breakdown and
  actionable recommendations (e.g., "Switch from Opus to Haiku")
- persistCompactionEvent() now inserts DB row alongside console log
- Best-effort recording — doesn't fail compaction if table is missing

Closes Martian-Engineering#309. Depends on Martian-Engineering#306 and Martian-Engineering#307.
Skip the DB read for storedTokens when leafBudgetHeadroomFactor=0
AND leafSkipReductionThreshold=0, since neither guard will run.
Also add boundary-value tests for clamp01 with out-of-range inputs.
100yenadmin pushed a commit to electricsheephq/lossless-claw-test that referenced this pull request Apr 7, 2026
Users have no visibility into whether LCM compaction is saving or
wasting money. This adds persistent event tracking, cost estimation,
and efficiency reporting.

Changes:
- New compaction_events table (SQLite migration) records each
  compaction pass with token counts and model name
- Static pricing table (pricing.ts) for cost estimation with fuzzy
  model prefix matching (11 models covered)
- /lossless status gains an efficiency section showing passes, tokens
  saved, compaction cost, net efficiency, and recommendations
- New /lossless efficiency subcommand with per-model breakdown and
  actionable recommendations (e.g., "Switch from Opus to Haiku")
- persistCompactionEvent() now inserts DB row alongside console log
- Best-effort recording — doesn't fail compaction if table is missing

Closes Martian-Engineering#309. Depends on Martian-Engineering#306 and Martian-Engineering#307.
@100yenadmin 100yenadmin requested a review from Copilot April 7, 2026 09:32
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

liu51115 pushed a commit to liu51115/lossless-claw that referenced this pull request Apr 7, 2026
Users have no visibility into whether LCM compaction is saving or
wasting money. This adds persistent event tracking, cost estimation,
and efficiency reporting.

Changes:
- New compaction_events table (SQLite migration) records each
  compaction pass with token counts and model name
- Static pricing table (pricing.ts) for cost estimation with fuzzy
  model prefix matching (11 models covered)
- /lossless status gains an efficiency section showing passes, tokens
  saved, compaction cost, net efficiency, and recommendations
- New /lossless efficiency subcommand with per-model breakdown and
  actionable recommendations (e.g., "Switch from Opus to Haiku")
- persistCompactionEvent() now inserts DB row alongside console log
- Best-effort recording — doesn't fail compaction if table is missing

Closes Martian-Engineering#309. Depends on Martian-Engineering#306 and Martian-Engineering#307.
@100yenadmin
Copy link
Copy Markdown
Contributor Author

This branch looks functionally superseded now. The runtime behavior it was pushing toward landed through #318 and #329, and the branch itself is now conflicting with current main.

I'm treating this as replaceable history rather than something worth reviving directly. The live follow-up work from the cost/compaction sweep will build on current main, not on this branch.

@100yenadmin
Copy link
Copy Markdown
Contributor Author

Closing as superseded by merged runtime work on current main, primarily #318 and #329. Any remaining cost/compaction follow-up will be rebased onto current main rather than carried on this conflicting branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants