Skip to content

fix: move compaction from afterTurn to pre-assembly with cache-TTL check#385

Closed
liu51115 wants to merge 2 commits intoMartian-Engineering:mainfrom
liu51115:feat/upstream-cache-ttl-367
Closed

fix: move compaction from afterTurn to pre-assembly with cache-TTL check#385
liu51115 wants to merge 2 commits intoMartian-Engineering:mainfrom
liu51115:feat/upstream-cache-ttl-367

Conversation

@liu51115
Copy link
Copy Markdown
Contributor

@liu51115 liu51115 commented Apr 11, 2026

Summary

Moves all compaction decisions from afterTurn() to assemble() pre-assembly, using a persisted lastApiCallAt timestamp to determine whether the prompt cache has expired.

Closes #367. Supersedes #362, #363. Related: #102.

Problem

afterTurn() evaluates compaction using the cache status of the call that just completed. This is a fundamental timing inversion: after any API call, the provider has just written (or refreshed) the prompt cache — the cache is always hot in afterTurn, regardless of what the cache status reading says. Compacting at this point always destroys a live cache.

The three removed branches all suffer from this:

  • hot-cache-budget-headroom: defers when hot and under budget — but "hot" at afterTurn time is meaningless since the cache is always hot after a call
  • hot-cache-defer: defers when pressure is low and cache is hot — same issue
  • cold-cache-catchup: triggers aggressive multi-pass on cold readings — on any provider, a cold reading at afterTurn means the provider just wrote the cache, so it's now hot

On providers that load-balance across backend instances with separate prompt caches (e.g. OpenRouter), a secondary problem compounds this: random cold readings from routing misses create a cascade of aggressive compaction → prefix rewrite → guaranteed cold → repeat. But even on direct Anthropic, the core inversion applies — afterTurn is simply the wrong time to evaluate cache state.

Fix

afterTurn() becomes telemetry-only: records lastApiCallAt = Date.now(), nothing else.

assemble() checks at pre-assembly time:

  1. Budget overflow — must compact to fit, fires unconditionally
  2. Cache-TTL — if cacheAwareCompaction.enabled and now - lastApiCallAt > cacheTTLSeconds and there's material to compact → compact before assembling

This is the only window where compaction is both safe and beneficial: the cache has genuinely expired, no live cache to destroy, and compacting reduces input cost on the upcoming cold call.

This approach supersedes the routing-noise guard in #362 (sticky cold counter is no longer needed when afterTurn doesn't make compaction decisions) and the deferred threshold compaction in #363 (all compaction is now off the reply path by design). It also addresses the same use case as the idle compaction trigger in #102 but with a simpler timestamp-based mechanism integrated into the assembly path.

Changes

  • src/engine.ts: remove two try/catch compaction blocks from afterTurn(), add pre-assembly block to assemble(), delete dead methods (evaluateIncrementalCompaction, logIncrementalCompactionDecision, shouldApplyHotCacheHysteresis, resolveCacheAwareState, isComfortablyUnderTokenBudget)
  • src/db/config.ts: add cacheTTLSeconds to CacheAwareCompactionConfig (default 300, env LCM_CACHE_TTL_SECONDS)
  • src/db/migration.ts: add last_api_call_at INTEGER column
  • src/store/compaction-telemetry-store.ts: add lastApiCallAt field throughout
  • openclaw.plugin.json: add cacheTTLSeconds to schema
  • test/engine.test.ts: remove 11 tests for deleted afterTurn compaction behavior
  • test/config.test.ts: update assertions

Testing

All tests pass. Soak-tested in production: zero compactions during active use (turns 30–200s apart), compaction fires correctly after genuine cache expiry (idle > 300s), per-conversation TTL tracking confirmed working across multiple concurrent sessions.

@liu51115 liu51115 force-pushed the feat/upstream-cache-ttl-367 branch 3 times, most recently from f5565e9 to 56eecc9 Compare April 11, 2026 04:42
@liu51115 liu51115 changed the title feat: move compaction from afterTurn to pre-assembly with cache-TTL check fix: move compaction from afterTurn to pre-assembly with cache-TTL check Apr 11, 2026
@liu51115 liu51115 force-pushed the feat/upstream-cache-ttl-367 branch from 56eecc9 to 7fb97bd Compare April 12, 2026 07:40
@100yenadmin
Copy link
Copy Markdown
Contributor

There is genuine value here, but I think it is solving a different axis than #407 rather than replacing it.

I checked our live installed mitigation and the post-patch logs before commenting. The tactical local patch that stops leaf compaction and threshold sweep from both firing on the same turn is still active in our installed extension. In /tmp/openclaw/openclaw-2026-04-12.log, main-session afterTurn before that patch (11 samples before 2026-04-12 02:18:07 +07) averaged about 11.8s with an 83.0s max; after the patch (58 samples after that timestamp), it dropped to about 456ms average, 42.5ms median, and 10.6s max. So the hot-path split absolutely helped.

But it did not close the stall problem by itself. The same post-patch window still shows repeated lane wait exceeded events on agent:main:main and repeated gateway timeout after 90000ms subagent-announce retries. That is why I still think the deferred/background direction in #408 + the host companion PR is the right fit for #407: it addresses coalescing, visibility, and keeping proactive work off the foreground lane entirely.

What I do think is genuinely valuable in this PR:

  • the cache-TTL / pre-assembly idea is a real cache-safety argument and worth considering on its own merits
  • pinned-file injection is useful, but it feels orthogonal to the compaction-lane problem

So my read is: useful ideas, but not a drop-in replacement for #407/#408. If this lands, I’d strongly prefer it as a narrower follow-up or split-out PR instead of a competing end-to-end compaction strategy.

…heck (closes Martian-Engineering#367)

Move all proactive compaction out of afterTurn() into pre-assembly in assemble().
afterTurn becomes telemetry-only (records cache state + lastApiCallAt).

Pre-assembly triggers:
1. Budget overflow — must compact to fit
2. Cache-TTL — compact if idle > cacheTTLSeconds (default 300s) AND leaf trigger fires

Removes: evaluateIncrementalCompaction, logIncrementalCompactionDecision,
shouldApplyHotCacheHysteresis, resolveCacheAwareState, isComfortablyUnderTokenBudget,
IncrementalCompactionDecision type, HOT_CACHE_HYSTERESIS_TURNS constant.

Tests: 476/476 pass.
@liu51115 liu51115 force-pushed the feat/upstream-cache-ttl-367 branch from 7fb97bd to debbbf8 Compare April 12, 2026 08:31
@liu51115
Copy link
Copy Markdown
Contributor Author

You're right about the pinned-file creep — that was a mistake on our end, we accidentally force-pushed a branch that bundled pinnedFiles (which has its own PR at #214) into this one. Just fixed it — the PR now contains only the cache-TTL pre-assembly change.

Agreed this is a different axis from #407/#408. The deferred debt + coalescing + visibility work in #408 addresses foreground blocking and starvation. This PR addresses cache-safety timing — making sure compaction doesn't destroy a hot provider cache by firing at the wrong moment.

They're complementary, not competing. Happy to keep this scoped as a narrower follow-up.

Reads agent's params.cacheRetention from OC config and maps to TTL:
- "long" → 3600s (1 hour)
- "short" → 300s (5 min)
- unset → falls back to cacheTTLSeconds config (default 300s)

Prevents compacting during a still-valid long cache for agents
configured with cacheRetention: "long".
@liu51115
Copy link
Copy Markdown
Contributor Author

Closing in favor of #408 which now covers cache-TTL safety + deferred compaction + maintenance visibility. The per-agent cache TTL approach there (via runtime telemetry retention) supersedes our config-based approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cache-aware compaction: timing inversion — decisions based on last-call status instead of cache expiry

2 participants