Skip to content

feat(agent): deterministic tool-result pruning + cache-TTL-aware cold-resume maintenance#3968

Merged
esengine merged 5 commits into
main-v2from
feat/context-maintenance-prune
Jun 11, 2026
Merged

feat(agent): deterministic tool-result pruning + cache-TTL-aware cold-resume maintenance#3968
esengine merged 5 commits into
main-v2from
feat/context-maintenance-prune

Conversation

@esengine

Copy link
Copy Markdown
Owner

Why

Multiple users report runaway token consumption on 1.x (#3098 closed: ~$20/hour; one commenter measured 18M input tokens on a single task at 99% cache hit; #3615 / #3007 / #3329 circle the same pain). Root cause analysis:

  • With the 1M default context_window and the 0.8 trigger, a session effectively never folds — the prompt grows unbounded and every tool round resends all of it.
  • Stale tool results dominate that growth, yet they are fully re-derivable (files can be re-read, commands re-run). Today the only reducers are the 32 KB per-result cap and the full LLM-summarizing fold.
  • Because cache hits are 50–120× cheaper than misses, carrying dead weight per-turn is nearly free. The real damage concentrates in cache-miss events — every fold, and above all every cold resume, where a session reopened after the provider cache expired pays full price on the entire prompt.

So this does not fold earlier (compression-eval already showed early folding costs more). It shrinks the prompt deterministically, and schedules rewrites into windows where the cache is already cold so they cost nothing.

What

Prune primitive (internal/agent/prune.go): replace the Content of tool results older than the protected recent tail with a one-line placeholder. Deterministic, zero LLM calls, originals archived first. No message is ever removed, so tool_calls/tool_call_id pairing cannot break and assistant messages (including signed reasoning) are never touched.

Two trigger points, both at moments where a cache reset is already being paid:

  1. Before the fold (maybeCompact): prune first; skip the paid summarize call entirely when eliding alone clears the trigger.
  2. Cold resume (Controller.Resume): when idle time (branch-meta UpdatedAt) exceeds cacheColdAfter, the cache is gone anyway — pruning is free in cache terms and directly shrinks the full-price first request. Pruned transcript is snapshotted atomically; a crash before the snapshot just replays the prune on next resume (idempotent).

Between maintenance points the history stays strictly append-only; warm sessions are never rewritten.

Real-API validation (deepseek-v4-flash)

  • Placeholder comprehension 5/5: an agent asked for a constant that only exists behind a pruned placeholder re-read the file via read_file and answered with the exact value in every trial — no hallucination (benchmarks/context-maintenance-e2e comprehension).
  • Prompt reduction: an 88.7k-token session pruned 15 stale results down to a 23.6k-token prompt (−73%).
  • Warm-prune penalty (why the TTL gate exists): pruning a still-cached session cost 23,598 miss tokens vs 5,900 unpruned — ~4×. The gate keeps this from ever happening: cacheColdAfter defaults to a risk-asymmetric 24h (too small burns a live cache; too large only forgoes a free prune).
  • Cache retention probe (benchmarks/cache-ttl-probe): unique ~18k-token prefixes re-probed after idle intervals; full hits through 30 min so far, longer intervals still collecting. Follow-up will tighten cacheColdAfter from the measured retention and add the cold-resume A/B numbers (two 88k sessions are already seeded and going cold).

Tests

  • Unit: tail protection, pairing intact, idempotency, no-op without a window, small results kept, prune-skips-fold, force-ratio still folds, cold-resume prunes-and-persists, warm-resume untouched.
  • Closed-loop e2e (compact_loop_e2e_test.go): a 20-turn tool-heavy session now stays bounded by pruning alone — zero paid folds, stuck-guard never trips; a new assistant-text-heavy variant keeps the fold path under regression.
  • go test ./... green except TestModelSwitchRefreshesCustomStatusline, which fails identically on a pristine origin/main-v2 checkout in this environment (upstream, unrelated).

reasonix added 3 commits June 10, 2026 22:43
Stale tool results are re-derivable (files re-read, commands re-run), so
eliding them is a free, lossless alternative to the paid summarize fold.
Prune runs only where a cache reset is already being paid: at the compact
trigger, where it skips the fold entirely when eliding alone clears the
threshold, and on resume after the provider prefix cache has expired
(cacheColdAfter), where rewriting history costs no extra misses and
directly shrinks the full-price first request. No message is ever removed,
so tool_call/result pairing and signed reasoning stay intact by
construction; originals are archived like fold drops.
…r comprehension

Three real-API scenarios for the prune work: seed two identical-shape fat
sessions, A/B the cold-restart miss tokens with and without pruning after
the provider cache has expired, and verify the model re-reads a file behind
a prune placeholder instead of answering from nothing.
…default

Pruning a still-cached session costs ~4x the miss tokens of leaving it
alone (measured warm-cache A/B), while a threshold that is too large only
forgoes a free prune. Default to 24h until the cache-ttl probe pins the
real retention; never set below it.
@esengine esengine requested a review from SivanCola as a code owner June 11, 2026 06:14
@github-actions github-actions Bot added v2 Go rewrite (1.x) — main-v2 branch, active development agent Core agent loop (internal/agent, internal/control) labels Jun 11, 2026
Comment thread internal/control/controller.go Fixed
reasonix added 2 commits June 10, 2026 23:18
…o/path-injection)

The os.Stat mtime fallback fed a user-influenced path straight into a
filesystem call. Branch meta is guaranteed for every session the controller
has snapshotted, so the fallback only ever covered never-saved imports —
those now skip one prune until their first snapshot creates the meta.
@esengine esengine merged commit 3025f5a into main-v2 Jun 11, 2026
13 checks passed
@esengine esengine deleted the feat/context-maintenance-prune branch June 11, 2026 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Core agent loop (internal/agent, internal/control) v2 Go rewrite (1.x) — main-v2 branch, active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants