feat(agent): deterministic tool-result pruning + cache-TTL-aware cold-resume maintenance by esengine · Pull Request #3968 · esengine/DeepSeek-Reasonix

esengine · 2026-06-11T06:14:26Z

Why

Multiple users report runaway token consumption on 1.x (#3098 closed: ~$20/hour; one commenter measured 18M input tokens on a single task at 99% cache hit; #3615 / #3007 / #3329 circle the same pain). Root cause analysis:

With the 1M default context_window and the 0.8 trigger, a session effectively never folds — the prompt grows unbounded and every tool round resends all of it.
Stale tool results dominate that growth, yet they are fully re-derivable (files can be re-read, commands re-run). Today the only reducers are the 32 KB per-result cap and the full LLM-summarizing fold.
Because cache hits are 50–120× cheaper than misses, carrying dead weight per-turn is nearly free. The real damage concentrates in cache-miss events — every fold, and above all every cold resume, where a session reopened after the provider cache expired pays full price on the entire prompt.

So this does not fold earlier (compression-eval already showed early folding costs more). It shrinks the prompt deterministically, and schedules rewrites into windows where the cache is already cold so they cost nothing.

What

Prune primitive (internal/agent/prune.go): replace the Content of tool results older than the protected recent tail with a one-line placeholder. Deterministic, zero LLM calls, originals archived first. No message is ever removed, so tool_calls/tool_call_id pairing cannot break and assistant messages (including signed reasoning) are never touched.

Two trigger points, both at moments where a cache reset is already being paid:

Before the fold (maybeCompact): prune first; skip the paid summarize call entirely when eliding alone clears the trigger.
Cold resume (Controller.Resume): when idle time (branch-meta UpdatedAt) exceeds cacheColdAfter, the cache is gone anyway — pruning is free in cache terms and directly shrinks the full-price first request. Pruned transcript is snapshotted atomically; a crash before the snapshot just replays the prune on next resume (idempotent).

Between maintenance points the history stays strictly append-only; warm sessions are never rewritten.

Real-API validation (deepseek-v4-flash)

Placeholder comprehension 5/5: an agent asked for a constant that only exists behind a pruned placeholder re-read the file via read_file and answered with the exact value in every trial — no hallucination (benchmarks/context-maintenance-e2e comprehension).
Prompt reduction: an 88.7k-token session pruned 15 stale results down to a 23.6k-token prompt (−73%).
Warm-prune penalty (why the TTL gate exists): pruning a still-cached session cost 23,598 miss tokens vs 5,900 unpruned — ~4×. The gate keeps this from ever happening: cacheColdAfter defaults to a risk-asymmetric 24h (too small burns a live cache; too large only forgoes a free prune).
Cache retention probe (benchmarks/cache-ttl-probe): unique ~18k-token prefixes re-probed after idle intervals; full hits through 30 min so far, longer intervals still collecting. Follow-up will tighten cacheColdAfter from the measured retention and add the cold-resume A/B numbers (two 88k sessions are already seeded and going cold).

Tests

Unit: tail protection, pairing intact, idempotency, no-op without a window, small results kept, prune-skips-fold, force-ratio still folds, cold-resume prunes-and-persists, warm-resume untouched.
Closed-loop e2e (compact_loop_e2e_test.go): a 20-turn tool-heavy session now stays bounded by pruning alone — zero paid folds, stuck-guard never trips; a new assistant-text-heavy variant keeps the fold path under regression.
go test ./... green except TestModelSwitchRefreshesCustomStatusline, which fails identically on a pristine origin/main-v2 checkout in this environment (upstream, unrelated).

Stale tool results are re-derivable (files re-read, commands re-run), so eliding them is a free, lossless alternative to the paid summarize fold. Prune runs only where a cache reset is already being paid: at the compact trigger, where it skips the fold entirely when eliding alone clears the threshold, and on resume after the provider prefix cache has expired (cacheColdAfter), where rewriting history costs no extra misses and directly shrinks the full-price first request. No message is ever removed, so tool_call/result pairing and signed reasoning stay intact by construction; originals are archived like fold drops.

…r comprehension Three real-API scenarios for the prune work: seed two identical-shape fat sessions, A/B the cold-restart miss tokens with and without pruning after the provider cache has expired, and verify the model re-reads a file behind a prune placeholder instead of answering from nothing.

…default Pruning a still-cached session costs ~4x the miss tokens of leaving it alone (measured warm-cache A/B), while a threshold that is too large only forgoes a free prune. Default to 24h until the cache-ttl probe pins the real retention; never set below it.

…rrcheck)

…o/path-injection) The os.Stat mtime fallback fed a user-influenced path straight into a filesystem call. Branch meta is guaranteed for every session the controller has snapshotted, so the fallback only ever covered never-saved imports — those now skip one prune until their first snapshot creates the meta.

reasonix added 3 commits June 10, 2026 22:43

esengine requested a review from SivanCola as a code owner June 11, 2026 06:14

github-actions Bot added v2 Go rewrite (1.x) — main-v2 branch, active development agent Core agent loop (internal/agent, internal/control) labels Jun 11, 2026

github-advanced-security AI found potential problems Jun 11, 2026

View reviewed changes

Comment thread internal/control/controller.go Fixed

reasonix added 2 commits June 10, 2026 23:18

bench(e2e): check write/unmarshal errors in the maintenance driver (e…

f7ffb16

…rrcheck)

esengine merged commit 3025f5a into main-v2 Jun 11, 2026
13 checks passed

esengine deleted the feat/context-maintenance-prune branch June 11, 2026 06:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): deterministic tool-result pruning + cache-TTL-aware cold-resume maintenance#3968

feat(agent): deterministic tool-result pruning + cache-TTL-aware cold-resume maintenance#3968
esengine merged 5 commits into
main-v2from
feat/context-maintenance-prune

esengine commented Jun 11, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

esengine commented Jun 11, 2026

Why

What

Real-API validation (deepseek-v4-flash)

Tests

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants