fix(context): align fold summary prefix with main agent for cache reuse by esengine · Pull Request #1565 · esengine/DeepSeek-Reasonix

esengine · 2026-05-22T16:14:43Z

Summary

The fold summary call was being sent with a bespoke "You compress conversation history…" system prompt and no tools, which produces a 0% prefix-cache hit against the main agent's just-cached request. Every fold paid full input price on tens of thousands of tokens.

This PR reshapes the summary request so its prefix mirrors the live agent's last call:

Same system prompt (was: bespoke "compress conversation" string)
Same tool list (was: omitted)
Same head conversation bytes (was: skill bodies stubbed mid-head, breaking the cache from the first skill onward)
Only the trailing user "summarize" instruction is new — that's the sole cache-miss boundary

Skill-pin handling was split: collectPinnedSkills is read-only and leaves head bytes intact. The summarize instruction now names the pinned skills so the model doesn't paraphrase their bodies (we still append them verbatim regardless).

Numbers

Measured on a real 60-message session at 48.7K prompt tokens (bench-fold-cache-live.mjs, real DeepSeek API):

shape	cache hit	input cost
before	0.0%	$0.145 per fold
after	99.6%	$0.015 per fold

~89.6% saving per fold. Across 6 real session replays the saving holds steady at 88-90%. The remaining ~0.4% miss is just the trailing summarize instruction (~185 tokens).

Test plan

7 new request-shape assertions in tests/context-manager-cache-aligned-fold.test.ts (system / tools / head bytes / trailing instruction / model pin / skill-pin verbatim)
All 10 pre-existing fold tests still pass (context-manager-skill-pin, context-manager-thinking-mode, context-manager-fold-timeout)
Full npm run verify: build + lint + typecheck + 3561 tests green
Live API check via tools/bench-fold-cache-live.mjs confirms 99.6% cache hit on real DeepSeek

…ult, lifecycle plans Headline themes: - TUI: Static-history renderer is the only path; virtual-viewport layers removed (#1529 stages 1-4) - Chat: queued mid-turn steer handling so input mid-render doesn't drop or fight the live frame (#1501) - Web search: default switches to Bing; dashboard engine switcher; Mojeek dropped (#1558) - Plans: lifecycle evidence summaries surface why a plan is ready to accept (#1500) - Desktop: native OS notifications for approvals + completion (#1519) - i18n: CLI command output (/mcp /sessions /prune /theme) + approval-prompt labels translated (#1524, #1560) - Security: SSRF block in web_fetch (#1544), edit-snapshot path containment (#1454), shell redirect sandbox (#1457), Task integrity guardrail (#1516) - Tools: per-turn dispatch-rate limit (#1356); run_command discourages shell-based edits (#1514) - Client: DeepSeek 429 → concurrency-limit hint (#1526); timeoutMs honored with AbortSignal (#1535); --no-proxy opt-out for direct route (#1507) - Files: read/edit/restore preserves source encoding (GB18030 / UTF-8 BOM) (#1518) - Context: pinned constraints survive folds + full tail capture (#1515, #1552) - Refactor: lifecycle risk policy extracted into its own module (#1557) See CHANGELOG for the full list.

The summarizer call was sending a bespoke "You compress conversation history" system prompt and no tools, guaranteeing a 0% cache hit against the main agent's just-cached prefix. Reshape the request so system + tools + head bytes mirror the live agent's last call — the only novel bytes are the trailing summarize instruction. Skill-pin handling now collects bodies read-only instead of stubbing mid-head, so the cache prefix stays unbroken. The summarize instruction names pinned skills so the model knows not to paraphrase their bodies (which we append verbatim regardless). Measured on a real session at 48.7K prompt tokens: OLD shape: 0.0% cache hit → $0.145 per fold NEW shape: 99.6% cache hit → $0.015 per fold saving: 89.6% per fold

bench-fold-cache-shape.mjs replays real session jsonls, simulates OLD vs NEW summary-call shapes at the fold point, and reports byte-level shared-prefix with the main agent's preceding request. Pure local — no API required. bench-fold-cache-live.mjs sends one priming + two summary calls to DeepSeek and reports prompt_cache_hit_tokens / cost for each shape. Used to confirm the shape change actually translates to API-side cache hits.

…d-summary # Conflicts: # src/context-manager.ts # src/loop.ts

+  const resp = await fetch("https://api.deepseek.com/chat/completions", {
+    method: "POST",
+    headers: { Authorization: `Bearer ${apiKey}`, "Content-Type": "application/json" },
+    body: JSON.stringify(body),


read-before-edit gate (#1563) and cache-aligned fold summary (#1565) landed after the release commit was written. Document them in the 0.49.0 entry before tagging so the published CHANGELOG matches what ships. Co-authored-by: reasonix <reasonix@deepseek.com>

reasonix added 4 commits May 22, 2026 08:45

Merge remote-tracking branch 'origin/main' into fix/fold-cache-aligne…

f204f37

…d-summary # Conflicts: # src/context-manager.ts # src/loop.ts

esengine merged commit 544714b into main May 22, 2026
3 checks passed

esengine deleted the fix/fold-cache-aligned-summary branch May 22, 2026 16:20

github-advanced-security AI found potential problems May 22, 2026

View reviewed changes

Comment thread tools/bench-fold-cache-live.mjs

const resp = await fetch("https://api.deepseek.com/chat/completions", {

method: "POST",

headers: { Authorization: `Bearer ${apiKey}`, "Content-Type": "application/json" },

body: JSON.stringify(body),

esengine mentioned this pull request May 22, 2026

chore(changelog): document #1563 + #1565 under 0.49.0 #1569

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(context): align fold summary prefix with main agent for cache reuse#1565

fix(context): align fold summary prefix with main agent for cache reuse#1565
esengine merged 4 commits into
mainfrom
fix/fold-cache-aligned-summary

esengine commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

esengine commented May 22, 2026

Summary

Numbers

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants