refactor(context-manager): preflight folds first, drop obsolete byte ceiling by esengine · Pull Request #1642 · esengine/DeepSeek-Reasonix

esengine · 2026-05-24T02:40:15Z

Summary

Live probe (tools/probe-deepseek-body-limit.mjs) confirms DeepSeek's gateway happily accepts 8MB request bodies — the empirical ~880KB ceiling MAX_BODY_BYTES was guarding against no longer exists. Remove the entire byte-trigger path from decidePreflight + mechanicalTruncate.
Converge on a single fold-first pipeline (Claude-Code-style): preflight tries semantic fold first, falls back to mechanicalTruncate only when fold cannot summarize (empty head, savings under 30%, active tool turn would be wiped, or summarizer call failed/timed out).
Fix two latent bugs in fold(): per-message token estimate now counts tool_calls JSON (was content-only, letting heavy args slip past the tail-budget check and slide the boundary past an active tool turn), and the new requireTailBoundary option (preflight-only) refuses when no user lands in tail.
Preflight sets _foldedThisTurn after compacting so decideAfterUsage does not re-fold on the now-empty head.
Drop bodyKB placeholder + bytes-trigger wording from i18n; rename "truncate" to "compact" in preflight messages.
E2E validated against live DeepSeek (tools/e2e-context-compression.mts): baseline, high-ratio terminal turn, and preflight emergency all pass.

Test plan

npm run verify — 258 files / 3586 tests pass (comment policy + biome + full vitest)
tests/preflight.test.ts — 6 tests pass (2 obsolete byte tests deleted, 4 rewritten for fold-first flow)
tests/loop.test.ts — auto-fold + aggressive fold tests pass against new boundary checks
tests/context-manager-cache-aligned-fold.test.ts + context-manager-skill-pin.test.ts — both still green after fold token-estimation change
E2E against live DeepSeek API (tools/e2e-context-compression.mts) — baseline, high-ratio, preflight emergency all green; preflight successfully calls the live summarizer, compacts 201 → 12 messages, and the main request goes through

…ceiling Live probe (tools/probe-deepseek-body-limit.mjs) shows DeepSeek's gateway accepts at least 8MB request bodies — the empirical ~880KB ceiling that MAX_BODY_BYTES guarded against no longer exists. Remove the byte path from decidePreflight + mechanicalTruncate and converge on a single fold-first pipeline: - Preflight tries semantic fold first; mechanical truncate is a last-resort fallback only when fold can't summarize (empty head, savings too small, active tool turn would be wiped, summarizer failed). - fold() per-message token estimate now includes tool_calls JSON so heavy tool-call args can't slip through the tail-budget check and slide the boundary past an active tool turn. - Preflight sets _foldedThisTurn so decideAfterUsage doesn't re-fold on the already-compacted log. - New `requireTailBoundary` option on fold() lets preflight refuse when no user lands in tail (would wipe an active tool turn). - Drop bodyKB placeholder + bytes trigger from i18n. End-to-end validated against live DeepSeek (tools/e2e-context-compression.mts): baseline, high-ratio terminal turn, and preflight emergency all pass.

…1646) Follow-up to #1642. After landing the fold-first preflight, the next question was whether preflight needs to exist at all — and on a 1M-context provider it doesn't: post-response decideAfterUsage already folds at 75%, upstream tool-result caps prevent single-message blowups, and the byte ceiling the preflight was originally guarding against is gone. Converge on the Claude-Code-style single compaction path: one fold check per turn, at turn start. - Delete decidePreflight, mechanicalTruncate, related constants, the PreflightDecision interface, and the per-iter preflight block in step(). - Add a single turn-start check after the user message is appended: if local request estimate > TURN_START_FOLD_THRESHOLD (90%), fold once before the iter loop. - No mechanical fallback. If fold can't shrink the log, the request goes out and DeepSeek's error surfaces to the user — honest beats silent re-compaction with worse semantics. - Drop preflight i18n keys + delete tests/preflight.test.ts. Net -393 lines. E2E (live DeepSeek) green across baseline, high-ratio terminal turn, and turn-start fold scenarios.

…moved, persisted usage stats, plan dispatch gate Headline themes: - Desktop: bundle the CLI-hosted React dashboard, retire Tauri+Preact duplicate (#1418) - Config: drop preset abstraction; flash/pro are direct model selections (#1657, #1630) - Stats: persist cumulative usage to session meta + auto-restore on startup (#1667, #1680, #1643, #1628) - Plans: editMode="plan" enforced at the ToolRegistry dispatch gate (#1681); step advance fix (#1629) - Context: fold once at turn start, drop pre-flight + byte-ceiling (#1642, #1646); collapsible compacted card (#1649) - Subagents: per-skill flash/pro override + Settings UI (#1632) - Desktop polish: sidebar drag-resize (#1688), responsive collapse (#1585), copy/edit overlay + msg-history nav (#1645), Esc closes modal not turn (#1685), QQ tab isolation (#1672), DiffCard for edits (#1662), theme-aware highlighting (#1655), system events toggle (#1654/#1650), macOS TCC inheritance (#1614), dashboard.enabled (#1612) - Dashboard polish: persistent session URL (#1586, #1589, #1599), theme-aware highlighting (#1664), IME confirm-enter guard (#1689), code-fence lang fix (#1677), vendor chunk split (#1587), markdown table h-scroll (#1562) - TUI: Alt+S input stash/recall; static history isolated from input rerenders (#1635); legacy mouse drop (#1637, #1648); multi-edit gated in review (#1647) - Diff: SplitDiff column border holds under CJK (#1686) - MCP: workspace roots passed to servers (#1625); codeCommand honors mcpServers (#1603) - Config plumbing: (baseUrl, apiKey) resolved as a tuple (#1658); stale model id self-heal (#1663) See CHANGELOG for the full list.

…uncate (#1741) Three changes that together cut per-turn CPU ~57% and steady-state RSS ~22% in the 200-turn fakeFetch probe (rss=256MB→181MB at log.len=800). - bpeEncode: in-place splice instead of slice/spread rebuild on every merge, plus 8K-entry LRU cache. Repetitive tool output (padded payloads, identifiers in code) re-encodes the same byte-level chunks thousands of times per session; the cache caps that at ~400KB. - estimateConversationTokens: drop the full formatDeepSeekPrompt rebuild + single bounded tokenize. Sum per-message bounded counts with a fixed template overhead, gated by a content-string-keyed 4K-entry LRU. Same entry tokenizes once over its lifetime instead of once per turn. The estimate drives fold thresholds (50%/75% of ctx) where ±5% slop is harmless. - truncateForModelByTokens: sample-based fast path. For inputs in the [maxTokens, maxTokens*4] range the old code unconditionally tokenized the full string (37% total CPU on the 200-turn probe). Now we use a 2KB-sample estimate with a 1.15x safety margin; only borderline cases fall through to a precise tokenize. Regression origin: #1642/#1646 collapsed the conditional preflight into an unconditional estimateTurnStart that runs every turn, surfacing the underlying tokenizer cost. The tokenizer itself has always been a pure-TS BPE port without caching — fine when called rarely, expensive when called on every turn against growing logs. Also adds three probes that reproduce + measure: - scripts/probe-mem-leak.mts — drives CacheFirstLoop through N turns with fakeFetch, samples RSS/heap/log - scripts/probe-jobs-leak.mts — confirms JobRegistry's MAX_COMPLETED_JOBS cap actually evicts - scripts/analyze-cpuprofile.mjs — flat self/total roll-up for any .cpuprofile produced by --cpu-prof or `reasonix code --profile` Co-authored-by: reasonix <reasonix@deepseek.com>

esengine merged commit 3c8cc8c into main May 24, 2026
4 checks passed

esengine deleted the refactor/context-compression-fold-first branch May 24, 2026 02:49

esengine mentioned this pull request May 24, 2026

refactor(context-manager): drop preflight, fold once at turn start #1646

Merged

3 tasks

esengine mentioned this pull request May 25, 2026

perf(tokenizer): cache BPE + bounded counts, fast-path truncate (-57% CPU, -22% RSS) #1741

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(context-manager): preflight folds first, drop obsolete byte ceiling#1642

refactor(context-manager): preflight folds first, drop obsolete byte ceiling#1642
esengine merged 1 commit into
mainfrom
refactor/context-compression-fold-first

esengine commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant