refactor(context-manager): drop preflight, fold once at turn start by esengine · Pull Request #1646 · esengine/DeepSeek-Reasonix

esengine · 2026-05-24T03:03:52Z

Summary

Follow-up to #1642. After landing the fold-first preflight, the next question was whether preflight needs to exist at all — and on a 1M-context provider, it doesn't:

Post-response decideAfterUsage already folds at 75% (covers routine growth).
Upstream tool-result caps prevent single-message blowups.
The byte ceiling preflight was originally guarding against has been disproved (refactor(context-manager): preflight folds first, drop obsolete byte ceiling #1642).

So preflight on a per-iter basis was redundant. Converge on the Claude-Code-style single compaction path: one fold check per turn, at turn start.

What changed

Delete decidePreflight, mechanicalTruncate, related constants, the PreflightDecision interface, and the per-iter preflight block in step().
Add a single turn-start check after the user message is appended: if local request estimate > TURN_START_FOLD_THRESHOLD (90%), fold once before the iter loop.
No mechanical fallback. If fold can't shrink the log, the request goes out and DeepSeek's error surfaces to the user — honest beats silent re-compaction with worse semantics.
Drop preflight i18n keys + delete tests/preflight.test.ts.

Why 90% at turn-start (not 75%)

Post-response fold at 75% already handles routine in-conversation growth. Turn-start only needs to catch what post-response can't see:

A terminal prior turn (plain-text response → decideAfterUsage skipped)
Session restore from disk with already-bloated log
User pastes a massive prompt that flips the ratio in one move

For those, 90% is the right "almost over" emergency line; 75% would fire on most long sessions and add summarizer-call latency to every turn.

Diff stats

7 files changed, 71 insertions(+), 464 deletions(-) — net -393 lines.

Test plan

npm run verify — 258 files / 3584 tests pass
Live e2e (tools/e2e-context-compression.mts against real DeepSeek): baseline / high-ratio terminal turn / turn-start fold all green
Existing context-manager + skill-pin + cache-aligned-fold tests pass unmodified

The preflight path that #1642 reshaped was redundant on a 1M-context provider: post-response decideAfterUsage already folds at 75%, upstream tool-result caps prevent single-message blowups, and the byte ceiling preflight was guarding against doesn't exist. Converge on the Claude-Code-style single compaction path. - Delete decidePreflight + mechanicalTruncate, related constants and PreflightDecision interface, and the per-iter preflight block in step(). - Add a single turn-start check that runs once after the user message is appended: if local estimate > 90% of ctxMax, fold first. Threshold is conservative on purpose — post-response 75% handles routine growth; turn-start covers what it can't see (terminal prior turn, fresh resume, huge user paste). - No mechanical fallback. If fold can't shrink the log, the request goes out and DeepSeek's error surfaces to the user — honest beats silent re-compaction with worse semantics. - Drop preflight i18n keys + preflight.test.ts. E2E (live DeepSeek) still green across baseline, high-ratio terminal turn, and turn-start fold.

…moved, persisted usage stats, plan dispatch gate Headline themes: - Desktop: bundle the CLI-hosted React dashboard, retire Tauri+Preact duplicate (#1418) - Config: drop preset abstraction; flash/pro are direct model selections (#1657, #1630) - Stats: persist cumulative usage to session meta + auto-restore on startup (#1667, #1680, #1643, #1628) - Plans: editMode="plan" enforced at the ToolRegistry dispatch gate (#1681); step advance fix (#1629) - Context: fold once at turn start, drop pre-flight + byte-ceiling (#1642, #1646); collapsible compacted card (#1649) - Subagents: per-skill flash/pro override + Settings UI (#1632) - Desktop polish: sidebar drag-resize (#1688), responsive collapse (#1585), copy/edit overlay + msg-history nav (#1645), Esc closes modal not turn (#1685), QQ tab isolation (#1672), DiffCard for edits (#1662), theme-aware highlighting (#1655), system events toggle (#1654/#1650), macOS TCC inheritance (#1614), dashboard.enabled (#1612) - Dashboard polish: persistent session URL (#1586, #1589, #1599), theme-aware highlighting (#1664), IME confirm-enter guard (#1689), code-fence lang fix (#1677), vendor chunk split (#1587), markdown table h-scroll (#1562) - TUI: Alt+S input stash/recall; static history isolated from input rerenders (#1635); legacy mouse drop (#1637, #1648); multi-edit gated in review (#1647) - Diff: SplitDiff column border holds under CJK (#1686) - MCP: workspace roots passed to servers (#1625); codeCommand honors mcpServers (#1603) - Config plumbing: (baseUrl, apiKey) resolved as a tuple (#1658); stale model id self-heal (#1663) See CHANGELOG for the full list.

…uncate (#1741) Three changes that together cut per-turn CPU ~57% and steady-state RSS ~22% in the 200-turn fakeFetch probe (rss=256MB→181MB at log.len=800). - bpeEncode: in-place splice instead of slice/spread rebuild on every merge, plus 8K-entry LRU cache. Repetitive tool output (padded payloads, identifiers in code) re-encodes the same byte-level chunks thousands of times per session; the cache caps that at ~400KB. - estimateConversationTokens: drop the full formatDeepSeekPrompt rebuild + single bounded tokenize. Sum per-message bounded counts with a fixed template overhead, gated by a content-string-keyed 4K-entry LRU. Same entry tokenizes once over its lifetime instead of once per turn. The estimate drives fold thresholds (50%/75% of ctx) where ±5% slop is harmless. - truncateForModelByTokens: sample-based fast path. For inputs in the [maxTokens, maxTokens*4] range the old code unconditionally tokenized the full string (37% total CPU on the 200-turn probe). Now we use a 2KB-sample estimate with a 1.15x safety margin; only borderline cases fall through to a precise tokenize. Regression origin: #1642/#1646 collapsed the conditional preflight into an unconditional estimateTurnStart that runs every turn, surfacing the underlying tokenizer cost. The tokenizer itself has always been a pure-TS BPE port without caching — fine when called rarely, expensive when called on every turn against growing logs. Also adds three probes that reproduce + measure: - scripts/probe-mem-leak.mts — drives CacheFirstLoop through N turns with fakeFetch, samples RSS/heap/log - scripts/probe-jobs-leak.mts — confirms JobRegistry's MAX_COMPLETED_JOBS cap actually evicts - scripts/analyze-cpuprofile.mjs — flat self/total roll-up for any .cpuprofile produced by --cpu-prof or `reasonix code --profile` Co-authored-by: reasonix <reasonix@deepseek.com>

esengine merged commit 421e412 into main May 24, 2026
4 checks passed

esengine deleted the refactor/remove-preflight-unify-fold branch May 24, 2026 03:23

esengine mentioned this pull request May 25, 2026

perf(tokenizer): cache BPE + bounded counts, fast-path truncate (-57% CPU, -22% RSS) #1741

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(context-manager): drop preflight, fold once at turn start#1646

refactor(context-manager): drop preflight, fold once at turn start#1646
esengine merged 1 commit into
mainfrom
refactor/remove-preflight-unify-fold

esengine commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant