refactor(context-manager): drop preflight, fold once at turn start#1646
Merged
Conversation
The preflight path that #1642 reshaped was redundant on a 1M-context provider: post-response decideAfterUsage already folds at 75%, upstream tool-result caps prevent single-message blowups, and the byte ceiling preflight was guarding against doesn't exist. Converge on the Claude-Code-style single compaction path. - Delete decidePreflight + mechanicalTruncate, related constants and PreflightDecision interface, and the per-iter preflight block in step(). - Add a single turn-start check that runs once after the user message is appended: if local estimate > 90% of ctxMax, fold first. Threshold is conservative on purpose — post-response 75% handles routine growth; turn-start covers what it can't see (terminal prior turn, fresh resume, huge user paste). - No mechanical fallback. If fold can't shrink the log, the request goes out and DeepSeek's error surfaces to the user — honest beats silent re-compaction with worse semantics. - Drop preflight i18n keys + preflight.test.ts. E2E (live DeepSeek) still green across baseline, high-ratio terminal turn, and turn-start fold.
esengine
pushed a commit
that referenced
this pull request
May 24, 2026
…moved, persisted usage stats, plan dispatch gate Headline themes: - Desktop: bundle the CLI-hosted React dashboard, retire Tauri+Preact duplicate (#1418) - Config: drop preset abstraction; flash/pro are direct model selections (#1657, #1630) - Stats: persist cumulative usage to session meta + auto-restore on startup (#1667, #1680, #1643, #1628) - Plans: editMode="plan" enforced at the ToolRegistry dispatch gate (#1681); step advance fix (#1629) - Context: fold once at turn start, drop pre-flight + byte-ceiling (#1642, #1646); collapsible compacted card (#1649) - Subagents: per-skill flash/pro override + Settings UI (#1632) - Desktop polish: sidebar drag-resize (#1688), responsive collapse (#1585), copy/edit overlay + msg-history nav (#1645), Esc closes modal not turn (#1685), QQ tab isolation (#1672), DiffCard for edits (#1662), theme-aware highlighting (#1655), system events toggle (#1654/#1650), macOS TCC inheritance (#1614), dashboard.enabled (#1612) - Dashboard polish: persistent session URL (#1586, #1589, #1599), theme-aware highlighting (#1664), IME confirm-enter guard (#1689), code-fence lang fix (#1677), vendor chunk split (#1587), markdown table h-scroll (#1562) - TUI: Alt+S input stash/recall; static history isolated from input rerenders (#1635); legacy mouse drop (#1637, #1648); multi-edit gated in review (#1647) - Diff: SplitDiff column border holds under CJK (#1686) - MCP: workspace roots passed to servers (#1625); codeCommand honors mcpServers (#1603) - Config plumbing: (baseUrl, apiKey) resolved as a tuple (#1658); stale model id self-heal (#1663) See CHANGELOG for the full list.
4 tasks
esengine
added a commit
that referenced
this pull request
May 25, 2026
…uncate (#1741) Three changes that together cut per-turn CPU ~57% and steady-state RSS ~22% in the 200-turn fakeFetch probe (rss=256MB→181MB at log.len=800). - bpeEncode: in-place splice instead of slice/spread rebuild on every merge, plus 8K-entry LRU cache. Repetitive tool output (padded payloads, identifiers in code) re-encodes the same byte-level chunks thousands of times per session; the cache caps that at ~400KB. - estimateConversationTokens: drop the full formatDeepSeekPrompt rebuild + single bounded tokenize. Sum per-message bounded counts with a fixed template overhead, gated by a content-string-keyed 4K-entry LRU. Same entry tokenizes once over its lifetime instead of once per turn. The estimate drives fold thresholds (50%/75% of ctx) where ±5% slop is harmless. - truncateForModelByTokens: sample-based fast path. For inputs in the [maxTokens, maxTokens*4] range the old code unconditionally tokenized the full string (37% total CPU on the 200-turn probe). Now we use a 2KB-sample estimate with a 1.15x safety margin; only borderline cases fall through to a precise tokenize. Regression origin: #1642/#1646 collapsed the conditional preflight into an unconditional estimateTurnStart that runs every turn, surfacing the underlying tokenizer cost. The tokenizer itself has always been a pure-TS BPE port without caching — fine when called rarely, expensive when called on every turn against growing logs. Also adds three probes that reproduce + measure: - scripts/probe-mem-leak.mts — drives CacheFirstLoop through N turns with fakeFetch, samples RSS/heap/log - scripts/probe-jobs-leak.mts — confirms JobRegistry's MAX_COMPLETED_JOBS cap actually evicts - scripts/analyze-cpuprofile.mjs — flat self/total roll-up for any .cpuprofile produced by --cpu-prof or `reasonix code --profile` Co-authored-by: reasonix <reasonix@deepseek.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #1642. After landing the fold-first preflight, the next question was whether preflight needs to exist at all — and on a 1M-context provider, it doesn't:
decideAfterUsagealready folds at 75% (covers routine growth).So preflight on a per-iter basis was redundant. Converge on the Claude-Code-style single compaction path: one fold check per turn, at turn start.
What changed
decidePreflight,mechanicalTruncate, related constants, thePreflightDecisioninterface, and the per-iter preflight block instep().TURN_START_FOLD_THRESHOLD(90%), fold once before the iter loop.tests/preflight.test.ts.Why 90% at turn-start (not 75%)
Post-response fold at 75% already handles routine in-conversation growth. Turn-start only needs to catch what post-response can't see:
decideAfterUsageskipped)For those, 90% is the right "almost over" emergency line; 75% would fire on most long sessions and add summarizer-call latency to every turn.
Diff stats
7 files changed, 71 insertions(+), 464 deletions(-) — net -393 lines.
Test plan
npm run verify— 258 files / 3584 tests passtools/e2e-context-compression.mtsagainst real DeepSeek): baseline / high-ratio terminal turn / turn-start fold all green