refactor(context-manager): preflight folds first, drop obsolete byte ceiling#1642
Merged
Merged
Conversation
…ceiling Live probe (tools/probe-deepseek-body-limit.mjs) shows DeepSeek's gateway accepts at least 8MB request bodies — the empirical ~880KB ceiling that MAX_BODY_BYTES guarded against no longer exists. Remove the byte path from decidePreflight + mechanicalTruncate and converge on a single fold-first pipeline: - Preflight tries semantic fold first; mechanical truncate is a last-resort fallback only when fold can't summarize (empty head, savings too small, active tool turn would be wiped, summarizer failed). - fold() per-message token estimate now includes tool_calls JSON so heavy tool-call args can't slip through the tail-budget check and slide the boundary past an active tool turn. - Preflight sets _foldedThisTurn so decideAfterUsage doesn't re-fold on the already-compacted log. - New `requireTailBoundary` option on fold() lets preflight refuse when no user lands in tail (would wipe an active tool turn). - Drop bodyKB placeholder + bytes trigger from i18n. End-to-end validated against live DeepSeek (tools/e2e-context-compression.mts): baseline, high-ratio terminal turn, and preflight emergency all pass.
3 tasks
esengine
added a commit
that referenced
this pull request
May 24, 2026
…1646) Follow-up to #1642. After landing the fold-first preflight, the next question was whether preflight needs to exist at all — and on a 1M-context provider it doesn't: post-response decideAfterUsage already folds at 75%, upstream tool-result caps prevent single-message blowups, and the byte ceiling the preflight was originally guarding against is gone. Converge on the Claude-Code-style single compaction path: one fold check per turn, at turn start. - Delete decidePreflight, mechanicalTruncate, related constants, the PreflightDecision interface, and the per-iter preflight block in step(). - Add a single turn-start check after the user message is appended: if local request estimate > TURN_START_FOLD_THRESHOLD (90%), fold once before the iter loop. - No mechanical fallback. If fold can't shrink the log, the request goes out and DeepSeek's error surfaces to the user — honest beats silent re-compaction with worse semantics. - Drop preflight i18n keys + delete tests/preflight.test.ts. Net -393 lines. E2E (live DeepSeek) green across baseline, high-ratio terminal turn, and turn-start fold scenarios.
esengine
pushed a commit
that referenced
this pull request
May 24, 2026
…moved, persisted usage stats, plan dispatch gate Headline themes: - Desktop: bundle the CLI-hosted React dashboard, retire Tauri+Preact duplicate (#1418) - Config: drop preset abstraction; flash/pro are direct model selections (#1657, #1630) - Stats: persist cumulative usage to session meta + auto-restore on startup (#1667, #1680, #1643, #1628) - Plans: editMode="plan" enforced at the ToolRegistry dispatch gate (#1681); step advance fix (#1629) - Context: fold once at turn start, drop pre-flight + byte-ceiling (#1642, #1646); collapsible compacted card (#1649) - Subagents: per-skill flash/pro override + Settings UI (#1632) - Desktop polish: sidebar drag-resize (#1688), responsive collapse (#1585), copy/edit overlay + msg-history nav (#1645), Esc closes modal not turn (#1685), QQ tab isolation (#1672), DiffCard for edits (#1662), theme-aware highlighting (#1655), system events toggle (#1654/#1650), macOS TCC inheritance (#1614), dashboard.enabled (#1612) - Dashboard polish: persistent session URL (#1586, #1589, #1599), theme-aware highlighting (#1664), IME confirm-enter guard (#1689), code-fence lang fix (#1677), vendor chunk split (#1587), markdown table h-scroll (#1562) - TUI: Alt+S input stash/recall; static history isolated from input rerenders (#1635); legacy mouse drop (#1637, #1648); multi-edit gated in review (#1647) - Diff: SplitDiff column border holds under CJK (#1686) - MCP: workspace roots passed to servers (#1625); codeCommand honors mcpServers (#1603) - Config plumbing: (baseUrl, apiKey) resolved as a tuple (#1658); stale model id self-heal (#1663) See CHANGELOG for the full list.
4 tasks
esengine
added a commit
that referenced
this pull request
May 25, 2026
…uncate (#1741) Three changes that together cut per-turn CPU ~57% and steady-state RSS ~22% in the 200-turn fakeFetch probe (rss=256MB→181MB at log.len=800). - bpeEncode: in-place splice instead of slice/spread rebuild on every merge, plus 8K-entry LRU cache. Repetitive tool output (padded payloads, identifiers in code) re-encodes the same byte-level chunks thousands of times per session; the cache caps that at ~400KB. - estimateConversationTokens: drop the full formatDeepSeekPrompt rebuild + single bounded tokenize. Sum per-message bounded counts with a fixed template overhead, gated by a content-string-keyed 4K-entry LRU. Same entry tokenizes once over its lifetime instead of once per turn. The estimate drives fold thresholds (50%/75% of ctx) where ±5% slop is harmless. - truncateForModelByTokens: sample-based fast path. For inputs in the [maxTokens, maxTokens*4] range the old code unconditionally tokenized the full string (37% total CPU on the 200-turn probe). Now we use a 2KB-sample estimate with a 1.15x safety margin; only borderline cases fall through to a precise tokenize. Regression origin: #1642/#1646 collapsed the conditional preflight into an unconditional estimateTurnStart that runs every turn, surfacing the underlying tokenizer cost. The tokenizer itself has always been a pure-TS BPE port without caching — fine when called rarely, expensive when called on every turn against growing logs. Also adds three probes that reproduce + measure: - scripts/probe-mem-leak.mts — drives CacheFirstLoop through N turns with fakeFetch, samples RSS/heap/log - scripts/probe-jobs-leak.mts — confirms JobRegistry's MAX_COMPLETED_JOBS cap actually evicts - scripts/analyze-cpuprofile.mjs — flat self/total roll-up for any .cpuprofile produced by --cpu-prof or `reasonix code --profile` Co-authored-by: reasonix <reasonix@deepseek.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tools/probe-deepseek-body-limit.mjs) confirms DeepSeek's gateway happily accepts 8MB request bodies — the empirical ~880KB ceilingMAX_BODY_BYTESwas guarding against no longer exists. Remove the entire byte-trigger path fromdecidePreflight+mechanicalTruncate.foldfirst, falls back tomechanicalTruncateonly when fold cannot summarize (empty head, savings under 30%, active tool turn would be wiped, or summarizer call failed/timed out).fold(): per-message token estimate now countstool_callsJSON (was content-only, letting heavy args slip past the tail-budget check and slide the boundary past an active tool turn), and the newrequireTailBoundaryoption (preflight-only) refuses when no user lands in tail._foldedThisTurnafter compacting sodecideAfterUsagedoes not re-fold on the now-empty head.bodyKBplaceholder + bytes-trigger wording from i18n; rename "truncate" to "compact" in preflight messages.tools/e2e-context-compression.mts): baseline, high-ratio terminal turn, and preflight emergency all pass.Test plan
npm run verify— 258 files / 3586 tests pass (comment policy + biome + full vitest)tests/preflight.test.ts— 6 tests pass (2 obsolete byte tests deleted, 4 rewritten for fold-first flow)tests/loop.test.ts— auto-fold + aggressive fold tests pass against new boundary checkstests/context-manager-cache-aligned-fold.test.ts+context-manager-skill-pin.test.ts— both still green after fold token-estimation changetools/e2e-context-compression.mts) — baseline, high-ratio, preflight emergency all green; preflight successfully calls the live summarizer, compacts 201 → 12 messages, and the main request goes through