Skip to content

refactor(context-manager): drop preflight, fold once at turn start#1646

Merged
esengine merged 1 commit into
mainfrom
refactor/remove-preflight-unify-fold
May 24, 2026
Merged

refactor(context-manager): drop preflight, fold once at turn start#1646
esengine merged 1 commit into
mainfrom
refactor/remove-preflight-unify-fold

Conversation

@esengine

Copy link
Copy Markdown
Owner

Summary

Follow-up to #1642. After landing the fold-first preflight, the next question was whether preflight needs to exist at all — and on a 1M-context provider, it doesn't:

So preflight on a per-iter basis was redundant. Converge on the Claude-Code-style single compaction path: one fold check per turn, at turn start.

What changed

  • Delete decidePreflight, mechanicalTruncate, related constants, the PreflightDecision interface, and the per-iter preflight block in step().
  • Add a single turn-start check after the user message is appended: if local request estimate > TURN_START_FOLD_THRESHOLD (90%), fold once before the iter loop.
  • No mechanical fallback. If fold can't shrink the log, the request goes out and DeepSeek's error surfaces to the user — honest beats silent re-compaction with worse semantics.
  • Drop preflight i18n keys + delete tests/preflight.test.ts.

Why 90% at turn-start (not 75%)

Post-response fold at 75% already handles routine in-conversation growth. Turn-start only needs to catch what post-response can't see:

  • A terminal prior turn (plain-text response → decideAfterUsage skipped)
  • Session restore from disk with already-bloated log
  • User pastes a massive prompt that flips the ratio in one move

For those, 90% is the right "almost over" emergency line; 75% would fire on most long sessions and add summarizer-call latency to every turn.

Diff stats

7 files changed, 71 insertions(+), 464 deletions(-) — net -393 lines.

Test plan

  • npm run verify — 258 files / 3584 tests pass
  • Live e2e (tools/e2e-context-compression.mts against real DeepSeek): baseline / high-ratio terminal turn / turn-start fold all green
  • Existing context-manager + skill-pin + cache-aligned-fold tests pass unmodified

The preflight path that #1642 reshaped was redundant on a 1M-context
provider: post-response decideAfterUsage already folds at 75%, upstream
tool-result caps prevent single-message blowups, and the byte ceiling
preflight was guarding against doesn't exist. Converge on the
Claude-Code-style single compaction path.

- Delete decidePreflight + mechanicalTruncate, related constants and
  PreflightDecision interface, and the per-iter preflight block in step().
- Add a single turn-start check that runs once after the user message is
  appended: if local estimate > 90% of ctxMax, fold first. Threshold is
  conservative on purpose — post-response 75% handles routine growth;
  turn-start covers what it can't see (terminal prior turn, fresh resume,
  huge user paste).
- No mechanical fallback. If fold can't shrink the log, the request goes
  out and DeepSeek's error surfaces to the user — honest beats silent
  re-compaction with worse semantics.
- Drop preflight i18n keys + preflight.test.ts. E2E (live DeepSeek) still
  green across baseline, high-ratio terminal turn, and turn-start fold.
@esengine esengine merged commit 421e412 into main May 24, 2026
4 checks passed
@esengine esengine deleted the refactor/remove-preflight-unify-fold branch May 24, 2026 03:23
esengine pushed a commit that referenced this pull request May 24, 2026
…moved, persisted usage stats, plan dispatch gate

Headline themes:
- Desktop: bundle the CLI-hosted React dashboard, retire Tauri+Preact duplicate (#1418)
- Config: drop preset abstraction; flash/pro are direct model selections (#1657, #1630)
- Stats: persist cumulative usage to session meta + auto-restore on startup (#1667, #1680, #1643, #1628)
- Plans: editMode="plan" enforced at the ToolRegistry dispatch gate (#1681); step advance fix (#1629)
- Context: fold once at turn start, drop pre-flight + byte-ceiling (#1642, #1646); collapsible compacted card (#1649)
- Subagents: per-skill flash/pro override + Settings UI (#1632)
- Desktop polish: sidebar drag-resize (#1688), responsive collapse (#1585), copy/edit overlay + msg-history nav (#1645), Esc closes modal not turn (#1685), QQ tab isolation (#1672), DiffCard for edits (#1662), theme-aware highlighting (#1655), system events toggle (#1654/#1650), macOS TCC inheritance (#1614), dashboard.enabled (#1612)
- Dashboard polish: persistent session URL (#1586, #1589, #1599), theme-aware highlighting (#1664), IME confirm-enter guard (#1689), code-fence lang fix (#1677), vendor chunk split (#1587), markdown table h-scroll (#1562)
- TUI: Alt+S input stash/recall; static history isolated from input rerenders (#1635); legacy mouse drop (#1637, #1648); multi-edit gated in review (#1647)
- Diff: SplitDiff column border holds under CJK (#1686)
- MCP: workspace roots passed to servers (#1625); codeCommand honors mcpServers (#1603)
- Config plumbing: (baseUrl, apiKey) resolved as a tuple (#1658); stale model id self-heal (#1663)

See CHANGELOG for the full list.
esengine added a commit that referenced this pull request May 25, 2026
…uncate (#1741)

Three changes that together cut per-turn CPU ~57% and steady-state RSS
~22% in the 200-turn fakeFetch probe (rss=256MB→181MB at log.len=800).

- bpeEncode: in-place splice instead of slice/spread rebuild on every
  merge, plus 8K-entry LRU cache. Repetitive tool output (padded
  payloads, identifiers in code) re-encodes the same byte-level chunks
  thousands of times per session; the cache caps that at ~400KB.

- estimateConversationTokens: drop the full formatDeepSeekPrompt
  rebuild + single bounded tokenize. Sum per-message bounded counts
  with a fixed template overhead, gated by a content-string-keyed
  4K-entry LRU. Same entry tokenizes once over its lifetime instead of
  once per turn. The estimate drives fold thresholds (50%/75% of ctx)
  where ±5% slop is harmless.

- truncateForModelByTokens: sample-based fast path. For inputs in the
  [maxTokens, maxTokens*4] range the old code unconditionally tokenized
  the full string (37% total CPU on the 200-turn probe). Now we use a
  2KB-sample estimate with a 1.15x safety margin; only borderline cases
  fall through to a precise tokenize.

Regression origin: #1642/#1646 collapsed the conditional preflight into
an unconditional estimateTurnStart that runs every turn, surfacing the
underlying tokenizer cost. The tokenizer itself has always been a
pure-TS BPE port without caching — fine when called rarely, expensive
when called on every turn against growing logs.

Also adds three probes that reproduce + measure:
- scripts/probe-mem-leak.mts — drives CacheFirstLoop through N turns
  with fakeFetch, samples RSS/heap/log
- scripts/probe-jobs-leak.mts — confirms JobRegistry's MAX_COMPLETED_JOBS
  cap actually evicts
- scripts/analyze-cpuprofile.mjs — flat self/total roll-up for any
  .cpuprofile produced by --cpu-prof or `reasonix code --profile`

Co-authored-by: reasonix <reasonix@deepseek.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant