Skip to content

refactor(context-manager): preflight folds first, drop obsolete byte ceiling#1642

Merged
esengine merged 1 commit into
mainfrom
refactor/context-compression-fold-first
May 24, 2026
Merged

refactor(context-manager): preflight folds first, drop obsolete byte ceiling#1642
esengine merged 1 commit into
mainfrom
refactor/context-compression-fold-first

Conversation

@esengine

Copy link
Copy Markdown
Owner

Summary

  • Live probe (tools/probe-deepseek-body-limit.mjs) confirms DeepSeek's gateway happily accepts 8MB request bodies — the empirical ~880KB ceiling MAX_BODY_BYTES was guarding against no longer exists. Remove the entire byte-trigger path from decidePreflight + mechanicalTruncate.
  • Converge on a single fold-first pipeline (Claude-Code-style): preflight tries semantic fold first, falls back to mechanicalTruncate only when fold cannot summarize (empty head, savings under 30%, active tool turn would be wiped, or summarizer call failed/timed out).
  • Fix two latent bugs in fold(): per-message token estimate now counts tool_calls JSON (was content-only, letting heavy args slip past the tail-budget check and slide the boundary past an active tool turn), and the new requireTailBoundary option (preflight-only) refuses when no user lands in tail.
  • Preflight sets _foldedThisTurn after compacting so decideAfterUsage does not re-fold on the now-empty head.
  • Drop bodyKB placeholder + bytes-trigger wording from i18n; rename "truncate" to "compact" in preflight messages.
  • E2E validated against live DeepSeek (tools/e2e-context-compression.mts): baseline, high-ratio terminal turn, and preflight emergency all pass.

Test plan

  • npm run verify — 258 files / 3586 tests pass (comment policy + biome + full vitest)
  • tests/preflight.test.ts — 6 tests pass (2 obsolete byte tests deleted, 4 rewritten for fold-first flow)
  • tests/loop.test.ts — auto-fold + aggressive fold tests pass against new boundary checks
  • tests/context-manager-cache-aligned-fold.test.ts + context-manager-skill-pin.test.ts — both still green after fold token-estimation change
  • E2E against live DeepSeek API (tools/e2e-context-compression.mts) — baseline, high-ratio, preflight emergency all green; preflight successfully calls the live summarizer, compacts 201 → 12 messages, and the main request goes through

…ceiling

Live probe (tools/probe-deepseek-body-limit.mjs) shows DeepSeek's gateway
accepts at least 8MB request bodies — the empirical ~880KB ceiling that
MAX_BODY_BYTES guarded against no longer exists. Remove the byte path
from decidePreflight + mechanicalTruncate and converge on a single
fold-first pipeline:

- Preflight tries semantic fold first; mechanical truncate is a last-resort
  fallback only when fold can't summarize (empty head, savings too small,
  active tool turn would be wiped, summarizer failed).
- fold() per-message token estimate now includes tool_calls JSON so heavy
  tool-call args can't slip through the tail-budget check and slide the
  boundary past an active tool turn.
- Preflight sets _foldedThisTurn so decideAfterUsage doesn't re-fold on
  the already-compacted log.
- New `requireTailBoundary` option on fold() lets preflight refuse when
  no user lands in tail (would wipe an active tool turn).
- Drop bodyKB placeholder + bytes trigger from i18n.

End-to-end validated against live DeepSeek (tools/e2e-context-compression.mts):
baseline, high-ratio terminal turn, and preflight emergency all pass.
@esengine esengine merged commit 3c8cc8c into main May 24, 2026
4 checks passed
@esengine esengine deleted the refactor/context-compression-fold-first branch May 24, 2026 02:49
esengine added a commit that referenced this pull request May 24, 2026
…1646)

Follow-up to #1642. After landing the fold-first preflight, the next
question was whether preflight needs to exist at all — and on a 1M-context
provider it doesn't: post-response decideAfterUsage already folds at
75%, upstream tool-result caps prevent single-message blowups, and the
byte ceiling the preflight was originally guarding against is gone.
Converge on the Claude-Code-style single compaction path: one fold check
per turn, at turn start.

- Delete decidePreflight, mechanicalTruncate, related constants, the
  PreflightDecision interface, and the per-iter preflight block in step().
- Add a single turn-start check after the user message is appended:
  if local request estimate > TURN_START_FOLD_THRESHOLD (90%), fold once
  before the iter loop.
- No mechanical fallback. If fold can't shrink the log, the request goes
  out and DeepSeek's error surfaces to the user — honest beats silent
  re-compaction with worse semantics.
- Drop preflight i18n keys + delete tests/preflight.test.ts.

Net -393 lines. E2E (live DeepSeek) green across baseline, high-ratio
terminal turn, and turn-start fold scenarios.
esengine pushed a commit that referenced this pull request May 24, 2026
…moved, persisted usage stats, plan dispatch gate

Headline themes:
- Desktop: bundle the CLI-hosted React dashboard, retire Tauri+Preact duplicate (#1418)
- Config: drop preset abstraction; flash/pro are direct model selections (#1657, #1630)
- Stats: persist cumulative usage to session meta + auto-restore on startup (#1667, #1680, #1643, #1628)
- Plans: editMode="plan" enforced at the ToolRegistry dispatch gate (#1681); step advance fix (#1629)
- Context: fold once at turn start, drop pre-flight + byte-ceiling (#1642, #1646); collapsible compacted card (#1649)
- Subagents: per-skill flash/pro override + Settings UI (#1632)
- Desktop polish: sidebar drag-resize (#1688), responsive collapse (#1585), copy/edit overlay + msg-history nav (#1645), Esc closes modal not turn (#1685), QQ tab isolation (#1672), DiffCard for edits (#1662), theme-aware highlighting (#1655), system events toggle (#1654/#1650), macOS TCC inheritance (#1614), dashboard.enabled (#1612)
- Dashboard polish: persistent session URL (#1586, #1589, #1599), theme-aware highlighting (#1664), IME confirm-enter guard (#1689), code-fence lang fix (#1677), vendor chunk split (#1587), markdown table h-scroll (#1562)
- TUI: Alt+S input stash/recall; static history isolated from input rerenders (#1635); legacy mouse drop (#1637, #1648); multi-edit gated in review (#1647)
- Diff: SplitDiff column border holds under CJK (#1686)
- MCP: workspace roots passed to servers (#1625); codeCommand honors mcpServers (#1603)
- Config plumbing: (baseUrl, apiKey) resolved as a tuple (#1658); stale model id self-heal (#1663)

See CHANGELOG for the full list.
esengine added a commit that referenced this pull request May 25, 2026
…uncate (#1741)

Three changes that together cut per-turn CPU ~57% and steady-state RSS
~22% in the 200-turn fakeFetch probe (rss=256MB→181MB at log.len=800).

- bpeEncode: in-place splice instead of slice/spread rebuild on every
  merge, plus 8K-entry LRU cache. Repetitive tool output (padded
  payloads, identifiers in code) re-encodes the same byte-level chunks
  thousands of times per session; the cache caps that at ~400KB.

- estimateConversationTokens: drop the full formatDeepSeekPrompt
  rebuild + single bounded tokenize. Sum per-message bounded counts
  with a fixed template overhead, gated by a content-string-keyed
  4K-entry LRU. Same entry tokenizes once over its lifetime instead of
  once per turn. The estimate drives fold thresholds (50%/75% of ctx)
  where ±5% slop is harmless.

- truncateForModelByTokens: sample-based fast path. For inputs in the
  [maxTokens, maxTokens*4] range the old code unconditionally tokenized
  the full string (37% total CPU on the 200-turn probe). Now we use a
  2KB-sample estimate with a 1.15x safety margin; only borderline cases
  fall through to a precise tokenize.

Regression origin: #1642/#1646 collapsed the conditional preflight into
an unconditional estimateTurnStart that runs every turn, surfacing the
underlying tokenizer cost. The tokenizer itself has always been a
pure-TS BPE port without caching — fine when called rarely, expensive
when called on every turn against growing logs.

Also adds three probes that reproduce + measure:
- scripts/probe-mem-leak.mts — drives CacheFirstLoop through N turns
  with fakeFetch, samples RSS/heap/log
- scripts/probe-jobs-leak.mts — confirms JobRegistry's MAX_COMPLETED_JOBS
  cap actually evicts
- scripts/analyze-cpuprofile.mjs — flat self/total roll-up for any
  .cpuprofile produced by --cpu-prof or `reasonix code --profile`

Co-authored-by: reasonix <reasonix@deepseek.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant