You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DeepSeek V4-Pro requires 27% of V3.2's per-token FLOPs and 10% of the KV cache at 1M context due to Hybrid Attention (CSA + HCA). The architecture only pays off in the long-context regime; aggressive routine compaction at 50K tokens defeats it. Make cache-maximal context the V4 default and treat compaction as an emergency mechanism, not routine hygiene.
Promotes #528 (currently drafted as opt-in) to default-on for V4 specifically. Legacy / V3 / custom-provider paths keep existing thresholds — raising compaction limits there could blow user bills.
crates/tui/src/cycle_manager.rs:65-68 — already V4-aware: hard cycle threshold bumped to 768K (~75% of 1M window). Compaction thresholds did NOT get the same treatment.
The mismatch: cycle_manager rotates at 768K, but compaction.rs summarizes at 50K. We're paying for V4's cheap long context and then immediately throwing it away.
Proposed change
Make CompactionConfig model-aware. When the active model is V4 family:
Raise token_threshold to ~500K (leaving ~500K headroom for: conversation, system prompt + tools, thinking-mode reserve, tool result accumulation).
Raise message_threshold proportionally or remove it.
Treat compaction as a near-cycle-boundary emergency, not routine.
V3 / legacy / custom-provider paths unchanged. Detect by model name prefix.
Open questions / risks
Custom base URL serving a non-V4 model under a V4-prefix name → silently raised thresholds → blown bill. Need robust detection, opt-out config, or per-model override ([cycle.per_model] already exists in config.toml — extend the same pattern to compaction).
500K is a starting estimate; needs measurement against real long-running V4 sessions before being declared the default.
Interaction with cycle_manager's 768K hard cycle: compaction at 500K and cycle at 768K leaves 268K of untouched-but-still-V4-cheap context. Confirm that's intentional.
Acceptance signals
V4 sessions routinely hit 200K+ tokens without plan_compaction firing.
working_set files stay resident across long turn loops.
V3 sessions show no behavior change.
A measurable cache hit rate improvement on long V4 sessions (cache stays warm because the prefix doesn't get rewritten).
Thesis
DeepSeek V4-Pro requires 27% of V3.2's per-token FLOPs and 10% of the KV cache at 1M context due to Hybrid Attention (CSA + HCA). The architecture only pays off in the long-context regime; aggressive routine compaction at 50K tokens defeats it. Make cache-maximal context the V4 default and treat compaction as an emergency mechanism, not routine hygiene.
Promotes #528 (currently drafted as opt-in) to default-on for V4 specifically. Legacy / V3 / custom-provider paths keep existing thresholds — raising compaction limits there could blow user bills.
Current behavior
crates/tui/src/compaction.rs:38-39—enabled: true,token_threshold: 50000,message_threshold: 50regardless of model.crates/tui/src/cycle_manager.rs:65-68— already V4-aware: hard cycle threshold bumped to 768K (~75% of 1M window). Compaction thresholds did NOT get the same treatment.Proposed change
CompactionConfigmodel-aware. When the active model is V4 family:token_thresholdto ~500K (leaving ~500K headroom for: conversation, system prompt + tools, thinking-mode reserve, tool result accumulation).message_thresholdproportionally or remove it.Open questions / risks
[cycle.per_model]already exists in config.toml — extend the same pattern to compaction).cycle_manager's 768K hard cycle: compaction at 500K and cycle at 768K leaves 268K of untouched-but-still-V4-cheap context. Confirm that's intentional.Acceptance signals
plan_compactionfiring.working_setfiles stay resident across long turn loops.