Skip to content

V4: cache-maximal context as default, demote routine compaction #541

@Hmbown

Description

@Hmbown

Thesis

DeepSeek V4-Pro requires 27% of V3.2's per-token FLOPs and 10% of the KV cache at 1M context due to Hybrid Attention (CSA + HCA). The architecture only pays off in the long-context regime; aggressive routine compaction at 50K tokens defeats it. Make cache-maximal context the V4 default and treat compaction as an emergency mechanism, not routine hygiene.

Promotes #528 (currently drafted as opt-in) to default-on for V4 specifically. Legacy / V3 / custom-provider paths keep existing thresholds — raising compaction limits there could blow user bills.

Current behavior

  • crates/tui/src/compaction.rs:38-39enabled: true, token_threshold: 50000, message_threshold: 50 regardless of model.
  • crates/tui/src/cycle_manager.rs:65-68 — already V4-aware: hard cycle threshold bumped to 768K (~75% of 1M window). Compaction thresholds did NOT get the same treatment.
  • The mismatch: cycle_manager rotates at 768K, but compaction.rs summarizes at 50K. We're paying for V4's cheap long context and then immediately throwing it away.

Proposed change

  • Make CompactionConfig model-aware. When the active model is V4 family:
    • Raise token_threshold to ~500K (leaving ~500K headroom for: conversation, system prompt + tools, thinking-mode reserve, tool result accumulation).
    • Raise message_threshold proportionally or remove it.
    • Treat compaction as a near-cycle-boundary emergency, not routine.
  • Promote Cache-maximal context mode: re-read active files instead of summarizing #528 ("re-read active files instead of summarizing") to default-on for V4. Working set stays resident; CSA/HCA handles the distant tail.
  • V3 / legacy / custom-provider paths unchanged. Detect by model name prefix.

Open questions / risks

  • Custom base URL serving a non-V4 model under a V4-prefix name → silently raised thresholds → blown bill. Need robust detection, opt-out config, or per-model override ([cycle.per_model] already exists in config.toml — extend the same pattern to compaction).
  • 500K is a starting estimate; needs measurement against real long-running V4 sessions before being declared the default.
  • Interaction with cycle_manager's 768K hard cycle: compaction at 500K and cycle at 768K leaves 268K of untouched-but-still-V4-cheap context. Confirm that's intentional.

Acceptance signals

  • V4 sessions routinely hit 200K+ tokens without plan_compaction firing.
  • working_set files stay resident across long turn loops.
  • V3 sessions show no behavior change.
  • A measurable cache hit rate improvement on long V4 sessions (cache stays warm because the prefix doesn't get rewritten).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestv0.9.0Targeting v0.9.0

    Projects

    Status
    In progress

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions