Skip to content

Cache-maximal context mode: re-read active files instead of summarizing #528

@Hmbown

Description

@Hmbown

Thesis

DeepSeek V4 makes cached input cheap enough that the active working set should be treated as resident source, not summarized memory. Instead of compacting away old file/tool context early, the engine can re-read and re-pass full contents for active files each turn, preserving exact source truth while relying on stable prompt prefixes and cache-hit telemetry to keep cost acceptable.

Current behavior

  • crates/tui/src/compaction.rs:30 enables auto-compaction by default at token_threshold: 50000 and message_threshold: 50.
  • crates/tui/src/compaction.rs:742 replaces unpinned history with an LLM-generated summary plus pinned messages.
  • crates/tui/src/working_set.rs:391 only renders a compact active-path summary in the system prompt, not file contents.
  • crates/tui/src/core/engine.rs:958 uses working-set pins to preserve selected messages during manual compaction.

Proposed change

Add an opt-in context.cache_maximal = true policy that, before each model request, materializes the top N working-set files into stable synthetic context blocks or messages. In this mode, defer ordinary summarization compaction until a hard request-budget threshold, and prefer re-reading current file contents over preserving stale summarized references. Keep file block order deterministic and size-bounded per file and per turn.

Open questions / risks

  • DeepSeek cache TTL may make long idle sessions pay cache-miss cost again.
  • Very large files still affect wall-clock latency even when cache hits are cheap.
  • Prompt-prefix stability is mandatory: unstable ordering or metadata would defeat the whole design.
  • Must not hide context-window hard walls; emergency compaction/trim still needs to work.

Acceptance signals

  • A fixture repo benchmark shows repeated turns over the same active files with high prompt_cache_hit_tokens / input_tokens after the first turn.
  • Editing a file causes only that file block to cache-miss while unchanged resident files remain cache-hit-heavy.
  • A regression test proves deterministic file block ordering for the same working set.
  • Manual and emergency compaction still recover from synthetic over-budget requests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    cache-maximalismDeepSeek V4 cache-maximal context and agent architectureenhancementNew feature or requestv0.9.0Targeting v0.9.0

    Projects

    Status
    In progress

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions