Thesis
DeepSeek V4 makes cached input cheap enough that the active working set should be treated as resident source, not summarized memory. Instead of compacting away old file/tool context early, the engine can re-read and re-pass full contents for active files each turn, preserving exact source truth while relying on stable prompt prefixes and cache-hit telemetry to keep cost acceptable.
Current behavior
crates/tui/src/compaction.rs:30 enables auto-compaction by default at token_threshold: 50000 and message_threshold: 50.
crates/tui/src/compaction.rs:742 replaces unpinned history with an LLM-generated summary plus pinned messages.
crates/tui/src/working_set.rs:391 only renders a compact active-path summary in the system prompt, not file contents.
crates/tui/src/core/engine.rs:958 uses working-set pins to preserve selected messages during manual compaction.
Proposed change
Add an opt-in context.cache_maximal = true policy that, before each model request, materializes the top N working-set files into stable synthetic context blocks or messages. In this mode, defer ordinary summarization compaction until a hard request-budget threshold, and prefer re-reading current file contents over preserving stale summarized references. Keep file block order deterministic and size-bounded per file and per turn.
Open questions / risks
- DeepSeek cache TTL may make long idle sessions pay cache-miss cost again.
- Very large files still affect wall-clock latency even when cache hits are cheap.
- Prompt-prefix stability is mandatory: unstable ordering or metadata would defeat the whole design.
- Must not hide context-window hard walls; emergency compaction/trim still needs to work.
Acceptance signals
- A fixture repo benchmark shows repeated turns over the same active files with high
prompt_cache_hit_tokens / input_tokens after the first turn.
- Editing a file causes only that file block to cache-miss while unchanged resident files remain cache-hit-heavy.
- A regression test proves deterministic file block ordering for the same working set.
- Manual and emergency compaction still recover from synthetic over-budget requests.
Thesis
DeepSeek V4 makes cached input cheap enough that the active working set should be treated as resident source, not summarized memory. Instead of compacting away old file/tool context early, the engine can re-read and re-pass full contents for active files each turn, preserving exact source truth while relying on stable prompt prefixes and cache-hit telemetry to keep cost acceptable.
Current behavior
crates/tui/src/compaction.rs:30enables auto-compaction by default attoken_threshold: 50000andmessage_threshold: 50.crates/tui/src/compaction.rs:742replaces unpinned history with an LLM-generated summary plus pinned messages.crates/tui/src/working_set.rs:391only renders a compact active-path summary in the system prompt, not file contents.crates/tui/src/core/engine.rs:958uses working-set pins to preserve selected messages during manual compaction.Proposed change
Add an opt-in
context.cache_maximal = truepolicy that, before each model request, materializes the top N working-set files into stable synthetic context blocks or messages. In this mode, defer ordinary summarization compaction until a hard request-budget threshold, and prefer re-reading current file contents over preserving stale summarized references. Keep file block order deterministic and size-bounded per file and per turn.Open questions / risks
Acceptance signals
prompt_cache_hit_tokens / input_tokensafter the first turn.