Cache-maximal context mode: re-read active files instead of summarizing

## Thesis
DeepSeek V4 makes cached input cheap enough that the active working set should be treated as resident source, not summarized memory. Instead of compacting away old file/tool context early, the engine can re-read and re-pass full contents for active files each turn, preserving exact source truth while relying on stable prompt prefixes and cache-hit telemetry to keep cost acceptable.

## Current behavior
- `crates/tui/src/compaction.rs:30` enables auto-compaction by default at `token_threshold: 50000` and `message_threshold: 50`.
- `crates/tui/src/compaction.rs:742` replaces unpinned history with an LLM-generated summary plus pinned messages.
- `crates/tui/src/working_set.rs:391` only renders a compact active-path summary in the system prompt, not file contents.
- `crates/tui/src/core/engine.rs:958` uses working-set pins to preserve selected messages during manual compaction.

## Proposed change
Add an opt-in `context.cache_maximal = true` policy that, before each model request, materializes the top N working-set files into stable synthetic context blocks or messages. In this mode, defer ordinary summarization compaction until a hard request-budget threshold, and prefer re-reading current file contents over preserving stale summarized references. Keep file block order deterministic and size-bounded per file and per turn.

## Open questions / risks
- DeepSeek cache TTL may make long idle sessions pay cache-miss cost again.
- Very large files still affect wall-clock latency even when cache hits are cheap.
- Prompt-prefix stability is mandatory: unstable ordering or metadata would defeat the whole design.
- Must not hide context-window hard walls; emergency compaction/trim still needs to work.

## Acceptance signals
- A fixture repo benchmark shows repeated turns over the same active files with high `prompt_cache_hit_tokens / input_tokens` after the first turn.
- Editing a file causes only that file block to cache-miss while unchanged resident files remain cache-hit-heavy.
- A regression test proves deterministic file block ordering for the same working set.
- Manual and emergency compaction still recover from synthetic over-budget requests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache-maximal context mode: re-read active files instead of summarizing #528

Thesis

Current behavior

Proposed change

Open questions / risks

Acceptance signals

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Cache-maximal context mode: re-read active files instead of summarizing #528

Description

Thesis

Current behavior

Proposed change

Open questions / risks

Acceptance signals

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions