V4: cache-maximal context as default, demote routine compaction

## Thesis
DeepSeek V4-Pro requires 27% of V3.2's per-token FLOPs and 10% of the KV cache *at 1M context* due to Hybrid Attention (CSA + HCA). The architecture only pays off in the long-context regime; aggressive routine compaction at 50K tokens defeats it. Make cache-maximal context the V4 default and treat compaction as an emergency mechanism, not routine hygiene.

Promotes #528 (currently drafted as opt-in) to default-on for V4 specifically. Legacy / V3 / custom-provider paths keep existing thresholds — raising compaction limits there could blow user bills.

## Current behavior
- `crates/tui/src/compaction.rs:38-39` — `enabled: true`, `token_threshold: 50000`, `message_threshold: 50` regardless of model.
- `crates/tui/src/cycle_manager.rs:65-68` — already V4-aware: hard cycle threshold bumped to 768K (~75% of 1M window). Compaction thresholds did NOT get the same treatment.
- The mismatch: cycle_manager rotates at 768K, but compaction.rs summarizes at 50K. We're paying for V4's cheap long context and then immediately throwing it away.

## Proposed change
- Make `CompactionConfig` model-aware. When the active model is V4 family:
  - Raise `token_threshold` to ~500K (leaving ~500K headroom for: conversation, system prompt + tools, thinking-mode reserve, tool result accumulation).
  - Raise `message_threshold` proportionally or remove it.
  - Treat compaction as a near-cycle-boundary emergency, not routine.
- Promote #528 ("re-read active files instead of summarizing") to default-on for V4. Working set stays resident; CSA/HCA handles the distant tail.
- V3 / legacy / custom-provider paths unchanged. Detect by model name prefix.

## Open questions / risks
- Custom base URL serving a non-V4 model under a V4-prefix name → silently raised thresholds → blown bill. Need robust detection, opt-out config, or per-model override (`[cycle.per_model]` already exists in config.toml — extend the same pattern to compaction).
- 500K is a starting estimate; needs measurement against real long-running V4 sessions before being declared the default.
- Interaction with `cycle_manager`'s 768K hard cycle: compaction at 500K and cycle at 768K leaves 268K of untouched-but-still-V4-cheap context. Confirm that's intentional.

## Acceptance signals
- V4 sessions routinely hit 200K+ tokens without `plan_compaction` firing.
- `working_set` files stay resident across long turn loops.
- V3 sessions show no behavior change.
- A measurable cache hit rate improvement on long V4 sessions (cache stays warm because the prefix doesn't get rewritten).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V4: cache-maximal context as default, demote routine compaction #541

Thesis

Current behavior

Proposed change

Open questions / risks

Acceptance signals

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

V4: cache-maximal context as default, demote routine compaction #541

Description

Thesis

Current behavior

Proposed change

Open questions / risks

Acceptance signals

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions