P0: make long-running sessions survivable by default (Codex-style compaction + bounded transcript state)

## Emergency

Long-running DeepSeek TUI sessions still degrade and crash during realistic multi-hour agent work. The screenshot/repro case is a coordinator turn with several `agent_spawn` children, checklist updates, `agent_wait`, and large tool/agent result traffic. The visible symptom is exactly what `AGENTS.md` already warns about: the session keeps growing until the TUI becomes fragile and dies.

This is not just an operator habit problem. We currently tell the model to delegate/batch/compact, but the runtime still lets the parent session accumulate unbounded model-visible and UI-visible state.

## What we are doing wrong vs Codex

DeepSeek TUI current behavior/evidence:

- `AGENTS.md:51-63` says long sessions will degrade/crash because `api_messages` and `history` accumulate with no automatic pruning, and session saves serialize the bloated array.
- `docs/CONFIGURATION.md:130-135` and `docs/CONFIGURATION.md:187-191` make replacement compaction opt-in (`auto_compact = false`) and keep the capacity controller disabled unless configured.
- `crates/tui/src/tui/app.rs:430-447` keeps unbounded visible `history` and model `api_messages` vectors in the TUI app state.
- `crates/tui/src/session_manager.rs:91-107` defines saved sessions as full `messages: Vec<Message>`, and `crates/tui/src/session_manager.rs:148-156` pretty-serializes the whole session on save.
- `crates/tui/tests/integration_mock_llm.rs:529-575` has the important end-to-end tests for compaction/resume, sub-agent round trip, parallel tool execution, and capacity-controller-forced compaction, but they are all ignored because the engine still takes a concrete `DeepSeekClient` instead of `Arc<dyn LlmClient>`.

Codex reference behavior in `/Volumes/VIXinSSD/codex-main`:

- `codex-rs/core/src/session/turn.rs:148-155` computes an auto-compaction limit before sampling, and `turn.rs:710-738` runs pre-sampling compaction when token usage crosses that limit.
- `turn.rs:467-492` can run mid-turn compaction when follow-up/tool continuation would continue past the limit.
- `turn.rs:788-807` routes auto-compaction through local or remote compaction depending on provider support.
- `codex-rs/core/src/compact.rs:246-265` builds replacement compacted history, records `replacement_history`, and installs it via `replace_compacted_history`.
- `codex-rs/core/src/session/mod.rs:2477-2493` replaces live history and persists a compacted rollout item, instead of merely appending more transcript.
- `codex-rs/core/src/agent/control.rs:383-387` supports forking only the last N turns, and `codex-rs/core/src/agent/control_tests.rs:706-724` / `873-925` verify sanitized/last-N child history.

## Repro shape

1. Launch `deepseek` in this repo.
2. Start a broad issue/work sprint with 4-6 sub-agents.
3. Let the parent coordinate with checklist updates, repeated `agent_wait`, and sub-agent result ingestion.
4. Observe context and transcript growth; the parent gets slower and eventually dies or becomes unusable unless the user manually compacts/restarts.

This should be reproducible even without provider flakiness by using a mock/integration harness that appends large tool outputs and sub-agent completions over many turns.

## Required fix

Make long-running sessions survivable by default, not only by prompt discipline.

Acceptance criteria:

- Auto-compaction/cycling is enabled for the default V4 long-running path before the parent session reaches the danger zone. It must run as a runtime guardrail, not only as a model suggestion.
- Compaction replaces live model history with a bounded compacted transcript and preserves enough exact state to continue tool calls safely, including DeepSeek V4 `reasoning_content` replay requirements.
- TUI visible transcript state is bounded or virtualized so rendering/saves do not scale linearly forever with every tool card and sub-agent result.
- Session persistence no longer pretty-serializes arbitrarily huge full `messages` arrays on every save/checkpoint. Store an event log plus compacted/current snapshot, or otherwise cap/write incrementally.
- Sub-agent result ingestion into the parent is summarized/bounded by default; full child transcript/details stay in the child artifact/session and are fetched on demand.
- Parent-to-child fork defaults should avoid copying the entire parent history. Support and test `last N turns`/sanitized fork semantics like Codex.
- Unignore or replace the ignored integration tests in `crates/tui/tests/integration_mock_llm.rs:529-575` so CI covers compaction/resume, sub-agent round trip, parallel tool execution, and forced compaction before send.
- Add a stress test that simulates at least 50 parent turns with repeated large tool/sub-agent outputs and asserts bounded `api_messages`, bounded save size, and acceptable transcript render/update time.
- Add a user-visible emergency state before overflow: when context/session size crosses a hard threshold, the app must compact/cycle or block the next model call with a recoverable prompt, not continue until crash.

## Non-goals

- Do not solve this only by changing `AGENTS.md` or the system prompt.
- Do not make users manually remember `/compact` every few turns.
- Do not rely on "spawn more sub-agents" while the parent still accumulates unbounded child notifications/results.

## Priority

P0 for v0.8.6 stabilization. This is blocking long-running agent work, which is the core use case for the current branch.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P0: make long-running sessions survivable by default (Codex-style compaction + bounded transcript state) #402

Emergency

What we are doing wrong vs Codex

Repro shape

Required fix

Non-goals

Priority

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

P0: make long-running sessions survivable by default (Codex-style compaction + bounded transcript state) #402

Description

Emergency

What we are doing wrong vs Codex

Repro shape

Required fix

Non-goals

Priority

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions