Emergency
Long-running DeepSeek TUI sessions still degrade and crash during realistic multi-hour agent work. The screenshot/repro case is a coordinator turn with several agent_spawn children, checklist updates, agent_wait, and large tool/agent result traffic. The visible symptom is exactly what AGENTS.md already warns about: the session keeps growing until the TUI becomes fragile and dies.
This is not just an operator habit problem. We currently tell the model to delegate/batch/compact, but the runtime still lets the parent session accumulate unbounded model-visible and UI-visible state.
What we are doing wrong vs Codex
DeepSeek TUI current behavior/evidence:
AGENTS.md:51-63 says long sessions will degrade/crash because api_messages and history accumulate with no automatic pruning, and session saves serialize the bloated array.
docs/CONFIGURATION.md:130-135 and docs/CONFIGURATION.md:187-191 make replacement compaction opt-in (auto_compact = false) and keep the capacity controller disabled unless configured.
crates/tui/src/tui/app.rs:430-447 keeps unbounded visible history and model api_messages vectors in the TUI app state.
crates/tui/src/session_manager.rs:91-107 defines saved sessions as full messages: Vec<Message>, and crates/tui/src/session_manager.rs:148-156 pretty-serializes the whole session on save.
crates/tui/tests/integration_mock_llm.rs:529-575 has the important end-to-end tests for compaction/resume, sub-agent round trip, parallel tool execution, and capacity-controller-forced compaction, but they are all ignored because the engine still takes a concrete DeepSeekClient instead of Arc<dyn LlmClient>.
Codex reference behavior in /Volumes/VIXinSSD/codex-main:
codex-rs/core/src/session/turn.rs:148-155 computes an auto-compaction limit before sampling, and turn.rs:710-738 runs pre-sampling compaction when token usage crosses that limit.
turn.rs:467-492 can run mid-turn compaction when follow-up/tool continuation would continue past the limit.
turn.rs:788-807 routes auto-compaction through local or remote compaction depending on provider support.
codex-rs/core/src/compact.rs:246-265 builds replacement compacted history, records replacement_history, and installs it via replace_compacted_history.
codex-rs/core/src/session/mod.rs:2477-2493 replaces live history and persists a compacted rollout item, instead of merely appending more transcript.
codex-rs/core/src/agent/control.rs:383-387 supports forking only the last N turns, and codex-rs/core/src/agent/control_tests.rs:706-724 / 873-925 verify sanitized/last-N child history.
Repro shape
- Launch
deepseek in this repo.
- Start a broad issue/work sprint with 4-6 sub-agents.
- Let the parent coordinate with checklist updates, repeated
agent_wait, and sub-agent result ingestion.
- Observe context and transcript growth; the parent gets slower and eventually dies or becomes unusable unless the user manually compacts/restarts.
This should be reproducible even without provider flakiness by using a mock/integration harness that appends large tool outputs and sub-agent completions over many turns.
Required fix
Make long-running sessions survivable by default, not only by prompt discipline.
Acceptance criteria:
- Auto-compaction/cycling is enabled for the default V4 long-running path before the parent session reaches the danger zone. It must run as a runtime guardrail, not only as a model suggestion.
- Compaction replaces live model history with a bounded compacted transcript and preserves enough exact state to continue tool calls safely, including DeepSeek V4
reasoning_content replay requirements.
- TUI visible transcript state is bounded or virtualized so rendering/saves do not scale linearly forever with every tool card and sub-agent result.
- Session persistence no longer pretty-serializes arbitrarily huge full
messages arrays on every save/checkpoint. Store an event log plus compacted/current snapshot, or otherwise cap/write incrementally.
- Sub-agent result ingestion into the parent is summarized/bounded by default; full child transcript/details stay in the child artifact/session and are fetched on demand.
- Parent-to-child fork defaults should avoid copying the entire parent history. Support and test
last N turns/sanitized fork semantics like Codex.
- Unignore or replace the ignored integration tests in
crates/tui/tests/integration_mock_llm.rs:529-575 so CI covers compaction/resume, sub-agent round trip, parallel tool execution, and forced compaction before send.
- Add a stress test that simulates at least 50 parent turns with repeated large tool/sub-agent outputs and asserts bounded
api_messages, bounded save size, and acceptable transcript render/update time.
- Add a user-visible emergency state before overflow: when context/session size crosses a hard threshold, the app must compact/cycle or block the next model call with a recoverable prompt, not continue until crash.
Non-goals
- Do not solve this only by changing
AGENTS.md or the system prompt.
- Do not make users manually remember
/compact every few turns.
- Do not rely on "spawn more sub-agents" while the parent still accumulates unbounded child notifications/results.
Priority
P0 for v0.8.6 stabilization. This is blocking long-running agent work, which is the core use case for the current branch.
Emergency
Long-running DeepSeek TUI sessions still degrade and crash during realistic multi-hour agent work. The screenshot/repro case is a coordinator turn with several
agent_spawnchildren, checklist updates,agent_wait, and large tool/agent result traffic. The visible symptom is exactly whatAGENTS.mdalready warns about: the session keeps growing until the TUI becomes fragile and dies.This is not just an operator habit problem. We currently tell the model to delegate/batch/compact, but the runtime still lets the parent session accumulate unbounded model-visible and UI-visible state.
What we are doing wrong vs Codex
DeepSeek TUI current behavior/evidence:
AGENTS.md:51-63says long sessions will degrade/crash becauseapi_messagesandhistoryaccumulate with no automatic pruning, and session saves serialize the bloated array.docs/CONFIGURATION.md:130-135anddocs/CONFIGURATION.md:187-191make replacement compaction opt-in (auto_compact = false) and keep the capacity controller disabled unless configured.crates/tui/src/tui/app.rs:430-447keeps unbounded visiblehistoryand modelapi_messagesvectors in the TUI app state.crates/tui/src/session_manager.rs:91-107defines saved sessions as fullmessages: Vec<Message>, andcrates/tui/src/session_manager.rs:148-156pretty-serializes the whole session on save.crates/tui/tests/integration_mock_llm.rs:529-575has the important end-to-end tests for compaction/resume, sub-agent round trip, parallel tool execution, and capacity-controller-forced compaction, but they are all ignored because the engine still takes a concreteDeepSeekClientinstead ofArc<dyn LlmClient>.Codex reference behavior in
/Volumes/VIXinSSD/codex-main:codex-rs/core/src/session/turn.rs:148-155computes an auto-compaction limit before sampling, andturn.rs:710-738runs pre-sampling compaction when token usage crosses that limit.turn.rs:467-492can run mid-turn compaction when follow-up/tool continuation would continue past the limit.turn.rs:788-807routes auto-compaction through local or remote compaction depending on provider support.codex-rs/core/src/compact.rs:246-265builds replacement compacted history, recordsreplacement_history, and installs it viareplace_compacted_history.codex-rs/core/src/session/mod.rs:2477-2493replaces live history and persists a compacted rollout item, instead of merely appending more transcript.codex-rs/core/src/agent/control.rs:383-387supports forking only the last N turns, andcodex-rs/core/src/agent/control_tests.rs:706-724/873-925verify sanitized/last-N child history.Repro shape
deepseekin this repo.agent_wait, and sub-agent result ingestion.This should be reproducible even without provider flakiness by using a mock/integration harness that appends large tool outputs and sub-agent completions over many turns.
Required fix
Make long-running sessions survivable by default, not only by prompt discipline.
Acceptance criteria:
reasoning_contentreplay requirements.messagesarrays on every save/checkpoint. Store an event log plus compacted/current snapshot, or otherwise cap/write incrementally.last N turns/sanitized fork semantics like Codex.crates/tui/tests/integration_mock_llm.rs:529-575so CI covers compaction/resume, sub-agent round trip, parallel tool execution, and forced compaction before send.api_messages, bounded save size, and acceptable transcript render/update time.Non-goals
AGENTS.mdor the system prompt./compactevery few turns.Priority
P0 for v0.8.6 stabilization. This is blocking long-running agent work, which is the core use case for the current branch.