Skip to content

P0: make long-running sessions survivable by default (Codex-style compaction + bounded transcript state) #402

@Hmbown

Description

@Hmbown

Emergency

Long-running DeepSeek TUI sessions still degrade and crash during realistic multi-hour agent work. The screenshot/repro case is a coordinator turn with several agent_spawn children, checklist updates, agent_wait, and large tool/agent result traffic. The visible symptom is exactly what AGENTS.md already warns about: the session keeps growing until the TUI becomes fragile and dies.

This is not just an operator habit problem. We currently tell the model to delegate/batch/compact, but the runtime still lets the parent session accumulate unbounded model-visible and UI-visible state.

What we are doing wrong vs Codex

DeepSeek TUI current behavior/evidence:

  • AGENTS.md:51-63 says long sessions will degrade/crash because api_messages and history accumulate with no automatic pruning, and session saves serialize the bloated array.
  • docs/CONFIGURATION.md:130-135 and docs/CONFIGURATION.md:187-191 make replacement compaction opt-in (auto_compact = false) and keep the capacity controller disabled unless configured.
  • crates/tui/src/tui/app.rs:430-447 keeps unbounded visible history and model api_messages vectors in the TUI app state.
  • crates/tui/src/session_manager.rs:91-107 defines saved sessions as full messages: Vec<Message>, and crates/tui/src/session_manager.rs:148-156 pretty-serializes the whole session on save.
  • crates/tui/tests/integration_mock_llm.rs:529-575 has the important end-to-end tests for compaction/resume, sub-agent round trip, parallel tool execution, and capacity-controller-forced compaction, but they are all ignored because the engine still takes a concrete DeepSeekClient instead of Arc<dyn LlmClient>.

Codex reference behavior in /Volumes/VIXinSSD/codex-main:

  • codex-rs/core/src/session/turn.rs:148-155 computes an auto-compaction limit before sampling, and turn.rs:710-738 runs pre-sampling compaction when token usage crosses that limit.
  • turn.rs:467-492 can run mid-turn compaction when follow-up/tool continuation would continue past the limit.
  • turn.rs:788-807 routes auto-compaction through local or remote compaction depending on provider support.
  • codex-rs/core/src/compact.rs:246-265 builds replacement compacted history, records replacement_history, and installs it via replace_compacted_history.
  • codex-rs/core/src/session/mod.rs:2477-2493 replaces live history and persists a compacted rollout item, instead of merely appending more transcript.
  • codex-rs/core/src/agent/control.rs:383-387 supports forking only the last N turns, and codex-rs/core/src/agent/control_tests.rs:706-724 / 873-925 verify sanitized/last-N child history.

Repro shape

  1. Launch deepseek in this repo.
  2. Start a broad issue/work sprint with 4-6 sub-agents.
  3. Let the parent coordinate with checklist updates, repeated agent_wait, and sub-agent result ingestion.
  4. Observe context and transcript growth; the parent gets slower and eventually dies or becomes unusable unless the user manually compacts/restarts.

This should be reproducible even without provider flakiness by using a mock/integration harness that appends large tool outputs and sub-agent completions over many turns.

Required fix

Make long-running sessions survivable by default, not only by prompt discipline.

Acceptance criteria:

  • Auto-compaction/cycling is enabled for the default V4 long-running path before the parent session reaches the danger zone. It must run as a runtime guardrail, not only as a model suggestion.
  • Compaction replaces live model history with a bounded compacted transcript and preserves enough exact state to continue tool calls safely, including DeepSeek V4 reasoning_content replay requirements.
  • TUI visible transcript state is bounded or virtualized so rendering/saves do not scale linearly forever with every tool card and sub-agent result.
  • Session persistence no longer pretty-serializes arbitrarily huge full messages arrays on every save/checkpoint. Store an event log plus compacted/current snapshot, or otherwise cap/write incrementally.
  • Sub-agent result ingestion into the parent is summarized/bounded by default; full child transcript/details stay in the child artifact/session and are fetched on demand.
  • Parent-to-child fork defaults should avoid copying the entire parent history. Support and test last N turns/sanitized fork semantics like Codex.
  • Unignore or replace the ignored integration tests in crates/tui/tests/integration_mock_llm.rs:529-575 so CI covers compaction/resume, sub-agent round trip, parallel tool execution, and forced compaction before send.
  • Add a stress test that simulates at least 50 parent turns with repeated large tool/sub-agent outputs and asserts bounded api_messages, bounded save size, and acceptable transcript render/update time.
  • Add a user-visible emergency state before overflow: when context/session size crosses a hard threshold, the app must compact/cycle or block the next model call with a recoverable prompt, not continue until crash.

Non-goals

  • Do not solve this only by changing AGENTS.md or the system prompt.
  • Do not make users manually remember /compact every few turns.
  • Do not rely on "spawn more sub-agents" while the parent still accumulates unbounded child notifications/results.

Priority

P0 for v0.8.6 stabilization. This is blocking long-running agent work, which is the core use case for the current branch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcontextContext management / contextv0.8.6Targeting v0.8.6

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions