Cap message render tree for long-running agent sessions

### What would you like to be added?

A safety cap on how many conversation messages the CLI keeps in its active rendering tree, so that long-running agent sessions stop degrading in performance and memory use.

When the conversation grows past a threshold (e.g. 200 messages), older messages would be dropped from the live render tree. They would still be visible to the user through the terminal's native scrollback — mouse wheel, trackpad, ⌘↑, Shift+PgUp — only the CLI itself stops tracking them for redraw.

To make this stable in practice, the cap should:

1. Apply only when the message count substantially exceeds the threshold, not after every single new message. Without this hysteresis, the boundary moves on every turn, which causes visible screen jumps.
2. Anchor itself to a specific message at the top of the visible window, not to a count. That way, transient changes that shrink or reshape the conversation list — auto-compaction, merging adjacent tool calls, regrouping — do not cause the boundary to drift. A count-based scheme would shift the visible window every time the list shortens for any reason, even when no new message was added.
3. Treat finalized messages as committed to scrollback. Once a message has finished streaming and its tools have resolved, it should be printed once and never redrawn. Dropping it from the live tree is then invisible to the terminal because the bytes already live in the terminal's history.

Out of scope for this issue: adding an in-app scrollback UI, virtualizing the conversation list, or building a separate transcript view. Those are richer follow-ups that solve different problems and can be layered on top later.

### Why is this needed?

Today, the CLI keeps every message of the conversation in its live rendering tree for the entire session. Each message carries layout state, and the underlying UI framework recomputes layout for the whole tree on every update — every keystroke, every streaming chunk, every tool result.

For a short conversation this is fine. For a long-running agent session — the kind a user starts in the morning and feeds tasks to all day — it isn't. As the session grows:

- **Memory grows linearly.** Each message takes on the order of hundreds of KB once layout state is included. A few thousand messages becomes multi-GB resident memory.
- **Per-frame render cost grows.** Layout is recomputed for the entire tree even when only the bottom row changed. At a few hundred messages, frames take long enough to feel laggy. At a few thousand, the process can no longer keep up with streaming output.
- **Garbage collection becomes pathological.** Frequent allocation in a large heap leads to long stop-the-world pauses. Pauses of 100 ms per frame are perceptible as stuttering; pauses of seconds make the CLI feel hung.
- **The screen buffer grows unbounded.** The terminal is sized to fit every line in the tree. Past a certain point this also pushes the terminal emulator itself into degraded performance.

The symptom users will report is: "after a long session, the CLI gets slow, hangs, or spikes CPU." It is not a bug in any single feature — it is the cumulative cost of treating the whole conversation as a live, redrawable UI region.

A cap is the simplest fix that preserves the existing user experience. Old content stays visible through terminal scrollback, which is the same way users already look at output from earlier commands. The CLI just stops re-rendering it.

### Additional context

**Suggested thresholds.** A cap of around 200 messages with a 50-message hysteresis step before the boundary advances is a reasonable starting point — large enough that the visible window comfortably covers the recent active context, small enough to keep memory and per-frame render cost bounded. Tracking the cap as a pointer to a specific message rather than as a count avoids screen-reset bugs whenever the conversation list changes length without new content arriving (compaction, regrouping). The failure modes seen at large session sizes — multi-GB resident memory, thousands of memory map operations per second, GC death spirals — match the trajectory the qwen CLI is on.

**Trade-offs.**

| Concern | Detail |
|---|---|
| Old messages become non-interactive | Once a message falls out of the cap, the user can still see it in scrollback as rendered text, but cannot click to expand a collapsed tool result, cannot search it via any in-app search, and cannot copy a structured form of it. Acceptable for the inline UI; motivates richer follow-ups for users who need to navigate long history. |
| Depends on terminal scrollback size | Most terminals default to 1,000–10,000 lines. We should document a recommended minimum — e.g. 100,000 lines or "unlimited" — for users running long agent sessions, alongside this feature. |
| Headless renders should opt out | One-shot renders such as exporting a transcript to a file have no scrollback to fall back on, and the memory concern doesn't apply to a single pass. They should bypass the cap. |
| Alt-screen / fullscreen modes | If the CLI ever runs in alternate screen mode (vim/less-style), the terminal has no scrollback for that mode, and a different mechanism is required. The cap as proposed only applies to the default inline mode. |

**Suggested rollout.** Behind a feature flag for the first release, enabled by default for new sessions. The change is low-risk because the dropped content is already on the user's screen — it just isn't in the CLI's tree any more — but a flag lets early users surface any rendering glitches at the boundary without affecting everyone.

**Recommended companion documentation.** A short note in the README or a troubleshooting guide pointing users to their terminal's scrollback setting, with concrete instructions for the common terminals (iTerm2, Windows Terminal, GNOME Terminal, kitty, tmux). Without this, users on default terminal configurations will hit the scrollback limit before they hit the message cap, and the experience will feel worse than it should.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cap message render tree for long-running agent sessions #3702

What would you like to be added?

Why is this needed?

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Concern	Detail
Old messages become non-interactive	Once a message falls out of the cap, the user can still see it in scrollback as rendered text, but cannot click to expand a collapsed tool result, cannot search it via any in-app search, and cannot copy a structured form of it. Acceptable for the inline UI; motivates richer follow-ups for users who need to navigate long history.
Depends on terminal scrollback size	Most terminals default to 1,000–10,000 lines. We should document a recommended minimum — e.g. 100,000 lines or "unlimited" — for users running long agent sessions, alongside this feature.
Headless renders should opt out	One-shot renders such as exporting a transcript to a file have no scrollback to fall back on, and the memory concern doesn't apply to a single pass. They should bypass the cap.
Alt-screen / fullscreen modes	If the CLI ever runs in alternate screen mode (vim/less-style), the terminal has no scrollback for that mode, and a different mechanism is required. The cap as proposed only applies to the default inline mode.

Cap message render tree for long-running agent sessions #3702

Description

What would you like to be added?

Why is this needed?

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions