OOM in long sessions: V8 heap pressure can exceed limit before token-based compaction runs

## What happened? / 问题描述

近期多个长会话场景会出现 Node/V8 OOM 崩溃，例如：

```text
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
```

从已有报告和本地排查看，这类崩溃通常发生在长会话、大上下文模型、大量 tool output / diff / file read、或 `/compress` 附近。一个典型 GC 片段显示 heap 已接近 4GB limit，Mark-Compact 后只能释放很少内存，说明大量 JS 对象仍然 reachable：

```text
Scavenge ... 4085.8 -> 4082.3 MB allocation failure
Mark-Compact (reduce) ... 4088.3 -> 4079.8 MB
FATAL ERROR: Ineffective mark-compacts near heap limit
```

目前 compaction 主要基于 token/context threshold 触发。对于 1M context window 的模型，70% token threshold 可能非常晚才触发，但 V8 heap 会先被 conversation history、tool result、UI history、compression 临时拷贝、stream/logging accumulator 等对象撑满。

## What did you expect to happen? / 期望行为

长会话在 V8 heap 接近 limit 前应该有轻量 safety net：

- 不应只依赖 token/context threshold 才触发 compaction。
- 应该基于 `heapUsed / heap_size_limit` 做 heap-pressure guard。
- 高 heap pressure 下应尽量避免额外 full-history `structuredClone()`。
- 对大 tool result / large command output 应有预算或 offload 策略，避免长期保留在 hot JS heap 中。
- `/doctor memory` 应能辅助判断是 V8 heap pressure、large tool result retention，还是 compression peak。

## Analysis / 初步原因分析

这看起来不是单一 bug，而是几类 OOM 场景叠加：

1. **token threshold 与 V8 heap pressure 脱节**

   例如 1M context window 下，70% compaction threshold 约等于 700K tokens。实际 JS 对象表示会有很高放大，包括 nested `Content` / `Part` 对象、React/Ink UI history、tool result、logging buffers 等，V8 heap 可能在 token threshold 前先到 2GB/4GB/8GB limit。

2. **`getHistory()` 的 full-history clone 放大峰值**

   `GeminiChat.getHistory()` 会 `structuredClone()` 整个 history。部分路径只是需要读最后一条消息或只读 history，但仍然触发全量 deep clone。长会话时，这会显著放大瞬时 heap 峰值。

   一个明显例子是 `nextSpeakerChecker` 同时调用：

   ```ts
   chat.getHistory(true)
   chat.getHistory()
   ```

   第二次 comprehensive history clone 主要用于检查最后一条消息，理论上可以用 O(1) accessor 替代。

3. **compression 本身可能制造 peak memory**

   `/compress` 或 auto-compaction 时，流程会读取 curated history、切分数组、构造 compression request、接收 summary response。若 heap 已经很高，compression 过程本身可能成为 OOM 触发点。

4. **large tool output / stream/logging retention**

   长会话中大 diff、file read、shell output、stream chunks、logging responses 可能长期保留在 live history 或 logging buffers 中。这个方向也与 #4184 相关。

## Related issues / 相关 issue

- #4116 — long-session OOM; analysis points to `structuredClone()` / large context window pressure.
- #4167 — crash during or near compression; useful for compression peak memory analysis.
- #4149 — recent `Ineffective mark-compacts near heap limit` report.
- #4134 — recent OOM report.
- #2868 — fast heap growth OOM.
- #2945 — OOM around `/resume`.
- #2036 — reduce memory usage of long-running tasks.
- #2562 — structuredClone OOM in long sessions.
- #3000 — memory diagnostics parent issue.
- #4179 / #4181 / #4182 — `/doctor memory` diagnostics slices.
- #4184 — large tool-result retention diagnostics / offload direction.

## Related PRs / 相关 PR

- #4127 — adds memory-based compression trigger using fixed heap thresholds. Similar goal, but may need ratio-based threshold using `v8.getHeapStatistics().heap_size_limit`.
- #4168 — redesigns auto-compaction token thresholds with a three-tier ladder. Useful but larger in scope and mostly token/context oriented.
- #4180 / #3785 — `/doctor memory` diagnostics.
- #4097 / #4126 — telemetry/logging span changes; relevant because logging paths may affect retention if enabled.
- #3989 / #4159 / #4174 — resume/session-related changes; relevant to `/resume` memory behavior.

## Suggested minimal mitigation / 建议的最小修复方向

A minimal, lower-conflict fix could be:

1. Add a heap-pressure guard in `GeminiChat.tryCompress()` based on:

   ```ts
   process.memoryUsage().heapUsed / v8.getHeapStatistics().heap_size_limit
   ```

   This avoids fixed `2GB` thresholds and adapts to 2GB/4GB/8GB heap limits.

2. Avoid full-history clone in `nextSpeakerChecker` by adding a small accessor such as `getLastHistoryEntry()` / `peekLastHistoryEntry()` and cloning only the last entry if mutation safety is required.

3. Treat `advanced.autoConfigureMemory` / `--max-old-space-size` as mitigation, not root fix. It gives headroom, but does not address retained history/tool-output/compression peak.

Larger follow-ups can separately handle large tool-result offload, stream/logging accumulator reduction, and post-compaction release of old history references.

## Client information / 客户端信息

<details>
<summary>Client Information</summary>

```console
Qwen Code v0.15.11
Model: qwen3.6-plus
Fast Model: qwen3-coder-flash
Auth: openai
Platform: darwin arm64 (24.1.0)
Node.js: v22.22.0
Git commit: 782403d71
```

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OOM in long sessions: V8 heap pressure can exceed limit before token-based compaction runs #4185

What happened? / 问题描述

What did you expect to happen? / 期望行为

Analysis / 初步原因分析

Related issues / 相关 issue

Related PRs / 相关 PR

Suggested minimal mitigation / 建议的最小修复方向

Client information / 客户端信息

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

OOM in long sessions: V8 heap pressure can exceed limit before token-based compaction runs #4185

Description

What happened? / 问题描述

What did you expect to happen? / 期望行为

Analysis / 初步原因分析

Related issues / 相关 issue

Related PRs / 相关 PR

Suggested minimal mitigation / 建议的最小修复方向

Client information / 客户端信息

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions