Skip to content

OOM in long sessions: V8 heap pressure can exceed limit before token-based compaction runs #4185

@yiliang114

Description

@yiliang114

What happened? / 问题描述

近期多个长会话场景会出现 Node/V8 OOM 崩溃,例如:

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

从已有报告和本地排查看,这类崩溃通常发生在长会话、大上下文模型、大量 tool output / diff / file read、或 /compress 附近。一个典型 GC 片段显示 heap 已接近 4GB limit,Mark-Compact 后只能释放很少内存,说明大量 JS 对象仍然 reachable:

Scavenge ... 4085.8 -> 4082.3 MB allocation failure
Mark-Compact (reduce) ... 4088.3 -> 4079.8 MB
FATAL ERROR: Ineffective mark-compacts near heap limit

目前 compaction 主要基于 token/context threshold 触发。对于 1M context window 的模型,70% token threshold 可能非常晚才触发,但 V8 heap 会先被 conversation history、tool result、UI history、compression 临时拷贝、stream/logging accumulator 等对象撑满。

What did you expect to happen? / 期望行为

长会话在 V8 heap 接近 limit 前应该有轻量 safety net:

  • 不应只依赖 token/context threshold 才触发 compaction。
  • 应该基于 heapUsed / heap_size_limit 做 heap-pressure guard。
  • 高 heap pressure 下应尽量避免额外 full-history structuredClone()
  • 对大 tool result / large command output 应有预算或 offload 策略,避免长期保留在 hot JS heap 中。
  • /doctor memory 应能辅助判断是 V8 heap pressure、large tool result retention,还是 compression peak。

Analysis / 初步原因分析

这看起来不是单一 bug,而是几类 OOM 场景叠加:

  1. token threshold 与 V8 heap pressure 脱节

    例如 1M context window 下,70% compaction threshold 约等于 700K tokens。实际 JS 对象表示会有很高放大,包括 nested Content / Part 对象、React/Ink UI history、tool result、logging buffers 等,V8 heap 可能在 token threshold 前先到 2GB/4GB/8GB limit。

  2. getHistory() 的 full-history clone 放大峰值

    GeminiChat.getHistory()structuredClone() 整个 history。部分路径只是需要读最后一条消息或只读 history,但仍然触发全量 deep clone。长会话时,这会显著放大瞬时 heap 峰值。

    一个明显例子是 nextSpeakerChecker 同时调用:

    chat.getHistory(true)
    chat.getHistory()

    第二次 comprehensive history clone 主要用于检查最后一条消息,理论上可以用 O(1) accessor 替代。

  3. compression 本身可能制造 peak memory

    /compress 或 auto-compaction 时,流程会读取 curated history、切分数组、构造 compression request、接收 summary response。若 heap 已经很高,compression 过程本身可能成为 OOM 触发点。

  4. large tool output / stream/logging retention

    长会话中大 diff、file read、shell output、stream chunks、logging responses 可能长期保留在 live history 或 logging buffers 中。这个方向也与 Diagnose and mitigate large tool-result retention in long sessions #4184 相关。

Related issues / 相关 issue

Related PRs / 相关 PR

Suggested minimal mitigation / 建议的最小修复方向

A minimal, lower-conflict fix could be:

  1. Add a heap-pressure guard in GeminiChat.tryCompress() based on:

    process.memoryUsage().heapUsed / v8.getHeapStatistics().heap_size_limit

    This avoids fixed 2GB thresholds and adapts to 2GB/4GB/8GB heap limits.

  2. Avoid full-history clone in nextSpeakerChecker by adding a small accessor such as getLastHistoryEntry() / peekLastHistoryEntry() and cloning only the last entry if mutation safety is required.

  3. Treat advanced.autoConfigureMemory / --max-old-space-size as mitigation, not root fix. It gives headroom, but does not address retained history/tool-output/compression peak.

Larger follow-ups can separately handle large tool-result offload, stream/logging accumulator reduction, and post-compaction release of old history references.

Client information / 客户端信息

Client Information
Qwen Code v0.15.11
Model: qwen3.6-plus
Fast Model: qwen3-coder-flash
Auth: openai
Platform: darwin arm64 (24.1.0)
Node.js: v22.22.0
Git commit: 782403d71

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions