You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This avoids fixed 2GB thresholds and adapts to 2GB/4GB/8GB heap limits.
Avoid full-history clone in nextSpeakerChecker by adding a small accessor such as getLastHistoryEntry() / peekLastHistoryEntry() and cloning only the last entry if mutation safety is required.
Treat advanced.autoConfigureMemory / --max-old-space-size as mitigation, not root fix. It gives headroom, but does not address retained history/tool-output/compression peak.
Larger follow-ups can separately handle large tool-result offload, stream/logging accumulator reduction, and post-compaction release of old history references.
What happened? / 问题描述
近期多个长会话场景会出现 Node/V8 OOM 崩溃,例如:
从已有报告和本地排查看,这类崩溃通常发生在长会话、大上下文模型、大量 tool output / diff / file read、或
/compress附近。一个典型 GC 片段显示 heap 已接近 4GB limit,Mark-Compact 后只能释放很少内存,说明大量 JS 对象仍然 reachable:目前 compaction 主要基于 token/context threshold 触发。对于 1M context window 的模型,70% token threshold 可能非常晚才触发,但 V8 heap 会先被 conversation history、tool result、UI history、compression 临时拷贝、stream/logging accumulator 等对象撑满。
What did you expect to happen? / 期望行为
长会话在 V8 heap 接近 limit 前应该有轻量 safety net:
heapUsed / heap_size_limit做 heap-pressure guard。structuredClone()。/doctor memory应能辅助判断是 V8 heap pressure、large tool result retention,还是 compression peak。Analysis / 初步原因分析
这看起来不是单一 bug,而是几类 OOM 场景叠加:
token threshold 与 V8 heap pressure 脱节
例如 1M context window 下,70% compaction threshold 约等于 700K tokens。实际 JS 对象表示会有很高放大,包括 nested
Content/Part对象、React/Ink UI history、tool result、logging buffers 等,V8 heap 可能在 token threshold 前先到 2GB/4GB/8GB limit。getHistory()的 full-history clone 放大峰值GeminiChat.getHistory()会structuredClone()整个 history。部分路径只是需要读最后一条消息或只读 history,但仍然触发全量 deep clone。长会话时,这会显著放大瞬时 heap 峰值。一个明显例子是
nextSpeakerChecker同时调用:第二次 comprehensive history clone 主要用于检查最后一条消息,理论上可以用 O(1) accessor 替代。
compression 本身可能制造 peak memory
/compress或 auto-compaction 时,流程会读取 curated history、切分数组、构造 compression request、接收 summary response。若 heap 已经很高,compression 过程本身可能成为 OOM 触发点。large tool output / stream/logging retention
长会话中大 diff、file read、shell output、stream chunks、logging responses 可能长期保留在 live history 或 logging buffers 中。这个方向也与 Diagnose and mitigate large tool-result retention in long sessions #4184 相关。
Related issues / 相关 issue
structuredClone()/ large context window pressure.Ineffective mark-compacts near heap limitreport./resume./doctor memorydiagnostics slices.Related PRs / 相关 PR
v8.getHeapStatistics().heap_size_limit./doctor memorydiagnostics./resumememory behavior.Suggested minimal mitigation / 建议的最小修复方向
A minimal, lower-conflict fix could be:
Add a heap-pressure guard in
GeminiChat.tryCompress()based on:This avoids fixed
2GBthresholds and adapts to 2GB/4GB/8GB heap limits.Avoid full-history clone in
nextSpeakerCheckerby adding a small accessor such asgetLastHistoryEntry()/peekLastHistoryEntry()and cloning only the last entry if mutation safety is required.Treat
advanced.autoConfigureMemory/--max-old-space-sizeas mitigation, not root fix. It gives headroom, but does not address retained history/tool-output/compression peak.Larger follow-ups can separately handle large tool-result offload, stream/logging accumulator reduction, and post-compaction release of old history references.
Client information / 客户端信息
Client Information