fix(core): replace structuredClone with shallow copy to prevent OOM in long sessions#4286
Conversation
📋 Review SummaryThis PR adds two documentation files: a memory benchmark report and an investigation plan for Qwen Code's runtime memory usage. The benchmark report presents well-structured evidence showing Qwen Code uses 2.3x-3.6x more memory than Claude Code across multiple workloads. The investigation plan appropriately defers root-cause claims and proposes a diagnostics-first approach. Overall, this is a solid evidence-gathering PR that sets up future optimization work without making premature conclusions. 🔍 General Feedback
🎯 Specific Feedback🟢 Medium
🔵 Low
✅ Highlights
|
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
Maintainer summary:runtime memory 调查进展当前先把 runtime memory 调查的进展、已有证据、初步推断和后续分析路径同步出来,方便大家 review 和对齐优化方向。 详细文档:
不同任务下的内存、token、tool call 分布
下面是按两个测试模型平均后的结果。可以看到,不同任务类型下 Qwen Code 的 process-tree RSS 峰值都明显高于 Claude Code。
按模型聚合后的结果:
当前初步推断从目前数据看,这个问题不像是单一大 PR、单一模型、或者单纯 tool call 次数导致的。 当前更可能的方向是:
相关已有工作目前已有一些相关 PR / issue,但它们覆盖的方向不完全一样:
后续分析方式下一步我会新开一个本地分支,把当前分析需要的指标能力先从相关 PR / issue 方向拉过来,或者在本地补齐最小必要打点,然后重新跑同一套 benchmark matrix。 重点不是先做优化,而是先把
复测时会继续覆盖相同类型任务:
这样可以进一步判断
有了这些内部指标后,再决定第一个 targeted memory optimization PR 应该优先解决哪一块。 |
OOM 根因定位总结OOM 的三层机制每一层都是必要条件,三层叠加才触发 OOM:
精确崩溃路径(代码级)版本归因
崩溃流程图(最坏路径:一次 send 4 次 clone)v0.15.6: 最多 1 次 clone/send → v0.15.11: 最坏 4 次 clone/send
|
| 时间 (UTC) | 事件 | Heap 占比 | 解读 |
|---|---|---|---|
| 13:29:43 | auto-compaction 尝试 #1 | 74.9% | 超过 70% 阈值,开始压缩 |
| 13:30:06 | compaction #1 成功 | ~70% | structuredClone 完成,旧 history 被替换 |
| 13:30:13 | auto-compaction 尝试 #2 | 70.7% | 压完仍 >70%,立即再次尝试 |
| 13:30:52 | 跳过(cooldown 中) | 86.0% | 30s cooldown 保护,但 heap 已飙升 |
| 13:30:56 | auto-compaction 尝试 #3 | 85.3% | cooldown 过期,强制再压 |
| 13:31:21 | compaction #3 成功 | ~85% | clone 峰值进一步推高 heap |
| 13:31:37 | auto-compaction 尝试 #4 | 88.8% | 压完反而更高! |
| 13:32:09 | 跳过(cooldown 中) | 90.2% | heap 已达 90%,无法执行任何操作 |
| 13:32:10 | 进程 crash | >95% | 下一次 structuredClone 超限,V8 OOM |
5.5 分钟内 5 次 auto-compaction 尝试,heap 从 74.9% 单调上升至 crash。每次"成功"压缩后 heap 反而更高。
2 GiB / 4 GiB Synthetic 复现
| Heap limit | Clone pressure | 结果 | GC stack |
|---|---|---|---|
| 2 GiB | 8 retained clones | 未崩溃 (RSS 2.42 GiB) | 接近 limit |
| 2 GiB | 10 retained clones | OOM | StructuredClone in stack |
| 4 GiB | 20 retained clones | OOM | StructuredClone in stack |
直接证明在用户真实 OOM 规模 (2-4 GiB) 下,structuredClone 路径同样致命。
为什么 128K context window 模型更容易触发
| Context Window | 70% 触发阈值 | Compaction 频率 | OOM 风险 |
|---|---|---|---|
| 128K (默认, DeepSeek, qwen3.6-plus) | ~90K tokens | 频繁(正常对话 10-20 分钟触发) | 高 |
| 200K (claude-sonnet) | ~140K tokens | 中等 | 中 |
| 1M (qwen-latest-series-invite-beta) | ~700K tokens | 极少触发 | 低 |
DeepSeek 等第三方模型未配置 contextWindowSize,默认 128K,compaction 触发极为频繁,OOM 报告因此更多。
各内存位置占比(基于 crash session 估算)
| 内存位置 | 占比 | 增长特征 |
|---|---|---|
this._history[] (tool results 累积) |
40-50% | 线性增长,每轮 +30~100MB |
structuredClone() 临时拷贝 |
30-40% | 瞬时峰值,compaction 时出现 |
| V8 runtime (GC metadata, compiled code) | ~15% | 基本恒定 |
| UI / logging / stream buffers | ~5% | 缓慢增长 |
结论
#3735 (v0.15.7) 是 OOM 报告激增的根本原因——把 structuredClone 从"偶尔调用"变成"每次 send 必调",在 history 较大时创造了正反馈死循环。#3879 (v0.15.10) 进一步恶化。
修复方向:避免在 compaction 检查中做全量 clone —— 先用 getHistoryLength() 判断是否需要压缩,不满足则跳过 getHistory(true);压缩时使用 slice 而非全量 deep clone。
详细报告
- OOM 复现报告 — 完整复现步骤、crash 日志、版本归因、修复验证
- Runtime Diagnostics Benchmark — 默认 heap 下 process-tree RSS 对比测试
- Auto-Compaction 阈值重设计方案 — RSS-aware 分级压缩策略提案
61843ea to
94873d8
Compare
🧪 Shallow Copy Fix — 多模型 PR Review 内存基准测试 (2026-05-20)本次在 测试条件
结果汇总
关键结论
对比:修复前 vs 修复后
DeepSeek RSS 时间序列(5s 采样) |
wenshao
left a comment
There was a problem hiding this comment.
[Critical] [build] Build break: packages/cli/src/ui/commands/doctorCommand.test.ts mocks (lines 145, 841, 948, 974) are not updated for the new MemoryResourceUsage fields (maxRSSRaw: number, maxRSSUnit: 'KiB') and MemoryDiagnostics field (processTree: ProcessTreeMemoryUsage | null) added in memoryDiagnostics.ts. This causes 4 TypeScript errors and breaks CI on all 3 platforms.
Fix: add the missing fields to each mock, e.g.:
resourceUsage: {
maxRSS: 4_000,
maxRSSRaw: 4_000,
maxRSSUnit: 'KiB',
userCPUTime: 10,
systemCPUTime: 20,
},
processTree: null,
wenshao
left a comment
There was a problem hiding this comment.
Test coverage gaps (aggregated): Several new code paths lack dedicated test coverage — parsePsRows / BFS traversal in collectProcessTreeMemoryUsage, runtimeDiagnostics disabled-state early returns and reset(), copyContentForApiHistory functionCall branch mutation isolation, agent truncation helper edge cases (result/error fields, non-string output), and the five new GeminiClient wrapper methods. Consider adding focused unit tests for these paths.
— qwen-latest-series-invite-beta-v34 via Qwen Code /review
There was a problem hiding this comment.
Pull request overview
This PR targets long-session OOM risk in packages/core by eliminating repeated full-history structuredClone() calls on hot paths, introducing shallow history read APIs, and adding opt-in runtime/request-size diagnostics to support memory attribution.
Changes:
- Replace full-history deep clones in request/compression/read paths with shallow container copies and new history “tail/peek” helpers.
- Add runtime diagnostics collectors (request/tool size summaries) and extend
/doctor memorydata with process-tree RSS probing. - Reduce live agent UI retention by storing bounded tool-result display strings instead of full
responseParts.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/vscode-ide-companion/src/utils/editorGroupUtils.ts | Brace-style tweak (lint compliance). |
| packages/vscode-ide-companion/eslint.config.mjs | Allow listed deep-import into core internals. |
| packages/core/src/utils/runtimeDiagnostics.ts | New opt-in runtime/request/tool sizing diagnostics. |
| packages/core/src/utils/runtimeDiagnostics.test.ts | Unit tests for diagnostics privacy + aggregation. |
| packages/core/src/utils/nextSpeakerChecker.ts | Use last-history access/tail instead of cloning full history. |
| packages/core/src/utils/nextSpeakerChecker.test.ts | Tests ensuring only last curated message is sent. |
| packages/core/src/utils/memoryDiagnostics.ts | Add process-tree RSS probe; normalize maxRSS units. |
| packages/core/src/utils/memoryDiagnostics.test.ts | Update tests for maxRSS normalization + processTree probe. |
| packages/core/src/tools/agent/agent.ts | Bound tool-result display; use shallow history API when available. |
| packages/core/src/tools/agent/agent.test.ts | Tests that live display doesn’t retain full responseParts. |
| packages/core/src/services/sessionService.ts | Replace structuredClone with targeted shallow copies for resume history rebuild. |
| packages/core/src/services/sessionService.test.ts | Ensure no structuredClone used; validate shallow-copy behavior. |
| packages/core/src/services/chatCompressionService.ts | Use shallow curated history to avoid deep-clone peak during compression. |
| packages/core/src/services/chatCompressionService.test.ts | Add coverage for “no deep clone during compression”. |
| packages/core/src/index.ts | Export runtimeDiagnostics utilities. |
| packages/core/src/core/openaiContentGenerator/pipeline.ts | Record OpenAI wire request summaries via runtimeDiagnostics. |
| packages/core/src/core/loggingContentGenerator/loggingContentGenerator.ts | Record genai request summaries via runtimeDiagnostics. |
| packages/core/src/core/geminiChat.ts | Add shallow history APIs + request-history builder; reduce deep-clone usage on send path. |
| packages/core/src/core/geminiChat.test.ts | Add coverage for request-history avoiding structuredClone; shallow helper tests. |
| packages/core/src/core/client.ts | Add shallow history accessors + last-message helpers to reduce clone pressure. |
| packages/core/src/core/anthropicContentGenerator/anthropicContentGenerator.ts | Record Anthropic wire request summaries via runtimeDiagnostics. |
| packages/cli/src/ui/hooks/useAtCompletion.test.ts | Relax ordering assertions; expand fixture coverage. |
| packages/cli/src/ui/commands/doctorCommand.test.ts | Update expected memory diagnostics shape (maxRSS + processTree). |
| eslint.config.js | Add import/no-internal-modules allowlist for vscode companion. |
| docs/plans/2026-05-18-qwen-runtime-memory-investigation.md | Add investigation plan doc. |
| docs/e2e-tests/2026-05-19-qwen-runtime-diagnostics-benchmark-report.md | Add diagnostics benchmark report doc. |
| docs/e2e-tests/2026-05-19-oom-reproduction-report.md | Add OOM reproduction/report doc. |
| docs/e2e-tests/2026-05-18-qwen-memory-benchmark-report.md | Add memory benchmark report doc. |
| docs/design/auto-compaction-threshold-redesign.md | Add design doc (context for related compaction work). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…osal - Runtime memory investigation plan - Non-interactive memory benchmark report - OOM reproduction report with 2GiB/4GiB synthetic tests - Runtime diagnostics benchmark report - Auto-compaction threshold redesign proposal
Replace `structuredClone(this.history)` (called up to 4x per turn on the
send path) with a lightweight shallow copy via `copyContentContainer()`.
This eliminates the OOM root cause in long tool-heavy sessions where the
full deep clone exceeded remaining V8 heap headroom.
Key changes:
- Add `copyContentContainer()` helper ({...content, parts: [...parts]})
- Add `getRequestHistory()` private method for the send path
- Add `getHistoryShallow()`, `getHistoryTailShallow()`,
`peekLastHistoryEntry()`, `getLastModelMessageText()`,
`getHistoryLength()` for read-only callers
- Remove HEAP_PRESSURE_COMPRESSION_RATIO safety net (no longer needed
now that the underlying OOM cause is fixed)
- Update chatCompressionService to use getHistoryShallow(true)
- Update nextSpeakerChecker to send only lastMessage (not full history)
- Update memoryDiagnostics with process-tree RSS measurement
…ation Required by content generators (anthropic, openai, logging) which import runtimeDiagnostics for optional heap-pressure telemetry during streaming. Gated by QWEN_CODE_PROFILE_RUNTIME=1 environment variable.
…nterface Add missing maxRSSRaw, maxRSSUnit, and processTree fields to test fixtures to match the updated MemoryResourceUsage and MemoryDiagnostics interfaces.
5f5c79f to
25712fd
Compare
…ccuracy
Code:
- Fix unsound type guard: `'text' in part` → `typeof part.text === 'string'`
in geminiChat.ts and client.ts (Copilot + wenshao feedback)
- Remove unnecessary optional chaining and dead fallback chains in client.ts
(getHistoryShallow, peekLastHistoryEntry, getHistoryLength, etc. now call
GeminiChat methods directly)
- Add 5s timeout to `execFileAsync('ps', ...)` in memoryDiagnostics.ts
Docs:
- Fix GiB conversion accuracy and add single-run caveat to summary
- Add Node.js version to test environment table
- Fix auto-compaction attempt count (5→4) in OOM report
- Soften root-cause attribution certainty
- Add MCP child process context to investigation plan
- Clarify "Codex" reference (→ OpenAI Codex)
- Fix truncated MCP server name (chrome → chrome-devtools)
- Remove duplicate verification commands in benchmark table
- Clarify thread exhaustion vs V8 heap OOM distinction
- Add workload confound caveat to before/after comparison
- Fix SUMMARY_RESERVE "hard relationship" vs thinking budget contradiction
The previous commit removed optional chaining from client.ts wrapper methods, but client.test.ts mocks getChat() with partial objects that lack the new shallow methods. Restore ?. fallback chains so both production (GeminiChat) and test (mock) paths work correctly.
Local verification report (maintainer)Reviewed PR head Automated checks
Note on flaky suite-mode failures. The timed-out files are
Confirmed via Code review notesWalked the hot-path changes; nothing alarming.
Memory smoke test (tmux
|
…istoryShallow) Main landed #4286 (replace structuredClone with shallow copy) which: - Reverted #4186's heap-pressure auto-compaction safety net (#4286 removed HEAP_PRESSURE_COMPRESSION_RATIO because the underlying OOM cause was fixed by the shallow-copy refactor) - Reverted #4168's consecutiveFailures ladder back to single-shot hasFailedCompressionAttempt - Introduced getHistoryShallow() / peekLastHistoryEntry() to replace structuredClone-based history access - Added a Chinese-language design doc draft for this exact redesign Resolution strategy: - Take OUR redesign everywhere it conflicts: three-tier threshold ladder, consecutiveFailures circuit breaker, hard-rescue, token estimator, hard-rescue debug log, CompressOptions plumbing for pendingUserMessage / precomputedEffectiveTokens / trigger. - DROP all bypassTokenThreshold / heapPressureCompressionCooldownUntil / HEAP_PRESSURE_* / mockGetHeapStatistics / mockHeapPressure code (heap-pressure mechanism is gone on main; we're not reviving it). - Use main's new getHistoryShallow(true) in chatCompressionService and in the hard-tier rescue estimator path (was getHistory(true) before main's refactor; the shallow path is what other compaction call sites now use). - For chatCompressionService.test.ts inline mockChat objects, alias getHistoryShallow to the same vi.fn() as getHistory so existing .mockReturnValue() calls drive both methods. - For the design doc, keep our resolved Open Question 2 closure rationale and prepend the round-2 blockquote clarifying the Background section describes pre-redesign behavior; take main's slightly more thorough SUMMARY_RESERVE paragraph where it explains both with/without-thinking cases. - Replace the round-2 test that asserted "hard-rescue forwards consecutiveFailures=3" with a test compatible with the post-merge history-access shape (now using getHistoryShallow). 346 core tests passing; CLI typecheck clean for affected files. Pre-existing provider-config typecheck errors from main's #4287 refactor are unrelated to this PR and not touched here.
OOM / memory issue cross-reference after #4286This PR is the central fix for the long-session V8 heap OOM path caused by repeated full-history Primary symptoms expected to improve:
Related reports to retest on Mixed or possibly different memory paths: Separate follow-ups: Please do not treat this comment as a blanket closure signal. It is intended as a central cross-reference for triage and future duplicate detection: if a new report matches the first symptom group, ask the reporter to upgrade to |
Summary
What changed:
copyContentContainer/getRequestHistory) 替代structuredClone(this.history)热路径调用,消除长 session 中的内存克隆峰值getHistoryShallow()、getHistoryTailShallow()、peekLastHistoryEntry()、getLastModelMessageText()方法供内部读路径使用runtimeDiagnostics工具用于 heap/memory instrumentationWhy it changed:
structuredClone(this.history)最多 4 次。当 session context 填充 ≥70% 时,瞬态克隆超出 V8 heap headroom,导致长时间交互 session OOM crashReviewer focus:
geminiChat.ts:copyContentContainer()只做 spread + parts array spread,是否足够防止 caller mutation 影响 historyclient.ts: 新增的 shallow API fallback 链是否正确Validation
本地 3 模型 × 3 PR 规模的交互式 TUI benchmark(MCP 启用,heap-pressure bypass/cooldown 已移除):
9/9 全部通过,peak RSS ≤743 MB,远低于 2GB limit,无 OOM。
Test plan
npm run buildnpm run typechecknpm run lintcd packages/core && npx vitest run src/core/geminiChat.test.ts src/services/chatCompressionService.test.ts src/services/sessionService.test.ts src/utils/memoryDiagnostics.test.ts src/utils/nextSpeakerChecker.test.ts src/utils/runtimeDiagnostics.test.ts src/utils/forkedAgent.cache.test.tscd packages/cli && npx vitest run src/ui/commands/doctorCommand.test.ts