fix(core): add heap-pressure auto-compaction safety net#4186
Conversation
📋 Review SummaryThis PR introduces a temporary heap-pressure safety net for chat compression, allowing auto-compaction to trigger based on V8 heap usage (≥70% of 🔍 General Feedback
🎯 Specific Feedback🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
|
Additional context on why this PR is intentionally small: Memory/OOM has become an active area of investigation in this repo this week. Several contributors are already working on related but different slices, and I want to make the relationship between those efforts and this PR explicit. Related work already in progressMy own related PRs / prior work:
Other contributors' related work:
Why this PR exists separatelyThe larger memory-related efforts above are valuable, but they are either diagnostic-only, broader in scope, or touch many files and therefore need more review time. Meanwhile, recent reports show users are already hitting V8 heap OOMs in long sessions. Based on this week's investigation and my local reproduction of the heap OOM pattern, the immediate failure mode is that token/context-based compaction can be too late. On large context-window models, especially 1M-context models, the 70% token threshold may not be reached before V8 heap is already near its limit. At that point, extra allocations from history cloning, compression setup, large tool outputs, UI/logging state, stream processing, or error handling can push the process into: This PR is therefore intentionally a temporary mitigation rather than the final memory architecture. What this PR doesThe narrow fix here is:
What this PR does not try to solveThis PR does not implement:
Those should continue in the existing follow-up PRs/issues. The intent of this PR is to provide a quick, low-conflict safety net that reduces current long-session OOM risk while the more complete memory work continues. |
doudouOUC
left a comment
There was a problem hiding this comment.
Code Review
整体评价:PR 的设计思路是正确的——bypassTokenThreshold 与 force=true 的语义区分很清晰,nextSpeakerChecker 的优化也干净无回归。以下是几个需要关注的问题:
🟠 高优先级:hasFailedCompressionAttempt 闩锁使安全网变成一次性机制
位置: chatCompressionService.ts 第 216 行附近的前置门控与新增的 bypassTokenThreshold 的交互
当堆压力触发压缩(bypassTokenThreshold=true)但压缩失败后,hasFailedCompressionAttempt 被置为 true。后续消息中即使堆压力持续升高,服务在前置门控就提前退出(NOOP),永远到不了 token 阈值门控:
// 第 216 行 — hasFailedCompressionAttempt 门控,不受 bypassTokenThreshold 影响
if (threshold <= 0 || (hasFailedCompressionAttempt && !force)) {
return { newHistory: null, info: { ... NOOP } };
}
// 第 234 行 — bypassTokenThreshold 只绕过了这里
if (!force && !bypassTokenThreshold) { ... }影响: 安全网实质上是一次性的——一次压缩失败后就永久失效。如果堆继续增长,OOM 仍会发生。
建议修复:
- 将
bypassTokenThreshold也纳入第 216 行门控:(hasFailedCompressionAttempt && !force && !bypassTokenThreshold) - 或者在堆压力触发时不标记
hasFailedCompressionAttempt = true
🟡 中优先级:isHeapPressureHigh() 做了两次 native 调用
getHeapStatistics() 已经提供了 used_heap_size(与 process.memoryUsage().heapUsed 是同一个 V8 指标)。一次调用即可获取分子分母,避免热路径上多余的同步 native 调用和微小的 TOCTOU 不一致:
private isHeapPressureHigh(): boolean {
const stats = getHeapStatistics();
const heapLimit = stats.heap_size_limit;
if (!Number.isFinite(heapLimit) || heapLimit <= 0) return false;
return stats.used_heap_size / heapLimit >= HEAP_PRESSURE_COMPRESSION_RATIO;
}🟡 中优先级:缺失关键交互场景的测试
当前测试覆盖了 bypassTokenThreshold=true 绕过 token 门控的正常路径,但没有覆盖 hasFailedCompressionAttempt=true && bypassTokenThreshold=true 的场景。建议增加此边界 case 的测试,确认行为是否符合预期。
🔵 低优先级:getLastHistoryEntry() 的 structuredClone
这是对之前全量 clone 的显著改进。但在 nextSpeakerChecker 中实际只检查 role 和是否是 function response(纯只读操作)。如果将来性能分析发现瓶颈,可考虑提供无拷贝的只读版本。当前实现正确且安全,不急需处理。
🤖 Generated with Qwen Code
Let heap-pressure bypass also bypass the failed-attempt latch and cover the interaction with a regression test. Link: QwenLM#4185
Verify next-speaker checks use the raw last history entry for function responses while the LLM prompt still uses curated history. Link: QwenLM#4185
|
Follow-up pushed after the review pass:
I intentionally did not add a Local verification after the latest push:
|
wenshao
left a comment
There was a problem hiding this comment.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
[Critical] Heap-pressure compaction failure permanently disables all subsequent auto-compaction via the shared hasFailedCompressionAttempt latch.
File: packages/core/src/core/geminiChat.ts, line 558 (existing code, not in diff)
When bypassTokenThreshold=true and compression fails (e.g., INFLATED_TOKEN_COUNT), line 558 sets hasFailedCompressionAttempt = true unconditionally (only checking !force). On the next turn, if heap pressure has subsided, bypassTokenThreshold becomes false, and the service-level latch at chatCompressionService.ts:224 blocks normal token-threshold auto-compaction with (hasFailedCompressionAttempt && !force && !bypassTokenThreshold).
This means a failed heap-pressure compaction poisons the primary auto-compaction path for the rest of the chat session. The only recovery is a successful manual /compress.
Suggested fix:
// line 558: change from
if (!force) {
// to
if (!force && !bypassTokenThreshold) {— DeepSeek/deepseek-v4-pro via Qwen Code /review
# Conflicts: # packages/core/src/core/geminiChat.test.ts # packages/core/src/core/geminiChat.ts
wenshao
left a comment
There was a problem hiding this comment.
Reviewed the combined changes (40f9167 + c1c6f70). The latch fix is correct and the new 30-second heap-pressure cooldown addresses the primary concern from the prior review pass. Build passes, all 153 targeted tests pass.
One observation on the reactive overflow handler (geminiChat.ts:892): it still sets hasFailedCompressionAttempt unconditionally, while the heap-pressure path now uses a brief cooldown. Both are automatic safety nets but the latch policies differ. This is currently outside the diff scope and likely intentional (overflow failure is a stronger signal), but a brief comment would help future maintainers distinguish the two paths.
— claude-opus-4-7 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
No new issues found. All prior Critical/Suggestion concerns from the previous review round are addressed. LGTM! ✅ — DeepSeek/deepseek-v4-pro via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
本地拉 b20d99130 验证:
- Critical(latch 污染)已修:
geminiChat.ts:578-583heap-pressure 失败/NOOP 走 30s cooldown 分支,不再设hasFailedCompressionAttempt;只有 L584-591 普通 token-threshold failure 才 latch。chatCompressionService.ts:229多加了!bypassTokenThreshold防御性短路。成功路径 reset 两个状态(L576-577)。 - 测试覆盖:8 条 heap-pressure / cooldown 专项断言(geminiChat.test.ts L3701-L3823),含"失败后不 latch"、"30s 后恢复"、"failed-inflated / noop 两种 cooldown 触发"、"heap stats 不可用回落 token-threshold"。
- 本地结果:tsc 0 错;156/156 单测过(geminiChat 88 + chatCompressionService 56 + nextSpeakerChecker 12);CLI smoke 跑通无回归。
L892 reactive overflow handler 仍然无条件设 latch、跟 heap-pressure 路径不一致——这是 diff 外的既有行为,做个 follow-up 注释即可,不阻塞。
LGTM ✅
tmux 真实交互测试报告(补 11:54Z approve 后追加验证)Head: 驱动方式
实测流程观察
70% heap pressure 真实触发的局限V8 PR 的 heap-pressure 触发分支只在 unit tests 里通过
综合判定
可以直接合入。 |
|
выпускайте в прод на сборку плиз - невозможно работать когда по 5 раз за день вылетает редактор))) |
代码审查(补)PR 已合并且经过了充分的 review,追加几点观察: ✅ 确认项所有前序 review 中发现的问题均已正确修复:
🟢 低优先级观察
📝 注意当前工作树中 总体: PR 设计正确、测试充分(8 条专项断言 + 148 测试过),review 流程完整。无阻塞性问题。 |
The heap-pressure-based auto-compaction mechanism (added in PR #4186) was a temporary mitigation for OOM caused by structuredClone on the hot path. With the shallow copy fix (copyContentContainer) eliminating clone peaks, this safety net is no longer necessary: - 9/9 benchmark cases pass with safety net disabled (ratio=99.0) - Peak RSS stays ≤743 MB, well below 2GB default heap limit - Claude Code uses purely token-based autocompact with no heap-pressure logic Removed: - HEAP_PRESSURE_COMPRESSION_RATIO constant and cooldown - getHeapPressureRatio() method and node:v8 import - bypassTokenThreshold parameter in ChatCompressionService - All related tests
…istoryShallow) Main landed #4286 (replace structuredClone with shallow copy) which: - Reverted #4186's heap-pressure auto-compaction safety net (#4286 removed HEAP_PRESSURE_COMPRESSION_RATIO because the underlying OOM cause was fixed by the shallow-copy refactor) - Reverted #4168's consecutiveFailures ladder back to single-shot hasFailedCompressionAttempt - Introduced getHistoryShallow() / peekLastHistoryEntry() to replace structuredClone-based history access - Added a Chinese-language design doc draft for this exact redesign Resolution strategy: - Take OUR redesign everywhere it conflicts: three-tier threshold ladder, consecutiveFailures circuit breaker, hard-rescue, token estimator, hard-rescue debug log, CompressOptions plumbing for pendingUserMessage / precomputedEffectiveTokens / trigger. - DROP all bypassTokenThreshold / heapPressureCompressionCooldownUntil / HEAP_PRESSURE_* / mockGetHeapStatistics / mockHeapPressure code (heap-pressure mechanism is gone on main; we're not reviving it). - Use main's new getHistoryShallow(true) in chatCompressionService and in the hard-tier rescue estimator path (was getHistory(true) before main's refactor; the shallow path is what other compaction call sites now use). - For chatCompressionService.test.ts inline mockChat objects, alias getHistoryShallow to the same vi.fn() as getHistory so existing .mockReturnValue() calls drive both methods. - For the design doc, keep our resolved Open Question 2 closure rationale and prepend the round-2 blockquote clarifying the Background section describes pre-redesign behavior; take main's slightly more thorough SUMMARY_RESERVE paragraph where it explains both with/without-thinking cases. - Replace the round-2 test that asserted "hard-rescue forwards consecutiveFailures=3" with a test compatible with the post-merge history-access shape (now using getHistoryShallow). 346 core tests passing; CLI typecheck clean for affected files. Pre-existing provider-config typecheck errors from main's #4287 refactor are unrelated to this PR and not touched here.
Summary
GeminiChat.tryCompress(): when V8heapUsed / heap_size_limit >= 70%, the chat attempts auto-compaction even if the token/context threshold has not been reached yet.bypassTokenThresholdtoChatCompressionServiceso heap-pressure compaction can skip only the token gate while preserving automatic compaction semantics. It deliberately does not useforce=true, becauseforce=truecarries manual/compressbehavior.GeminiChat.getLastHistoryEntry()and use it innextSpeakerCheckerto avoid cloning the full comprehensive history just to inspect the last turn.packages/core/src/core/geminiChat.ts: heap-pressure guard and the decision to keepforce=false.packages/core/src/services/chatCompressionService.ts:bypassTokenThresholdskips only the token threshold gate and still respects the existing failed-compression latch.packages/core/src/utils/nextSpeakerChecker.ts: O(1) last-history inspection instead of a full comprehensive-history clone.Validation
force=true.bypassTokenThresholdis set.nextSpeakerCheckercallsgetHistory(true)only once and usesgetLastHistoryEntry()for comprehensive-history last-turn checks.cd packages/core && npm run typecheckpassed.npm run build && npm run typecheckpassed. The build emitted existing non-blocking warnings for stale Browserslist data andpackages/vscode-ide-companion/src/utils/editorGroupUtils.tscurly lint warning, with 0 errors.bypassTokenThreshold, service-level threshold bypass, defensive last-history clone behavior, and avoiding the second full-history clone innextSpeakerChecker.Scope / Risk
force=truespecifically to avoid manual/compresssemantics and to keep the change narrow./doctor memorydiagnostics.bypassTokenThresholdis an internal optional service option.Coordination with related memory work
This PR is intentionally small and transitional because several broader memory efforts are already open or being discussed:
/doctor memorydiagnostics PRs./resumememory behavior.The goal here is to reduce default long-session OOM risk without taking ownership of the full memory-diagnostics/tool-offload redesign.
Testing Matrix
Testing matrix notes:
Linked Issues / Bugs