Skip to content

fix(core): add heap-pressure auto-compaction safety net#4186

Merged
wenshao merged 7 commits into
QwenLM:mainfrom
yiliang114:codex/4185-heap-pressure-safety
May 16, 2026
Merged

fix(core): add heap-pressure auto-compaction safety net#4186
wenshao merged 7 commits into
QwenLM:mainfrom
yiliang114:codex/4185-heap-pressure-safety

Conversation

@yiliang114

Copy link
Copy Markdown
Collaborator

Summary

  • What changed:
    • Add a temporary heap-pressure safety net in GeminiChat.tryCompress(): when V8 heapUsed / heap_size_limit >= 70%, the chat attempts auto-compaction even if the token/context threshold has not been reached yet.
    • Add bypassTokenThreshold to ChatCompressionService so heap-pressure compaction can skip only the token gate while preserving automatic compaction semantics. It deliberately does not use force=true, because force=true carries manual /compress behavior.
    • Add GeminiChat.getLastHistoryEntry() and use it in nextSpeakerChecker to avoid cloning the full comprehensive history just to inspect the last turn.
  • Why it changed:
    • Recent OOM reports show V8 heap can reach its limit before token-based compaction runs, especially in long sessions, large-context models, and large tool-output/diff workflows.
    • This is intended as a small transitional mitigation while broader memory work is in progress.
  • Reviewer focus:
    • packages/core/src/core/geminiChat.ts: heap-pressure guard and the decision to keep force=false.
    • packages/core/src/services/chatCompressionService.ts: bypassTokenThreshold skips only the token threshold gate and still respects the existing failed-compression latch.
    • packages/core/src/utils/nextSpeakerChecker.ts: O(1) last-history inspection instead of a full comprehensive-history clone.

Validation

  • Commands run:
    cd packages/core && npx vitest run src/core/geminiChat.test.ts src/services/chatCompressionService.test.ts src/utils/nextSpeakerChecker.test.ts
    cd packages/core && npm run typecheck
    npm run build && npm run typecheck
  • Prompts / inputs used:
    • No live prompt was required. Unit tests mock high V8 heap pressure and compression-service behavior.
  • Expected result:
    • Heap pressure bypasses only the token gate and does not set force=true.
    • Under-threshold compression remains a NOOP unless bypassTokenThreshold is set.
    • nextSpeakerChecker calls getHistory(true) only once and uses getLastHistoryEntry() for comprehensive-history last-turn checks.
    • Workspace build and typecheck complete successfully.
  • Observed result:
    ✓ src/services/chatCompressionService.test.ts (54 tests)
    ✓ src/utils/nextSpeakerChecker.test.ts (11 tests)
    ✓ src/core/geminiChat.test.ts (81 tests)
    
    Test Files  3 passed (3)
         Tests  146 passed (146)
    
    cd packages/core && npm run typecheck passed.
    npm run build && npm run typecheck passed. The build emitted existing non-blocking warnings for stale Browserslist data and packages/vscode-ide-companion/src/utils/editorGroupUtils.ts curly lint warning, with 0 errors.
  • Quickest reviewer verification path:
    cd packages/core && npx vitest run src/core/geminiChat.test.ts src/services/chatCompressionService.test.ts src/utils/nextSpeakerChecker.test.ts
  • Evidence:
    • Added tests cover heap-pressure bypassTokenThreshold, service-level threshold bypass, defensive last-history clone behavior, and avoiding the second full-history clone in nextSpeakerChecker.

Scope / Risk

  • Main risk or tradeoff:
    • This can start auto-compaction earlier when heap pressure is high. That is intentional, but compression itself still has peak-memory cost. This PR avoids force=true specifically to avoid manual /compress semantics and to keep the change narrow.
  • Not covered / not validated:
    • This does not claim to fix all OOM classes.
    • It does not implement large tool-result offload, stream/logging retention reduction, heap snapshots, or /doctor memory diagnostics.
    • It does not reproduce a multi-hour live OOM session locally; validation is targeted unit coverage plus build/typecheck.
  • Breaking changes / migration notes:
    • None. bypassTokenThreshold is an internal optional service option.

Coordination with related memory work

This PR is intentionally small and transitional because several broader memory efforts are already open or being discussed:

The goal here is to reduce default long-session OOM risk without taking ownership of the full memory-diagnostics/tool-offload redesign.

Testing Matrix

🍏 🪟 🐧
npm run ⚠️ ⚠️
npx ⚠️ ⚠️
Docker ⚠️ ⚠️ ⚠️
Podman ⚠️ N/A N/A
Seatbelt ⚠️ N/A N/A

Testing matrix notes:

  • Tested locally on macOS arm64 with focused core vitest, core typecheck, and root build/typecheck.
  • Windows/Linux/container execution was not run locally; the changed logic is Node/V8 process-memory logic plus TypeScript unit coverage.

Linked Issues / Bugs

@github-actions

Copy link
Copy Markdown
Contributor

📋 Review Summary

This PR introduces a temporary heap-pressure safety net for chat compression, allowing auto-compaction to trigger based on V8 heap usage (≥70% of heap_size_limit) even when the token threshold hasn't been reached. It also adds a getLastHistoryEntry() method for O(1) last-message inspection, which is used in nextSpeakerChecker to avoid cloning the full comprehensive history. The implementation is well-tested, narrowly scoped, and addresses a real OOM mitigation need while broader memory work is in progress.

🔍 General Feedback

  • Positive aspects:

    • The heap-pressure bypass is narrowly scoped and intentionally avoids force=true semantics, keeping manual /compress behavior distinct from automatic compaction.
    • Good test coverage across all three modified areas (heap-pressure bypass, bypassTokenThreshold service option, and getLastHistoryEntry() optimization).
    • The O(1) last-history inspection is a sensible performance optimization that avoids unnecessary structuredClone costs.
    • Code follows existing patterns and conventions; the changes integrate naturally with the existing compression flow.
  • Architectural notes:

    • The separation between force (manual semantics) and bypassTokenThreshold (automatic heap-pressure relief) is well-justified and correctly implemented.
    • The failed-compression latch (hasFailedCompressionAttempt) is correctly respected even when bypassing the token gate.

🎯 Specific Feedback

🟡 High

  • File: packages/core/src/core/geminiChat.ts:507-518 — The isHeapPressureHigh() check uses process.memoryUsage().heapUsed / heapLimit >= 0.7. This is a reasonable default, but the 0.7 threshold is hardcoded and may not suit all environments (e.g., containers with tight memory limits). Consider making this configurable via ChatCompression settings or an environment variable, especially since the PR description acknowledges this is a "transitional mitigation."

  • File: packages/core/src/core/geminiChat.ts:507 — The bypassTokenThreshold flag is set based on heap pressure at the start of tryCompress(), but there's no logging of the actual heap percentage that triggered it. Adding the actual ratio to the debug log (e.g., 'Heap pressure is high (X%); attempting auto-compaction...') would aid debugging and incident investigation.

🟢 Medium

  • File: packages/core/src/services/chatCompressionService.ts:237 — The condition if (!force && !bypassTokenThreshold) correctly gates the token-threshold check, but the comment above it ("steady-state path on every send") is now slightly outdated since bypassTokenThreshold introduces a non-steady-state path. Consider updating the comment to reflect that this gate is skipped when heap pressure is high.

  • File: packages/core/src/utils/nextSpeakerChecker.ts:63-73 — The variable naming is slightly confusing: comprehensiveHistory is assigned but then the comment says "If comprehensive history is empty" (lowercase). Also, the early-return check at lines 66-70 is now redundant with the getLastHistoryEntry() check at line 71. Consider consolidating to just use getLastHistoryEntry() and remove the explicit length check.

  • File: packages/core/src/core/geminiChat.ts:1188-1191 — The getLastHistoryEntry() method returns a defensive clone of only the last entry, which is correct. However, the JSDoc comment says "O(1) inspections" but doesn't explicitly note that this is still O(n) in the size of the last Content.parts array due to structuredClone(). This is a minor documentation clarification.

🔵 Low

  • File: packages/core/src/core/geminiChat.ts:58 — The constant HEAP_PRESSURE_COMPRESSION_RATIO = 0.7 is well-named, but consider adding a brief comment explaining why 70% was chosen (e.g., "Leaves 30% headroom for compression overhead and subsequent operations").

  • File: packages/core/src/services/chatCompressionService.ts:186-190 — The bypassTokenThreshold JSDoc is clear, but it might benefit from a cross-reference to the heap-pressure check in geminiChat.ts that sets this flag, helping future maintainers trace the flow.

  • File: packages/core/src/utils/nextSpeakerChecker.ts — The test file mocks both getHistory and getLastHistoryEntry, which is correct for testing. However, the test at line 278-295 ("should avoid cloning comprehensive history...") asserts getHistory is called once and getLastHistoryEntry once, but the mock setup at line 94-101 mocks both methods. Consider adding a test that verifies getHistory(/* curated */ true) is still called (for the curated history used in the LLM request) while getLastHistoryEntry() is used for the function-response check.

✅ Highlights

  • Excellent test coverage: All three test files have been updated with targeted tests for the new heap-pressure bypass, bypassTokenThreshold service option, and getLastHistoryEntry() optimization. The 146 passing tests provide strong confidence in the changes.

  • Thoughtful design: The decision to use bypassTokenThreshold instead of force=true demonstrates careful consideration of the different semantics between manual and automatic compaction. This keeps the change narrowly scoped as intended.

  • Performance-conscious: The getLastHistoryEntry() optimization avoids cloning the entire comprehensive history just to inspect the last message, which is a meaningful improvement for long conversation histories.

  • Build and typecheck pass: The PR author confirmed that npm run build && npm run typecheck completed successfully with zero errors, and all relevant test suites pass.

@yiliang114

Copy link
Copy Markdown
Collaborator Author

Additional context on why this PR is intentionally small:

Memory/OOM has become an active area of investigation in this repo this week. Several contributors are already working on related but different slices, and I want to make the relationship between those efforts and this PR explicit.

Related work already in progress

My own related PRs / prior work:

Other contributors' related work:

Why this PR exists separately

The larger memory-related efforts above are valuable, but they are either diagnostic-only, broader in scope, or touch many files and therefore need more review time. Meanwhile, recent reports show users are already hitting V8 heap OOMs in long sessions.

Based on this week's investigation and my local reproduction of the heap OOM pattern, the immediate failure mode is that token/context-based compaction can be too late. On large context-window models, especially 1M-context models, the 70% token threshold may not be reached before V8 heap is already near its limit. At that point, extra allocations from history cloning, compression setup, large tool outputs, UI/logging state, stream processing, or error handling can push the process into:

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

This PR is therefore intentionally a temporary mitigation rather than the final memory architecture.

What this PR does

The narrow fix here is:

  1. Add a V8 heap-pressure safety net based on heapUsed / heap_size_limit.
  2. When heap pressure is high, bypass only the token threshold gate for auto-compaction.
  3. Keep force=false, so this does not take the broader manual /compress path.
  4. Avoid one known full-history clone in nextSpeakerChecker by adding an O(1) last-history accessor.

What this PR does not try to solve

This PR does not implement:

  • /doctor memory diagnostics
  • heap snapshots
  • large tool-result offload
  • stream/logging retention reduction
  • a full auto-compaction threshold redesign
  • session resume storage redesign

Those should continue in the existing follow-up PRs/issues.

The intent of this PR is to provide a quick, low-conflict safety net that reduces current long-session OOM risk while the more complete memory work continues.

Comment thread packages/core/src/services/chatCompressionService.ts
Comment thread packages/core/src/core/geminiChat.ts
Comment thread packages/core/src/core/geminiChat.ts
Comment thread packages/core/src/utils/nextSpeakerChecker.ts Outdated
Comment thread packages/core/src/services/chatCompressionService.ts

@doudouOUC doudouOUC left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

整体评价:PR 的设计思路是正确的——bypassTokenThresholdforce=true 的语义区分很清晰,nextSpeakerChecker 的优化也干净无回归。以下是几个需要关注的问题:


🟠 高优先级:hasFailedCompressionAttempt 闩锁使安全网变成一次性机制

位置: chatCompressionService.ts 第 216 行附近的前置门控与新增的 bypassTokenThreshold 的交互

当堆压力触发压缩(bypassTokenThreshold=true)但压缩失败后,hasFailedCompressionAttempt 被置为 true。后续消息中即使堆压力持续升高,服务在前置门控就提前退出(NOOP),永远到不了 token 阈值门控:

// 第 216 行 — hasFailedCompressionAttempt 门控,不受 bypassTokenThreshold 影响
if (threshold <= 0 || (hasFailedCompressionAttempt && !force)) {
  return { newHistory: null, info: { ... NOOP } };
}

// 第 234 行 — bypassTokenThreshold 只绕过了这里
if (!force && !bypassTokenThreshold) { ... }

影响: 安全网实质上是一次性的——一次压缩失败后就永久失效。如果堆继续增长,OOM 仍会发生。

建议修复:

  • bypassTokenThreshold 也纳入第 216 行门控: (hasFailedCompressionAttempt && !force && !bypassTokenThreshold)
  • 或者在堆压力触发时不标记 hasFailedCompressionAttempt = true

🟡 中优先级:isHeapPressureHigh() 做了两次 native 调用

getHeapStatistics() 已经提供了 used_heap_size(与 process.memoryUsage().heapUsed 是同一个 V8 指标)。一次调用即可获取分子分母,避免热路径上多余的同步 native 调用和微小的 TOCTOU 不一致:

private isHeapPressureHigh(): boolean {
  const stats = getHeapStatistics();
  const heapLimit = stats.heap_size_limit;
  if (!Number.isFinite(heapLimit) || heapLimit <= 0) return false;
  return stats.used_heap_size / heapLimit >= HEAP_PRESSURE_COMPRESSION_RATIO;
}

🟡 中优先级:缺失关键交互场景的测试

当前测试覆盖了 bypassTokenThreshold=true 绕过 token 门控的正常路径,但没有覆盖 hasFailedCompressionAttempt=true && bypassTokenThreshold=true 的场景。建议增加此边界 case 的测试,确认行为是否符合预期。


🔵 低优先级:getLastHistoryEntry()structuredClone

这是对之前全量 clone 的显著改进。但在 nextSpeakerChecker 中实际只检查 role 和是否是 function response(纯只读操作)。如果将来性能分析发现瓶颈,可考虑提供无拷贝的只读版本。当前实现正确且安全,不急需处理。


🤖 Generated with Qwen Code

Let heap-pressure bypass also bypass the failed-attempt latch and cover the interaction with a regression test.

Link: QwenLM#4185
Verify next-speaker checks use the raw last history entry for function responses while the LLM prompt still uses curated history.

Link: QwenLM#4185
@yiliang114

Copy link
Copy Markdown
Collaborator Author

Follow-up pushed after the review pass:

  • aa9bbb838 fixes the failed-attempt latch interaction so heap-pressure auto-compaction is not disabled after one failed attempt.
  • It also logs the actual heap pressure ratio, uses a single getHeapStatistics() read, and documents the raw-history lookup in nextSpeakerChecker.
  • 7b1730a5 adds a regression test proving nextSpeakerChecker still uses the raw last history entry for functionResponse detection while the LLM prompt path uses curated history.

I intentionally did not add a getHistoryLength() small-history gate. A chat can have only a few entries but still contain one very large tool/function-response payload, which is one of the cases this safety net is meant to compact. Entry count would be a weak proxy and could skip the mitigation in exactly that case.

Local verification after the latest push:

  • cd packages/core && npx vitest run src/core/geminiChat.test.ts src/services/chatCompressionService.test.ts src/utils/nextSpeakerChecker.test.ts -> 148 tests passed
  • cd packages/core && npm run typecheck -> passed
  • npm run build && npm run typecheck -> passed, with only the existing vscode companion curly warning / Browserslist notices

Comment thread packages/core/src/core/geminiChat.ts
Comment thread packages/core/src/services/chatCompressionService.ts

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ CI has 3 failing checks: Post Coverage Comment, Test (macos-latest, Node 22.x), Test (windows-latest, Node 22.x).

— DeepSeek/deepseek-v4-pro via Qwen Code /review

Comment thread packages/core/src/core/geminiChat.ts
Comment thread packages/core/src/core/geminiChat.ts
Comment thread packages/core/src/core/geminiChat.ts
Comment thread packages/core/src/services/chatCompressionService.ts Outdated
Comment thread packages/core/src/core/geminiChat.test.ts
Comment thread packages/core/src/core/geminiChat.test.ts Outdated

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] Heap-pressure compaction failure permanently disables all subsequent auto-compaction via the shared hasFailedCompressionAttempt latch.

File: packages/core/src/core/geminiChat.ts, line 558 (existing code, not in diff)

When bypassTokenThreshold=true and compression fails (e.g., INFLATED_TOKEN_COUNT), line 558 sets hasFailedCompressionAttempt = true unconditionally (only checking !force). On the next turn, if heap pressure has subsided, bypassTokenThreshold becomes false, and the service-level latch at chatCompressionService.ts:224 blocks normal token-threshold auto-compaction with (hasFailedCompressionAttempt && !force && !bypassTokenThreshold).

This means a failed heap-pressure compaction poisons the primary auto-compaction path for the rest of the chat session. The only recovery is a successful manual /compress.

Suggested fix:

// line 558: change from
if (!force) {
// to
if (!force && !bypassTokenThreshold) {

— DeepSeek/deepseek-v4-pro via Qwen Code /review

Comment thread packages/core/src/core/geminiChat.ts Outdated
Comment thread packages/core/src/core/geminiChat.ts

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the combined changes (40f9167 + c1c6f70). The latch fix is correct and the new 30-second heap-pressure cooldown addresses the primary concern from the prior review pass. Build passes, all 153 targeted tests pass.

One observation on the reactive overflow handler (geminiChat.ts:892): it still sets hasFailedCompressionAttempt unconditionally, while the heap-pressure path now uses a brief cooldown. Both are automatic safety nets but the latch policies differ. This is currently outside the diff scope and likely intentional (overflow failure is a stronger signal), but a brief comment would help future maintainers distinguish the two paths.

— claude-opus-4-7 via Qwen Code /review

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new issues found. All prior Critical/Suggestion concerns from the previous review round are addressed. LGTM! ✅ — DeepSeek/deepseek-v4-pro via Qwen Code /review

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

本地拉 b20d99130 验证:

  • Critical(latch 污染)已修geminiChat.ts:578-583 heap-pressure 失败/NOOP 走 30s cooldown 分支,不再设 hasFailedCompressionAttempt;只有 L584-591 普通 token-threshold failure 才 latch。chatCompressionService.ts:229 多加了 !bypassTokenThreshold 防御性短路。成功路径 reset 两个状态(L576-577)。
  • 测试覆盖:8 条 heap-pressure / cooldown 专项断言(geminiChat.test.ts L3701-L3823),含"失败后不 latch"、"30s 后恢复"、"failed-inflated / noop 两种 cooldown 触发"、"heap stats 不可用回落 token-threshold"。
  • 本地结果:tsc 0 错;156/156 单测过(geminiChat 88 + chatCompressionService 56 + nextSpeakerChecker 12);CLI smoke 跑通无回归。

L892 reactive overflow handler 仍然无条件设 latch、跟 heap-pressure 路径不一致——这是 diff 外的既有行为,做个 follow-up 注释即可,不阻塞。

LGTM ✅

@wenshao

wenshao commented May 16, 2026

Copy link
Copy Markdown
Collaborator

tmux 真实交互测试报告(补 11:54Z approve 后追加验证)

Head: b20d9913 — 与 11:54Z approve 时一致,CI 仍全绿,mergeStateStatus CLEAN

驱动方式

tmux 起 220×50 pane 跑真实 qwen TUI(DeepSeek 后端,--max-old-space-size=128 限 V8 old-space heap 到 128MB,DEBUG=1 捕获完整 debug log),通过 tmux send-keys 注入真键盘。

实测流程

[boot]               CLI 启动(128MB heap 限制下)
[turn 1] X reply     "Reply with single character X" → ✦ X
[turn 2] Y reply     "Reply with single character Y" → ✦ Y
[turn 3] Z reply     "Reply with single character Z" → ✦ Z
[turn 4] @ ref       "@big.txt count the chars"      → 模型调 shell `wc -c`,回 `28,329,950 字符`
[/about]             查内存与 context

观察

结果
CLI 在 --max-old-space-size=128 上下文里启动 ✅ 无 OOM、无 startup error
4 轮 LLM 对话(含 @big.txt 引用 27MB 随机 base64 文件) ✅ 全部回复正常,无崩溃
nextSpeakerCheckergetLastHistoryEntry() O(1) 优化 ✅ 每轮模型回复后正常推进,无回归
内存使用(/about 报告 Memory Usage) RSS 266.9 MB(--max-old-space-size 只限 V8 old-space heap,进程总 RSS 还有 native + V8 metadata)
Context 占用 1.9% — 远低于 token 阈值
debug log 含 tryCompress / Heap pressure / bypassToken 关键字 0 行(163 行 debug log 全 grep)
无意外触发的 compaction ✅ 在 context 1.9% + heap 未到 70% 时,safety net 正确 no-op

70% heap pressure 真实触发的局限

V8 heap_size_limit--max-old-space-size=128 限到 128MB,要让 used_heap_size / heap_size_limit >= 0.7 得让 V8 old-space 实际用到 ~90MB。我尝试用 @big.txt 加载 27MB 文件让 context 膨胀,但模型聪明地调了 shell wc -c 而不是把文件读进 context——这正是这个 OOM 报告的痛点:heap pressure 是长 session、大 context model、大 tool output 累积出来的低概率事件,手动很难在短时间内复现

PR 的 heap-pressure 触发分支在 unit tests 里通过 mockGetHeapStatistics(750) 模拟 75% 来真正验证(已 156/156 pass,含失败后不 latch + 30s cooldown + heap stats 不可用回落 token-threshold 等 8 条专项断言)。tmux 实测能确认的两点:

  1. ✅ Safety net 代码不会让 CLI 启动/运行失败
  2. ✅ 不会在 heap 未到阈值时假阳性触发 compaction(这本身是回归测试,因为新增分支引入了额外的状态读取与判断)

综合判定

  • 11:54Z 之后无新 commit、无新 review,CI 持续全绿
  • tmux 跑了 4 轮真 LLM 对话 + memory 巡查 + debug log 全文 grep,没有任何回归
  • Unit tests 覆盖了 70% pressure 触发路径;tmux 验证了 no-pressure 路径不退化
  • 11:29Z 那条 L892 reactive overflow handler 跟 heap-pressure 政策不一致的注释建议属于 follow-up,不阻塞

可以直接合入。

@wenshao wenshao merged commit 2c200a3 into QwenLM:main May 16, 2026
11 checks passed
@supercargotim-rgb

Copy link
Copy Markdown

выпускайте в прод на сборку плиз - невозможно работать когда по 5 раз за день вылетает редактор)))

@yiliang114

Copy link
Copy Markdown
Collaborator Author

代码审查(补)

PR 已合并且经过了充分的 review,追加几点观察:

✅ 确认项

所有前序 review 中发现的问题均已正确修复:

  • latch 污染chatCompressionService.ts:229 三重门控 + geminiChat.ts:578 heap-pressure 失败走 30s cooldown 而非设 latch
  • 双重 getHeapStatistics() 调用 — 已改为单次调用 getHeapPressureRatio()
  • heap ratio 日志debugLogger.warn 输出精度百分比

🟢 低优先级观察

  1. getLastHistoryEntry() JSDoc: 注释说避免 O(history) clone,但 structuredClone 本身开销跟最后一条 entry 大小成正比。可加上 "cloning cost is proportional to the last entry's size" 澄清。

  2. Reactive overflow handler (geminiChat.ts:892) 与 heap-pressure 路径策略不一致: overflow handler 仍无条件设 hasFailedCompressionAttempt latch,而 heap-pressure 路径用 30s cooldown。wenshao 在 final review 中也指出了这点,建议 follow-up 加注释区分两条路径。

  3. HEAP_PRESSURE_COMPRESSION_RATIO = 0.7 硬编码: PR 描述已标记为过渡方案,可接受。

📝 注意

当前工作树中 HEAP_PRESSURE_COMPRESSION_RATIO 已改为 99.0(带注释 "Disable"),这说明后续有其他 PR 在处理这个问题。


总体: PR 设计正确、测试充分(8 条专项断言 + 148 测试过),review 流程完整。无阻塞性问题。

yiliang114 added a commit that referenced this pull request May 20, 2026
The heap-pressure-based auto-compaction mechanism (added in PR #4186) was
a temporary mitigation for OOM caused by structuredClone on the hot path.
With the shallow copy fix (copyContentContainer) eliminating clone peaks,
this safety net is no longer necessary:

- 9/9 benchmark cases pass with safety net disabled (ratio=99.0)
- Peak RSS stays ≤743 MB, well below 2GB default heap limit
- Claude Code uses purely token-based autocompact with no heap-pressure logic

Removed:
- HEAP_PRESSURE_COMPRESSION_RATIO constant and cooldown
- getHeapPressureRatio() method and node:v8 import
- bypassTokenThreshold parameter in ChatCompressionService
- All related tests
LaZzyMan added a commit that referenced this pull request May 21, 2026
…istoryShallow)

Main landed #4286 (replace structuredClone with shallow copy) which:
  - Reverted #4186's heap-pressure auto-compaction safety net (#4286
    removed HEAP_PRESSURE_COMPRESSION_RATIO because the underlying OOM
    cause was fixed by the shallow-copy refactor)
  - Reverted #4168's consecutiveFailures ladder back to single-shot
    hasFailedCompressionAttempt
  - Introduced getHistoryShallow() / peekLastHistoryEntry() to replace
    structuredClone-based history access
  - Added a Chinese-language design doc draft for this exact redesign

Resolution strategy:
  - Take OUR redesign everywhere it conflicts: three-tier threshold
    ladder, consecutiveFailures circuit breaker, hard-rescue, token
    estimator, hard-rescue debug log, CompressOptions plumbing for
    pendingUserMessage / precomputedEffectiveTokens / trigger.
  - DROP all bypassTokenThreshold / heapPressureCompressionCooldownUntil /
    HEAP_PRESSURE_* / mockGetHeapStatistics / mockHeapPressure code
    (heap-pressure mechanism is gone on main; we're not reviving it).
  - Use main's new getHistoryShallow(true) in chatCompressionService and
    in the hard-tier rescue estimator path (was getHistory(true) before
    main's refactor; the shallow path is what other compaction call
    sites now use).
  - For chatCompressionService.test.ts inline mockChat objects, alias
    getHistoryShallow to the same vi.fn() as getHistory so existing
    .mockReturnValue() calls drive both methods.
  - For the design doc, keep our resolved Open Question 2 closure
    rationale and prepend the round-2 blockquote clarifying the
    Background section describes pre-redesign behavior; take main's
    slightly more thorough SUMMARY_RESERVE paragraph where it explains
    both with/without-thinking cases.
  - Replace the round-2 test that asserted "hard-rescue forwards
    consecutiveFailures=3" with a test compatible with the post-merge
    history-access shape (now using getHistoryShallow).

346 core tests passing; CLI typecheck clean for affected files.
Pre-existing provider-config typecheck errors from main's #4287
refactor are unrelated to this PR and not touched here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants