Skip to content

fix(core): prevent OOM by compacting API history, UI history, and triggering under memory pressure#4824

Merged
wenshao merged 31 commits into
QwenLM:mainfrom
zzhenyao:fix/microcompact-oom-V2
Jun 8, 2026
Merged

fix(core): prevent OOM by compacting API history, UI history, and triggering under memory pressure#4824
wenshao merged 31 commits into
QwenLM:mainfrom
zzhenyao:fix/microcompact-oom-V2

Conversation

@zzhenyao

@zzhenyao zzhenyao commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

What this PR does

Fixed: #4815

Three targeted fixes to prevent old-space exhaustion during long-running sessions, plus hardening from review feedback:

  1. Run microcompaction on Hook messages (goal-mode continuation)microcompactHistory was gated behind UserQuery || Cron message types. Goal-mode loops use SendMessageType.Hook, so tool outputs accumulated indefinitely without ever being truncated. Moving the call outside the type guard lets it run for all non-ToolResult message types (avoids O(history) overhead per tool call).

  2. Add compact_history step to moderate memory-pressure cleanup — When V8 heap usage reaches the hard threshold (65 % of heap limit), the MemoryPressureMonitor now calls microcompactHistory with a forced zero-minute idle threshold, replacing old tool-result content with [Old tool result content cleared]. This frees heap memory directly, unlike the existing evict_cold_cache step which only touches the FileReadCache. Compaction errors are caught and logged without rethrowing, so subsequent cleanup steps (e.g. trigger_gc) still execute.

  3. Compact UI history at dynamic V8 heap threshold — The previous two fixes only address the API-side history (GeminiChat.history). The CLI-side UI history (HistoryItem[]) — which stores thinking content (gemini_thought_content, 74 % of items in production) and tool output (tool_group.resultDisplay) — was never cleaned up. A new compactOldItems() method in useHistoryManager removes old thinking items (keeps last 20) and replaces old tool resultDisplay with [Old tool result content cleared]. Thresholds (MEMORY_UI_COMPACT_THRESHOLD, MEMORY_PHYSICAL_DELETE_THRESHOLD) are computed dynamically from V8's heap_size_limit on each check instead of at module load, since V8 grows the limit at runtime.

Hardening (from review feedback)

  • Debug logging: debugLogger.debug() arguments are guarded with isEnabled() hot-path checks — string concatenation, process.memoryUsage(), and .toFixed() no longer execute when no debug session is active.
  • Memory thresholds: MEMORY_WARNING_THRESHOLD uses min(7 GB, 85% of system RAM) to prevent OOM on low-RAM machines. UI compact and physical-delete thresholds are functions (not stale module-load constants) so they track V8's dynamically growing heap_size_limit.
  • Error resilience: compact_history cleanup step catches exceptions and logs them without rethrowing, allowing subsequent steps (clear_file_cache, trigger_gc) to still run.
  • O(1) history length: resetChat uses getHistoryLength() instead of getHistory().length to avoid a structuredClone allocation spike under heap pressure.
  • Test coverage: Added tests for SendMessageType.Cron (triggers microcompaction) and SendMessageType.Retry (skips), compact_history step paths (success, exception, client-not-initialized), and UI compaction with mixed-type history.

Why it's needed

Long-running sessions (goal mode or sustained interactive use) accumulate large amounts of data in both API history and UI history. The API side had partial coverage (microcompaction for tool outputs, tryCompress for full compaction), but the UI side had zero cleanup.

Production evidence: A goal-mode session accumulated 2,778 UI history items in ~1 hour, with 74 % being gemini_thought_content (thinking chunks). Despite each item being only ~235 bytes of text, V8 heap grew from 106 MB to 8 GB due to GC inefficiency at scale — late-session items cost ~2.8 MB each vs ~0.8 MB early on. The session OOM'd at the 8 GB V8 heap limit.

The root cause: gemini_thought_content items are synthetic (designed to be disposable) but were never disposed. Tool output in tool_group.resultDisplay (file diffs, shell output, grep results) also accumulated indefinitely on the UI side.

Reviewer Test Plan

Evidence (Before & After)

Before:

  • microcompactHistory only runs for UserQuery and Cron messages — goal-mode Hook messages are excluded
  • MemoryPressureMonitor moderate/aggressive cleanup only clears FileReadCache — history is never compacted under memory pressure
  • UI history (HistoryItem[]) has no compaction — thinking items and tool outputs accumulate for the entire session lifetime
  • Thresholds computed once at module load — stale on machines with different V8 heap limits
  • Debug log arguments eagerly evaluated even when no session is active

After:

  • microcompactHistory runs for all message types except ToolResult
  • At 65 % V8 heap usage, old API-side tool results are replaced with placeholders, freeing heap memory
  • At dynamic threshold (65 % of V8 heap_size_limit), UI history is compacted: old thinking items removed (last 20 kept), old tool resultDisplay cleared
  • Thresholds recomputed per check — adapt to V8's growing heap_size_limit
  • Debug log arguments guarded with isEnabled() — zero cost when inactive
  • Recent tool results (configurable via toolResultsNumToKeep, default 5) are preserved on the API side; recent 20 thinking items preserved on the UI side

How to Verify

  1. Build and run

    npm run build && npm run bundle
    npm start
  2. Start a goal-mode session and let it run for 30+ minutes with tool-heavy operations (file reads, shell commands). Monitor process.memoryUsage().heapUsed — it should plateau or decrease after microcompaction kicks in, instead of growing unbounded.

  3. In an interactive session, trigger moderate memory pressure (e.g., read many large files). Verify the debug log contains:

    • [COMPACT_HISTORY] cleared N tool result(s) — API-side compaction
    • [UI_COMPACT] heapUsed=...MB exceeds ...MB threshold, compacting UI history — UI-side compaction
    • [COMPACT_UI_HISTORY] removed N thought item(s), compacted M tool group(s) — UI compaction results
  4. Regression: scroll back through conversation history (Ctrl+O detailed mode) — recent tool outputs and thinking should be intact; older tool outputs will show [Old tool result content cleared] placeholders, and old thinking items will be removed. Verify the model can still re-read files and execute commands normally.

中文

这个 PR 做了什么

长时间运行的会话(goal mode 或持续交互)会不断积累 API 历史和 UI 历史,最终导致 V8 堆内存耗尽(OOM)。这个 PR 做了三件事来防住这个问题:

  1. Hook 消息也跑 microcompaction — 之前只有 UserQueryCron 类型的消息会触发历史压缩,goal mode 用的 Hook 消息被漏掉了,工具输出会无限堆积。现在除了 ToolResult(避免每次工具调用都遍历整个历史),其他消息类型都会触发。

  2. 内存压力时主动压缩 API 历史 — V8 堆使用达到 65% 阈值时,MemoryPressureMonitor 会调用 microcompactHistory,把旧的工具调用结果替换成占位符,直接释放堆内存。压缩过程中的异常会被 catch 并记录日志,不会中断后续的清理步骤(比如 trigger_gc)。

  3. UI 历史也做压缩useHistoryManager 新增 compactOldItems(),清理旧的 thinking 条目(保留最近 20 条)和旧的工具输出。阈值改为每次检测时动态计算(V8 的 heap_size_limit 会随运行时增长),不再是模块加载时算一次的死值。

其他改进

  • debug 日志参数加了 isEnabled() 守卫,没有活跃 session 时不会做字符串拼接和 process.memoryUsage() 调用
  • MEMORY_WARNING_THRESHOLD 改为 min(7GB, 85% 系统内存),低内存机器上不会设得太高
  • resetChat 用 O(1) 的 getHistoryLength() 代替 getHistory().length,避免堆压力大时的 structuredClone 开销
  • 补了 Cron/Retry 测试、compact_history 各路径的测试、混合类型 UI 压缩测试

为什么需要这个

线上实测:一个 goal mode 跑了约 1 小时,积累 2778 条 UI 历史,其中 74% 是 gemini_thought_content。虽然每条只有 ~235 字节文本,但 V8 堆从 106 MB 涨到 8 GB 后 OOM。后期每条条目实际占用 ~2.8 MB(前期 ~0.8 MB),因为 V8 GC 在大量小对象场景下效率急剧下降。

根本原因:thinking 条目本身就是设计来用完即弃的,但从来没被弃过。工具输出(文件 diff、shell 输出、grep 结果)在 UI 侧也是只增不减。

怎么验证

  1. npm run build && npm run bundle && npm start
  2. 跑一个 goal mode 会话,30 分钟以上,多做文件读写和 shell 操作。观察 process.memoryUsage().heapUsed — 压缩生效后应该趋于缓慢增长。
  3. 交互模式下触发中等内存压力(读大量文件),看 debug 日志里有没有 [COMPACT_HISTORY][UI_COMPACT] 输出。
  4. 回归检查:翻看历史记录,最近的工具输出和 thinking 应该还在,旧的会被清理或替换成占位符。模型重新读文件、执行命令不受影响。

zzhenyao added 5 commits June 6, 2026 08:30
…l mode

microcompactHistory was gated behind UserQuery || Cron message types. Goal-mode loops use SendMessageType.Hook, so tool outputs accumulated indefinitely without ever being truncated, causing old_space OOM.

Moving the call outside the type guard lets it run for all message types including Hook, ensuring tool results are compacted based on the idle time threshold regardless of message type.
When V8 heap usage reaches the hard threshold (65% of heap limit), the MemoryPressureMonitor now calls microcompactHistory with a forced zero-minute idle threshold, replacing old tool-result content with [Old tool result content cleared].

This frees heap memory directly, unlike the existing evict_cold_cache step which only touches the FileReadCache. The compact_history step is added to the hard threshold cleanup pipeline, preserving recent tool results (configurable via toolResultsNumToKeep, default 5).
@zzhenyao

zzhenyao commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

@yiliang114 If I can get through a 10-hour goal session without V8 heap OOM, then we can revisit the threshold. The 5GB heapUsed threshold was chosen for my 8GB heap setup, for the default V8 heap limit, I'm not sure what that value is. Once we confirm the fix works, the threshold can be tuned accordingly.

Comment thread packages/cli/src/ui/hooks/useHistoryManager.ts
Comment thread packages/core/src/services/memoryPressureMonitor.ts
Comment thread packages/cli/src/ui/hooks/useHistoryManager.ts
Comment thread packages/cli/src/ui/hooks/useMemoryMonitor.ts
Comment thread packages/cli/src/ui/hooks/useGeminiStream.ts Outdated
zzhenyao and others added 5 commits June 7, 2026 06:32
…adCache after compact_history

Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com>
…recency guard for tool_group

Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com>
…nders and allow repeat triggers

Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com>
…educe V8 heap pressure

Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com>
@zzhenyao

zzhenyao commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the review! @wenshao All R1 comments addressed:

  • Fixed compactOldItems kept oldest 20 thoughts instead of newest: count total thoughts first, drop from the front, keep the tail — old code incremented thoughtKept from index 0 so the first 20 (oldest) passed the filter
  • Added keep-recent-20 guard for tool_group compaction: same pattern as thoughts, count total tool_groups then skip the last N from being compacted
  • Fixed fileReadCache not disarmed after compact_history: added clear_file_cache after compact_history in hard pressure steps so stale FileReadEntry can't return file_unchanged for blanked content (Assistant is forced to re-read files it already read, after the session has been idle #4239)
  • Fixed uiCompacted one-shot flag: let uiCompacted = falseuseRef(0) with 5-minute cooldown — the old let resets on re-render and never re-fires, so UI compaction was one-shot per session
  • Removed process.memoryUsage() from THOUGHT_MERGE, THOUGHT_BUFFER, ADD_ITEM hot paths: each call does a V8 heap walk + string alloc, firing on every streamed chunk — kept the 30s periodic monitor
  • Fixed TS errors in test: HistoryItemWithoutId import moved to ../types.js, added type assertion for resultDisplay object access
  • Added 6 unit tests for compactOldItems: >20 thoughts → last 20 kept, ≤20 untouched, string resultDisplay cleared, fileDiff blanked, empty history same ref, >20 tool_groups → last 20 untouched

The original version hit OOM after 2h20m, UI compaction only fired once due to the one-shot uiCompacted flag, then memory kept climbing to 8GB. I've removed that restriction and applied the R1 fixes above; long-running validation is in progress.

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test coverage gaps — the three new OOM-prevention mechanisms lack dedicated tests:

  1. useMemoryMonitor: no test that compactOldItems is called when heapUsed exceeds 5 GB, no test for the 5-minute cooldown, no test for the interaction between the warning interval and the compaction check (see finding below about clearInterval killing both).
  2. memoryPressureMonitor.compact_history: no test for the new cleanup step (client-not-initialized, successful compaction, nothing-to-compact, exception paths).
  3. client.ts: no test that SendMessageType.Hook triggers microcompactHistory — the primary fix for goal-mode OOM.

— qwen3.7-max via Qwen Code /review

Comment thread packages/cli/src/ui/hooks/useMemoryMonitor.ts
Comment thread packages/core/src/core/client.ts
Comment thread packages/cli/src/ui/hooks/useHistoryManager.ts
Comment thread packages/core/src/services/memoryPressureMonitor.ts
zzhenyao and others added 7 commits June 7, 2026 12:00
…terval

Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com>
…ToolResult to avoid O(history) per tool call

Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com>
…instead of blanking fields

Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com>
…ure steps, fix async test drain

Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com>
… compact_history step, Hook/ToolResult gating)

Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com>
@zzhenyao

zzhenyao commented Jun 7, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the review! @wenshao All R2 comments addressed:

  1. clearInterval(warningIntervalId) was also killing the UI compaction check since they shared the same interval. Moved compaction to the debug interval (30s, never cleared).

  2. microcompactHistory was running on every message type. Added the guard back, only UserQuery/Cron/Hook. Running it on ToolResult is pointless and adds O(history) per tool call.

  3. fileDiff compaction was zeroing individual fields (fileDiff: '', originalContent: null) which renders an empty diff panel. Now replaces the whole resultDisplay with [Old tool result content cleared].

  4. Critical pressure tier was missing compact_history. If heap jumps past 80% in one interval we skip straight to critical and miss the most useful cleanup step. Added evict_cold_cache + compact_history to critical.

  5. Added 11 tests: useMemoryMonitor (compaction threshold, cooldown, warning self-destruct), memoryPressureMonitor compact_history step (null client, empty history, exception), client.ts (Hook triggers microcompact, ToolResult doesn't).

One thing from a 2h test run under the R1 version: compaction was working, UI fired 4 times, API fired 90 times, objects were being freed. But V8 GC wasn't reclaiming the space fast enough, heap kept climbing from 5.4GB to 8GB. Turns out we were only cleaning up but not forcing GC. Set QWEN_MEMORY_ENABLE_GC=1 to trigger global.gc() under critical pressure, testing again now with the R2 fixes.

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All R1/R2 issues from prior rounds have been addressed. Build passes, 226 tests pass, tsc/eslint clean.

Needs Human Review (low-confidence findings — terminal only):

  1. memoryPressureMonitor.ts:519compact_history calls microcompactHistory + setHistory but relies on the subsequent clear_file_cache step to blanket-wipe the fileReadCache. Works today due to step ordering, but the coupling is implicit and fragile.
  2. useMemoryMonitor.ts:15MEMORY_UI_COMPACT_THRESHOLD is a fixed 5 GB absolute value. The default V8 heap limit is ~4 GB; on machines where the CLI relaunches with less than 10 GB heap, this threshold is unreachable. The server-side monitor uses ratio-based thresholds (0.5/0.65/0.8) — consider consistency.
  3. Test coverage gaps: compactOldItems lacks tests for mixed-type history (interleaved thoughts + tool_groups), non-compactable resultDisplay types (TodoResultDisplay, AnsiOutputDisplay), and gemini_thought type (only gemini_thought_content tested). shouldCompact gate lacks Retry/Notification exclusion tests. compact_history step never exercises the meta-present path (setHistory call unverified).

— qwen3.7-max via Qwen Code /review

Comment thread packages/cli/src/ui/hooks/useHistoryManager.ts
zzhenyao and others added 2 commits June 7, 2026 18:27
… microcompact stats bug, and decouple fileReadCache clearing

- Broaden hasOldOutput check to `!= null` instead of only string/fileDiff
- Fix toolsKept/mediaKept counting already-cleared items as kept
- Add explicit fileReadCache.clear() in compact_history step
- Optimize useMemoryMonitor to reuse Date.now() call
- Add test for mixed-type history (interleaved thoughts + tool_groups)
- Add test for gemini_thought type (not just gemini_thought_content)
- Add test for non-string resultDisplay types (TodoResultDisplay, AnsiOutputDisplay, AgentResultDisplay)
- Add test for null resultDisplay (should not compact)
- Add test for non-compactable types (Retry, Notification, user, gemini)

Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com>
@zzhenyao

zzhenyao commented Jun 7, 2026

Copy link
Copy Markdown
Contributor Author

@wenshao Appreciate the follow-up review! R3 feedback addressed:

  1. memoryPressureMonitor.ts:519 Added explicit fileReadCache.clear() in the compact_history step so it no longer depends on the ordering of the subsequent clear_file_cache step.

  2. useMemoryMonitor.ts:15 Changed MEMORY_UI_COMPACT_THRESHOLD from a hardcoded 5GB to dynamic calculation: Math.floor(v8.getHeapStatistics().heap_size_limit * 0.65).

  3. useHistoryManager.ts Simplified the hasOldOutput check from only matching string and fileDiff to != null, covering all resultDisplay variants (AnsiOutputDisplay, AgentResultDisplay, TodoResultDisplay, etc.).

  4. microcompact.ts Fixed toolsKept/mediaKept stats bug that was counting already-cleared items as kept.

  5. Added 5 new tests for compactOldItems:

    • Mixed-type history (interleaved thoughts + tool_groups)
    • gemini_thought type (not just gemini_thought_content)
    • Non-string resultDisplay types (TodoResultDisplay, AnsiOutputDisplay, AgentResultDisplay)
    • Null resultDisplay (should not compact)
    • Non-compactable types (Retry, Notification, user, gemini)

All of the above strategies help slow down heap growth. From our test runs, the compaction triggered multiple times and reduced heap by 40 to 150MB each time. RSS stays high because V8 doesn't immediately return memory to the OS, but the heap pressure is clearly relieved.


…ING_THRESHOLD

Hardcoded "10.50 GB" failed on macOS CI where system RAM differs, causing MEMORY_WARNING_THRESHOLD to be 85% of RAM instead of 7 GB. Compute the expected text from the actual threshold value.
@zzhenyao

zzhenyao commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the review! @wenshao R6/R7 feedback addressed:

  1. useMemoryMonitor.ts MEMORY_UI_COMPACT_THRESHOLD from stale constant → function, recomputed per check against V8's dynamic heap_size_limit.
  2. memoryPressureMonitor.ts Added debug log when compact_history skips (client not initialized). Caught compaction errors without rethrow — subsequent cleanup steps like trigger_gc still run.
  3. client.test.ts Added Cron (triggers microcompaction) and Retry (skips) tests.
  4. memoryPressureMonitor.test.ts Added compact_history tests — successful compaction with fileReadCache clearing, exception handling paths.

@zzhenyao zzhenyao requested a review from wenshao June 8, 2026 04:24
@zzhenyao zzhenyao marked this pull request as ready for review June 8, 2026 04:39
Comment thread packages/cli/src/ui/hooks/useMemoryMonitor.ts Outdated
Comment thread packages/core/src/core/client.ts Outdated
Comment thread packages/cli/src/ui/hooks/useHistoryManager.ts Outdated

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R8 review at 3a98d88 (qwen3.7-max --comment). tsc/eslint clean, 423 tests pass, CI all_pass. 0 Critical, 2 Suggestion inline, 2 low-confidence terminal.

Additional finding (overlap with existing R7 comment at memoryPressureMonitor.ts:519):
The toolResultsThresholdMinutes override logic (positive→0, negative→pass-through, undefined→0) has no dedicated test verifying each branch. The existing compact_history happy-path test exercises the default value (5→0), but doesn't test the explicit -1 pass-through or a positive-value override. Consider adding test cases for: (a) toolResultsThresholdMinutes: 60 → overridden to 0, (b) toolResultsThresholdMinutes: -1 → preserved as -1 (compaction skipped), (c) toolResultsThresholdMinutes: undefined → defaults to 0.

— qwen3.7-max via Qwen Code /review

Comment thread packages/core/src/services/memoryPressureMonitor.ts
Comment thread packages/cli/src/ui/hooks/useGeminiStream.ts Outdated
zzhenyao and others added 3 commits June 8, 2026 13:26
- Wrap compactOldItems() in try/catch inside setInterval callback to prevent uncaughtException from crashing the CLI.
- Wrap microcompactHistory() in try/catch inside sendMessageStream so a compaction failure degrades gracefully instead of aborting the agent loop (critical for goal-mode Hook messages).
- Add UI_COMPACT_CLEARED_MESSAGE guard to hasOldOutput check in compactOldItems, preventing spurious re-renders when re-compacting already-cleared tool groups.

Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com>
Thought subjects can contain sensitive inferences about the user's
codebase. Log only the length to avoid persisting verbatim content
to debug log files that may be attached to bug reports.

Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com>
Add clearContextOnIdle.toolResultsThresholdMinutes to createMockConfig
type definition to fix TS2353, and add 3 tests covering the override
logic (positive→0, negative preserved, undefined→0).
@wenshao

wenshao commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Local Verification Report

PR: #4824 fix/microcompact-oom-V2main
Title: fix(core): prevent OOM by compacting API history, UI history, and triggering under memory pressure
Scope: 15 files, +1437/−76 (core: client, memoryPressureMonitor, microcompact, debugLogger; CLI: useHistoryManager, useMemoryMonitor, useGeminiStream, clearCommand, layout tests)

Test Results

Check Result Details
Core tests (full) ✅ 10062/10067 passed 1 pre-existing failure, 4 skipped
CLI PR-specific tests ✅ 35/35 passed useHistoryManager (19), useMemoryMonitor (7), DefaultAppLayout (5), ScreenReaderAppLayout (4)
Core typecheck ✅ 0 errors Clean
CLI typecheck ✅ 0 new errors 113 total — all pre-existing (main drift)
ESLint ✅ Clean --max-warnings 0 on all 8 changed source files
Whitespace ✅ Clean git diff --check passed

Pre-existing Failure (not from this PR)

  • anthropicContentGenerator.test.ts:184 — User-Agent header mismatch (QwenCode vs claude-cli). Confirmed pre-existing on main.

Verdict

✅ PASS — All PR-specific tests pass. No new test failures, no new typecheck errors, no lint issues.


Verified locally by wenshao

Comment thread packages/core/src/core/client.ts
Comment thread packages/cli/src/ui/hooks/useMemoryMonitor.ts
zzhenyao and others added 2 commits June 8, 2026 14:59
…mpaction try/catch

verify that microcompactHistory() and compactOldItems() exceptions are caught and logged without crashing the host loop.

- client.test.ts: mock microcompactHistory to throw, verify debugLogger.error is called and sendMessage completes normally.
- useMemoryMonitor.test.ts: mock compactOldItems to throw on first call, verify error is logged and subsequent interval ticks still trigger compaction after cooldown.

Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com>
@zzhenyao

zzhenyao commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the review and approval! @wenshao

Summary of commits since last round:

  • a54fa93 (R8): try/catch guards for compactOldItems() and microcompactHistory(), idempotent UI compaction, toolResultsThresholdMinutes tests, thought-subject log sanitization.
  • 628d229 (R9): error-path test coverage for the two try/catch blocks above.
  • 7f081bc: merge latest main.

@tanzhenxin tanzhenxin added the type/bug Something isn't working as expected label Jun 8, 2026
Comment thread packages/cli/src/ui/hooks/useHistoryManager.ts
Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com>
@zzhenyao

zzhenyao commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

@wenshao Appreciate the thorough review! R10 feedback addressed:

useHistoryManager.ts Fixed totalToolGroups to count only tool groups with real output (excluding those already compacted with UI_COMPACT_CLEARED_MESSAGE).

// Should not compact non-compactable types
expect(result.current.history).toBe(before);
});
});

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The compactOldItems counting fix (commit 595701096) correctly aligns totalToolGroupsWithOutput with the map-phase predicate, but no regression test covers the specific scenario it fixes — a history containing already-compacted tool groups (where resultDisplay === UI_COMPACT_CLEARED_MESSAGE).

Without this test, a future refactor that drops the !== UI_COMPACT_CLEARED_MESSAGE guard would silently re-introduce the over-compaction bug from R10.

Suggested test:

it('should not re-compact already-compacted tool groups (idempotent)', () => {
  // 15 already-compacted + 15 fresh tool_groups = 30 total
  // → totalToolGroupsWithOutput = 15, toolGroupsToCompact = 0
  // → no additional compaction occurs
  // Call compactOldItems() a second time → same reference returned (no-op)
});

Two additional untested scenarios: (1) all tool groups already compacted, (2) tool group with mixed output (some tools real, some cleared).

— qwen3.7-max via Qwen Code /review

@DragonnZhang DragonnZhang left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No new issues found. This PR has been through 10+ thorough review rounds with 31 inline comments, all of which have been addressed in the current code. The three core changes (Hook-message microcompaction, compact_history cleanup step, and UI history compaction) are correctly implemented with proper error handling, edge case coverage, and comprehensive tests. LGTM. -- Qwen Code /review

@zzhenyao

zzhenyao commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @DragonnZhang for the approve!

@zzhenyao

zzhenyao commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

@wenshao Thanks for the detailed review! Would it be okay if I address your R11 test suggestion in a separate follow-up PR? Would you mind reviewing this one again?

@wenshao wenshao merged commit d8464af into QwenLM:main Jun 8, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/bug Something isn't working as expected

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Severe OOM with qwen --resume and Escape key broken

4 participants