Skip to content

fix(core): eliminate OOM from debugResponses accumulation#4982

Merged
wenshao merged 2 commits into
QwenLM:mainfrom
zzhenyao:fix/remove-debug-responses
Jun 12, 2026
Merged

fix(core): eliminate OOM from debugResponses accumulation#4982
wenshao merged 2 commits into
QwenLM:mainfrom
zzhenyao:fix/remove-debug-responses

Conversation

@zzhenyao

@zzhenyao zzhenyao commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

What this PR does

Removes debugResponses array and extractUsageFromGeminiClient. Both are dead code.

Turn.debugResponses pushed every streaming chunk into an array. Nothing in production ever read it. extractUsageFromGeminiClient was exported but never imported by production code. Non-interactive mode uses computeUsageFromMetrics from the telemetry pipeline instead.

Closes: #4815
Follow-up to #4824

Why it is needed

Every streaming chunk was pushed into debugResponses and held in memory for the lifetime of the Turn object. In goal mode, nested turns are not cleaned up between iterations, so the array keeps growing. A single long session can accumulate gigabytes of response objects that are never read, leading to OOM.

debugResponses holds a reference to every streaming chunk for the lifetime of the Turn object. Two scenarios cause OOM:

  • Non-goal mode: a single turn with the model stuck in a thinking loop produces hundreds of chunks, each a full GenerateContentResponse. The array grows unbounded within one turn.
  • Goal mode: nested sendMessageStream calls hold inner Turn objects as references until the outer call completes. Each nested turn accumulates its own debugResponses. The chain never releases until the top-level goal finishes, so memory grows across the entire nesting depth.

Reviewer Test Plan

Before / After

Before: Turn has a debugResponses array that collects every chunk. extractUsageFromGeminiClient is exported but unused.

After: Neither exists. Zero references left.

How to verify

git grep "debugResponses"
git grep "extractUsageFromGeminiClient"

npm run build
npm run typecheck

cd packages/core && npx vitest run src/core/turn.test.ts
cd packages/cli && npx vitest run src/utils/nonInteractiveHelpers.test.ts
cd packages/cli && npx vitest run src/nonInteractiveCli.test.ts

中文

删除 debugResponses 数组和 extractUsageFromGeminiClient 函数。

每个 streaming chunk 都被 push 进 debugResponses 并在 Turn 生命周期内一直持有。goal 模式下嵌套的 turn 不会被清理,数组持续增长,单次长会话可以累积 GB 级别的无用对象,最终 OOM。

extractUsageFromGeminiClient 从初始提交起就没有被生产代码调用过,非交互模式实际走的 computeUsageFromMetrics

@zzhenyao zzhenyao marked this pull request as ready for review June 11, 2026 03:34

@DragonnZhang DragonnZhang left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found. Clean dead-code removal: debugResponses array, getDebugResponses(), and extractUsageFromGeminiClient are fully removed with zero remaining references. Test updates are consistent — the usage test in nonInteractiveCli.test.ts correctly relies on computeUsageFromMetrics via setupMetricsMock, matching the production path. CI test failures (all 3 platforms) appear pre-existing and unrelated to this change; Lint and CodeQL pass. LGTM.

@zzhenyao zzhenyao changed the title fix(core): remove dead debugResponses array to prevent OOM fix(core): eliminate OOM from debugResponses accumulation Jun 11, 2026
@zzhenyao

zzhenyao commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the review! @DragonnZhang

  • Merged main into this branch

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found. Clean dead-code removal: debugResponses array, getDebugResponses(), and extractUsageFromGeminiClient are fully removed with zero remaining references (git grep confirmed). Test updates are consistent — usage test in nonInteractiveCli.test.ts correctly migrated from mockGetDebugResponses to setupMetricsMock / computeUsageFromMetrics path. Build passes, tsc 0 errors, eslint 0 errors, 120 tests pass across all three affected test files. CI 13/13 green. ✅ — qwen3.7-max via Qwen Code /review

@zzhenyao zzhenyao requested a review from DragonnZhang June 12, 2026 08:59
@wenshao

wenshao commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

@qwen-code /triage

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Thanks for the PR! @zzhenyao

Template is mostly complete — has clear "What this PR does", "Why it is needed", "Reviewer Test Plan", and bilingual description. Minor gaps: missing the "Tested on" OS table and "Risk & Scope" section from the template, but the substantive content is solid.

On direction: this is a clean fix for a real P1 bug (#4815 — OOM from debugResponses accumulation). Removing dead code that holds references to every streaming chunk is exactly the right approach. No product-direction concerns whatsoever.

On approach: scope is perfect. The PR removes the unused debugResponses array from Turn, the orphaned extractUsageFromGeminiClient helper, and all associated test code. +2/-214 across 6 files — there's nothing to cut. I verified both symbols have zero references after the patch. The simplest possible fix.

Moving on to code review. 🔍

中文说明

感谢贡献!

模板基本完整——有清晰的"做什么"、"为什么"、"测试计划"和双语描述。小缺憾:缺少模板中的"Tested on"操作系统表格和"Risk & Scope"部分,但实质内容充分。

方向:这是修复 P1 OOM bug (#4815) 的正确方案。删除持有所有 streaming chunk 引用的死代码,方向毫无疑问。

方案:范围恰当。删除 Turn 中未使用的 debugResponses 数组、孤立的 extractUsageFromGeminiClient 函数及相关测试。+2/-214,6 个文件,无可精简。已验证两个符号在补丁后零引用。最简修复。

进入代码审查 🔍

Qwen Code · qwen3.7-max

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Code Review

Independent proposal (before reading the diff): Remove debugResponses field from Turn, its accumulation in the streaming loop, its reset in the retry handler, and the getDebugResponses() getter. Then remove extractUsageFromGeminiClient from nonInteractiveHelpers.ts since it depends on the now-removed getter. Update tests. Nothing else needed.

Comparison with the diff: The PR matches my proposal exactly — same files, same lines, same approach. Six files touched: turn.ts (remove field, push, retry clear, getter), nonInteractiveHelpers.ts (remove function), test files (remove assertions and mocks), and a design doc (remove mention). Nothing missed, nothing extra.

No correctness bugs, no security concerns, no AGENTS.md violations. This is a clean dead-code removal with zero residual references confirmed by git grep.

Build & Tests

$ npm run build
✓ build succeeded (0 errors)

$ npm run typecheck
✓ tsc --noEmit passed for all packages

$ cd packages/core && npx vitest run src/core/turn.test.ts
✓ 23 tests passed

$ cd packages/cli && npx vitest run src/utils/nonInteractiveHelpers.test.ts
✓ 52 tests passed

$ cd packages/cli && npx vitest run src/nonInteractiveCli.test.ts
✓ 45 tests passed (1 skipped, pre-existing)

Real-Scenario Verification

This PR removes dead code that causes OOM — the bug manifests only in long-running sessions (minutes to hours of streaming). A quick tmux prompt can't reproduce 8GB heap exhaustion, but I verified the core claim: both symbols are completely absent from the PR's built output.

$ node .qwen/worktrees/triage/dist/cli.js --version
0.17.1

$ grep -c 'debugResponses' .qwen/worktrees/triage/dist/cli.js
0
$ grep -c 'extractUsageFromGeminiClient' .qwen/worktrees/triage/dist/cli.js
0

Before (installed v0.18.0):

$ grep -c 'debugResponses' /usr/local/lib/node_modules/@qwen-code/qwen-code/chunks/chunk-IWAYOW5Q.js
4

After (this PR's bundle):

$ grep -c 'debugResponses' dist/cli.js
0

Source-level verification on main branch (before PR):

$ git show main:packages/core/src/core/turn.ts | grep -n 'debugResponses'
291:  private debugResponses: GenerateContentResponse[] = [];
331:          this.debugResponses = [];
358:        this.debugResponses.push(resp);
483:    return this.debugResponses;

$ git show main:packages/cli/src/utils/nonInteractiveHelpers.ts | grep -n 'extractUsageFromGeminiClient'
105:export function extractUsageFromGeminiClient(

Source-level verification on PR branch (after):

$ git grep 'debugResponses' pr-4982
(no matches)

$ git grep 'extractUsageFromGeminiClient' pr-4982
(no matches)

Both symbols fully removed — from source, tests, and the production bundle.

中文说明

代码审查

独立方案(读 diff 前): 删除 Turn 中的 debugResponses 字段、streaming 循环中的 push、retry 处理中的重置、以及 getDebugResponses() getter。然后删除依赖该 getter 的 extractUsageFromGeminiClient。更新测试。无需其他改动。

与 diff 对比: PR 与我的方案完全一致——同样的文件、同样的代码行、同样的方法。6 个文件:turn.tsnonInteractiveHelpers.ts、测试文件、设计文档。无遗漏,无多余。

无正确性 bug、无安全隐患、无 AGENTS.md 违规。干净的死代码删除,git grep 确认零残留引用。

构建与测试

全部通过:build ✓、typecheck ✓、3 个测试文件共 120 个测试全部通过。

真实场景验证

OOM bug 需要长时间会话才能复现(数分钟到数小时的 streaming)。快速 tmux 测试无法重现 8GB 堆溢出,但已验证核心声明:两个符号在 PR 构建产物中完全不存在。

源码级别验证:main 分支有 4 处 debugResponses 引用 + extractUsageFromGeminiClient 导出函数;PR 分支全部为零匹配。

Qwen Code · qwen3.7-max

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Final Assessment

This PR is exactly what it should be: a surgical removal of dead code that causes a P1 OOM bug. The motivation is clear (users hitting 8GB heap exhaustion in long sessions), the fix is minimal (+2/-214, 6 files), and the execution is clean.

My independent proposal matched the diff line-for-line. There's nothing to simplify further — you can't make "delete unused code" any simpler.

Build, typecheck, and all 120 tests across the three affected test files pass. Both symbols confirmed absent from the production bundle. The source-level git grep verification on the PR branch returns zero matches for both debugResponses and extractUsageFromGeminiClient.

The PR description is one of the better ones I've seen — it explains both the root cause (chunk accumulation in the Turn object's lifetime) and the two specific OOM scenarios (non-goal mode thinking loops, goal mode nested turns). The bilingual format and reviewer test plan make review straightforward.

Approving. ✅

中文说明

最终评估

这个 PR 恰如其分:精确删除导致 P1 OOM bug 的死代码。动机清晰(用户在长会话中遇到 8GB 堆溢出),修复最小化(+2/-214,6 个文件),执行干净。

我的独立方案与 diff 逐行一致。无法进一步简化——"删除未使用的代码"已经是最简形式。

构建、类型检查和全部 120 个测试通过。两个符号在生产产物中确认为零引用。PR 描述优秀——解释了根因和两种具体的 OOM 场景。双语格式和测试计划让审查很方便。

批准合并 ✅

Qwen Code · qwen3.7-max

@qwen-code-ci-bot qwen-code-ci-bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, looks ready to ship. ✅

@wenshao

wenshao commented Jun 12, 2026

Copy link
Copy Markdown
Collaborator

Local verification report (maintainer)

Verified this PR locally with real runs, not just the test suite. Since the PR's merge-base is ~50 commits behind main, all checks below were run on the merge result of this PR + latest origin/main (92c4a82) to rule out semantic conflicts that GitHub's MERGEABLE flag cannot catch.

1. Dead-code claim — confirmed, plus one stronger finding

  • On latest main, the only references to debugResponses / getDebugResponses / extractUsageFromGeminiClient are the definitions themselves and their test files. No production reader exists.
  • After merging this PR into latest main: whole-repo case-insensitive grep finds zero remaining references (all file types, not just .ts).
  • Stronger than "unused": extractUsageFromGeminiClient is not just unused — it is a no-op. It probes geminiClient.getChat().getDebugResponses(), but on current main the GeminiChat class has no getDebugResponses method at all (only Turn has one, and the helper never touches Turn). The helper can only ever return undefined, and the mock in nonInteractiveCli.test.ts was mocking a method that doesn't exist on the real class. The non-interactive usage path goes through computeUsageFromMetrics (4 call sites in nonInteractiveCli.ts), exactly as the PR description says.
  • One note for the record: Turn is re-exported from core's public index.ts (export * from './core/turn.js'), so removing getDebugResponses() technically shrinks the public surface of @qwen-code/qwen-code-core for external consumers. In-repo impact is zero — nothing imports it anywhere in the monorepo.

2. Build / typecheck / tests (merged tree)

Check Result
npm run build ✅ exit 0
npm run typecheck ✅ exit 0
core targeted: turn.test.ts ✅ 23 passed
cli targeted: nonInteractiveHelpers.test.ts + nonInteractiveCli.test.ts ✅ 97 passed, 1 skipped (the skip is it.skip on main already, unrelated)
core full suite ✅ 405 files, 11466 passed, 4 skipped
cli full suite ⚠️ 8123 passed, 26 failed in 9 files — none attributable to this PR (see below)

On the cli full-suite failures: all 26 are pre-existing on latest main or load-induced flakes, in files that import nothing this PR touches (AuthDialog, InputPrompt, useAtCompletion, server, …).

  • AuthDialog.test.tsx fails identically on pure origin/main and on the merged tree (isolation run: 11 failed / 13 passed on both) — pre-existing on latest main in this environment (vi.waitFor timeouts).
  • useAtCompletion.test.ts fails on both trees as well.
  • The other 7 files only fail under CPU contention (two full suites ran concurrently) and pass when re-run on an idle machine; the failing subset also varies between back-to-back runs on the same tree, which is the signature of timing flakes, not a regression.

3. Real-run smoke tests (bundled CLI from the merged tree, driven via tmux)

  • Non-interactive (-p "What is 2+2? ..." --output-format json): exit 0, correct answer, and the final result event carries real usage — {"input_tokens":19365,"output_tokens":9,"total_tokens":19374} — produced by computeUsageFromMetrics. The removed helper is demonstrably not part of this path.
  • Interactive (tmux): multi-chunk streaming response renders fully; a tool-call turn (ReadFile) executes and returns correct content; /stats reports correct token accounting (58.3k input, 96.9% cached). Turn.run behaves identically without debugResponses.

4. Memory-retention benchmark — the OOM claim holds empirically

Drove Turn.run directly (mocked chat.sendMessageStream) with 2000 chunks carrying ~63 MB of unique text, measured with node --expose-gc on both builds:

Build Retained while Turn alive After Turn released
main (with debugResponses) 469 MB 0.1 MB
this PR (merged) 0 MB 0 MB

Both builds emitted an identical event stream (2001 events) — behavior parity. A WeakRef probe confirms the mechanism: on main, chunk objects are uncollectable for the entire Turn lifetime and become collectable the moment the Turn is released — exactly the retention pattern described in the PR (long thinking loops within one turn; nested goal-mode turns held until the outer call completes). (The 469 MB > 63 MB delta is benchmark string-rope overhead; production chunks are flat JSON.parse strings, so real retention ≈ payload + object overhead — still unbounded and proportional to chunk count.)

Verdict

Safe to merge. Dead code with zero references after merging into latest main; build, typecheck, all targeted tests, and the full core suite are green (the cli suite's 26 failures are pre-existing on pure main or load-induced flakes — none attributable to this PR); real CLI behavior is intact in both modes; and the memory claim is reproducible, not just theoretical.

中文版(Chinese version)

本地验证报告(维护者)

本 PR 在本地做了真实运行验证,而不仅是跑测试。由于 PR 的 merge-base 落后 main 约 50 个提交,以下所有检查都在「本 PR 合入最新 origin/main(92c4a82)的合并结果」上进行,以排除 GitHub MERGEABLE 标志无法发现的语义冲突。

1. 死代码声明 — 成立,且有一个更强的发现

  • 在最新 main 上,debugResponses / getDebugResponses / extractUsageFromGeminiClient 的引用只有定义本身和测试文件,生产代码零调用。
  • 合入最新 main 后:全仓库大小写不敏感 grep(不限文件类型)零残留引用
  • 比"未使用"更强:extractUsageFromGeminiClient 不只是没人调用——它本身就是空操作。它探测 geminiClient.getChat().getDebugResponses(),但当前 main 上的 GeminiChat 类根本没有 getDebugResponses 方法(只有 Turn 有,而该 helper 从不接触 Turn)。这个 helper 永远只能返回 undefined,nonInteractiveCli.test.ts 里 mock 的是真实类上不存在的方法。非交互模式的 usage 实际走 computeUsageFromMetrics(nonInteractiveCli.ts 中 4 处调用),与 PR 描述一致。
  • 备注:Turn 通过 core 的公开 index.ts 对外导出(export * from './core/turn.js'),删除 getDebugResponses() 严格来说缩小了 @qwen-code/qwen-code-core 对外部消费者的公开面。仓库内影响为零——monorepo 中无任何导入。

2. 构建 / 类型检查 / 测试(合并树)

检查 结果
npm run build ✅ exit 0
npm run typecheck ✅ exit 0
core 定向:turn.test.ts ✅ 23 通过
cli 定向:nonInteractiveHelpers.test.ts + nonInteractiveCli.test.ts ✅ 97 通过,1 跳过(该 skip 在 main 上即存在,与本 PR 无关)
core 全套 ✅ 405 文件,11466 通过,4 跳过
cli 全套 ⚠️ 8123 通过,9 个文件 26 个失败 — 均与本 PR 无关(见下)

关于 cli 全套的失败:26 个失败全部是最新 main 上预先存在的问题或负载导致的 flaky,所在文件(AuthDialogInputPromptuseAtCompletionserver 等)不导入本 PR 触及的任何代码。

  • AuthDialog.test.tsxorigin/main 和合并树上失败完全一致(隔离运行:两边均 11 失败 / 13 通过)——是最新 main 在本环境下预先存在的问题(vi.waitFor 超时)。
  • useAtCompletion.test.ts 同样在两棵树上都失败。
  • 其余 7 个文件仅在 CPU 争抢下失败(当时 core/cli 两个全套并发运行),空闲机器重跑即通过;且同一棵树连续两次批量运行的失败子集都不相同——这是时序 flaky 的典型特征,不是回归。

3. 真实运行冒烟测试(合并树打包出的 CLI,tmux 驱动)

  • 非交互模式(-p "What is 2+2? ..." --output-format json):exit 0,回答正确,最终 result 事件携带真实 usage——{"input_tokens":19365,"output_tokens":9,"total_tokens":19374},由 computeUsageFromMetrics 产出。被删除的 helper 确实不在该路径上。
  • 交互模式(tmux):多 chunk 流式响应完整渲染;工具调用 turn(ReadFile)执行并正确返回内容;/stats 的 token 统计正确(输入 58.3k,缓存命中 96.9%)。去掉 debugResponsesTurn.run 行为不变。

4. 内存滞留基准 — OOM 声明有实证支撑

直接驱动 Turn.run(mock chat.sendMessageStream),流式输入 2000 个 chunk、约 63 MB 唯一文本,用 node --expose-gc 在两个构建上测量:

构建 Turn 存活期间滞留 Turn 释放后
main(含 debugResponses) 469 MB 0.1 MB
本 PR(合并后) 0 MB 0 MB

两个构建产出完全相同的事件流(2001 个事件)——行为等价。WeakRef 探针证实了机制:在 main 上,chunk 对象在 Turn 整个生命周期内不可回收,Turn 释放的瞬间即可回收——与 PR 描述的滞留模式完全吻合(单 turn 内长思考循环;goal 模式嵌套 turn 被持有直到外层调用结束)。(469 MB 大于 63 MB 的差额是基准测试字符串 rope 结构的开销;生产环境 chunk 来自 JSON.parse 的扁平字符串,真实滞留 ≈ 载荷 + 对象开销——但仍然无上界、随 chunk 数量线性增长。)

结论

可以安全合并。 死代码在合入最新 main 后零引用残留;构建、类型检查、全部定向测试及 core 全套测试全绿(cli 全套的 26 个失败要么在纯 main 上预先存在、要么是负载导致的 flaky——没有任何一个可归因于本 PR);两种模式下真实 CLI 行为完好;内存声明可复现而非纸面推断。

@wenshao wenshao merged commit 8000342 into QwenLM:main Jun 12, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Severe OOM with qwen --resume and Escape key broken

4 participants