feat(core): auto-dump memory diagnostics to disk on pressure detection#4654
Conversation
When the MemoryPressureMonitor (#4403) detects hard or critical pressure, write a lightweight diagnostics JSON to .qwen/<project>/diagnostics/ before running cleanup. The file survives even if a subsequent operation triggers OOM, giving maintainers actionable data from bug reports without requiring the user to manually run /doctor memory after a crash. Design follows Claude Code's heapDumpService approach: write the cheap JSON first (small write, won't OOM), heavy snapshot second. Diagnostics include process memory stats, V8 heap stats, session history size, and an actionable suggestion for the user. Per-session limits: max 3 dumps, 30s cooldown between dumps. Closes #4651
There was a problem hiding this comment.
Pull request overview
Adds an auto-dump of memory diagnostics to disk when MemoryPressureMonitor detects hard or critical pressure, so that a JSON snapshot survives any subsequent OOM crash and can be attached to bug reports without the user needing to run /doctor memory manually. The new MemoryDiagnosticsDumper reuses the existing collectMemoryDiagnostics() collector and is invoked fire-and-forget from the pressure-monitor cleanup path.
Changes:
- New
MemoryDiagnosticsDumperservice that writesmemory-<sessionShort>-<timestamp>.jsonunder the project's runtime diagnostics directory, with a 3-dump-per-session cap and a 30s cooldown. MemoryPressureMonitorconstructs the dumper, callsdump(pressure)beforeexecuteCleanuponhard/critical, and forwardsresetForNewSession().- Tests for first-dump shape, cap, cooldown, session reset, critical suggestion, and missing
geminiClient.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| packages/core/src/services/memoryDiagnosticsDumper.ts | New service that collects diagnostics + session stats and writes JSON to <projectDir>/diagnostics/. |
| packages/core/src/services/memoryPressureMonitor.ts | Instantiates the dumper, invokes it on hard/critical pressure before cleanup, and resets it on new sessions. |
| packages/core/src/services/memoryDiagnosticsDumper.test.ts | Unit tests covering dump shape, per-session cap, cooldown, reset, critical suggestion, and missing client. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
📋 Review SummaryThis PR implements automatic memory diagnostics dumping when memory pressure is detected (hard or critical levels). The implementation is clean, well-tested, and follows established patterns from the codebase. The change adds a new 🔍 General Feedback
🎯 Specific Feedback🟢 Medium
🔵 Low
✅ Highlights
|
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
测试报告1. 集成测试(真实 Node 进程)✅使用 node --max-old-space-size=256 integration-test.mjs输出: 写出的 JSON 样本(摘要): {
"trigger": "hard",
"dumpNumber": 1,
"memoryUsage": { "rss": 172146688, "heapUsed": 98903048, "heapTotal": 122830848 },
"v8HeapStats": { "heapSizeLimit": 318767104, "usedHeapSize": 98903192, "detachedContexts": 0 },
"session": { "historyEntries": 42 },
"suggestion": "Memory pressure detected. Running /compact may help reduce memory usage."
}2. 真实模型 E2E ✅使用 team API key + node --max-old-space-size=512 packages/cli/dist/index.js \
--model qwen3-coder-flash --bare --auth-type openai -y \
-p 'Read every TypeScript file in packages/core/src/services/ and summarize each'
3. 设计验证
结论全部验证通过。核心链路(pressure detection → auto-dump → diagnostics JSON on disk)在集成测试中端到端跑通,真实模型 E2E 确认 CLI + monitor 正常加载运行。 |
wenshao
left a comment
There was a problem hiding this comment.
One non-blocking suggestion inline. Note: the only red check is Test (windows-latest, Node 22.x) — its failed-step log was not retrievable, macOS/Ubuntu pass, and the 61 unit tests pass locally, so it does not appear attributable to this diff (worth a quick re-run to confirm it is flaky/unrelated). — claude-opus-4-8[1m] via Qwen Code /qreview
…latform
The assertion hard-coded POSIX separators ('/tmp/test-project/diagnostics/'),
which fails on Windows where path.join produces backslashes. Build the
expected substring with path.join + path.sep so it matches the dumper's
actual output on every platform.
Two critical issues from review: 1. The async collectMemoryDiagnostics() runs before writeFileSync, but it spawns a `ps` subprocess and reads /proc — fork() under critical memory pressure can fail or be OOM-killed, leaving no file on disk despite the "cheap write first" design comment. 2. dumpCount and lastDumpTime were updated after the await, so concurrent dumps (e.g. hard→critical escalation) would both pass the cap/cooldown guards and overwrite each other. Fix: - Reserve the dump slot synchronously (++dumpCount, lastDumpTime) before any await, so concurrent calls correctly hit the cap. - Phase 1: synchronously write a minimal JSON (process.memoryUsage + v8.getHeapStatistics, no fork/exec) with collectionComplete=false. Because async functions execute synchronously up to the first await, this is guaranteed on disk before the caller's next statement runs. - Phase 2: enrich with full diagnostics asynchronously and overwrite the file with collectionComplete=true. If Phase 2 crashes, the minimal Phase 1 file still survives for debugging. Tests updated for the two-phase write and gain two new cases covering the sync-Phase-1 guarantee and the synchronous slot reservation.
…al command) The suggestion text told users to run /compact, which does not exist in this repository — the actual command is /compress (see compressCommand.ts). Pointing users at a nonexistent slash command in a diagnostics report makes the suggestion unactionable.
There was a problem hiding this comment.
Stage 1: Template Gate — ❌ Not Passed → ✅ Passed (updated)
Update (2026-06-02): The author has since updated the PR body to include all required template headings. This review is now superseded. See the latest triage review for the current assessment.
@yiliang114 The PR body does not follow the required PR template. The following mandatory headings are missing:
## What this PR does— your PR uses## Summaryinstead## Why it's needed— the motivation is embedded as a bullet inside Summary, not a standalone section## Reviewer Test Plan— your PR has## Validationand## Scope / Riskbut not this required heading### Evidence (Before & After)— not present; your integration test report in the comments is great, but the PR body itself needs this section
Please update the PR body to match the template. The content you have is solid — it just needs to be reorganized under the correct headings so reviewers can find what they need quickly.
中文说明
更新 (2026-06-02): 作者已更新 PR body,包含所有必需模板 heading。此 review 已被取代。请参阅最新的 triage review。
@yiliang114 PR body 未遵循 PR 模板,缺少以下 4 个必需 heading:
## What this PR does— 当前用的是## Summary## Why it's needed— 动机信息嵌在 Summary 的 bullet 里,没有独立成段## Reviewer Test Plan— 当前有## Validation和## Scope / Risk,但缺少此必需 heading### Evidence (Before & After)— 缺失;comment 里的集成测试报告很好,但 PR body 本身需要此 section
Stage 1: Template Gate — ✅ PassedAll required template headings are present in the PR body:
The author updated the PR body from the initial submission (which used non-standard headings like 中文说明PR body 包含所有必需模板 heading,作者已从初始提交(使用 |
pomelo-nwu
left a comment
There was a problem hiding this comment.
Stage 2: Product Direction Gate — ✅ Aligned (Claude Code Parity)
Claude Code ships analogous capabilities:
/heapdumpcommand — changelog entry: "Fixed/heapdumpfailing on Windows withEEXISTerror when the Desktop folder already exists"- Memory pressure management — "releasing warm-spare background workers under memory pressure" (v2.1.147), "shed under memory pressure only after non-pinned sessions"
- Multiple OOM crash fixes — for Edit on large files,
/feedbackon long sessions, session resume with subagent usage
This PR adds auto-dump diagnostics on pressure detection — directly aligned with Claude Code's heap dump capabilities and OOM prevention strategy. The design explicitly parallels Claude Code's heapDumpService auto-dump pattern. Direction is aligned; continuing to code review.
Stage 3: KISS-Focused Code Review — ✅ Passed
Scope: 1 new file (175 lines) + 10 lines added to memoryPressureMonitor.ts + 250 lines of tests.
Structure and ownership: Clean separation — MemoryDiagnosticsDumper is a standalone class with a single responsibility. Integration into MemoryPressureMonitor is minimal (3 lines of wiring + 1 invocation).
Key design decisions reviewed:
- Two-phase write strategy: Phase 1 (synchronous
writeFileSyncwithprocess.memoryUsage()+v8.getHeapStatistics()) lands on disk before anyawait. Phase 2 (asynccollectMemoryDiagnosticswhich may fork subprocesses) overwrites with full data. If Phase 2 crashes under OOM, Phase 1's minimal file survives. This correctly addresses the prior review's critical concern about fire-and-forget with async collection. - Synchronous slot reservation:
++this.dumpCountandthis.lastDumpTime = nowhappen before anyawait, preventing concurrent invocations from bypassing the cap/cooldown guards. Test verifies this withPromise.allof 4 concurrent dumps. - Fire-and-forget pattern:
void this.diagnosticsDumper.dump(pressure)is safe because Phase 1'swriteFileSyncexecutes synchronously beforeexecuteCleanupstarts (JavaScript runs async function bodies synchronously up to the firstawait).
Correctness: /compress is correctly used in suggestion text (not /compact, which doesn't exist). Error handling is sound — slot stays consumed on failure to prevent cascading attempts under pressure.
No unnecessary abstraction: Single class, single file, no config options, no speculative flexibility.
Tests: 63/63 pass (55 existing memoryPressureMonitor + 8 new memoryDiagnosticsDumper). Tests cover: basic dump, session cap, cooldown, session reset, Phase 1 sync guarantee, concurrent slot reservation, graceful fallback on missing client.
Stage 4: Real-Scenario Testing — ✅ Passed
CLI smoke test (no regression):
npm run dev -- -p 'Say hello in one word'
→ CLI started, model responded "你好!", no errors
Dumper integration test (direct invocation of MemoryDiagnosticsDumper via npx tsx):
Test 1: Dump on hard pressure
✅ Dump written: /tmp/triage-dumper-test/diagnostics/memory-integrat-2026-06-02T01-29-40_947Z.json
collectionComplete: true (Phase 2 finished)
dumpNumber: 1
memoryUsage.rss: 83 MB
suggestion: Memory pressure detected. Running /compress may help reduce memory usage.
Test 2: Cooldown prevents second dump
Second dump result: undefined (correctly blocked)
Test 3: Reset session, then dump critical
✅ Critical dump written
suggestion: Memory is critically high. Consider running /compress or starting a fresh session to avoid OOM.
All three core behaviors verified end-to-end: dump writing, cooldown enforcement, session reset.
Stage 5: Final Decision — ✅ Merge-Ready
Three honest checks:
-
Is the need real? Yes. When qwen-code OOMs, diagnostic data dies with the process. Users cannot run
/doctor memorypost-mortem. This auto-dump ensures maintainers get actionable data from bug reports without user intervention. Issue #4624 is a concrete case that would have benefited from this. -
Is the code simple? Yes. One class, 175 lines, no over-engineering. Two-phase write is the minimum complexity needed to survive OOM during collection. No config options, no speculative flexibility.
-
Am I confident to merge this? Yes. Blast radius is small (3 files, purely additive, gated by existing pressure thresholds). Default behavior is no-op until pressure ≥ hard. Code is well-tested (63/63 unit tests + integration verified). The two-phase write strategy correctly addresses the prior critical review about async collection before write.
中文说明
Stage 2: 产品方向 — ✅ 对齐(Claude Code 同等能力)
Claude Code 已有 /heapdump 命令和内存压力管理机制。本 PR 在压力检测时自动 dump 诊断数据,与 Claude Code 的堆 dump 能力和 OOM 预防策略直接对齐。
Stage 3: KISS 代码审查 — ✅ 通过
代码简洁:1 个新文件(175 行)+ memoryPressureMonitor.ts 新增 10 行。两阶段写入策略(Phase 1 同步写最小 JSON,Phase 2 异步覆盖完整数据)正确解决了前次 review 提出的 fire-and-forget + 异步收集的关键问题。同步 slot 预留防止并发绕过 cap/cooldown。63/63 单测全部通过。
Stage 4: 真实场景测试 — ✅ 通过
CLI 烟雾测试无回归,dumper 集成测试验证了 dump 写入、cooldown 阻断、session reset 三个核心行为。
Stage 5: 最终决策 — ✅ 可合并
需求真实(OOM 后诊断数据丢失),代码简洁(无过度工程),合并信心充足(影响面小、纯增量、测试充分)。
| dumpNumber, | ||
| timestamp: new Date().toISOString(), | ||
| memoryUsage: process.memoryUsage(), | ||
| v8HeapStats: v8.getHeapStatistics(), |
There was a problem hiding this comment.
[Suggestion] Phase 1 uses raw v8.getHeapStatistics() which produces snake_case keys (heap_size_limit, total_heap_size), while Phase 2 uses collectMemoryDiagnostics() → mapHeapStats() which produces camelCase keys (heapSizeLimit, totalHeapSize). If Phase 2 fails/crashes, the surviving Phase 1 file has a different v8HeapStats schema than completed files. Any downstream consumer (or human reading bug reports) must handle two schemas depending on collectionComplete.
| v8HeapStats: v8.getHeapStatistics(), | |
| v8HeapStats: mapHeapStats(v8.getHeapStatistics()), |
This requires exporting mapHeapStats from ../utils/memoryDiagnostics.js.
— qwen3.7-max via Qwen Code /review
| const fullPayload = { | ||
| trigger, | ||
| dumpNumber, | ||
| ...diagnostics, |
There was a problem hiding this comment.
[Suggestion] Phase 2's fullPayload spreads ...diagnostics which includes memoryUsage from collectMemoryDiagnostics(). Since void dump() fires before executeCleanup() in the caller, Phase 2's async collection may capture post-cleanup (cleaner) memory state — overwriting Phase 1's accurate crisis-state memoryUsage with misleading "healthy" numbers.
Consider preserving Phase 1's memory snapshot in the full payload:
| ...diagnostics, | |
| const fullPayload = { | |
| trigger, | |
| dumpNumber, | |
| ...diagnostics, | |
| memoryUsage: process.memoryUsage(), | |
| session: this.collectSessionStats(), | |
| suggestion: this.getSuggestion(trigger), | |
| collectionComplete: true, | |
| }; |
Or place dumper-specific memory fields after the spread so they take precedence.
— qwen3.7-max via Qwen Code /review
| await promise; | ||
| // Phase 2 has now overwritten the file | ||
| expect(fs.writeFileSync).toHaveBeenCalledTimes(2); | ||
| }); |
There was a problem hiding this comment.
[Suggestion] No test covers collectMemoryDiagnostics rejecting (Phase 2 failure). The class's core design guarantee — "Phase 1 survives Phase 2 crash" — is never validated by a test.
Suggested test:
it('Phase 1 file survives Phase 2 failure', async () => {
const config = createMockConfig();
const dumper = new MemoryDiagnosticsDumper(config);
vi.mocked(collectMemoryDiagnostics).mockRejectedValueOnce(
new Error('OOM during collection'),
);
const result = await dumper.dump('hard');
expect(result).toBeUndefined();
// Phase 1 written, Phase 2 never ran
expect(fs.writeFileSync).toHaveBeenCalledTimes(1);
const phase1 = JSON.parse(
vi.mocked(fs.writeFileSync).mock.calls[0][1] as string,
);
expect(phase1.collectionComplete).toBe(false);
expect(phase1.memoryUsage).toBeDefined();
});— qwen3.7-max via Qwen Code /review
|
|
||
| debugLogger.info( | ||
| `Phase 2 diagnostics written to ${filePath} (trigger=${trigger}, dump #${dumpNumber})`, | ||
| ); |
There was a problem hiding this comment.
[Suggestion] The catch block contains a deliberate design decision ("slot stays consumed on failure") but is entirely untested. No test mocks mkdirSync or writeFileSync to throw.
Suggested test:
it('catch block keeps slot consumed on write failure', async () => {
const config = createMockConfig();
const dumper = new MemoryDiagnosticsDumper(config);
let mockNow = 1000000;
vi.spyOn(Date, 'now').mockImplementation(() => {
mockNow += 60_000;
return mockNow;
});
vi.mocked(fs.writeFileSync).mockImplementationOnce(() => {
throw new Error('EACCES');
});
const r1 = await dumper.dump('hard'); // fails, slot consumed
expect(r1).toBeUndefined();
// Slots 2 and 3 succeed
await dumper.dump('hard');
await dumper.dump('hard');
// Slot 4 hits cap (not cooldown), proving failed attempt consumed slot 1
const r4 = await dumper.dump('hard');
expect(r4).toBeUndefined();
});— qwen3.7-max via Qwen Code /review
What this PR does
Adds a
MemoryDiagnosticsDumperservice that auto-writes a lightweight diagnostics JSON to.qwen/<project>/diagnostics/whenMemoryPressureMonitordetectshardorcriticalpressure — before running cleanup. The dump survives subsequent OOM crashes, giving maintainers actionable data from bug reports without requiring user intervention.Implementation: one new file (
memoryDiagnosticsDumper.ts, 130 lines) + 10 lines added tomemoryPressureMonitor.ts. The dumper is invoked viavoid this.diagnosticsDumper.dump(pressure)— fire-and-forget; failures are swallowed and logged.Design (parallels Claude Code's
heapDumpServiceauto-dump at 1.5 GB):Per-session limits:
startNewSession()Why it's needed
Currently when qwen-code OOMs, the only artifact is V8's native stack trace. Users can't run
/doctor memoryafter the process is dead — the diagnostic data dies with the process. This auto-dump ensures that memory state (heap usage, V8 stats, history size, actionable suggestion) is persisted to disk before the process crashes, so users can attach it directly to bug reports.This follows the same philosophy as Claude Code's
heapDumpServicewhich writes diagnostics before the heap snapshot (because the snapshot itself can crash for very large heaps).Reviewer Test Plan
How to verify
Run integration test with constrained heap:
The script allocates JS objects until heap ratio exceeds the
hardthreshold (35%), triggering the dumper.Verify the diagnostics JSON is written with correct shape and actionable suggestion.
Run real model E2E to confirm CLI + monitor loads normally without false triggers:
node --max-old-space-size=512 packages/cli/dist/index.js \ --model qwen3-coder-flash --bare --auth-type openai -y \ -p 'Read every TypeScript file in packages/core/src/services/ and summarize each'Evidence (Before & After)
Integration test output (256 MB heap cap, triggered hard pressure):
Diagnostics JSON sample:
{ "trigger": "hard", "dumpNumber": 1, "memoryUsage": { "rss": 172146688, "heapUsed": 98903048, "heapTotal": 122830848 }, "v8HeapStats": { "heapSizeLimit": 318767104, "usedHeapSize": 98903192, "detachedContexts": 0 }, "session": { "historyEntries": 42 }, "suggestion": "Memory pressure detected. Running /compact may help reduce memory usage." }Real model E2E: CLI starts normally, model responds, tool call chain works, no false dump triggers under normal heap usage.
Unit tests:
Tested on
Environment (optional)
Node.js v22.22.0,
--max-old-space-size=256(integration),--max-old-space-size=512(E2E). Model: qwen3-coder-flash.Risk & Scope
fs.writeFileSyncon the tool-execution path (after tool finishes, in finally). Typical diagnostics JSON is <50 KB so write latency is negligible.Linked Issues
Closes #4651
Builds on: #4403 (memory pressure monitor), #3000 (memory diagnostics roadmap)
Related: #4624 (user-reported OOM that would have benefited from auto-dump)
中文说明
这个 PR 做了什么
新增
MemoryDiagnosticsDumper服务,当MemoryPressureMonitor检测到hard或critical压力时,自动将轻量级诊断 JSON 写入.qwen/<project>/diagnostics/——在执行清理之前。这样即使后续 OOM 崩溃,诊断数据仍然存活在磁盘上。实现:一个新文件(
memoryDiagnosticsDumper.ts,130 行)+memoryPressureMonitor.ts新增 10 行。通过void this.diagnosticsDumper.dump(pressure)调用——fire-and-forget,失败会被吞掉并记录日志。设计(对标 Claude Code 的
heapDumpService在 1.5 GB 时自动 dump):每会话限制:
startNewSession()时重置状态为什么需要
当前 qwen-code OOM 时,唯一的产物是 V8 的原生堆栈跟踪。进程死后用户无法运行
/doctor memory——诊断数据随进程一起消亡。此自动 dump 确保内存状态(堆使用量、V8 统计、history 大小、可操作建议)在进程崩溃前持久化到磁盘,用户可直接附到 bug report 中。这遵循了 Claude Code
heapDumpService的相同理念:在堆快照之前先写诊断(因为快照本身可能因堆过大而崩溃)。评审测试计划
如何验证
node --max-old-space-size=256 integration-test.mjs证据(修复前后对比)
集成测试输出(256 MB 堆上限,触发 hard 压力):
memory-integrat-2026-05-31T08-29-05_219Z.json真实模型 E2E: CLI 正常启动、模型响应正常、tool 调用链路正常,无误触发。
单测: 61 tests passed(55 + 6)
测试平台
风险与范围
fs.writeFileSync,在 tool 执行路径上(tool 完成后的 finally 中)。典型诊断 JSON <50 KB,写入延迟可忽略。关联 Issue
Closes #4651
构建于:#4403(内存压力监控器)、#3000(内存诊断路线图)
相关:#4624(本可从自动 dump 中获益的用户 OOM 报告)