feat(core): add memory pressure monitor by ZevGit · Pull Request #4403 · QwenLM/qwen-code

ZevGit · 2026-05-21T10:53:15Z

Summary

Added low-overhead runtime memory-pressure handling for long-running sessions.
The monitor classifies pressure using cgroup-aware RSS and V8 heap usage, then conservatively evicts FileReadCache metadata under soft, hard, or critical pressure.
The implementation preserves active conversation state and avoids automatic compaction or user-visible session interruption.
Configuration is opt-in/tunable through environment variables with strict parsing, validation, and diagnostics for invalid values.

Reviewer Focus

Cleanup policy and thresholds for soft, hard, and critical pressure.
Container memory-limit handling for cgroup v2 and cgroup v1.
Session-boundary behavior, especially avoiding stale async cleanup tails after a new session starts.
Ineffective-cleanup diagnostics and adaptive backoff for repeated aggressive cleanups.
The decision to limit automatic cleanup to cache metadata and optional explicit GC, rather than dropping conversation state.

Review Follow-up

Latest update: 55fb90be9 addresses the May 27 review feedback.

Added a session-generation guard so stale async cleanup tails and queued escalations cannot clear a newly-started session FileReadCache.
Added adaptive backoff for repeated ineffective aggressive cleanup, reducing the risk of repeated cache clears or explicit GC when memory pressure comes from non-cache sources.
Split diagnostic counter reset from session-scoped cleanup cancellation via resetForNewSession().
Added regression tests for queued cleanup cancellation on session reset, aggressive cleanup backoff, effective-cleanup counter reset, failure recovery, GC availability, and cooldown behavior.
Tightened test isolation by restoring Vitest spies after each memory-pressure test.

Validation

Commands run locally:

npx eslint packages/core/src/services/memoryPressureMonitor.ts packages/core/src/services/memoryPressureMonitor.test.ts packages/core/src/services/fileReadCache.ts packages/core/src/services/fileReadCache.test.ts packages/core/src/config/config.ts packages/core/src/config/config.test.ts packages/core/src/core/coreToolScheduler.ts packages/core/src/core/coreToolScheduler.test.ts
cd packages/core && npx vitest run src/services/memoryPressureMonitor.test.ts src/config/config.test.ts src/services/fileReadCache.test.ts src/core/coreToolScheduler.test.ts
npm run lint
npm run build
npm run typecheck
git diff --check

Observed locally:

Focused memory/config tests: 237/237 passed.
Related core tests: 451/451 passed.
Prettier, targeted ESLint, root lint, root typecheck, clean root build, and diff whitespace checks passed.
npm run build still prints existing VS Code companion curly-rule warnings and Browserslist freshness warnings; these are outside this PR.

GitHub Actions on 55fb90be9:

Classify PR: passed.
Lint: passed.
Tests on macOS, Ubuntu, and Windows with Node 22.x: passed.
CodeQL: passed.
Post Coverage Comment: skipped by workflow configuration.

Quickest reviewer verification path:

cd packages/core && npx vitest run src/services/memoryPressureMonitor.test.ts src/config/config.test.ts src/services/fileReadCache.test.ts src/core/coreToolScheduler.test.ts

Scope and Risk

The cleanup is intentionally limited to file-read cache metadata and optional explicit GC when configured.
Under sustained pressure, inactive file-read metadata may be evicted, which can reduce cache hit rate but should not drop conversation state or disconnect an active AI session.
Container behavior is covered by unit tests with mocked cgroup v1/v2 files. No live Docker, Podman, or Linux container smoke test was run locally.
No breaking changes. All new environment variables are optional and default to conservative behavior.

Linked Issues

Related to long-session memory pressure and OOM prevention.
Related discussion: [P3] Memory Diagnostics / 内存诊断 #3000.

wenshao · 2026-05-21T15:35:51Z

+   */
+  getPressureLevel(): 'normal' | 'soft' | 'hard' | 'critical' {
+    const mem = process.memoryUsage();
+    const rssRatio = mem.rss / this.effectiveMemoryLimit;


[Critical] Division by zero when effectiveMemoryLimit is 0.

heapSizeLimit on the next line has an explicit > 0 guard, but effectiveMemoryLimit does not. If os.totalmem() returns 0 (possible in some containerized or unusual environments) and both cgroup reads return undefined, then computeEffectiveMemoryLimit() returns 0. This makes rssRatio = Infinity, which always evaluates to 'critical' — causing the monitor to aggressively clear FileReadCache every cleanupCooldownMs (default 5 s), degrading performance with no diagnostic output explaining why.

Suggested change

const rssRatio = mem.rss / this.effectiveMemoryLimit;

const rssRatio =

this.effectiveMemoryLimit > 0

? mem.rss / this.effectiveMemoryLimit

: 0;

— qwen-latest-series-invite-beta-v36 via Qwen Code /review

wenshao · 2026-05-21T15:35:51Z

+
+    if (freedRatio < 0.01) {
+      this.consecutiveIneffectiveCleanups++;
+      if (this.consecutiveIneffectiveCleanups === 3) {


[Suggestion] The memory-cleanup-ineffective event fires only at exactly === 3, never again. By contrast, recordCleanupFailure (line 320) uses >= 3 and keeps emitting on every subsequent failure. If cleanup remains ineffective for 10+ consecutive cycles, only the 3rd emits an event — cycles 4, 5, … are silent. Downstream consumers lose visibility into ongoing memory pressure after the initial alert.

Consider changing to >= 3 to match the failure-path pattern, or emit at periodic thresholds (3, 6, 12…) to limit noise while maintaining visibility:

Suggested change

if (this.consecutiveIneffectiveCleanups === 3) {

if (this.consecutiveIneffectiveCleanups >= 3) {

— qwen-latest-series-invite-beta-v36 via Qwen Code /review

wenshao · 2026-05-21T15:35:51Z

+      if (limit > hostTotal) return undefined;
+
+      return limit;
+    } catch {


[Suggestion] All cgroup read errors are silently swallowed — permission denied, corrupt file content, unexpected format all fall back to hostTotal with no log. If a container has a misconfigured memory limit, the monitor falls back to the host total (potentially much larger), never triggers cleanup, and the container gets OOM-killed while the monitor logs show "normal pressure."

Consider adding a debugLogger.warn(...) when returning undefined for unexpected values, including the file path and raw content:

Suggested change

} catch {

} catch {

debugLogger.debug(`Failed to read cgroup memory limit from ${filePath}`);

return undefined;

— qwen-latest-series-invite-beta-v36 via Qwen Code /review

wenshao · 2026-05-21T15:35:51Z

+        debugLogger.debug(`FileReadCache stale eviction: ${evicted} entries`);
+        break;
+      }
+      case 'trigger_gc': {


[Suggestion] When enableExplicitGC is true (via QWEN_MEMORY_ENABLE_GC=1) but the process wasn't started with --expose-gc, this step silently no-ops with no log or warning. An operator debugging critical-level OOM could see enableExplicitGC=true in config and assume GC is running when it isn't.

Suggested change

case 'trigger_gc': {

case 'trigger_gc': {

if (typeof global.gc === 'function') {

const before = process.memoryUsage().rss;

global.gc();

const after = process.memoryUsage().rss;

debugLogger.debug(`global.gc() freed ${before - after} bytes`);

} else {

debugLogger.warn(

'trigger_gc requested but global.gc is not available; ' +

'start Node.js with --expose-gc',

);

}

break;

}

— qwen-latest-series-invite-beta-v36 via Qwen Code /review

ZevGit · 2026-05-22T03:19:20Z

Update pushed in 33a7353.

Addressed the memory pressure monitor review feedback:

Hardened cgroup memory limit parsing and fallback observability.
Added defensive handling for memoryUsage failures and queued cleanup startup failures.
Added exhaustive switch guards for cleanup recommendations, cleanup steps, and action ranking.
Documented the Object.create child Config monitor isolation contract.

Verification:

Focused core tests: 443/443 passed
Prettier / ESLint / git diff --check passed
npm run typecheck passed
npm run lint passed
Clean npm run build passed
GitHub CI is green across lint, macOS/Ubuntu/Windows tests, and CodeQL.

ZevGit · 2026-05-27T03:13:51Z

Quick follow-up on this PR.

Given the discussion in #3000 and the remaining #4181-#4184 roadmap, I wanted to check whether maintainers still want this PR reviewed as a standalone low-overhead runtime mitigation.

This PR does not try to replace /doctor memory, heap snapshots, memory timeline diagnostics, or bounded tool-output offload. Its scope is narrower: detect RSS/heap pressure during normal tool execution and conservatively release FileReadCache metadata under pressure.

If the preferred direction is to fold this into #4184 or avoid a separate memory-pressure monitor, I am happy to close this PR to reduce maintainer overhead. Otherwise, the branch is currently green and ready for re-review.

wenshao · 2026-05-27T03:57:21Z

+      ) {
+        this.queuedCleanupRecommendation = recommendation;
+        debugLogger.debug(
+          `Queued escalated cleanup "${recommendation.action}" while ` +


[Critical] Queued cleanup recommendation can be silently downgraded by a lower-ranked arrival.

The comparison on line 248 checks recommendation.action > this.activeCleanupAction but does NOT compare against the already-queued this.queuedCleanupRecommendation. Consider this sequence:

aggressive (rank 3) cleanup starts → activeCleanupAction = 'aggressive'

Between await Promise.resolve() boundaries, light (rank 1) arrives → 1 > 3 is false → skipped ✅

But:

light (rank 1) cleanup starts → activeCleanupAction = 'light'

aggressive (rank 3) arrives → 3 > 1 is true → queued ✅

moderate (rank 2) arrives → 2 > 1 is true → overwrites the queued aggressive with moderate ❌

The most aggressive needed cleanup is silently replaced by a weaker one.

Suggested change

`Queued escalated cleanup "${recommendation.action}" while ` +

if (this.cleanupInProgress) {

if (

cleanupActionRank(recommendation.action) >

cleanupActionRank(this.activeCleanupAction)

) {

const queuedRank = this.queuedCleanupRecommendation

? cleanupActionRank(this.queuedCleanupRecommendation.action)

: -1;

if (cleanupActionRank(recommendation.action) > queuedRank) {

this.queuedCleanupRecommendation = recommendation;

debugLogger.debug(

`Queued escalated cleanup "${recommendation.action}" while ` +

`"${this.activeCleanupAction}" is in progress`,

);

}

} else {

debugLogger.debug('Cleanup already in progress, skipping');

}

return;

}

— qwen3.7-max via Qwen Code /review

wenshao · 2026-05-27T03:57:21Z

+    this.config = { ...(pressureConfig ?? DEFAULT_PRESSURE_CONFIG) };
+    validateMemoryPressureConfig(this.config);
+    this.effectiveMemoryLimit = this.computeEffectiveMemoryLimit();
+    this.heapSizeLimit = getHeapStatistics().heap_size_limit;


[Critical] heapSizeLimit is cached once at construction and never refreshed.

V8 dynamically adjusts heap_size_limit over the lifetime of a long-running session (e.g., on a 16 GB machine the initial limit might be ~1.5 GB, growing to 4+ GB as the old generation expands). A stale, too-small denominator inflates heapRatio = mem.heapUsed / this.heapSizeLimit, causing early critical cleanup — even when V8 could grow the heap much further before actually running out. Since ratio = Math.max(rssRatio, heapRatio), the stale heap ratio becomes the dominant signal.

getHeapStatistics() is a cheap V8 intrinsic (no I/O, no GC trigger). Refresh on each call:

Suggested change

this.heapSizeLimit = getHeapStatistics().heap_size_limit;

this.effectiveMemoryLimit = this.computeEffectiveMemoryLimit();

debugLogger.info(

Then in getPressureLevel():

const heapStats = getHeapStatistics(); const heapRatio = heapStats.heap_size_limit > 0 ? mem.heapUsed / heapStats.heap_size_limit : 0;

— qwen3.7-max via Qwen Code /review

wenshao · 2026-05-27T03:57:21Z

+        this.consecutiveCleanupFailures = 0;
+        const memAfter = process.memoryUsage().rss;
+        // RSS can lag behind object graph cleanup, so this is diagnostic
+        // logging only and does not drive failure detection.


[Critical] Comment and code contradict each other — RSS measurement drives failure detection.

The comment on lines 268–269 says: "RSS can lag behind object graph cleanup, so this is diagnostic logging only and does not drive failure detection."

This is factually wrong: logCleanupResult (invoked via setImmediate using the memAfter captured on line 267) increments consecutiveIneffectiveCleanups and emits memory-cleanup-ineffective events when freedRatio < 0.01. The measurement directly drives both counter state and user-visible events.

Additionally, memAfter is captured in .then() before setImmediate fires — but in the default path (enableExplicitGC: false), cleanup only drops FileReadCache Map references. V8 has not yet GC'd the orphaned entries, so RSS is essentially unchanged and freedRatio < 0.01 fires on virtually every cleanup, generating false memory-cleanup-ineffective events.

Move the RSS measurement inside setImmediate so it benefits from the deferred timing:

Suggested change

// logging only and does not drive failure detection.

.then(() => {

this.consecutiveCleanupFailures = 0;

setImmediate(() => {

try {

const memAfter = process.memoryUsage().rss;

this.logCleanupResult(memBefore, memAfter, recommendation);

} catch (err) {

debugLogger.error(

`Cleanup measurement failed: ${getErrorMessage(err)}`,

);

}

});

})

— qwen3.7-max via Qwen Code /review

wenshao · 2026-05-27T03:57:21Z

+
+      const limit = Number(raw);
+      if (!Number.isFinite(limit) || limit <= 0) {
+        debugLogger.warn(


[Critical] No lower bound on cgroup memory limit — a value of "1" (1 byte) passes all validation checks.

A cgroup value like "1" passes Number.isFinite, > 0, Number.isSafeInteger, and <= hostTotal — producing rssRatio = rss / 1 which is astronomically high (~10 billion), triggering aggressive cleanup after every single tool execution. If enableExplicitGC is on with --expose-gc, this becomes a synchronous GC after every tool call — complete denial of service.

No realistic container/VM has a memory limit below a few hundred MB. Add a sensible floor:

Suggested change

debugLogger.warn(

const limit = Number(raw);

if (!Number.isFinite(limit) || limit <= 0) {

debugLogger.warn(

`Ignoring out-of-range cgroup memory limit from ${filePath}: ${raw}`,

);

return undefined;

}

// Reject unrealistically small limits (below 64 MiB).

const MIN_CGROUP_MEMORY_LIMIT = 64 * 1024 * 1024;

if (limit < MIN_CGROUP_MEMORY_LIMIT) {

debugLogger.warn(

`Ignoring unrealistically small cgroup memory limit from ${filePath}: ${raw}`,

);

return undefined;

}

— qwen3.7-max via Qwen Code /review

wenshao · 2026-05-27T03:57:21Z

+      cleanupActionRank(recommendation.action) >
+      cleanupActionRank(this.lastCleanupAction);
+    if (
+      !isEscalation &&


[Suggestion] Cooldown blocking branch has no test verifying it actually blocks.

All tests that set cleanupCooldownMs to a nonzero value (e.g., 60000) exercise the isEscalation bypass branch — they never test that a same-level or lower-level cleanup is actually suppressed within the cooldown window. All other tests set cleanupCooldownMs: 0, making the guard always pass.

The cooldown is the primary throttling mechanism. If the comparison logic were inverted, the test suite would not detect it. Consider adding:

it('blocks same-level cleanup within the cooldown window', async () => { const evictSpy = vi.fn().mockReturnValue(0); const monitor = new MemoryPressureMonitor( createMockConfig({ fileReadCache: { evictNotAccessedSince: evictSpy } }), { ...DEFAULT_PRESSURE_CONFIG, cleanupCooldownMs: 60_000 }, ); setMemUsage(9 * 1024 * 1024 * 1024); // soft pressure monitor.performCheck(); await drainCleanupMeasurement(); expect(evictSpy).toHaveBeenCalledTimes(1); // Second check at same level within cooldown → should be a no-op monitor.performCheck(); await drainCleanupMeasurement(); expect(evictSpy).toHaveBeenCalledTimes(1); // still 1 });

— qwen3.7-max via Qwen Code /review

wenshao · 2026-05-27T03:57:21Z

+    this.consecutiveIneffectiveCleanups = 0;
+  }
+
+  private recordCleanupFailure(


[Suggestion] The consecutiveIneffectiveCleanups = 0 reset on the effective-cleanup branch is untested.

Tests verify the counter increments (when RSS drops < 1%) and can be reset via the public resetConsecutiveFailures(). But no test verifies that an effective cleanup (RSS drops ≥ 1%) automatically resets the counter. If this line were accidentally removed, the monitor would emit spurious memory-cleanup-ineffective events after just 3 total ineffective cleanups spread across effective ones — breaking the consecutive-invariant.

Consider adding a test that interleaves effective and ineffective cleanups and asserts the counter resets.

— qwen3.7-max via Qwen Code /review

wenshao · 2026-05-27T03:57:21Z

+
+    void this.runCleanupSteps(recommendation.steps)
+      .then(() => {
+        this.consecutiveCleanupFailures = 0;


[Suggestion] The consecutiveCleanupFailures = 0 auto-reset after successful cleanup is untested in a failure-then-success scenario.

Tests verify failure counting and the public resetConsecutiveFailures() method, but no test runs failures followed by a successful cleanup and asserts the counter returns to zero. If this reset were removed, the counter would monotonically increase, eventually emitting memory-cleanup-failed events even after recovery.

Consider adding:

it('resets consecutive failures after a successful cleanup', async () => { const evictSpy = vi.fn().mockImplementation(() => { throw new Error('fail'); }); const monitor = new MemoryPressureMonitor( createMockConfig({ fileReadCache: { evictNotAccessedSince: evictSpy } }), { ...DEFAULT_PRESSURE_CONFIG, cleanupCooldownMs: 0 }, ); setMemUsage(9 * 1024 * 1024 * 1024); monitor.performCheck(); await drainCleanupMeasurement(); expect(monitor.getConsecutiveFailures()).toBe(1); evictSpy.mockReturnValue(0); // now succeed monitor.performCheck(); await drainCleanupMeasurement(); expect(monitor.getConsecutiveFailures()).toBe(0); });

— qwen3.7-max via Qwen Code /review

wenshao · 2026-05-27T06:57:53Z

+    return this.consecutiveCleanupFailures;
+  }
+
+  resetConsecutiveFailures(): void {


[Critical] resetConsecutiveFailures() only resets counters — it does not cancel in-flight cleanup. startNewSession() (config.ts:1928-1929) calls getFileReadCache().clear() then resetConsecutiveFailures(), but if cleanup is in-flight at that moment, the async resolution (.then → setImmediate → clear_file_cache) fires against the FileReadCache after startNewSession has cleared it and the new session may have started re-populating it.

The same applies to a queued escalation in queuedCleanupRecommendation — it is drained by finishCleanupAndRunQueued after the session boundary.

Suggested change

resetConsecutiveFailures(): void {

resetConsecutiveFailures(): void {

this.consecutiveCleanupFailures = 0;

this.consecutiveIneffectiveCleanups = 0;

// Cancel any in-flight cleanup so its async tail cannot wipe the

// new session's freshly-populated FileReadCache.

this.cleanupInProgress = false;

this.activeCleanupAction = 'none';

this.queuedCleanupRecommendation = undefined;

}

A generation counter (bumped here, checked in the .then/setImmediate chain) would be more robust if simply resetting state has side effects on the currently-running runCleanupSteps promise.

— qwen3.7-max via Qwen Code /review

wenshao · 2026-05-27T06:57:54Z

+    );
+
+    if (freedRatio < 0.01) {
+      this.consecutiveIneffectiveCleanups++;


[Suggestion] No adaptive backoff when cleanup is repeatedly ineffective. When freedRatio < 0.01 under critical pressure, this counter increments and shouldEmitRepeatedDiagnostic throttles the event — but the next performCheckInternal still triggers executeCleanup({ action: 'aggressive', steps: ['clear_file_cache', 'trigger_gc'] }) at full cooldown cadence (5 s).

If memory pressure comes from non-cache sources (chat history accumulation, native allocations), every 5-second cycle runs a stop-the-world global.gc() (if enabled) plus a full cache clear — neither of which helps — degrading responsiveness without reducing RSS. The memory-cleanup-ineffective event is fire-and-forget; nothing subscribes with corrective action.

Consider applying exponential backoff when consecutiveIneffectiveCleanups exceeds a threshold (e.g., cleanupCooldownMs * min(2^k, 64)) or temporarily suspending the aggressive tier until pressure drops below hardPressureRatio.

— qwen3.7-max via Qwen Code /review

ZevGit · 2026-05-27T09:12:08Z

Follow-up pushed in 55fb90b.

Addressed the latest review feedback from May 27:

Added session-boundary cleanup isolation with a generation guard so stale async cleanup tails and queued escalations cannot clear a newly-started session's FileReadCache.
Added adaptive backoff for repeated ineffective aggressive cleanup so critical pressure from non-cache sources does not keep clearing cache / running explicit GC at the base cooldown cadence.
Split the public reset API so normal diagnostic counter reset remains separate from session-scoped cleanup cancellation.
Added regression coverage for queued cleanup cancellation on session reset and aggressive cleanup backoff, plus tightened test spy isolation.

Verification after the change:

Focused memory/config tests: 237/237 passed
Related core tests: 451/451 passed
Prettier / targeted ESLint / git diff --check passed
npm run lint passed
npm run typecheck passed
clean npm run build passed
GitHub CI is green across Lint, macOS/Ubuntu/Windows tests, and CodeQL.

wenshao

LGTM ✅ — Third commit comprehensively addresses all prior-round Critical and Suggestion findings. Generation-based cleanup isolation, adaptive backoff, and expanded test coverage (451 tests pass) are well-implemented. CI all_pass 8/8. — qwen3.7-max via Qwen Code /review

wenshao · 2026-05-27T12:06:34Z

Maintainer Verification Report

PR: #4403 — feat(core): add memory pressure monitor
Commit: 55fb90be9 (HEAD of feat/memory-pressure-monitor)
Verified by: wenshao
Date: 2026-05-27

Local Build & Test Results

Check	Result	Notes
`npm run build`	✅ Pass	Clean build after `dist/` cleanup. Pre-existing vscode-ide-companion curly-brace warnings (not from this PR)
`npm run lint`	✅ Pass	No lint errors
`npm run typecheck`	⚠️ Pre-existing failure	TS2739 in `packages/cli/src/serve/httpAcpBridge.ts` — not touched by this PR
Targeted ESLint (PR files)	✅ Pass	All 8 PR-changed files pass ESLint
`git diff --check`	✅ Pass	No whitespace issues
Focused tests (4 suites)	✅ 451/451 passed	`memoryPressureMonitor`, `config`, `fileReadCache`, `coreToolScheduler`
Full `packages/core` tests	✅ 9264/9265 passed (1 pre-existing failure)	The single failure is in `anthropicContentGenerator.test.ts` (User-Agent string mismatch `claude-cli` vs `QwenCode`) — not touched by this PR

GitHub Actions CI (on `55fb90be9`)

Job	Status
Classify PR	✅ Pass
Lint	✅ Pass
CodeQL	✅ Pass
Test (macOS, Node 22.x)	✅ Pass
Test (Ubuntu, Node 22.x)	✅ Pass
Test (Windows, Node 22.x)	✅ Pass

Code Review Summary

Architecture & Design: Sound. The monitor fires reactively after tool execution (no background timer), uses dual pressure signals (RSS + V8 heap), and applies graduated escalation (soft → hard → critical). Generation-based cancellation correctly handles session resets. queueMicrotask coalescing for concurrent tool completions is the right approach.

Code Quality: High quality, no bugs found. Cleanup pipeline ordering is correct (executeCleanup → runCleanupSteps → setImmediate → finishCleanupAndRunQueued). Map deletion during iteration is spec-compliant. assertNever exhaustiveness guards on switch defaults. Error isolation via emitSafely prevents listener crashes from breaking the cleanup pipeline.

Integration: Minimal and clean.

coreToolScheduler.ts: 6 lines — scheduleCheck() in the finally block of executeSingleToolCall
config.ts: 90 lines — mirrors existing getFileReadCache() pattern with prototype inheritance
fileReadCache.ts: 30 lines — evictNotAccessedSince() method

Configuration: Well-designed. Strict Number() parsing (rejects partial parses like "0.8extra"), empty-string guards, ordering validation (soft < hard < critical), range bounds [0.3, 0.98], visible stderr warnings on misconfiguration with safe fallback to defaults.

Test Coverage: Comprehensive — 1,233 lines of monitor tests + 197 config + 117 cache + 39 scheduler. Covers all pressure levels, cgroup v1/v2 parsing, escalation queuing, session-reset cancellation, cooldown blocking, failure accounting, ineffective cleanup diagnostics, GC available/unavailable paths, and env var edge cases.

Security: No concerns. Cgroup paths are hardcoded, no user-supplied paths. Memory info is not exposed externally. global.gc() gated behind explicit opt-in.

Performance: Minimal overhead. One process.memoryUsage() + getHeapStatistics() per tool call, cooldown-protected cleanup, exponential back-off for ineffective aggressive cleanups (max ~320s).

Minor Observations (non-blocking)

P3: evictNotAccessedSince skips entries with lastReadAt === undefined — structurally prevented today but not type-enforced. Consider treating undefined as "always stale."
P3: Hard throw in getMemoryPressureMonitor for missing config is fail-fast but could return undefined for defense-in-depth.
P4: No setMaxListeners on EventEmitter (default 10 — fine for typical usage, but diagnostic tooling with many listeners would trigger a warning).

Verdict

✅ Ready to merge. Well-designed, well-implemented, thoroughly tested. All issues are P3/P4 non-blockers. The feature is opt-in and has zero impact on existing functionality when disabled.

Verified locally with tmux parallel execution (build + typecheck + lint + focused tests + full core test suite).

yiliang114

LGTM — confirmed no conflict with current main.

Context: the earlier heap-pressure compaction safety net (#4186) was removed in 924736d6b (2026-05-20), so there is currently no runtime memory pressure handling on main. This PR fills that gap with a conservative, side-effect-minimal approach:

Integration surface is one optional-chaining call in coreToolScheduler.ts finally block — worst case = no-op.
Cleanup only evicts cold FileReadCache entries; does not touch chat history, compaction, or session state.
cgroup v1/v2 awareness is useful for container deployments.
Session-boundary isolation + adaptive backoff are well-designed.

This aligns with the #3000 Memory Diagnostics roadmap (specifically the "check memory pressure" + "proactive memory management" goals) and partially addresses #4181 (interpretation layer) by exposing a getPressureLevel() API.

CI all green across 3 platforms. 237 focused tests + 451 related core tests pass.

Thanks @ZevGit for the thorough iteration across 3 review rounds.

#4654) * feat(core): auto-dump memory diagnostics to disk on pressure detection When the MemoryPressureMonitor (#4403) detects hard or critical pressure, write a lightweight diagnostics JSON to .qwen/<project>/diagnostics/ before running cleanup. The file survives even if a subsequent operation triggers OOM, giving maintainers actionable data from bug reports without requiring the user to manually run /doctor memory after a crash. Design follows Claude Code's heapDumpService approach: write the cheap JSON first (small write, won't OOM), heavy snapshot second. Diagnostics include process memory stats, V8 heap stats, session history size, and an actionable suggestion for the user. Per-session limits: max 3 dumps, 30s cooldown between dumps. Closes #4651 * ci: retrigger CI after Windows flaky failure * test(core): use path.join in memoryDiagnosticsDumper test for cross-platform The assertion hard-coded POSIX separators ('/tmp/test-project/diagnostics/'), which fails on Windows where path.join produces backslashes. Build the expected substring with path.join + path.sep so it matches the dumper's actual output on every platform. * fix(core): two-phase memory diagnostics write to survive OOM Two critical issues from review: 1. The async collectMemoryDiagnostics() runs before writeFileSync, but it spawns a `ps` subprocess and reads /proc — fork() under critical memory pressure can fail or be OOM-killed, leaving no file on disk despite the "cheap write first" design comment. 2. dumpCount and lastDumpTime were updated after the await, so concurrent dumps (e.g. hard→critical escalation) would both pass the cap/cooldown guards and overwrite each other. Fix: - Reserve the dump slot synchronously (++dumpCount, lastDumpTime) before any await, so concurrent calls correctly hit the cap. - Phase 1: synchronously write a minimal JSON (process.memoryUsage + v8.getHeapStatistics, no fork/exec) with collectionComplete=false. Because async functions execute synchronously up to the first await, this is guaranteed on disk before the caller's next statement runs. - Phase 2: enrich with full diagnostics asynchronously and overwrite the file with collectionComplete=true. If Phase 2 crashes, the minimal Phase 1 file still survives for debugging. Tests updated for the two-phase write and gain two new cases covering the sync-Phase-1 guarantee and the synchronous slot reservation. * fix(core): point memory diagnostics suggestion at /compress (the actual command) The suggestion text told users to run /compact, which does not exist in this repository — the actual command is /compress (see compressCommand.ts). Pointing users at a nonexistent slash command in a diagnostics report makes the suggestion unactionable.

ZevGit force-pushed the feat/memory-pressure-monitor branch 3 times, most recently from 49ba200 to 5be2c95 Compare May 21, 2026 12:10

yiliang114 mentioned this pull request May 21, 2026

[P3] Memory Diagnostics / 内存诊断 #3000

Open

wenshao requested changes May 21, 2026

View reviewed changes

feat(core): add memory pressure monitor

33a7353

ZevGit force-pushed the feat/memory-pressure-monitor branch from 5be2c95 to 33a7353 Compare May 22, 2026 02:55

wenshao requested changes May 27, 2026

View reviewed changes

fix(core): address memory pressure review

4074b44

wenshao requested changes May 27, 2026

View reviewed changes

fix(core): isolate session memory cleanup

55fb90b

wenshao approved these changes May 27, 2026

View reviewed changes

yiliang114 approved these changes May 31, 2026

View reviewed changes

yiliang114 merged commit 3fc1849 into QwenLM:main May 31, 2026
10 checks passed

This was referenced May 31, 2026

feat(core): auto-dump memory diagnostics to disk on pressure detection #4651

Closed

feat(core): auto-dump memory diagnostics to disk on pressure detection #4654

Merged

DragonnZhang mentioned this pull request Jun 8, 2026

fix(core): memory-based chat compression to prevent heap OOM #4127

Closed

github-actions Bot mentioned this pull request Jun 8, 2026

Upstream changes detected: ACP spec, Gemini CLI, Codex CLI (App Server Protocol), GitHub Copilot CLI, Qwen Code greenpolo/cc-multi-cli-plugin#7

Open

	if (this.consecutiveIneffectiveCleanups === 3) {
	if (this.consecutiveIneffectiveCleanups >= 3) {

-      case 'trigger_gc': {
+      case 'trigger_gc': {
+        if (typeof global.gc === 'function') {
+          const before = process.memoryUsage().rss;
+          global.gc();
+          const after = process.memoryUsage().rss;
+          debugLogger.debug(`global.gc() freed ${before - after} bytes`);
+        } else {
+          debugLogger.warn(
+            'trigger_gc requested but global.gc is not available; ' +
+              'start Node.js with --expose-gc',
+          );
+        }
+        break;
+      }

-          `Queued escalated cleanup "${recommendation.action}" while ` +
+    if (this.cleanupInProgress) {
+      if (
+        cleanupActionRank(recommendation.action) >
+        cleanupActionRank(this.activeCleanupAction)
+      ) {
+        const queuedRank = this.queuedCleanupRecommendation
+          ? cleanupActionRank(this.queuedCleanupRecommendation.action)
+          : -1;
+        if (cleanupActionRank(recommendation.action) > queuedRank) {
+          this.queuedCleanupRecommendation = recommendation;
+          debugLogger.debug(
+            `Queued escalated cleanup "${recommendation.action}" while ` +
+              `"${this.activeCleanupAction}" is in progress`,
+          );
+        }
+      } else {
+        debugLogger.debug('Cleanup already in progress, skipping');
+      }
+      return;
+    }

	this.heapSizeLimit = getHeapStatistics().heap_size_limit;
	this.effectiveMemoryLimit = this.computeEffectiveMemoryLimit();
	debugLogger.info(

-        // logging only and does not drive failure detection.
+      .then(() => {
+        this.consecutiveCleanupFailures = 0;
+        setImmediate(() => {
+          try {
+            const memAfter = process.memoryUsage().rss;
+            this.logCleanupResult(memBefore, memAfter, recommendation);
+          } catch (err) {
+            debugLogger.error(
+              `Cleanup measurement failed: ${getErrorMessage(err)}`,
+            );
+          }
+        });
+      })

-  resetConsecutiveFailures(): void {
+  resetConsecutiveFailures(): void {
+    this.consecutiveCleanupFailures = 0;
+    this.consecutiveIneffectiveCleanups = 0;
+    // Cancel any in-flight cleanup so its async tail cannot wipe the
+    // new session's freshly-populated FileReadCache.
+    this.cleanupInProgress = false;
+    this.activeCleanupAction = 'none';
+    this.queuedCleanupRecommendation = undefined;
+  }

Conversation

ZevGit commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reviewer Focus

Review Follow-up

Validation

Scope and Risk

Linked Issues

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZevGit commented May 22, 2026

Uh oh!

ZevGit commented May 27, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ZevGit commented May 27, 2026

Uh oh!

wenshao left a comment

Choose a reason for hiding this comment

Uh oh!

wenshao commented May 27, 2026

Maintainer Verification Report

Local Build & Test Results

GitHub Actions CI (on 55fb90be9)

Code Review Summary

Minor Observations (non-blocking)

Verdict

Uh oh!

yiliang114 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ZevGit commented May 21, 2026 •

edited

Loading

GitHub Actions CI (on `55fb90be9`)