feat(daemon): optimize ACP child lifecycle — skip relaunch, preheat, idle keep-alive#4751
Conversation
📋 Review SummaryThis PR implements significant performance optimizations for the Qwen Code daemon mode, focusing on ACP (Agent Client Protocol) child process lifecycle management. The changes include: (1) skipping unnecessary relaunch of ACP children, (2) pre-heating the ACP child at daemon boot, (3) configurable idle timeout for keeping the ACP child alive, and (4) container-aware memory allocation. A comprehensive benchmark test suite is also added. The implementation is well-structured with good backward compatibility, but there are several areas that need attention before merging. 🔍 General Feedback
🎯 Specific Feedback🔴 Critical
🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
There was a problem hiding this comment.
Pull request overview
This PR targets daemon cold-start latency and reconnection performance by optimizing the ACP child process lifecycle: avoiding unnecessary relaunch behavior, adding a daemon boot-time ACP preheat, and introducing an idle keep-alive option for the ACP channel. It also adds a gated benchmark/stress-test suite and a companion report to quantify improvements.
Changes:
- Add ACP channel
preheat()and invoke it after the daemon HTTP server starts listening to parallelize child startup with listener setup. - Introduce
--channel-idle-timeout-ms/ServeOptions.channelIdleTimeoutMsto keep the ACP child alive briefly after the last session closes. - Skip ACP grandchild relaunch by setting
QWEN_CODE_NO_RELAUNCH=trueand pass memory sizing flags directly when spawning the ACP child; add a gated daemon-vs-CLI benchmark suite + report.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/cli/src/serve/workspaceMemory.test.ts | Update bridge test stub to include preheat() method. |
| packages/cli/src/serve/workspaceAgents.test.ts | Update bridge test stub to include preheat() method. |
| packages/cli/src/serve/types.ts | Extend ServeOptions with channelIdleTimeoutMs. |
| packages/cli/src/serve/server.test.ts | Update fake bridge to include preheat(). |
| packages/cli/src/serve/runQwenServe.ts | Wire channelIdleTimeoutMs into bridge options and trigger fire-and-forget bridge.preheat() after listen. |
| packages/cli/src/serve/runQwenServe.test.ts | Update test bridge mock to include preheat(). |
| packages/cli/src/serve/acpHttp/transport.test.ts | Update FakeBridge to include preheat(). |
| packages/cli/src/commands/serve.ts | Add CLI flag --channel-idle-timeout-ms and forward into runQwenServe. |
| packages/acp-bridge/src/spawnChannel.ts | Add container-aware memory sizing + disable relaunch via env; pass --max-old-space-size directly to the ACP child spawn. |
| packages/acp-bridge/src/bridgeTypes.ts | Add preheat() to HttpAcpBridge interface. |
| packages/acp-bridge/src/bridgeOptions.ts | Add channelIdleTimeoutMs option to bridge configuration. |
| packages/acp-bridge/src/bridge.ts | Implement ACP idle keep-alive timer and preheat() method. |
| integration-tests/cli/qwen-daemon-vs-cli-benchmark.test.ts | Add gated POSIX benchmark/stress suite producing JSON/Markdown artifacts. |
| docs/e2e-tests/2026-06-03-daemon-vs-cli-benchmark-report.md | Add a benchmark report snapshot documenting baseline results and analysis. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Review fix summary (d0ad2c9)
Resolved 4/4 threads. |
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] No unit tests for the three core behavioral changes in this PR: (1) the idle timer lifecycle (arm on closeSession, cancel on ensureChannel, fire after timeout), (2) preheat() spawning and its shutdown guard, (3) getAcpMemoryArgs() branching logic. The benchmark test suite is gated behind QWEN_BENCHMARK_ENABLED=1 and measures performance thresholds, not behavioral correctness. Consider adding unit tests in bridge.test.ts using vi.useFakeTimers() to verify idle timer arming/cancellation/fire, and exporting getAcpMemoryArgs for direct testing of its boundary conditions.
Needs Human Review
- Possibly:
killWithLoglost the BkUyD rationale comment explaining whyisDying = truemust be set synchronously before the async kill. The behavior is preserved, but future maintainers may not understand the invariant. Consider adding a brief comment referencing the full BkUyD block at line ~813. (packages/acp-bridge/src/bridge.ts:789-793) - Possibly: When the idle timer fires and kills the channel, there's no stderr breadcrumb distinguishing it from an unexpected SIGTERM. Consider adding
writeStderrLine('qwen serve: idle timer expired, killing channel')before the kill. (packages/acp-bridge/src/bridge.tsstartIdleTimer callback) - Possibly:
--channel-idle-timeout-msusesdefault: 0while sibling timeout options (prompt-deadline-ms,writer-idle-timeout-ms) have no default (undefined when unset). This eliminates the distinction between "not configured" and "explicitly 0". (packages/cli/src/commands/serve.ts:172)
— qwen3.7-max via Qwen Code /review
Review fix summary (a65fb67) — wenshao round 1
Review-level request (unit tests): Acknowledged — will add in follow-up commit. Resolved 5/5 threads. |
Review fix summary (8e0a991) — wenshao review-level + Needs Human Review
|
Review fix summary (b6e9adf) — wenshao round 2
Resolved 6/6 threads. |
doudouOUC
left a comment
There was a problem hiding this comment.
Review note: GitHub does not allow me to request changes on my own PR, so this is submitted as a comment review. I still consider the first two items blocking before merge. The direction of removing the redundant ACP relaunch parent makes sense, but this PR currently mixes that process-count fix with behavior changes that need tightening.
wenshao
left a comment
There was a problem hiding this comment.
Needs Human Review
- Possibly:
bridge.test.ts:8389— timeout=0 test assertssessionCount === 0but never checks that the channel was actually killed (handle.killed). — qwen3.7-max via Qwen Code /review - Possibly:
bridge.tskillSession path — all idle timer tests exercisecloseSession;killSessionalso callsstartIdleTimerbut has no test coverage for the idle timer behavior. — qwen3.7-max via Qwen Code /review - Possibly:
bridge.ts:4519—preheat()checksshuttingDownbeforeensureChannel()but not after. If shutdown races with the channel spawn, the.catch()inrunQwenServe.ts:1203logs "will retry on first session" — misleading during shutdown. — qwen3.7-max via Qwen Code /review - Possibly:
spawnChannel.ts:16—cachedMemoryArgsmodule-level cache has no reset mechanism for tests;getAcpMemoryArgstests are order-dependent when run in the same vitest worker. — qwen3.7-max via Qwen Code /review
ℹ️ Informational: tsc --noEmit reports TS2741 at packages/cli/src/serve/server.test.ts:727 — Property 'executeShellCommand' is missing in type (FakeBridge mock). This is on a line not modified by this PR and may be a pre-existing type gap from the base branch. Not posted as inline because it falls outside the diff. — qwen3.7-max via Qwen Code /review
…idle keep-alive (#4748) Three optimizations to reduce daemon cold start latency and improve session throughput: 1. Skip unnecessary relaunchAppInChildProcess for ACP children by setting QWEN_CODE_NO_RELAUNCH=true, eliminating a redundant grandchild process spawn. Memory args (--max-old-space-size) are passed directly with container-aware cgroup detection. 2. Pre-spawn ACP child at daemon boot via bridge.preheat(), so the first session doesn't pay cold-start latency. Fire-and-forget with fallback to lazy spawn on failure. 3. Add --channel-idle-timeout-ms flag to keep ACP child alive after last session closes, avoiding cold restart on reconnect. Default 0 (immediate kill) preserves backward compatibility. Also adds daemon-vs-CLI benchmark test suite and report. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
- Fix preheat() immediately killing the preheated channel when channelIdleTimeoutMs=0 (default). Preheat now leaves the channel alive for the first session; idle timer is only armed by session close paths. - Cast process.constrainedMemory via typed intermediate to avoid tsc errors on @types/node versions without the declaration. - Add JSDoc to ServeOptions.channelIdleTimeoutMs. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…heat idle - Remove unused __dirname + fileURLToPath (TS6133) - Fix body?.code → body?.['code'] for index signature (TS4111) - Fix lastEventId: '0' → 0 type mismatch (TS2322) - Restore await semantics for channel kill in timeout=0 path - Preheat conditionally arms idle timer when channelIdleTimeoutMs > 0 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…ult, add unit tests - Add stderr breadcrumb before idle timer kills channel, distinguishing idle-timeout reap from unexpected SIGTERM/crash - Remove `default: 0` from --channel-idle-timeout-ms to match sibling options (prompt-deadline-ms, writer-idle-timeout-ms) that use undefined-when-unset - Export getAcpMemoryArgs for direct testing - Add unit tests: channelIdleTimeoutMs lifecycle (immediate kill, warm channel reuse during idle window), preheat (channel reuse, shutdown guard), getAcpMemoryArgs (boundary conditions, 16GB cap) 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…che memory args - Skip preheat when bridge is test-injected (deps.bridge) to avoid in-flight ensureChannel blocking test shutdown - Cache getAcpMemoryArgs() result — os.totalmem() and cgroup reads are constant for the daemon's lifetime - Correct preheat savings estimate in benchmark report (0-0.5s depending on session arrival timing, not 0.3-0.5s) 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Report moved to: https://alidocs.dingtalk.com/i/nodes/YMyQA2dXW7gYo6Mzc5nYdp7GWzlwrZgb 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
The -p mode startup profiler test runs 6 iterations of full CLI initialization (~20s each), exceeding the previous 105s timeout. Increase to 210s. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…wrapper bridge.preheat() is async, so synchronous throws are already wrapped in a rejected promise. The Promise.resolve().then() indirection added no safety and confused readers about what edge case it guarded. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…shold On Node 22+, the default V8 heap limit is ~4.2GB, not ~2GB. The previous `targetMB > 2048` check would set --max-old-space-size to a value lower than the default on 5-8GB hosts, causing a regression. Now compares against the actual V8 heap_size_limit via v8.getHeapStatistics(), matching the approach used by getNodeMemoryArgs() in gemini.tsx. Also adds --max-sessions 0 to warm session and memory baseline benchmark tests to prevent session_limit_exceeded on heavy mode. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
30fe5f1 to
9778dc3
Compare
…use assertions, boot validation - Add vi.useFakeTimers() tests verifying channel kill after idle expiry and timer cancellation on new session arrival - Add factory call counters to idle keep-alive and preheat tests to prove channel reuse (not respawn) - Add handle.killed assertion to immediate-kill test - Remove noisy stderr log on default timeout=0 path - Add channelIdleTimeoutMs boot-time validation in runQwenServe - Fix getAcpMemoryArgs test to not assume os.totalmem() matches process.constrainedMemory() in container environments - Update benchmark description to reflect preheat behavior
Address wenshao review round 3Pushed Fixed (wenshao)
Also fixed (self-review threads)
All 240 bridge tests + 20 spawnChannel tests pass. |
wenshao
left a comment
There was a problem hiding this comment.
[Suggestion] bridge.ts:1372 (doSpawn) — The doSpawn newSession-failure path still uses immediate kill (ci.isDying = true; await ci.channel.kill()) while closeSession/killSession were migrated to startIdleTimer. This is likely intentional (a broken channel shouldn't be kept warm), but the rationale is undocumented. A future maintainer might "fix" the inconsistency by migrating to startIdleTimer, inadvertently keeping a broken channel alive. Consider adding a one-line comment: // Intentionally NOT using startIdleTimer: a channel that failed newSession is potentially broken and must be fully replaced. — qwen3.7-max via Qwen Code /review
…eheat+idle test - Add context parameter to killChannelWithLog/startIdleTimer so kill-failure logs include the sessionId that triggered the kill - Add factoryCalls assertion to preheat "no-op after shutdown" test - Add preheat + channelIdleTimeoutMs interaction test (fake timers): preheat arms idle timer, first session cancels it, closeSession re-arms it, channel killed after expiry
Address wenshao review round 4Pushed Fixed
Pushed back
All 241 bridge tests pass. Resolved 5/5 threads. |
PR #4751 Local Verification ReportBranch: Changes Summary
Test Results
TypeCheck Analysis
Verdict✅ Ready to merge. All 261 unit tests pass across both test files. Lint clean. The 2 new typecheck errors are a known consequence of Verified by wenshao |
The idle timer callback captured the arming context (e.g. closeSession "abc123") instead of identifying the idle-timeout expiry as the cause.
Address wenshao review round 5Pushed Resolved 1/1 thread. 0 unresolved threads remain. |
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review
Summary
relaunchAppInChildProcessfor ACP children — eliminates redundant grandchild process spawn, passes--max-old-space-sizedirectly with container-aware cgroup detection viaprocess.constrainedMemory(), capped at 16GBbridge.preheat()— ACP child is ready before the first session arrives, reducing first-session latency by 0-0.5s depending on arrival timing. Fire-and-forget with fallback to lazy spawn on failure--channel-idle-timeout-msflag — keeps ACP child alive after last session closes, avoiding cold restart on reconnect (default: unset = immediate kill, backward compatible). Includes stderr breadcrumb distinguishing idle-timeout kills from unexpected exitsqwen-daemon-vs-cli-benchmark.test.ts) with startup profiler integration, process tree RSS measurement, stress tests (burst, throughput, churn, limit saturation, SSE flood), and resource profiling via/usr/bin/timegetAcpMemoryArgsboundary conditionsCloses #4748
Option comparison
This PR implements Option B. Option A is the smaller alternative if reviewers want to land only the process-count fix first.
--channel-idle-timeout-ms. This keeps the same two-process daemon + ACP worker shape after relaunch removal, but changes cold-start behavior and can keep an idle ACP process around for faster reconnects.Key differences:
Process diagrams
之前:


之后:
flowchart LR subgraph Current["Current before this PR"] CClient["HTTP client"] --> CDaemon["qwen serve daemon"] CDaemon --> CParent["qwen --acp relaunch parent"] CParent --> CWorker["qwen --acp worker"] CWorker --> CSessions["N ACP sessions"] end subgraph OptionB["This PR"] BClient["HTTP client"] --> BDaemon["qwen serve daemon"] BDaemon --> BWorker["preheated / kept-alive<br/>qwen --acp worker"] BWorker --> BSessions["N ACP sessions"] endTest plan
npm run build— no errorsnpx vitest run src/bridge.test.ts— unit tests for idle timer + preheat (220 tests pass)npx vitest run src/spawnChannel.test.ts— unit tests for getAcpMemoryArgs (20 tests pass)QWEN_BENCHMARK_ENABLED=1 npx vitest run integration-tests/cli/qwen-daemon-vs-cli-benchmark.test.ts— all 10 benchmark tests passnpx vitest run integration-tests/cli/qwen-serve-routes.test.ts— regression check (pre-existing capabilities test failure unrelated to this PR)--channel-idle-timeout-ms 10000keeps ACP alive after session close🤖 Generated with Qwen Code