Skip to content

feat(daemon): optimize ACP child lifecycle — skip relaunch, preheat, idle keep-alive#4751

Merged
doudouOUC merged 12 commits into
daemon_mode_b_mainfrom
feat/daemon_analyze
Jun 5, 2026
Merged

feat(daemon): optimize ACP child lifecycle — skip relaunch, preheat, idle keep-alive#4751
doudouOUC merged 12 commits into
daemon_mode_b_mainfrom
feat/daemon_analyze

Conversation

@doudouOUC

@doudouOUC doudouOUC commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Skip unnecessary relaunchAppInChildProcess for ACP children — eliminates redundant grandchild process spawn, passes --max-old-space-size directly with container-aware cgroup detection via process.constrainedMemory(), capped at 16GB
  • Pre-spawn ACP child at daemon boot via bridge.preheat() — ACP child is ready before the first session arrives, reducing first-session latency by 0-0.5s depending on arrival timing. Fire-and-forget with fallback to lazy spawn on failure
  • Add --channel-idle-timeout-ms flag — keeps ACP child alive after last session closes, avoiding cold restart on reconnect (default: unset = immediate kill, backward compatible). Includes stderr breadcrumb distinguishing idle-timeout kills from unexpected exits
  • Add daemon-vs-CLI benchmark test suite (qwen-daemon-vs-cli-benchmark.test.ts) with startup profiler integration, process tree RSS measurement, stress tests (burst, throughput, churn, limit saturation, SSE flood), and resource profiling via /usr/bin/time
  • Add unit tests for idle timer lifecycle, preheat shutdown guard, and getAcpMemoryArgs boundary conditions

Closes #4748

Option comparison

This PR implements Option B. Option A is the smaller alternative if reviewers want to land only the process-count fix first.

  • Option A: only skip ACP relaunch. The daemon spawns the actual ACP worker directly and passes the memory limit arguments itself. This removes the extra ACP relaunch parent process while keeping the daemon as the outer supervisor for channel cleanup and later respawn.
  • Option B: skip ACP relaunch, preheat the ACP channel at daemon boot, and optionally keep the channel alive after the last session via --channel-idle-timeout-ms. This keeps the same two-process daemon + ACP worker shape after relaunch removal, but changes cold-start behavior and can keep an idle ACP process around for faster reconnects.

Key differences:

  • Process count: Option A reduces an active daemon session from daemon + ACP relaunch parent + ACP worker to daemon + ACP worker. Option B does the same during active sessions, but preheat/idle keep-alive can also create or retain the ACP worker while the daemon is otherwise idle.
  • Startup latency: Option A removes the relaunch hop (~0.2-0.3s). Option B also makes ACP ready before the first session and can avoid reconnect cold starts when idle keep-alive is enabled.
  • Risk: Option A mainly needs coverage that ACP exit cleanup and respawn still work, and that memory sizing is preserved without relaunch. Option B additionally needs validation for preheat timing, idle timeout behavior, resource usage, and shutdown cleanup.

Process diagrams

之前:
image
之后:
image

flowchart LR
  subgraph Current["Current before this PR"]
    CClient["HTTP client"] --> CDaemon["qwen serve daemon"]
    CDaemon --> CParent["qwen --acp relaunch parent"]
    CParent --> CWorker["qwen --acp worker"]
    CWorker --> CSessions["N ACP sessions"]
  end

  subgraph OptionB["This PR"]
    BClient["HTTP client"] --> BDaemon["qwen serve daemon"]
    BDaemon --> BWorker["preheated / kept-alive<br/>qwen --acp worker"]
    BWorker --> BSessions["N ACP sessions"]
  end
Loading

Test plan

  • npm run build — no errors
  • npx vitest run src/bridge.test.ts — unit tests for idle timer + preheat (220 tests pass)
  • npx vitest run src/spawnChannel.test.ts — unit tests for getAcpMemoryArgs (20 tests pass)
  • QWEN_BENCHMARK_ENABLED=1 npx vitest run integration-tests/cli/qwen-daemon-vs-cli-benchmark.test.ts — all 10 benchmark tests pass
  • npx vitest run integration-tests/cli/qwen-serve-routes.test.ts — regression check (pre-existing capabilities test failure unrelated to this PR)
  • Verify --channel-idle-timeout-ms 10000 keeps ACP alive after session close
  • Verify daemon process tree is 2 processes (daemon + ACP worker) instead of 3

🤖 Generated with Qwen Code

Copilot AI review requested due to automatic review settings June 3, 2026 23:52
@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

📋 Review Summary

This PR implements significant performance optimizations for the Qwen Code daemon mode, focusing on ACP (Agent Client Protocol) child process lifecycle management. The changes include: (1) skipping unnecessary relaunch of ACP children, (2) pre-heating the ACP child at daemon boot, (3) configurable idle timeout for keeping the ACP child alive, and (4) container-aware memory allocation. A comprehensive benchmark test suite is also added. The implementation is well-structured with good backward compatibility, but there are several areas that need attention before merging.

🔍 General Feedback

  • Positive aspects:

    • The idle timer implementation is clean with proper .unref() to prevent blocking daemon exit
    • Good backward compatibility — default channelIdleTimeoutMs=0 preserves existing behavior
    • The preheat() method is fire-and-forget with proper error handling and fallback
    • Memory allocation using process.constrainedMemory() shows good container awareness
    • Comprehensive benchmark test suite covers multiple scenarios (burst, throughput, churn, SSE flood)
  • Architectural decisions:

    • Centralizing idle timer logic in startIdleTimer/cancelIdleTimer helpers is a good pattern
    • The killWithLog helper properly handles async kill errors without blocking
    • Setting QWEN_CODE_NO_RELAUNCH=true env var is a clean way to communicate the optimization to child processes
  • Concerns:

    • The memory calculation heuristic (50% of total/constrained memory, capped at 16GB) may be too aggressive for memory-constrained environments
    • The benchmark test file is substantial (1852 lines) but appears to be missing from the actual codebase (not found during review)
    • Some race conditions in timer cancellation need careful review

🎯 Specific Feedback

🔴 Critical

  • File: packages/acp-bridge/src/spawnChannel.ts:24 - The memory calculation Math.floor(totalMB * 0.5) with a 16GB cap could cause OOM issues on systems with limited memory. On a 2GB container, this would allocate 1GB to Node.js heap, leaving insufficient memory for the OS and other processes. Recommendation: Add a minimum threshold check and consider a more conservative default (e.g., 25% instead of 50%, or a lower absolute minimum like 512MB).

  • File: packages/acp-bridge/src/bridge.ts:804-807 - Race condition in startIdleTimer: The check if (ci.sessionIds.size === 0 && ci.pendingRestoreIds.size === 0) happens inside the timer callback, but between timer start and callback execution, a new session could be added. While this is partially mitigated by cancelIdleTimer() in ensureChannel(), there's a window where the timer could kill a channel that just received a session. Recommendation: Add a guard check inside the callback to verify the channel is still idle before killing.

🟡 High

  • File: packages/acp-bridge/src/bridge.ts:799-803 - The timeout clamping to 2_147_483_647 (max 32-bit signed int) is good, but the validation logic raw !== undefined && Number.isFinite(raw) && raw > 0 doesn't handle NaN explicitly (though Number.isFinite does reject it). Recommendation: Add explicit validation error for non-finite values to fail fast with a clear error message rather than silently defaulting to 0.

  • File: packages/cli/src/serve/runQwenServe.ts:1202-1209 - The preheat() call is fire-and-forget with only stderr logging on failure, but there's no metric/counter exposed for operators to monitor preheat failure rates. Recommendation: Consider adding a diagnostic counter or event that operators can monitor via /workspace/status or similar endpoints.

  • File: packages/acp-bridge/src/spawnChannel.ts:15-23 - The getAcpMemoryArgs() function uses process.constrainedMemory() which is only available in Node.js 16+. While the codebase requires Node 22+, the fallback to os.totalmem() when constrainedMemory() returns 0 could be problematic — a return value of 0 might mean "not constrained" OR "cgroup v1 without proper accounting". Recommendation: Add a comment clarifying this behavior or add telemetry to detect when the fallback is used.

🟢 Medium

  • File: packages/acp-bridge/src/bridge.ts:1020 - The cancelIdleTimer() call in ensureChannel() is placed before the isDying check. This is correct for the common case, but if channelInfo exists and is not dying, the timer is cancelled even though we're returning the existing channel. Recommendation: Add a comment explaining why this is correct (the timer cancellation is appropriate because we're about to use the channel).

  • File: packages/acp-bridge/src/bridge.ts:1128 - The channel.exited.then() callback cancels the idle timer, but this cleanup path is only triggered when the channel exits naturally. If the channel is killed via killWithLog(), the timer is already cancelled in startIdleTimer(). Recommendation: Add a comment to clarify the two kill paths and why both are covered.

  • File: packages/cli/src/commands/serve.ts:171-175 - The CLI option description for channel-idle-timeout-ms could benefit from an example use case. Recommendation: Add something like "Use --channel-idle-timeout-ms 60000 to keep ACP alive for 60 seconds after last session closes, useful for intermittent usage patterns."

  • File: integration-tests/cli/qwen-daemon-vs-cli-benchmark.test.ts - The benchmark test is comprehensive but the skip logic at lines 49-54 uses QWEN_BENCHMARK_ENABLED=1 which means it won't run in CI by default. Recommendation: Ensure there's documentation in the README or CONTRIBUTING guide about how to run these benchmarks locally.

🔵 Low

  • File: packages/acp-bridge/src/bridge.ts:789 - The killWithLog function uses writeStderrLine for error logging. Consider whether this should use the structured daemonLog.error() instead for consistency with other error paths in the file.

  • File: packages/acp-bridge/src/spawnChannel.ts:22 - The magic number 16_384 (16GB) for the memory cap is clear from context, but consider extracting it as a named constant like MAX_NODE_HEAP_MB for easier discovery and modification.

  • File: packages/cli/src/serve/runQwenServe.ts:1202-1209 - The preheat error message is quite long and includes the full error message. Consider truncating or hashing very long error messages to prevent log flooding.

  • File: docs/e2e-tests/2026-06-03-daemon-vs-cli-benchmark-report.md - The benchmark report is in Chinese. While this is acceptable for internal documentation, consider adding an English summary or translation for broader team accessibility.

✅ Highlights

  • Excellent test coverage: The benchmark test suite (qwen-daemon-vs-cli-benchmark.test.ts) is comprehensive, covering startup latency, memory baseline, burst stress, throughput, session churn, limit saturation, and SSE connection flood scenarios.

  • Good backward compatibility: The default channelIdleTimeoutMs=0 ensures existing deployments are unaffected, and operators can opt-in to the idle timeout behavior.

  • Container-aware memory management: Using process.constrainedMemory() demonstrates awareness of containerized deployment scenarios where cgroup memory limits may differ from host memory.

  • Proper timer handling: The use of .unref() on idle timers ensures the daemon will exit cleanly even with pending timers, preventing zombie processes.

  • Defensive error handling: The preheat() method gracefully handles failures without blocking daemon startup, logging errors while allowing fallback to lazy spawn.

  • Comprehensive documentation: The PR description clearly outlines the expected performance improvements (~0.5-0.8s cold start reduction, 61x throughput improvement in anchored mode) with specific test commands for verification.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets daemon cold-start latency and reconnection performance by optimizing the ACP child process lifecycle: avoiding unnecessary relaunch behavior, adding a daemon boot-time ACP preheat, and introducing an idle keep-alive option for the ACP channel. It also adds a gated benchmark/stress-test suite and a companion report to quantify improvements.

Changes:

  • Add ACP channel preheat() and invoke it after the daemon HTTP server starts listening to parallelize child startup with listener setup.
  • Introduce --channel-idle-timeout-ms / ServeOptions.channelIdleTimeoutMs to keep the ACP child alive briefly after the last session closes.
  • Skip ACP grandchild relaunch by setting QWEN_CODE_NO_RELAUNCH=true and pass memory sizing flags directly when spawning the ACP child; add a gated daemon-vs-CLI benchmark suite + report.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
packages/cli/src/serve/workspaceMemory.test.ts Update bridge test stub to include preheat() method.
packages/cli/src/serve/workspaceAgents.test.ts Update bridge test stub to include preheat() method.
packages/cli/src/serve/types.ts Extend ServeOptions with channelIdleTimeoutMs.
packages/cli/src/serve/server.test.ts Update fake bridge to include preheat().
packages/cli/src/serve/runQwenServe.ts Wire channelIdleTimeoutMs into bridge options and trigger fire-and-forget bridge.preheat() after listen.
packages/cli/src/serve/runQwenServe.test.ts Update test bridge mock to include preheat().
packages/cli/src/serve/acpHttp/transport.test.ts Update FakeBridge to include preheat().
packages/cli/src/commands/serve.ts Add CLI flag --channel-idle-timeout-ms and forward into runQwenServe.
packages/acp-bridge/src/spawnChannel.ts Add container-aware memory sizing + disable relaunch via env; pass --max-old-space-size directly to the ACP child spawn.
packages/acp-bridge/src/bridgeTypes.ts Add preheat() to HttpAcpBridge interface.
packages/acp-bridge/src/bridgeOptions.ts Add channelIdleTimeoutMs option to bridge configuration.
packages/acp-bridge/src/bridge.ts Implement ACP idle keep-alive timer and preheat() method.
integration-tests/cli/qwen-daemon-vs-cli-benchmark.test.ts Add gated POSIX benchmark/stress suite producing JSON/Markdown artifacts.
docs/e2e-tests/2026-06-03-daemon-vs-cli-benchmark-report.md Add a benchmark report snapshot documenting baseline results and analysis.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/acp-bridge/src/spawnChannel.ts Outdated
Comment thread packages/acp-bridge/src/bridge.ts
Comment thread packages/cli/src/serve/types.ts
Comment thread packages/cli/src/serve/types.ts
@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Review fix summary (d0ad2c9)

Thread Action Detail
spawnChannel.ts:24process.constrainedMemory TS types Fixed Cast via typed intermediate
bridge.ts:4514 — preheat kills channel immediately Fixed Removed startIdleTimer from preheat(); channel stays alive for first session
types.ts:161 — missing JSDoc Fixed Added doc comment
types.ts:161createServeApp doesn't forward option Pushed back createServeApp receives pre-constructed bridge, not options

Resolved 4/4 threads.

@doudouOUC doudouOUC requested review from chiga0 and wenshao June 4, 2026 00:34

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] No unit tests for the three core behavioral changes in this PR: (1) the idle timer lifecycle (arm on closeSession, cancel on ensureChannel, fire after timeout), (2) preheat() spawning and its shutdown guard, (3) getAcpMemoryArgs() branching logic. The benchmark test suite is gated behind QWEN_BENCHMARK_ENABLED=1 and measures performance thresholds, not behavioral correctness. Consider adding unit tests in bridge.test.ts using vi.useFakeTimers() to verify idle timer arming/cancellation/fire, and exporting getAcpMemoryArgs for direct testing of its boundary conditions.

Needs Human Review

  • Possibly: killWithLog lost the BkUyD rationale comment explaining why isDying = true must be set synchronously before the async kill. The behavior is preserved, but future maintainers may not understand the invariant. Consider adding a brief comment referencing the full BkUyD block at line ~813. (packages/acp-bridge/src/bridge.ts:789-793)
  • Possibly: When the idle timer fires and kills the channel, there's no stderr breadcrumb distinguishing it from an unexpected SIGTERM. Consider adding writeStderrLine('qwen serve: idle timer expired, killing channel') before the kill. (packages/acp-bridge/src/bridge.ts startIdleTimer callback)
  • Possibly: --channel-idle-timeout-ms uses default: 0 while sibling timeout options (prompt-deadline-ms, writer-idle-timeout-ms) have no default (undefined when unset). This eliminates the distinction between "not configured" and "explicitly 0". (packages/cli/src/commands/serve.ts:172)

— qwen3.7-max via Qwen Code /review

Comment thread integration-tests/cli/qwen-daemon-vs-cli-benchmark.test.ts Outdated
Comment thread integration-tests/cli/qwen-daemon-vs-cli-benchmark.test.ts Outdated
Comment thread integration-tests/cli/qwen-daemon-vs-cli-benchmark.test.ts Outdated
Comment thread packages/acp-bridge/src/bridge.ts Outdated
Comment thread packages/acp-bridge/src/bridge.ts
@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Review fix summary (a65fb67) — wenshao round 1

Thread Action Detail
benchmark.test.ts:29 — unused __dirname Fixed Removed declaration + fileURLToPath import
benchmark.test.ts:1364 — TS4111 index signature Fixed body?.codebody?.['code']
benchmark.test.ts:1433 — TS2322 type mismatch Fixed lastEventId: '0'0
bridge.ts:790 — fire-and-forget kill Fixed killChannelWithLog is now async; startIdleTimer awaits it in timeout=0 path
bridge.ts:4509 — preheat idle timer Fixed Arms idle timer when channelIdleTimeoutMs > 0; stays alive for first session when 0

Review-level request (unit tests): Acknowledged — will add in follow-up commit.

Resolved 5/5 threads.

Comment thread packages/cli/src/serve/runQwenServe.ts Outdated
Comment thread packages/acp-bridge/src/bridge.ts
Comment thread packages/acp-bridge/src/bridge.ts Outdated
Comment thread packages/acp-bridge/src/bridge.ts Outdated
Comment thread docs/e2e-tests/2026-06-03-daemon-vs-cli-benchmark-report.md Outdated
Comment thread packages/acp-bridge/src/spawnChannel.ts Outdated
@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Review fix summary (8e0a991) — wenshao review-level + Needs Human Review

Item Action Detail
[Suggestion] Unit tests Fixed Added tests in bridge.test.ts (idle timer lifecycle + preheat) and spawnChannel.test.ts (getAcpMemoryArgs boundary conditions)
Needs HR #1: BkUyD rationale comment on killChannelWithLog Not taking BkUyD invariant has 8 references in the file; isDying field JSDoc (line 190-216) is the canonical doc. Adding a 9th on a 3-line helper is redundant
Needs HR #2: stderr log before idle timer kill Fixed Added writeStderrLine distinguishing idle-timeout reap from immediate kill
Needs HR #3: remove default: 0 from --channel-idle-timeout-ms Fixed Removed to match sibling options pattern; resolvedChannelIdleTimeoutMs() treats undefined same as 0

@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Review fix summary (b6e9adf) — wenshao round 2

Thread Action Detail
runQwenServe.ts:1203 — preheat breaks tests Fixed Skip preheat when deps.bridge is injected (test path)
bridge.ts:4523 — preheat orphans with timeout=0 Pushed back Preheated channel is used by first session; daemon shutdown is the reclamation path
bridge.ts:818 — no log on idle timer kill Already fixed Added in 8e0a991 (previous round)
bridge.ts:791 — kill log missing session context Pushed back Timer fires with zero sessions; no session ID to reference
docs/..:190 — preheat savings overestimated Fixed Corrected to 0-0.5s depending on session arrival timing
spawnChannel.ts:14 — cache getAcpMemoryArgs Fixed Module-level cache added

Resolved 6/6 threads.

@doudouOUC doudouOUC left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review note: GitHub does not allow me to request changes on my own PR, so this is submitted as a comment review. I still consider the first two items blocking before merge. The direction of removing the redundant ACP relaunch parent makes sense, but this PR currently mixes that process-count fix with behavior changes that need tightening.

Comment thread packages/acp-bridge/src/spawnChannel.ts
Comment thread packages/cli/src/serve/runQwenServe.ts Outdated
Comment thread packages/acp-bridge/src/bridge.ts
Comment thread packages/acp-bridge/src/spawnChannel.test.ts Outdated
Comment thread integration-tests/cli/qwen-daemon-vs-cli-benchmark.test.ts Outdated
@doudouOUC doudouOUC requested a review from yiliang114 June 4, 2026 09:57

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs Human Review

  • Possibly: bridge.test.ts:8389 — timeout=0 test asserts sessionCount === 0 but never checks that the channel was actually killed (handle.killed). — qwen3.7-max via Qwen Code /review
  • Possibly: bridge.ts killSession path — all idle timer tests exercise closeSession; killSession also calls startIdleTimer but has no test coverage for the idle timer behavior. — qwen3.7-max via Qwen Code /review
  • Possibly: bridge.ts:4519preheat() checks shuttingDown before ensureChannel() but not after. If shutdown races with the channel spawn, the .catch() in runQwenServe.ts:1203 logs "will retry on first session" — misleading during shutdown. — qwen3.7-max via Qwen Code /review
  • Possibly: spawnChannel.ts:16cachedMemoryArgs module-level cache has no reset mechanism for tests; getAcpMemoryArgs tests are order-dependent when run in the same vitest worker. — qwen3.7-max via Qwen Code /review

ℹ️ Informational: tsc --noEmit reports TS2741 at packages/cli/src/serve/server.test.ts:727Property 'executeShellCommand' is missing in type (FakeBridge mock). This is on a line not modified by this PR and may be a pre-existing type gap from the base branch. Not posted as inline because it falls outside the diff. — qwen3.7-max via Qwen Code /review

Comment thread packages/acp-bridge/src/bridge.ts Outdated
Comment thread packages/acp-bridge/src/bridge.test.ts
Comment thread packages/acp-bridge/src/bridge.test.ts
Comment thread packages/acp-bridge/src/bridge.ts Outdated
Comment thread packages/cli/src/serve/runQwenServe.ts
Comment thread packages/acp-bridge/src/bridge.test.ts
doudouOUC added 9 commits June 4, 2026 21:40
…idle keep-alive (#4748)

Three optimizations to reduce daemon cold start latency and improve
session throughput:

1. Skip unnecessary relaunchAppInChildProcess for ACP children by
   setting QWEN_CODE_NO_RELAUNCH=true, eliminating a redundant
   grandchild process spawn. Memory args (--max-old-space-size)
   are passed directly with container-aware cgroup detection.

2. Pre-spawn ACP child at daemon boot via bridge.preheat(), so the
   first session doesn't pay cold-start latency. Fire-and-forget
   with fallback to lazy spawn on failure.

3. Add --channel-idle-timeout-ms flag to keep ACP child alive after
   last session closes, avoiding cold restart on reconnect. Default
   0 (immediate kill) preserves backward compatibility.

Also adds daemon-vs-CLI benchmark test suite and report.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
- Fix preheat() immediately killing the preheated channel when
  channelIdleTimeoutMs=0 (default). Preheat now leaves the channel
  alive for the first session; idle timer is only armed by session
  close paths.
- Cast process.constrainedMemory via typed intermediate to avoid
  tsc errors on @types/node versions without the declaration.
- Add JSDoc to ServeOptions.channelIdleTimeoutMs.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…heat idle

- Remove unused __dirname + fileURLToPath (TS6133)
- Fix body?.code → body?.['code'] for index signature (TS4111)
- Fix lastEventId: '0' → 0 type mismatch (TS2322)
- Restore await semantics for channel kill in timeout=0 path
- Preheat conditionally arms idle timer when channelIdleTimeoutMs > 0

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…ult, add unit tests

- Add stderr breadcrumb before idle timer kills channel, distinguishing
  idle-timeout reap from unexpected SIGTERM/crash
- Remove `default: 0` from --channel-idle-timeout-ms to match sibling
  options (prompt-deadline-ms, writer-idle-timeout-ms) that use
  undefined-when-unset
- Export getAcpMemoryArgs for direct testing
- Add unit tests: channelIdleTimeoutMs lifecycle (immediate kill,
  warm channel reuse during idle window), preheat (channel reuse,
  shutdown guard), getAcpMemoryArgs (boundary conditions, 16GB cap)

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…che memory args

- Skip preheat when bridge is test-injected (deps.bridge) to avoid
  in-flight ensureChannel blocking test shutdown
- Cache getAcpMemoryArgs() result — os.totalmem() and cgroup reads
  are constant for the daemon's lifetime
- Correct preheat savings estimate in benchmark report (0-0.5s
  depending on session arrival timing, not 0.3-0.5s)

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
The -p mode startup profiler test runs 6 iterations of full CLI
initialization (~20s each), exceeding the previous 105s timeout.
Increase to 210s.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…wrapper

bridge.preheat() is async, so synchronous throws are already wrapped
in a rejected promise. The Promise.resolve().then() indirection added
no safety and confused readers about what edge case it guarded.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
…shold

On Node 22+, the default V8 heap limit is ~4.2GB, not ~2GB. The
previous `targetMB > 2048` check would set --max-old-space-size to
a value lower than the default on 5-8GB hosts, causing a regression.

Now compares against the actual V8 heap_size_limit via
v8.getHeapStatistics(), matching the approach used by
getNodeMemoryArgs() in gemini.tsx.

Also adds --max-sessions 0 to warm session and memory baseline
benchmark tests to prevent session_limit_exceeded on heavy mode.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
@doudouOUC doudouOUC force-pushed the feat/daemon_analyze branch from 30fe5f1 to 9778dc3 Compare June 4, 2026 13:42
@doudouOUC doudouOUC requested a review from wenshao June 4, 2026 14:18
…use assertions, boot validation

- Add vi.useFakeTimers() tests verifying channel kill after idle expiry
  and timer cancellation on new session arrival
- Add factory call counters to idle keep-alive and preheat tests to
  prove channel reuse (not respawn)
- Add handle.killed assertion to immediate-kill test
- Remove noisy stderr log on default timeout=0 path
- Add channelIdleTimeoutMs boot-time validation in runQwenServe
- Fix getAcpMemoryArgs test to not assume os.totalmem() matches
  process.constrainedMemory() in container environments
- Update benchmark description to reflect preheat behavior
@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Address wenshao review round 3

Pushed 3985775dc addressing all 6 comments:

Fixed (wenshao)

# Level File Fix
1 Critical bridge.ts:870 Added 2 vi.useFakeTimers() tests: idle-expiry kill + timer cancellation on new session
2 Suggestion bridge.test.ts:8519 Added factoryCalls counter proving channel reuse (not respawn)
3 Suggestion bridge.test.ts:8545 Same factory counter for preheat reuse
4 Suggestion bridge.ts:858 Removed noisy writeStderrLine from default timeout=0 path
5 Suggestion runQwenServe.ts:766 Added boot-time validation: isFinite + isInteger + >= 0
6 Suggestion bridge.test.ts:8506 Added handle.killed assertion on immediate-kill test

Also fixed (self-review threads)

  • spawnChannel.test.ts: removed os.totalmem() dependency — test now works correctly in cgroup/container environments where constrainedMemory() differs
  • benchmark.test.ts: updated description to reflect preheat behavior instead of lazy-spawn

All 240 bridge tests + 20 spawnChannel tests pass.

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] bridge.ts:1372 (doSpawn) — The doSpawn newSession-failure path still uses immediate kill (ci.isDying = true; await ci.channel.kill()) while closeSession/killSession were migrated to startIdleTimer. This is likely intentional (a broken channel shouldn't be kept warm), but the rationale is undocumented. A future maintainer might "fix" the inconsistency by migrating to startIdleTimer, inadvertently keeping a broken channel alive. Consider adding a one-line comment: // Intentionally NOT using startIdleTimer: a channel that failed newSession is potentially broken and must be fully replaced. — qwen3.7-max via Qwen Code /review

Comment thread packages/acp-bridge/src/bridge.test.ts
Comment thread packages/acp-bridge/src/spawnChannel.test.ts
Comment thread packages/cli/src/serve/runQwenServe.ts
Comment thread packages/acp-bridge/src/bridge.ts Outdated
Comment thread packages/acp-bridge/src/bridge.ts
…eheat+idle test

- Add context parameter to killChannelWithLog/startIdleTimer so
  kill-failure logs include the sessionId that triggered the kill
- Add factoryCalls assertion to preheat "no-op after shutdown" test
- Add preheat + channelIdleTimeoutMs interaction test (fake timers):
  preheat arms idle timer, first session cancels it, closeSession
  re-arms it, channel killed after expiry
@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Address wenshao review round 4

Pushed 481b0d78f addressing 3/5 comments, pushed back on 2.

Fixed

# File Fix
1 bridge.test.ts:8646 Added factoryCalls === 0 assertion to preheat "no-op after shutdown" test
4 bridge.ts:2872 Added context param to killChannelWithLog/startIdleTimer — kill-failure logs now include sessionId
5 bridge.ts:4655 Added preheat + idle timeout interaction test (fake timers): preheat arms timer → session cancels → re-arms on close → kills on expiry

Pushed back

# File Reason
2 spawnChannel.test.ts:249 Smoke tests; _resetCachedMemoryArgs() export leaks implementation. The cache is a perf optimization, not a contract.
3 runQwenServe.ts:632 Inline validation in runQwenServe() — testing requires full mock server. Siblings (promptDeadlineMs/writerIdleTimeoutMs) also lack dedicated unit tests. resolvedChannelIdleTimeoutMs() clamps defensively regardless.

All 241 bridge tests pass. Resolved 5/5 threads.

@doudouOUC doudouOUC requested a review from wenshao June 4, 2026 15:27
@wenshao

wenshao commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

PR #4751 Local Verification Report

Branch: feat/daemon_analyze | Commits: 11 (77ad5fb3481b0d78) | Base: daemon_mode_b_main

Changes Summary

  • packages/acp-bridge/src/bridge.ts (+68 −19) — Preheat channel at daemon boot, idle timer lifecycle (arm/cancel/kill), killChannelWithLog with context, QWEN_CODE_NO_RELAUNCH=true skip
  • packages/acp-bridge/src/spawnChannel.ts (+25 −1) — getAcpMemoryArgs() with V8 heap limit comparison, cgroup-aware detection, 16GB cap
  • packages/acp-bridge/src/bridgeOptions.ts (+8) — channelIdleTimeoutMs option
  • packages/acp-bridge/src/bridgeTypes.ts (+7) — preheat() method declaration
  • packages/cli/src/commands/serve.ts (+10) — --channel-idle-timeout-ms CLI flag
  • packages/cli/src/serve/runQwenServe.ts (+24) — Boot-time validation + preheat invocation
  • packages/cli/src/serve/types.ts (+2) — ServeOptions.channelIdleTimeoutMs
  • packages/acp-bridge/src/bridge.test.ts (+195) — Idle timer, preheat, fake-timer tests
  • packages/acp-bridge/src/spawnChannel.test.ts (+28 −1) — getAcpMemoryArgs boundary tests
  • integration-tests/cli/qwen-daemon-vs-cli-benchmark.test.ts (+1852) — Full benchmark suite (gated by QWEN_BENCHMARK_ENABLED)
  • 5 test files: +1 line each (test-injected bridge skip preheat)

Test Results

Check Result Details
bridge.test.ts ✅ PASS 241/241 tests passed (includes idle timer, preheat, fake-timer tests)
spawnChannel.test.ts ✅ PASS 20/20 tests passed (includes getAcpMemoryArgs boundary + 16GB cap)
ESLint ✅ PASS 0 warnings, 0 errors (all 7 changed source files)
TypeCheck (acp-bridge) 🟡 Pre-existing 2 errors: DAEMON_TRACEPARENT_META_KEY/DAEMON_TRACESTATE_META_KEYsame on base branch
TypeCheck (cli) 🟡 +2 expected 62 errors (base has 60). +2 are bridge.preheat not on HttpAcpBridge type — same root cause as the pre-existing httpAcpBridge.ts:2449 gap (concrete type missing properties declared in bridgeTypes.ts)
Whitespace ✅ PASS git diff --check clean

TypeCheck Analysis

  • Pre-existing on base (60 CLI + 2 acp-bridge): httpAcpBridge.ts:2449 missing properties, DAEMON_*_META_KEY not exported, PostToolBatch, IMAGE_* FinishReason, ink/dom, btwCommand, insightCommand, etc.
  • +2 from this PR: runQwenServe.ts:1218preheat() is correctly declared in bridgeTypes.ts:609 but HttpAcpBridge concrete class (on base) is already out of sync (the same TS2740 error at line 2449 proves the type is stale). Will resolve when the base type catches up.

Verdict

Ready to merge. All 261 unit tests pass across both test files. Lint clean. The 2 new typecheck errors are a known consequence of HttpAcpBridge's stale type (already 7+ methods behind per the pre-existing TS2740 error on base). The implementation is well-tested with fake-timer coverage for idle lifecycle, preheat reuse/shutdown-guard assertions, and memory-args boundary conditions.


Verified by wenshao

Comment thread packages/acp-bridge/src/bridge.ts
The idle timer callback captured the arming context (e.g. closeSession
"abc123") instead of identifying the idle-timeout expiry as the cause.
@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Address wenshao review round 5

Pushed f018b5cf1: idle timer callback now uses 'idle timeout' as kill-log context instead of the stale arming context (e.g. closeSession "abc123"). The immediate-kill path (timeoutMs <= 0) still correctly uses the original session context.

Resolved 1/1 thread. 0 unresolved threads remain.

@doudouOUC doudouOUC requested a review from wenshao June 4, 2026 16:32

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review

@doudouOUC doudouOUC merged commit 92b97b8 into daemon_mode_b_main Jun 5, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants