Commit 0788ed7
* test(perf): add daemon baseline harness (#4175 Wave 1 PR 1)
First implementation PR of the Mode B v0.16 rollout (issue #4175 Wave 1
PR 1). Captures reference performance metrics for the `qwen serve`
daemon so subsequent Mode B PRs (M2 MCP shared pool, M3 architecture
refactor, M4 multi-client safety) can be measured against a known
baseline rather than guessed-at numbers.
## What it captures
The new `integration-tests/cli/qwen-serve-baseline.test.ts` runs five
describe blocks against a real `qwen serve` daemon:
- RSS scaling across 1 / 5 / 10 same-workspace `createOrAttachSession`
calls (sampled via `ps -o rss=`).
- Same-workspace attach latency for the 2nd and 5th attach.
- MCP child amplification with two configured idle-mcp servers,
measured via two-level `pgrep -P` walk (daemon → ACP child → MCP
grandchildren).
- SSE backpressure invariants exercised at the unit layer by
instantiating `EventBus` directly: queue overflow → synthetic
`client_evicted` frame; replay across reconnect honors
`lastEventId` up to ring size.
- Prompt p50 / p99 (skipped when `QWEN_TEST_MODEL_KEY` is unset, with
an explicit reason recorded in the snapshot).
Each run writes a structured JSON snapshot to
`<INTEGRATION_TEST_FILE_DIR>/perf-baseline.json` plus a Markdown
summary, with `gitCommit` / platform / config preserved for cross-PR
correlation.
## Honest documentation of current limits
The captured snapshot includes a `notes` field flagging that with the
default `sessionScope: 'single'`, N successive
`createOrAttachSession` calls return the same sessionId — so the RSS
and MCP metrics here measure "N attaches to one shared session", not
"N distinct sessions". Once Wave 2 PR 5 lands per-request
`sessionScope: 'thread'` override, the harness will be updated to
optionally force distinct sessions and surface the P1 MCP N×M
amplification before M2 fixes it.
## Reused / new
Reused: existing daemon spawn pattern from `qwen-serve-routes.test.ts`
(port-0 + stdout regex + SIGTERM teardown), `pgrep -P` pattern from
`qwen-serve-streaming.test.ts:144`, `EventBus` invariants from
`eventBus.test.ts`, `DaemonClient` SDK, integration-tests
`globalSetup.ts` env var conventions.
New (this PR):
- `integration-tests/cli/_daemon-harness.ts` (~280 lines) — extracts
the inline daemon spawn pattern into a shared helper plus adds
`getRssMB`, `startRssPolling`, `countDescendants`, `percentiles`,
`consumeSseEvents`, `writeWorkspaceSettings`. Future serve test
files can import instead of inlining.
- `integration-tests/fixtures/idle-mcp/{server.mjs,package.json}` — a
minimal stdio MCP fixture that responds to `initialize` /
`tools/list` and idles. Lets the harness count real MCP children
via `pgrep` without depending on a network npm package in CI.
- `integration-tests/baselines/baseline-stage-1.json` — the first
captured baseline at this commit. Future Mode B PRs can diff their
run against this file; updating it is a deliberate one-line change
in a follow-up PR.
## Reference patterns from opencode
JSDoc on the main test file documents the shape borrowed from
`opencode/test/memory/abort-leak.test.ts` (forced-GC heap-growth),
`opencode/src/cli/heap.ts` (RSS poll + threshold-triggered
`writeHeapSnapshot`, useful for Wave 6 production tooling), and
`opencode/src/util/cpu-watchdog.ts` (event-loop lag drift sampling).
The harness here is daemon-level multi-session — a shape neither
opencode nor qwen-code had before.
## Engineering principles checklist
- [x] Independently mergeable (test-only; no production code touched)
- [x] Backward compatible (no removed routes / event fields / CLI behavior)
- [x] Default off (PR CI does not run integration tests; baseline
runs in release CI / nightly / manual)
- [x] `qwen serve` Stage 1 routes / SDK behavior preserved (no production
code changed)
- [x] Gradual migration (no client adapter migration in this PR)
- [x] Reversible (revert = delete files, no other side effects)
- [x] Tests-first (this IS the test PR; harness exercises real daemon
end-to-end; Windows skipped via existing `process.platform === 'win32'`
precedent)
## Test plan
- [x] `KEEP_OUTPUT=true TEST_CLI_PATH=$(pwd)/packages/cli/dist/index.js
QWEN_BASELINE_SKIP_PROMPT_LATENCY=1 QWEN_BASELINE_RSS_SAMPLE_DURATION_MS=2000
npx vitest run integration-tests/cli/qwen-serve-baseline.test.ts`
— 6 passed / 1 skipped (prompt latency requires model key)
- [x] `npx tsc --noEmit -p integration-tests/tsconfig.json` — only
pre-existing tsconfig `paths` glob warning remains, no new errors
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
* fix: import exit from node:process in idle-mcp fixture
Fixes eslint no-undef error: 'process' is not defined.
Replace process.exit(0) with exit(0) from node:process import.
* fix(test): remove stale baseline lint disable
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
* fix(test): harden daemon baseline harness
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
---------
Co-authored-by: Shaojin Wen <shaojin.wensj@alibaba-inc.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
1 parent 54fd5c5 commit 0788ed7
5 files changed
Lines changed: 1343 additions & 0 deletions
File tree
- integration-tests
- baselines
- cli
- fixtures/idle-mcp
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
0 commit comments