fix(serve): unbreak E2E after #4271 (capabilities + clientCount)#4306
Conversation
Two regressions introduced by #4271 (MCP guardrail push events) had been failing every main E2E run since the PR landed. Both fixes are in integration tests; no source changes. 1. `qwen serve — capabilities envelope > advertises all baseline capabilities`. `mcp_guardrail_events` was added to `SERVE_CAPABILITY_REGISTRY` and to the unit baseline list (`packages/cli/src/serve/server.test.ts:119`) but not to the integration test's hand-maintained list. Same drift class as #4268 / #4284. Fix: append the tag in registry order. 2. `MCP child amplification (P1 baseline) > clientCount matches external pgrep observation`. The test (added by #4271, never passed CI) asserted `pgrep_observed === MCP_SERVERS_CONFIGURED`, ignoring that an ACP child runs TWO `Config` objects — bootstrap (`runAcpAgent` → `config.initialize`) + per-session (`newSessionConfig` → `config.initialize`) — each with its own `McpClientManager`. After one session, pgrep observes 2×N grandchildren while `/workspace/mcp` snapshot (`buildWorkspaceMcpStatus(this.config)`) reads only the bootstrap manager (=N). Fix: encode the 2× architectural amplification literally so a future follow-up that unifies the managers fails this assertion and forces an explicit update; keep `clientCount === MCP_SERVERS_CONFIGURED` and the original `clientCount ≤ pgrep` over-report guard intact. Verified locally: both tests pass on first attempt (no retries) via `vitest run --root ./integration-tests cli/qwen-serve-routes.test.ts cli/qwen-serve-baseline.test.ts`. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
📋 Review SummaryThis PR fixes two test regressions introduced by #4271 in the integration test suite. Both changes are well-documented and address test drift rather than source code issues. The fixes are technically sound, with excellent explanatory comments that will help future maintainers understand the architectural context. 🔍 General Feedback
🎯 Specific Feedback🔵 LowFile: The architectural explanation about dual
Suggestion: Consider documenting this in a dedicated architecture doc or as a code comment in the actual File: const expectedGrandchildren = MCP_SERVERS_CONFIGURED * 2;While the // One manager at bootstrap + one manager per session
const MCP_MANAGERS_PER_SESSION = 2;
const expectedGrandchildren = MCP_SERVERS_CONFIGURED * MCP_MANAGERS_PER_SESSION;This makes the assertion self-documenting and reduces the chance someone "simplifies" the ✅ Highlights
|
There was a problem hiding this comment.
Pull request overview
Test-only PR to restore qwen serve E2E green after regressions introduced by #4271, by realigning integration expectations with the serve capability registry and by fixing the MCP subprocess accounting cross-check to reflect the current ACP child architecture.
Changes:
- Update the integration
/capabilitiesbaseline feature list to includemcp_guardrail_eventsin registry order. - Adjust the MCP
pgrep -Pcross-check to expect the current 2× configured-server-count grandchild process observation while keepingclientCount === MCP_SERVERS_CONFIGUREDand theclientCount <= pgrepguard.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| integration-tests/cli/qwen-serve-routes.test.ts | Syncs the E2E capability baseline with the newly-added mcp_guardrail_events tag. |
| integration-tests/cli/qwen-serve-baseline.test.ts | Fixes the MCP child-process cross-check to match observed 2× amplification from bootstrap + per-session managers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // idle MCP fixtures are stdio-only so the relationship between | ||
| // `clientCount` and pgrep is exact (no amplification slack | ||
| // required at idle). | ||
| it('clientCount matches external pgrep observation', async () => { |
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — qwen-latest-series-invite-beta-v28 via Qwen Code /review
Summary
Two regressions introduced by #4271 (
feat(serve): MCP guardrail push events + hysteresis) had been failing every main E2E run since the PR landed. Both fixes are in integration tests; no source changes.qwen serve — capabilities envelope > advertises all baseline capabilities—mcp_guardrail_eventswas added toSERVE_CAPABILITY_REGISTRYand to the unit baseline list (packages/cli/src/serve/server.test.ts:119) but missed the integration test's hand-maintained list. Same drift class as fix(serve): add mcp_guardrails to E2E capabilities expectation #4268 / fix(serve): sync E2E baseline capabilities with registry #4284. Fix: append the tag in registry order.MCP child amplification (P1 baseline) > clientCount matches external pgrep observation(added by feat(serve): MCP guardrail push events + hysteresis (#4175 Wave 3 PR 14b) #4271, never passed CI) — the assertionpgrep_observed === MCP_SERVERS_CONFIGUREDignored that an ACP child runs TWOConfigobjects (bootstrap fromrunAcpAgent+ per-session fromnewSessionConfig), each with its ownMcpClientManager. After one session pgrep observes 2×N grandchildren while/workspace/mcpsnapshot reads only the bootstrap manager (=N). Fix: encode the 2× architectural amplification literally so a future proposal(serve): Mode B feature-priority roadmap toward v0.16 production-ready #4175 follow-up that unifies the managers fails this assertion and forces an explicit update; keepclientCount === MCP_SERVERS_CONFIGUREDand the originalclientCount ≤ pgrepover-report guard intact.Test plan
vitest run --root ./integration-tests cli/qwen-serve-routes.test.ts cli/qwen-serve-baseline.test.ts— both green on first attempt (no retries) locallyE2E Test (Linux) - sandbox:none/sandbox:docker/E2E Test - macOSall greenFollow-up (not in this PR)
The "double MCP discovery in ACP child" itself (bootstrap manager + per-session manager both spawning the same servers) is real wasted spawn cost in daemon mode. Worth a separate #4175 follow-up to either gate bootstrap MCP discovery behind a daemon-mode flag, or refactor
newSessionConfigto share the bootstrap'sMcpClientManager. The literalMCP_SERVERS_CONFIGURED * 2in the test is a tripwire for that future change.🤖 Generated with Qwen Code