fix(serve): align integration test mirrors with merged capability + EventBus changes#4245
Conversation
…ventBus changes - qwen-serve-routes.test.ts: expand expected features list to 24, adding slow_client_warning (#4237) and workspace_mcp/workspace_skills/ workspace_providers/session_context/session_supported_commands (#4241). Matches EXPECTED_STAGE1_FEATURES in server.test.ts:76-101. - qwen-serve-baseline.test.ts: update SSE backpressure assertion from 3 to 4 frames (tick, tick, slow_client_warning, client_evicted). PR #4237 changed EventBus to force-push a slow_client_warning synthetic frame when the per-subscriber queue reaches the 75% warn threshold, before the client_evicted terminal frame fires on overflow. Mirrors the unit test at eventBus.test.ts:103-122. Both integration mirrors drifted because integration tests only run on schedule / workflow_dispatch (release.yml:4-9), not PR CI. Fixes the release run 25992130532 failure in both Docker and No-Sandbox jobs. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
📋 Review SummaryThis PR fixes two integration test failures caused by test data drift after recent merges (#4237 and #4241). The changes correctly align the integration test mirrors with the canonical source files. Both fixes are minimal, targeted, and follow the established patterns in the codebase. 🔍 General Feedback
🎯 Specific Feedback🔵 Low
✅ Highlights
|
There was a problem hiding this comment.
Pull request overview
This PR updates integration-test mirrors to match recent serve capability and EventBus behavior changes, fixing stale nightly/release integration expectations without changing product code.
Changes:
- Adds the newly advertised serve feature tags to the integration
/capabilitiesexpected list. - Updates the SSE backpressure baseline assertion to expect
slow_client_warningbeforeclient_evicted.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
integration-tests/cli/qwen-serve-routes.test.ts |
Aligns expected capabilities with the canonical serve feature registry. |
integration-tests/cli/qwen-serve-baseline.test.ts |
Aligns backpressure expectations with EventBus slow-client warning behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅
Both integration-test mirrors correctly synchronized with their canonical sources:
qwen-serve-routes.test.ts: 6 missing capabilities added, order matchesEXPECTED_STAGE1_FEATURES.qwen-serve-baseline.test.ts: SSE backpressure assertion updated from 3→4 frames, accounting for theslow_client_warningsynthetic frame.
All 9 review dimensions (correctness, security, code quality, performance, test coverage, attacker/oncall/maintainer audit, build & test) found no issues. Deterministic analysis clean (eslint, tsc). Integration tests pass (26/26 + 6/6).
— glm-5.1 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — gpt-5.5 via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — DeepSeek/deepseek-v4-pro via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — claude-opus-4-7 via Qwen Code /review
Resolves conflicts after main advanced through PR 13 (#4251 preflight + env diagnostics) and #4245 (capability registry test alignment). Two intentional integrations: 1. **`SERVE_ERROR_KINDS` taxonomy join.** PR 13 introduced a closed enumeration of structured error kinds (`SERVE_ERROR_KINDS` + sibling `DAEMON_ERROR_KINDS`) replacing PR 14's free-form `errorKind?: string`. Added `'budget_exhausted'` to both lists so the per-server / workspace cell semantics from PR 14 stay typed within the closed set. The `acpAgent.buildBudgetCells` local `errorKind` variable is retyped from `string | undefined` to `ServeErrorKind | undefined`. `status.test.ts` "exposes the seven roadmap-defined error kinds" pin updated to "eight" with `'budget_exhausted'` appended. 2. **`acpAgent.ts` import + errorCell collision.** PR 13 added `mapDomainErrorToErrorKind` + `errorCell(kind, error, errorKind?)` 3-arg signature; PR 14 added `buildBudgetCells` next to it. Both blocks kept verbatim with PR 13's wider `errorCell` signature taking precedence. Imports merged. No behavior change in any pre-existing PR 14 fix; the merge is purely structural alignment with main's PR 13 closed-taxonomy. 693/693 focused tests pass; typecheck + lint clean across 4 workspaces.
Summary
Fixes release run 25992130532, which failed in both Docker and No-Sandbox integration jobs. Two integration-test mirrors drifted from canonical source after recent PRs merged. Both failures are stale test data, not product-code bugs.
Integration tests only run on
schedule(nightly) /workflow_dispatch(release) per.github/workflows/release.yml:4-9, so PR CI didn't catch the drift at merge time of #4237 and #4241.Changes
integration-tests/cli/qwen-serve-routes.test.tsExpand the expected features list from 18 → 24 entries, matching
EXPECTED_STAGE1_FEATURESinpackages/cli/src/serve/server.test.ts:76-101(the verified mirror, kept up-to-date).Six features were missing:
slow_client_warning— added by feat(serve): SSE replay sizing + slow_client_warning backpressure (#4175 Wave 2.5 PR 10) #4237workspace_mcp,workspace_skills,workspace_providers,session_context,session_supported_commands— added by feat(serve): add read-only status routes #4241require_authcorrectly remains absent (conditionally advertised only when--require-authis on).integration-tests/cli/qwen-serve-baseline.test.tsUpdate the SSE backpressure assertion from 3 to 4 frames. PR #4237 changed
EventBusto force-push aslow_client_warningsynthetic frame when the per-subscriber queue reaches the 75% warn threshold, before theclient_evictedterminal frame fires on overflow. New expected order:Mirrors the unit-test assertion at
packages/cli/src/serve/eventBus.test.ts:103-122. Comment is also updated to explain the warn-threshold math (withmaxQueued: 2,warnThreshold = 0.75 × 2 = 1.5; event 2 bringsliveSizeto 2 ≥ 1.5 → warn).What is intentionally NOT changed
importfrompackages/cli). The integration test value is acting as an external wire-shape contract against a spawned daemon — importing the constant would let registry typos silently propagate to the test.snapshot.sseBackpressure.ringSize: 4_000inbaseline.test.ts:495is stale (DEFAULT_RING_SIZEis8000ineventBus.ts:76) but doesn't drive any assertion — onlyperf-baseline.mdrendering. Out of scope for this minimal fix._daemon-harness.ts:502only keys onclient_evictedas terminal and tolerates prior frames via thereceived++loop — no harness change needed.Test plan
npx vitest run packages/cli/src/serve/server.test.ts packages/cli/src/serve/eventBus.test.ts— 165/165 passnpm run build && npm run bundlenpx vitest run --root ./integration-tests cli/qwen-serve-routes.test.ts— 26/26 passQWEN_BASELINE_SKIP_PROMPT_LATENCY=1 npx vitest run --root ./integration-tests cli/qwen-serve-baseline.test.ts— 6/6 pass (1 expected skip for prompt-latency block)gh workflow run release.yml -f dry_run=trueFollow-ups (not in this PR)
snapshot.sseBackpressure.ringSize: 4_000inbaseline.test.tsto8000and extendSnapshotShape.sseBackpressurewith theslow_client_warningbehavior bounds (WARN_THRESHOLD_RATIO,WARN_RESET_RATIO, once-per-episode), soperf-baseline.mdrecords them.🤖 Generated with Qwen Code