fix(test): unbreak qwen serve integration suites after the daemon batch merge#5041
Conversation
…erge Three integration tests have failed every nightly Release and E2E run since the daemon-mode feature batch (#4490) merged, because these suites only run post-merge: - routes: resync the capabilities envelope baseline with the features the batch added (verified against a live daemon), and strip the env toggles that flip conditional tags so the exact-equality assertion is hermetic on dev machines. - baseline: the 2xN MCP grandchildren tripwire fired as designed — the workspace MCP pool eliminated the bootstrap/session duplicate discovery. Assert exactly N pooled children and cross-check the pool's per-server accounting against pgrep. - streaming: the permission test could finish with its turn still blocked on a second permission request nobody would ever answer; the abandoned request wedges the shared session's prompt FIFO and the downstream Last-Event-ID resume test times out waiting for a turn_complete that never comes (reproduced empirically). Pin the session to default approval mode (hermetic vs host user settings) and cancel the possibly-in-flight turn before finishing. The daemon-side wedge (abandoned permission request blocks the FIFO until an explicit cancel) is real beyond tests and tracked separately.
E2E verification report (local, macOS)All runs against the bundled CLI (
The wedge reproduction matches the nightly signature exactly: resume test timing out 60s × 3 retries on both Release jobs and macOS E2E while everything passes in isolation. |
|
Thanks for the PR! Template looks good ✓ On direction: This is a pure test fix unblocking the nightly release pipeline — blocked since 2026-06-12. Three integration test suites ( On approach: The scope is tight and well-motivated. Three files, three distinct root causes, each addressed at the source:
The PR description is unusually thorough — it explains not just what changed but why each assertion was wrong, and the before/after evidence table is clear. No scope creep, no speculative additions. Moving on to code review. 🔍 中文说明感谢贡献! 模板完整 ✓ 方向:纯测试修复,解除自 2026-06-12 起被阻塞的 nightly 发布流水线。三个 方案:范围紧凑,三个文件对应三个独立根因,均从源头修复:
PR 描述非常详尽——不仅说明了改了什么,还解释了每个断言为什么是错的。没有范围蔓延。 进入代码审查 🔍 — Qwen Code · qwen3.7-max |
wenshao
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: CI still running. — qwen3.7-max via Qwen Code /review
Code ReviewThe diff is clean — three files, each addressing a distinct root cause, with architectural comments that explain why the assertions changed (not just what). No correctness bugs, no AGENTS.md violations, no scope creep. A few observations:
TestingRan the non-model-dependent suites (routes + baseline) in tmux before and after applying the PR. The streaming suite requires a model credential ( Before (main — installed build)After (this PR)Result: Both previously-failing tests now pass. The capabilities envelope correctly expects 60 features (up from 39), and the MCP pool accounting correctly expects N grandchildren (down from 2×N). Duration also dropped from 103s to 69s — likely because the MCP pool test no longer waits for 4 grandchildren that never appear. Not tested: — Qwen Code · qwen3.7-max |
ReflectionThis PR does exactly what it says — unblocks three integration suites that have been red on every nightly and E2E run since the daemon batch merged. The scope is tight (3 files, +113/-42), the root causes are real and well-explained, and the before/after is unambiguous: 2 failures → 0 failures in the testable suites. The MCP pool accounting fix is the most interesting of the three — the tripwire assertion fired exactly as its comment predicted it would ("a future refactor that unifies bootstrap + session managers fails this assertion"), and this PR is the deliberate update it called for. The new The streaming suite fix (pinning approval mode + cancelling the in-flight turn) is logically sound from code review — the wedge is real (an abandoned permission request blocking the session FIFO forever), and the fix is minimal. I couldn't verify it end-to-end without a model credential, but the analysis in the PR description is convincing. One reservation: these suites don't run on PR CI, which is how this drift went unnoticed for a full release cycle. The PR author flags this explicitly ("the deeper gap is that these suites don't run on PR CI") — worth tracking separately, but not a blocker for this fix. Looks good, ready to ship. ✅ 中文说明总结本 PR 完成了它所承诺的工作——解除自 daemon 批次合入以来每次 nightly 和 E2E 运行都失败的三个集成测试套件。范围紧凑(3 个文件,+113/-42),根因真实且有充分解释,前后对比明确:可测试套件从 2 个失败变为 0 个失败。 MCP 池计数修复是三个修复中最值得关注的——绊线断言如其注释所预言的那样触发了("统一 bootstrap 与 session manager 的重构会使此断言失败"),而本 PR 正是它所期望的有意识更新。新增的 streaming 套件的修复(固定审批模式 + 取消执行中的回合)从代码审查看逻辑正确——卡死问题是真实的(被遗弃的权限请求永久阻塞 session FIFO),修复也很精简。由于缺少模型凭证未能端到端验证,但 PR 描述中的分析是有说服力的。 一个保留意见: 这些套件不在 PR CI 中运行,这正是漂移在一个完整发布周期内未被发现的原因。PR 作者已明确指出这一点——值得单独跟踪,但不阻塞本次修复。 可以合入 ✅ — Qwen Code · qwen3.7-max |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
LGTM, looks ready to ship. ✅
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
Local runtime verification on Linux — PASS ✅Since this PR changes only test files, the daemon binary is main's — so beyond running the suites, I verified each fact the new assertions encode against live daemons directly (spawned in tmux from a fresh Setup: Linux, fresh Fact 1 — capabilities envelopeLive daemon (env toggles stripped, no 🔍 Env-strip probe: relaunching with Fact 2 — MCP pool process accountingWorkspace with the suite's two All three new assertions hold against the OS: exactly N grandchildren, Fact 3 — the permission-FIFO wedge (reproduced on the live daemon)Followed the PR's repro recipe against a credentialed daemon, via the SDK: The abandoned second permission request wedges the session's prompt FIFO exactly as described, and a single explicit cancel un-wedges it — the queued prompt then runs to Suite red/green
Findings (non-blocking)
中文版(点击展开)Linux 本地真实运行验证 — 通过 ✅由于本 PR 只改测试文件,守护进程二进制与 main 相同——因此除了跑测试套件,我还把新断言所编码的每一项事实直接对真实守护进程做了独立验证(tmux 中以本分支全新 环境: Linux,全新 事实 1 — capabilities 信封真实守护进程(剥离环境开关、无 🔍 环境剥离探测:设置 事实 2 — MCP 池进程账目工作区配置套件同款的两个 三条新断言对操作系统全部成立:恰好 N 个孙进程、 事实 3 — 权限 FIFO 卡死(在真实守护进程上复现)按 PR 给出的复现步骤,经 SDK 驱动带凭据守护进程: 被遗弃的第二个权限请求恰如描述地卡死了会话的 prompt FIFO,一次显式 cancel 即解除——排队的 prompt 随后约 1.2 秒跑到 套件红绿对照
观察(不阻塞合并)
|
What this PR does
Fixes the three `qwen serve` integration test failures that have failed every nightly Release run and every E2E run on main since the daemon-mode feature batch (#4490) merged:
Why it's needed
The nightly release publish has been blocked since 2026-06-12 (run 27386201557), and every push to main burns a failing E2E run. These suites only run in the Release and E2E workflows — not in required PR checks — so the batch merged green while the drift went unnoticed.
Reviewer Test Plan
How to verify
Evidence (Before & After)
Tested on
Environment (optional)
Local bundle (`npm run build && npm run bundle`), no sandbox; streaming suite verified end-to-end with a real OpenAI-compatible model.
Risk & Scope
Linked Issues
中文说明
本 PR 做了什么
修复自 daemon 功能批次(#4490)合入 main 后,每晚 Release 与每次 E2E 都失败的三个 `qwen serve` 集成测试:
为什么需要
自 2026-06-12 起(run 27386201557)每晚发布被阻塞,且每次推送 main 都浪费一次失败的 E2E。这些套件只在 Release 与 E2E 工作流中运行,不在 PR 必检项中,因此批次合入时是绿的,漂移未被发现。
评审验证方式
风险与范围