test(integration): drop tight 30s timeout in sleep-interception e2e tests#4878
Conversation
The four sleep-interception integration tests hardcoded a 30s per-test timeout. That is shorter than the test rig's own per-operation timeout in CI (60s), and each test performs more than one such operation, so under Docker sandbox load with the CI model the tests intermittently exceed 30s and fail (most recently 'should allow sleep < 2s' timed out on all retries, failing the release Docker integration job). Drop the override so they use the suite's default timeout, consistent with the other integration tests.
|
Thanks for the PR! Template looks good ✓ On direction: clearly aligned — flaky CI blocks releases and wastes contributor time. Fixing test timeouts that are too tight for the Docker sandbox environment is straightforward maintenance. On approach: scope is minimal and correct. Four lines removed, no logic changes. The 30s override was genuinely shorter than the suite default (5 min), so dropping it is the right fix rather than bumping to another arbitrary number. Moving on to code review. 🔍 中文说明感谢贡献! 模板完整 ✓ 方向:明确对齐——不稳定的 CI 会阻塞发布并浪费时间。修复 Docker 沙箱环境中过紧的测试超时是合理的维护。 方案:范围最小且正确。删除四行,无逻辑变更。30s 的覆盖值确实短于套件默认值(5 分钟),直接移除比改为另一个任意值更合理。 进入代码审查 🔍 — Qwen Code · qwen3.7-max |
Code ReviewIndependent proposal before reading the diff: remove the four No correctness bugs, no security concerns, no regressions. The change follows project conventions. Clean. Test ResultsRan the full sleep-interception suite twice — once on Key observation: even on a healthy run, individual tests take 8–19s. Under Docker sandbox load with retries, hitting 30s is plausible — the PR's motivation checks out. Before (main, 30s timeout)After (this PR, default 5 min timeout)Note: test 4 hit 18.7s — demonstrating that even without Docker sandbox overhead, individual tests can consume a significant chunk of the old 30s budget. 中文说明代码审查独立方案:移除 无正确性缺陷、无安全隐患、无回归。符合项目规范。干净。 测试结果在 关键发现:即使在健康运行中,单个测试也需 8–19s。在 Docker 沙箱负载下达到 30s 是合理的——PR 的动机成立。 — Qwen Code · qwen3.7-max |
|
This is about as clean as a fix gets: four lines removed, zero risk, solves a real CI flake that's been blocking releases. The independent proposal matches the PR exactly — there's no simpler path because this is the simplest path. Test results confirm the motivation: individual tests routinely take 10–19s even without Docker sandbox overhead, so a 30s hard cap was genuinely too tight. Dropping to the 5-minute suite default gives ample headroom without being reckless. No concerns. Approving. 中文说明这是一个非常干净的修复:删除四行,零风险,解决了阻塞发布的 CI 不稳定问题。独立方案与 PR 完全一致——没有更简单的路径,因为这本身就是最简单的路径。 测试结果证实了动机:即使没有 Docker 沙箱开销,单个测试也需要 10–19s,30s 硬上限确实过紧。改为 5 分钟套件默认值提供了充足的余量。 无顾虑。批准。 — Qwen Code · qwen3.7-max |
qwen-code-ci-bot
left a comment
There was a problem hiding this comment.
LGTM, looks ready to ship. ✅
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
What this PR does
Removes the hardcoded 30-second per-test timeout from the four sleep-interception integration tests so they use the suite's default timeout (the same as the rest of the integration tests).
Why it's needed
These tests drive a real model and execute a shell command inside the Docker sandbox. They hardcoded a 30s timeout — which is actually shorter than the test rig's own per-operation timeout in CI (60s), and each test performs more than one such operation. Under Docker-sandbox load with the CI model they intermittently exceed 30s and fail the release pipeline's Docker integration job. The most recent failure timed out on "should allow sleep < 2s" across all retries: https://github.com/QwenLM/qwen-code/actions/runs/27178705912/job/80233278943
The interception behavior these tests cover is environment-independent and already passes in the non-sandbox job; the failures are purely a too-tight timeout budget, not a real defect. Dropping the override lets them use the same generous default as every other integration test, which removes the flakiness without losing any coverage.
Reviewer Test Plan
How to verify
Run the sleep-interception integration tests under the Docker sandbox. They complete in roughly 12–20s normally and no longer fail spuriously when the runner or model is slow. Behavior and assertions are unchanged — only the timeout budget.
Evidence (Before & After)
Before: the release Docker integration job failed with "Test timed out in 30000ms" on the sleep-interception suite (linked above). After: the tests use the default timeout and complete normally. Reproduced locally in the Docker sandbox passing 3/3 — twice in isolation and once under induced parallel contention — at ~12s each. Non-UI change, so no screenshots.
Tested on
Environment (optional)
Local Docker sandbox.
Risk & Scope
Linked Issues
Failed CI run that motivated this: https://github.com/QwenLM/qwen-code/actions/runs/27178705912/job/80233278943