test: de-flake two windows-latest timing tests by esengine · Pull Request #3726 · esengine/DeepSeek-Reasonix

esengine · 2026-06-09T15:44:11Z

Summary

windows-latest has two distinct timing-sensitive tests that flake under CI load and intermittently block PRs (both surfaced while running CI on an unrelated branch). This de-flakes both at the root — test-only changes, no product behaviour touched.

1. `internal/plugin` — `TestStartPhaseAReturnsBeforePhaseB`

Asserted StartAvailable returns in <150ms while the helper stalls prompts/list by 200ms — a wall-clock proxy for "phase A doesn't block on the prompt fetch". But StartAvailable spawns a subprocess and runs the MCP initialize handshake, which alone can approach/exceed 150ms on a slow runner (observed 196ms). Dropped the timing assertion; kept the deterministic deferral checks that already encode the contract (prompts empty after phase A, present only after StartPhaseB drains them).

2. `internal/control` — `TestRunShell_CancelStopsCommand`

Waited 5s for TurnDone after Cancel, but cmd.Wait honours shellWaitDelay (also 5s) — and on Windows cmd.Cancel spawns taskkill /F /T — so when the kill rides the full grace, TurnDone arrives at ~shellWaitDelay and the 5s budget loses a dead heat with it (observed 5.18s). Now waits shellWaitDelay + 10s via a new waitForDoneWithin helper that waitForDone delegates to.

Test plan

go test ./internal/plugin/ -run TestStartPhaseAReturnsBeforePhaseB -count=50 and -race -count=10 → pass
go test ./internal/control/ -run TestRunShell_CancelStopsCommand -count=20 → pass
go test ./internal/plugin/ ./internal/control/ → pass
gofmt / go vet clean

The test asserted StartAvailable returns in <150ms while the helper stalls prompts/list by 200ms — a wall-clock proxy for "phase A doesn't block on the prompt fetch". But StartAvailable spawns a subprocess and runs the MCP initialize handshake, and that cost alone can approach or exceed 150ms on a slow CI runner (observed 196ms on windows-latest), so the threshold flaked with no real regression. Drop the timing assertion and keep the deterministic deferral checks that already encode the contract: after phase A the prompts surface is empty, and prompts only materialise after StartPhaseB drains them. A realistic "phase A surfaces prompts inline" regression still trips the empty-surface check; the removed threshold only proxied an implausible fetch-but-not-surface case at the cost of flakiness.

The test waited 5s for TurnDone after Cancel, but cmd.Wait honours shellWaitDelay (also 5s) — and on Windows cmd.Cancel spawns taskkill /F /T — so when the kill rides the full grace, TurnDone arrives at ~shellWaitDelay and the 5s budget loses its own race (observed 5.18s on windows-latest). Wait shellWaitDelay + 10s instead, via a new waitForDoneWithin helper that the existing waitForDone now delegates to. The command is still killed promptly in practice; the larger budget only removes the dead heat with the grace period.

* test(plugin): de-flake TestStartPhaseAReturnsBeforePhaseB The test asserted StartAvailable returns in <150ms while the helper stalls prompts/list by 200ms — a wall-clock proxy for "phase A doesn't block on the prompt fetch". But StartAvailable spawns a subprocess and runs the MCP initialize handshake, and that cost alone can approach or exceed 150ms on a slow CI runner (observed 196ms on windows-latest), so the threshold flaked with no real regression. Drop the timing assertion and keep the deterministic deferral checks that already encode the contract: after phase A the prompts surface is empty, and prompts only materialise after StartPhaseB drains them. A realistic "phase A surfaces prompts inline" regression still trips the empty-surface check; the removed threshold only proxied an implausible fetch-but-not-surface case at the cost of flakiness. * test(control): de-flake TestRunShell_CancelStopsCommand The test waited 5s for TurnDone after Cancel, but cmd.Wait honours shellWaitDelay (also 5s) — and on Windows cmd.Cancel spawns taskkill /F /T — so when the kill rides the full grace, TurnDone arrives at ~shellWaitDelay and the 5s budget loses its own race (observed 5.18s on windows-latest). Wait shellWaitDelay + 10s instead, via a new waitForDoneWithin helper that the existing waitForDone now delegates to. The command is still killed promptly in practice; the larger budget only removes the dead heat with the grace period.

esengine requested a review from SivanCola as a code owner June 9, 2026 15:44

github-actions Bot added v2 Go rewrite (1.x) — main-v2 branch, active development mcp MCP servers / plugins (internal/plugin, codegraph) labels Jun 9, 2026

github-actions Bot added the agent Core agent loop (internal/agent, internal/control) label Jun 9, 2026

esengine changed the title ~~test(plugin): de-flake TestStartPhaseAReturnsBeforePhaseB~~ test: de-flake two windows-latest timing tests Jun 9, 2026

esengine merged commit 6b0fe9b into main-v2 Jun 9, 2026
14 checks passed

esengine deleted the fix/plugin-deflake-phase-handshake branch June 9, 2026 15:58

Bernardxu123 mentioned this pull request Jun 10, 2026

[Meta] Issues 分组审核报告 — 按模块分类 & 优先级排序 #3275

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: de-flake two windows-latest timing tests#3726

test: de-flake two windows-latest timing tests#3726
esengine merged 2 commits into
main-v2from
fix/plugin-deflake-phase-handshake

esengine commented Jun 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

esengine commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. internal/plugin — TestStartPhaseAReturnsBeforePhaseB

2. internal/control — TestRunShell_CancelStopsCommand

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

esengine commented Jun 9, 2026 •

edited

Loading

1. `internal/plugin` — `TestStartPhaseAReturnsBeforePhaseB`

2. `internal/control` — `TestRunShell_CancelStopsCommand`