test: de-flake two windows-latest timing tests#3726
Merged
Conversation
The test asserted StartAvailable returns in <150ms while the helper stalls prompts/list by 200ms — a wall-clock proxy for "phase A doesn't block on the prompt fetch". But StartAvailable spawns a subprocess and runs the MCP initialize handshake, and that cost alone can approach or exceed 150ms on a slow CI runner (observed 196ms on windows-latest), so the threshold flaked with no real regression. Drop the timing assertion and keep the deterministic deferral checks that already encode the contract: after phase A the prompts surface is empty, and prompts only materialise after StartPhaseB drains them. A realistic "phase A surfaces prompts inline" regression still trips the empty-surface check; the removed threshold only proxied an implausible fetch-but-not-surface case at the cost of flakiness.
The test waited 5s for TurnDone after Cancel, but cmd.Wait honours shellWaitDelay (also 5s) — and on Windows cmd.Cancel spawns taskkill /F /T — so when the kill rides the full grace, TurnDone arrives at ~shellWaitDelay and the 5s budget loses its own race (observed 5.18s on windows-latest). Wait shellWaitDelay + 10s instead, via a new waitForDoneWithin helper that the existing waitForDone now delegates to. The command is still killed promptly in practice; the larger budget only removes the dead heat with the grace period.
SuMuxi66
pushed a commit
to SuMuxi66/DeepSeek-Reasonix
that referenced
this pull request
Jun 10, 2026
* test(plugin): de-flake TestStartPhaseAReturnsBeforePhaseB The test asserted StartAvailable returns in <150ms while the helper stalls prompts/list by 200ms — a wall-clock proxy for "phase A doesn't block on the prompt fetch". But StartAvailable spawns a subprocess and runs the MCP initialize handshake, and that cost alone can approach or exceed 150ms on a slow CI runner (observed 196ms on windows-latest), so the threshold flaked with no real regression. Drop the timing assertion and keep the deterministic deferral checks that already encode the contract: after phase A the prompts surface is empty, and prompts only materialise after StartPhaseB drains them. A realistic "phase A surfaces prompts inline" regression still trips the empty-surface check; the removed threshold only proxied an implausible fetch-but-not-surface case at the cost of flakiness. * test(control): de-flake TestRunShell_CancelStopsCommand The test waited 5s for TurnDone after Cancel, but cmd.Wait honours shellWaitDelay (also 5s) — and on Windows cmd.Cancel spawns taskkill /F /T — so when the kill rides the full grace, TurnDone arrives at ~shellWaitDelay and the 5s budget loses its own race (observed 5.18s on windows-latest). Wait shellWaitDelay + 10s instead, via a new waitForDoneWithin helper that the existing waitForDone now delegates to. The command is still killed promptly in practice; the larger budget only removes the dead heat with the grace period.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
windows-latesthas two distinct timing-sensitive tests that flake under CI load and intermittently block PRs (both surfaced while running CI on an unrelated branch). This de-flakes both at the root — test-only changes, no product behaviour touched.1.
internal/plugin—TestStartPhaseAReturnsBeforePhaseBAsserted
StartAvailablereturns in<150mswhile the helper stallsprompts/listby 200ms — a wall-clock proxy for "phase A doesn't block on the prompt fetch". ButStartAvailablespawns a subprocess and runs the MCPinitializehandshake, which alone can approach/exceed 150ms on a slow runner (observed 196ms). Dropped the timing assertion; kept the deterministic deferral checks that already encode the contract (prompts empty after phase A, present only afterStartPhaseBdrains them).2.
internal/control—TestRunShell_CancelStopsCommandWaited 5s for
TurnDoneafterCancel, butcmd.WaithonoursshellWaitDelay(also 5s) — and on Windowscmd.Cancelspawnstaskkill /F /T— so when the kill rides the full grace,TurnDonearrives at ~shellWaitDelayand the 5s budget loses a dead heat with it (observed 5.18s). Now waitsshellWaitDelay + 10svia a newwaitForDoneWithinhelper thatwaitForDonedelegates to.Test plan
go test ./internal/plugin/ -run TestStartPhaseAReturnsBeforePhaseB -count=50and-race -count=10→ passgo test ./internal/control/ -run TestRunShell_CancelStopsCommand -count=20→ passgo test ./internal/plugin/ ./internal/control/→ passgofmt/go vetclean