Skip to content

test: de-flake two windows-latest timing tests#3726

Merged
esengine merged 2 commits into
main-v2from
fix/plugin-deflake-phase-handshake
Jun 9, 2026
Merged

test: de-flake two windows-latest timing tests#3726
esengine merged 2 commits into
main-v2from
fix/plugin-deflake-phase-handshake

Conversation

@esengine

@esengine esengine commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Summary

windows-latest has two distinct timing-sensitive tests that flake under CI load and intermittently block PRs (both surfaced while running CI on an unrelated branch). This de-flakes both at the root — test-only changes, no product behaviour touched.

1. internal/pluginTestStartPhaseAReturnsBeforePhaseB

Asserted StartAvailable returns in <150ms while the helper stalls prompts/list by 200ms — a wall-clock proxy for "phase A doesn't block on the prompt fetch". But StartAvailable spawns a subprocess and runs the MCP initialize handshake, which alone can approach/exceed 150ms on a slow runner (observed 196ms). Dropped the timing assertion; kept the deterministic deferral checks that already encode the contract (prompts empty after phase A, present only after StartPhaseB drains them).

2. internal/controlTestRunShell_CancelStopsCommand

Waited 5s for TurnDone after Cancel, but cmd.Wait honours shellWaitDelay (also 5s) — and on Windows cmd.Cancel spawns taskkill /F /T — so when the kill rides the full grace, TurnDone arrives at ~shellWaitDelay and the 5s budget loses a dead heat with it (observed 5.18s). Now waits shellWaitDelay + 10s via a new waitForDoneWithin helper that waitForDone delegates to.

Test plan

  • go test ./internal/plugin/ -run TestStartPhaseAReturnsBeforePhaseB -count=50 and -race -count=10 → pass
  • go test ./internal/control/ -run TestRunShell_CancelStopsCommand -count=20 → pass
  • go test ./internal/plugin/ ./internal/control/ → pass
  • gofmt / go vet clean

The test asserted StartAvailable returns in <150ms while the helper stalls
prompts/list by 200ms — a wall-clock proxy for "phase A doesn't block on the
prompt fetch". But StartAvailable spawns a subprocess and runs the MCP
initialize handshake, and that cost alone can approach or exceed 150ms on a
slow CI runner (observed 196ms on windows-latest), so the threshold flaked
with no real regression.

Drop the timing assertion and keep the deterministic deferral checks that
already encode the contract: after phase A the prompts surface is empty, and
prompts only materialise after StartPhaseB drains them. A realistic
"phase A surfaces prompts inline" regression still trips the empty-surface
check; the removed threshold only proxied an implausible fetch-but-not-surface
case at the cost of flakiness.
@esengine esengine requested a review from SivanCola as a code owner June 9, 2026 15:44
@github-actions github-actions Bot added v2 Go rewrite (1.x) — main-v2 branch, active development mcp MCP servers / plugins (internal/plugin, codegraph) labels Jun 9, 2026
The test waited 5s for TurnDone after Cancel, but cmd.Wait honours
shellWaitDelay (also 5s) — and on Windows cmd.Cancel spawns taskkill /F /T —
so when the kill rides the full grace, TurnDone arrives at ~shellWaitDelay and
the 5s budget loses its own race (observed 5.18s on windows-latest).

Wait shellWaitDelay + 10s instead, via a new waitForDoneWithin helper that the
existing waitForDone now delegates to. The command is still killed promptly in
practice; the larger budget only removes the dead heat with the grace period.
@github-actions github-actions Bot added the agent Core agent loop (internal/agent, internal/control) label Jun 9, 2026
@esengine esengine changed the title test(plugin): de-flake TestStartPhaseAReturnsBeforePhaseB test: de-flake two windows-latest timing tests Jun 9, 2026
@esengine esengine merged commit 6b0fe9b into main-v2 Jun 9, 2026
14 checks passed
@esengine esengine deleted the fix/plugin-deflake-phase-handshake branch June 9, 2026 15:58
SuMuxi66 pushed a commit to SuMuxi66/DeepSeek-Reasonix that referenced this pull request Jun 10, 2026
* test(plugin): de-flake TestStartPhaseAReturnsBeforePhaseB

The test asserted StartAvailable returns in <150ms while the helper stalls
prompts/list by 200ms — a wall-clock proxy for "phase A doesn't block on the
prompt fetch". But StartAvailable spawns a subprocess and runs the MCP
initialize handshake, and that cost alone can approach or exceed 150ms on a
slow CI runner (observed 196ms on windows-latest), so the threshold flaked
with no real regression.

Drop the timing assertion and keep the deterministic deferral checks that
already encode the contract: after phase A the prompts surface is empty, and
prompts only materialise after StartPhaseB drains them. A realistic
"phase A surfaces prompts inline" regression still trips the empty-surface
check; the removed threshold only proxied an implausible fetch-but-not-surface
case at the cost of flakiness.

* test(control): de-flake TestRunShell_CancelStopsCommand

The test waited 5s for TurnDone after Cancel, but cmd.Wait honours
shellWaitDelay (also 5s) — and on Windows cmd.Cancel spawns taskkill /F /T —
so when the kill rides the full grace, TurnDone arrives at ~shellWaitDelay and
the 5s budget loses its own race (observed 5.18s on windows-latest).

Wait shellWaitDelay + 10s instead, via a new waitForDoneWithin helper that the
existing waitForDone now delegates to. The command is still killed promptly in
practice; the larger budget only removes the dead heat with the grace period.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Core agent loop (internal/agent, internal/control) mcp MCP servers / plugins (internal/plugin, codegraph) v2 Go rewrite (1.x) — main-v2 branch, active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant