Skip to content

flaky(windows): it.live session tests hit the 3s timeout ceiling #15

@Astro-Han

Description

@Astro-Han

Symptom

Four it.live tests in packages/opencode/test/session/prompt-effect.test.ts fail with duration hitting ~3015-3016ms on Windows CI, matching the explicit { timeout: 3_000 } passed to each helper call.

Test File:Line Explicit timeout
prompt submitted during an active run is included in the next LLM input prompt-effect.test.ts:938 3_000
shell rejects with BusyError when loop running prompt-effect.test.ts:1054 3_000
loop waits while shell runs and starts after shell exits prompt-effect.test.ts:1214 3_000
shell completion resumes queued loop callers prompt-effect.test.ts:1252 3_000

Evidence (last 5 Windows CI runs on dev, 2026-04-16 → 2026-04-17)

Test Ubuntu pass Windows pass p50 / max Windows fail count
prompt submitted during active run 324ms 2281 / 2391ms 1 × 3015ms
shell rejects with BusyError 238ms 1531 / 1531ms 1 × 3016ms
loop waits while shell runs 574ms 2774 / 2828ms 3 × 3016ms
shell completion resumes 591ms 2367 / 2375ms 2 × 3016ms

Fail duration (~3016ms) lands precisely on the explicit 3s cap. Windows pass p95 is 2828ms — 94% of the 3s ceiling. Any runner jitter pushes over.

Windows is 4-7× slower than Ubuntu on these tests. it.live uses a real clock (TestClock replaced with the live layer), so real runner slowness shows up here while it doesn't for tests using TestClock.

Proposed fix

Add an OS-aware applyScale in packages/opencode/test/lib/effect.ts so it.live and testEffect helpers scale user-specified timeouts on Windows:

const scaleForOS = (base: number) =>
  process.platform === "win32" ? base * 3 : base

const applyScale = (opts?: number | TestOptions) => {
  if (opts === undefined) return undefined
  if (typeof opts === "number") return scaleForOS(opts)
  if (opts.timeout !== undefined) return { ...opts, timeout: scaleForOS(opts.timeout) }
  return opts
}

Apply inside effect, effect.only, effect.skip, live, live.only, live.skip — pass applyScale(opts) through to test(name, fn, ...).

  • Coefficient × 3: Windows pass max observed is 2828ms. 3s × 3 = 9s gives ~3× headroom over observed p95. If flakes persist, raise to × 5 rather than adding more escape hatches.
  • Does NOT affect tests that pass undefined (they keep the Bun global --timeout 30000 from package.json).
  • Does NOT affect Ubuntu (coefficient is for non-win32).

Alternative considered

Bump the four explicit 3_000 literals to 9_000 inline. Rejected — the same Windows-runner slowness will hit future it.live tests; centralizing the scale is reusable and keeps the Ubuntu-authoring experience unchanged.

Verification

Run CI 5× after merge. Expect these 4 tests to pass on Windows in all runs. If any still flake, the coefficient is wrong — go to × 5 and collect another 5 runs before closing.

Out of scope

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium prioritybugSomething isn't workingciContinuous integration / GitHub Actionsflaky-testNon-deterministic test failurewindowsWindows-specific

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions