Summary
Several CLI UI tests intermittently fail on CI across all three runner platforms (macOS, Windows, Ubuntu). Failures reproduce on main independently of any PR's content. Every flake observed so far falls into the same class: a vitest assertion on a render-spy / key-sequence expecting an exact call count, where ink's async rerender timing or fake-timer interaction causes the spy to fire one extra time.
Recurring failing tests
| Test |
First flake observed |
Failure shape |
src/ui/AppContainer.test.tsx > AppContainer State Management > Terminal Height Calculation > does not remeasure footer height for sticky todo status-only updates |
well before PR #4386 |
expected "spy" to be called 1 times, but got 2 times |
src/ui/components/InputPrompt.test.tsx > InputPrompt > prompt suggestions > accepts and submits the prompt suggestion on Enter when the buffer is empty |
well before PR #4386 |
expected "spy" to be called with arguments: [ 'commit this' ] |
src/ui/components/messages/AskUserQuestionDialog.test.tsx > <AskUserQuestionDialog /> > single-select interaction > keeps bare k/j in custom input while Ctrl+P/N still navigates options |
well before PR #4386 |
(same async-spy shape) |
Evidence — failures on main (recent, sampled)
Five of the eight most recent CI runs on main failed; all five failures fall in this class. PR-level CI runs hit them at roughly the same rate; PR #4386 hit them in three of its first four runs (different test each time, all in this class).
What we know
- All three tests pass reliably on local dev machines (re-ran each locally; immediate pass).
- All three tests interact with ink rendering + an async
useEffect or useState rerender; the spy assertion measures a call count or an arg shape that depends on timer/microtask ordering.
- Failures are not deterministic per-platform — the same test that fails on Windows in one run passes on Windows in the next.
What would help
This isn't a hard fix request, more a tracking issue so PR authors stop re-triaging the same flake across rounds. Reasonable next steps if someone takes it:
- Quarantine the three tests (e.g.
test.skipIf(process.env.CI) or vitest.config.ts testTimeout + retry: 2) until root-caused, so CI signal improves immediately.
- Audit those three tests for the underlying race — most likely candidates: missing
act() wrappers around state updates, real timers leaking from prior tests in the file, or useEffect cleanup running on a different tick than the spy expected.
- Optionally: add
@vitest/runners retry only for this file rather than globally, so retries don't mask other tests' bugs.
Why not just fix it here
This is well out of scope for any feature/bugfix PR — the flake exists on main and predates any single PR. Triaging the actual root cause is a focused exercise on infrastructure / test-harness, not on a code change. Filing this issue so the work can be picked up independently.
Summary
Several CLI UI tests intermittently fail on CI across all three runner platforms (macOS, Windows, Ubuntu). Failures reproduce on
mainindependently of any PR's content. Every flake observed so far falls into the same class: a vitest assertion on a render-spy / key-sequence expecting an exact call count, where ink's async rerender timing or fake-timer interaction causes the spy to fire one extra time.Recurring failing tests
src/ui/AppContainer.test.tsx > AppContainer State Management > Terminal Height Calculation > does not remeasure footer height for sticky todo status-only updatesexpected "spy" to be called 1 times, but got 2 timessrc/ui/components/InputPrompt.test.tsx > InputPrompt > prompt suggestions > accepts and submits the prompt suggestion on Enter when the buffer is emptyexpected "spy" to be called with arguments: [ 'commit this' ]src/ui/components/messages/AskUserQuestionDialog.test.tsx > <AskUserQuestionDialog /> > single-select interaction > keeps bare k/j in custom input while Ctrl+P/N still navigates optionsEvidence — failures on
main(recent, sampled)AppContainer > does not remeasure footer height ...InputPrompt > accepts and submits the prompt suggestion ...AppContainer > does not remeasure footer height ...AppContainer > does not remeasure footer height ...AppContainer > does not remeasure footer height ...Five of the eight most recent CI runs on
mainfailed; all five failures fall in this class. PR-level CI runs hit them at roughly the same rate; PR #4386 hit them in three of its first four runs (different test each time, all in this class).What we know
useEffectoruseStatererender; the spy assertion measures a call count or an arg shape that depends on timer/microtask ordering.What would help
This isn't a hard fix request, more a tracking issue so PR authors stop re-triaging the same flake across rounds. Reasonable next steps if someone takes it:
test.skipIf(process.env.CI)orvitest.config.tstestTimeout+retry: 2) until root-caused, so CI signal improves immediately.act()wrappers around state updates, real timers leaking from prior tests in the file, oruseEffectcleanup running on a different tick than the spy expected.@vitest/runnersretry only for this file rather than globally, so retries don't mask other tests' bugs.Why not just fix it here
This is well out of scope for any feature/bugfix PR — the flake exists on
mainand predates any single PR. Triaging the actual root cause is a focused exercise on infrastructure / test-harness, not on a code change. Filing this issue so the work can be picked up independently.