Skip to content

CI flake: CLI UI tests intermittently fail across macos/windows/ubuntu (AppContainer footer-remeasure, InputPrompt suggestion submit, AskUserQuestionDialog key handling) #4429

@LaZzyMan

Description

@LaZzyMan

Summary

Several CLI UI tests intermittently fail on CI across all three runner platforms (macOS, Windows, Ubuntu). Failures reproduce on main independently of any PR's content. Every flake observed so far falls into the same class: a vitest assertion on a render-spy / key-sequence expecting an exact call count, where ink's async rerender timing or fake-timer interaction causes the spy to fire one extra time.

Recurring failing tests

Test First flake observed Failure shape
src/ui/AppContainer.test.tsx > AppContainer State Management > Terminal Height Calculation > does not remeasure footer height for sticky todo status-only updates well before PR #4386 expected "spy" to be called 1 times, but got 2 times
src/ui/components/InputPrompt.test.tsx > InputPrompt > prompt suggestions > accepts and submits the prompt suggestion on Enter when the buffer is empty well before PR #4386 expected "spy" to be called with arguments: [ 'commit this' ]
src/ui/components/messages/AskUserQuestionDialog.test.tsx > <AskUserQuestionDialog /> > single-select interaction > keeps bare k/j in custom input while Ctrl+P/N still navigates options well before PR #4386 (same async-spy shape)

Evidence — failures on main (recent, sampled)

Run Created (UTC) Platform Failing test
https://github.com/QwenLM/qwen-code/actions/runs/26213996190 2026-05-21 08:13 ubuntu AppContainer > does not remeasure footer height ...
https://github.com/QwenLM/qwen-code/actions/runs/26213457435 2026-05-21 08:01 windows InputPrompt > accepts and submits the prompt suggestion ...
https://github.com/QwenLM/qwen-code/actions/runs/26208239117 2026-05-21 05:54 windows AppContainer > does not remeasure footer height ...
https://github.com/QwenLM/qwen-code/actions/runs/26207015376 2026-05-21 05:18 macos AppContainer > does not remeasure footer height ...
https://github.com/QwenLM/qwen-code/actions/runs/26204481218 2026-05-21 03:55 macos AppContainer > does not remeasure footer height ...

Five of the eight most recent CI runs on main failed; all five failures fall in this class. PR-level CI runs hit them at roughly the same rate; PR #4386 hit them in three of its first four runs (different test each time, all in this class).

What we know

  • All three tests pass reliably on local dev machines (re-ran each locally; immediate pass).
  • All three tests interact with ink rendering + an async useEffect or useState rerender; the spy assertion measures a call count or an arg shape that depends on timer/microtask ordering.
  • Failures are not deterministic per-platform — the same test that fails on Windows in one run passes on Windows in the next.

What would help

This isn't a hard fix request, more a tracking issue so PR authors stop re-triaging the same flake across rounds. Reasonable next steps if someone takes it:

  1. Quarantine the three tests (e.g. test.skipIf(process.env.CI) or vitest.config.ts testTimeout + retry: 2) until root-caused, so CI signal improves immediately.
  2. Audit those three tests for the underlying race — most likely candidates: missing act() wrappers around state updates, real timers leaking from prior tests in the file, or useEffect cleanup running on a different tick than the spy expected.
  3. Optionally: add @vitest/runners retry only for this file rather than globally, so retries don't mask other tests' bugs.

Why not just fix it here

This is well out of scope for any feature/bugfix PR — the flake exists on main and predates any single PR. Triaging the actual root cause is a focused exercise on infrastructure / test-harness, not on a code change. Filing this issue so the work can be picked up independently.

Metadata

Metadata

Assignees

Labels

category/cliCommand line interface and interactionscope/ci-cdContinuous integration/deploymentscope/testingTest frameworks and casestype/bugSomething isn't working as expected

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions