Skip to content

Flaky test: notification-poller dedup test fails when poller threads leaked by earlier tests race on the global completion_queue #44789

@AIalliAI

Description

@AIalliAI

Summary

tests/test_tui_gateway_server.py::test_notification_poller_emits_distinct_watch_matches_once is flaky in CI: it can fail with AssertionError: assert 3 == 2 (one extra status.update emission). The root cause is a test-isolation gap, not the dedup logic itself: notification-poller threads leaked by earlier tests in the same pytest process race on the global process_registry.completion_queue, and each poller thread keeps its own _emitted dedup set, so an exact-replay event consumed by a different (leaked) poller thread is not deduped.

Observed failure

Seen on PR #44061, run 27404366238 test (6) (the test itself is unmodified there):

__________ test_notification_poller_emits_distinct_watch_matches_once __________
>           assert len(status_calls) == 2
E           AssertionError: assert 3 == 2
E            +  where 3 = len([('status.update', '0abe5e79', {'command': 'tail -f app.log', 'event_type': 'watch_match', ...}),
                               ...
                               ('status.update', 'ec783fb7', {'command': 'tail -f app.log', 'event_type': 'watch_match', ...})])

Smoking gun

The test starts its own poller with sid_watch_dedup, yet the captured emissions carry sids 0abe5e79 and ec783fb7 — sessions from earlier tests. The mechanism:

  1. Any test that exercises the session-init paths starts a daemon poller via _start_notification_poller (tui_gateway/server.py); the only handles to stop it are the returned stop event stored as _sessions[sid]["_notif_stop"] and the session's _finalized flag.
  2. Tests typically clean up with server._sessions.pop(sid) — which removes the dict entry but never sets the stop event or _finalized, so the thread keeps polling process_registry.completion_queue.get(timeout=0.5) for the rest of the pytest process.
  3. The leaked threads call server._emit dynamically, so they emit through whatever monkeypatch the currently running test installed.
  4. The dedup test drains the queue and enqueues 3 watch_match events (2 distinct + 1 exact replay) whose process session_id (proc_watch_dedup) is owned by no live session, so _notification_event_belongs_elsewhere lets any poller consume them. Whichever mix of the test's own drain pass and leaked threads consumes them, the replay lands in a thread with a fresh _emitted set → 3 emissions instead of 2.

The per-thread _emitted set is fine in production (one poller per session); only the cross-test thread leak breaks the invariant.

Suggested fixes (either suffices)

  • Test-side: an autouse fixture in tests/test_tui_gateway_server.py that wraps _start_notification_poller, records every returned stop event, and sets them all at teardown (popping _sessions alone does not stop the thread). Patching completion_queue per-test does not help — the loop re-reads the attribute every iteration, so leaked threads would consume from the fresh queue too.
  • Server-side: have _notification_poller_loop also exit when its sid is no longer in _sessions, which makes the existing _sessions.pop(...) cleanup sufficient and also reaps pollers for dead sessions in production.

Happy to PR whichever direction is preferred.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/tuiTerminal UI (ui-tui/ + tui_gateway/)type/testTest coverage or test infrastructure

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions