Summary
tests/test_tui_gateway_server.py::test_notification_poller_emits_distinct_watch_matches_once is flaky in CI: it can fail with AssertionError: assert 3 == 2 (one extra status.update emission). The root cause is a test-isolation gap, not the dedup logic itself: notification-poller threads leaked by earlier tests in the same pytest process race on the global process_registry.completion_queue, and each poller thread keeps its own _emitted dedup set, so an exact-replay event consumed by a different (leaked) poller thread is not deduped.
Observed failure
Seen on PR #44061, run 27404366238 test (6) (the test itself is unmodified there):
__________ test_notification_poller_emits_distinct_watch_matches_once __________
> assert len(status_calls) == 2
E AssertionError: assert 3 == 2
E + where 3 = len([('status.update', '0abe5e79', {'command': 'tail -f app.log', 'event_type': 'watch_match', ...}),
...
('status.update', 'ec783fb7', {'command': 'tail -f app.log', 'event_type': 'watch_match', ...})])
Smoking gun
The test starts its own poller with sid_watch_dedup, yet the captured emissions carry sids 0abe5e79 and ec783fb7 — sessions from earlier tests. The mechanism:
- Any test that exercises the session-init paths starts a daemon poller via
_start_notification_poller (tui_gateway/server.py); the only handles to stop it are the returned stop event stored as _sessions[sid]["_notif_stop"] and the session's _finalized flag.
- Tests typically clean up with
server._sessions.pop(sid) — which removes the dict entry but never sets the stop event or _finalized, so the thread keeps polling process_registry.completion_queue.get(timeout=0.5) for the rest of the pytest process.
- The leaked threads call
server._emit dynamically, so they emit through whatever monkeypatch the currently running test installed.
- The dedup test drains the queue and enqueues 3
watch_match events (2 distinct + 1 exact replay) whose process session_id (proc_watch_dedup) is owned by no live session, so _notification_event_belongs_elsewhere lets any poller consume them. Whichever mix of the test's own drain pass and leaked threads consumes them, the replay lands in a thread with a fresh _emitted set → 3 emissions instead of 2.
The per-thread _emitted set is fine in production (one poller per session); only the cross-test thread leak breaks the invariant.
Suggested fixes (either suffices)
- Test-side: an autouse fixture in
tests/test_tui_gateway_server.py that wraps _start_notification_poller, records every returned stop event, and sets them all at teardown (popping _sessions alone does not stop the thread). Patching completion_queue per-test does not help — the loop re-reads the attribute every iteration, so leaked threads would consume from the fresh queue too.
- Server-side: have
_notification_poller_loop also exit when its sid is no longer in _sessions, which makes the existing _sessions.pop(...) cleanup sufficient and also reaps pollers for dead sessions in production.
Happy to PR whichever direction is preferred.
Summary
tests/test_tui_gateway_server.py::test_notification_poller_emits_distinct_watch_matches_onceis flaky in CI: it can fail withAssertionError: assert 3 == 2(one extrastatus.updateemission). The root cause is a test-isolation gap, not the dedup logic itself: notification-poller threads leaked by earlier tests in the same pytest process race on the globalprocess_registry.completion_queue, and each poller thread keeps its own_emitteddedup set, so an exact-replay event consumed by a different (leaked) poller thread is not deduped.Observed failure
Seen on PR #44061, run 27404366238
test (6)(the test itself is unmodified there):Smoking gun
The test starts its own poller with
sid_watch_dedup, yet the captured emissions carry sids0abe5e79andec783fb7— sessions from earlier tests. The mechanism:_start_notification_poller(tui_gateway/server.py); the only handles to stop it are the returnedstopevent stored as_sessions[sid]["_notif_stop"]and the session's_finalizedflag.server._sessions.pop(sid)— which removes the dict entry but never sets the stop event or_finalized, so the thread keeps pollingprocess_registry.completion_queue.get(timeout=0.5)for the rest of the pytest process.server._emitdynamically, so they emit through whatever monkeypatch the currently running test installed.watch_matchevents (2 distinct + 1 exact replay) whose processsession_id(proc_watch_dedup) is owned by no live session, so_notification_event_belongs_elsewherelets any poller consume them. Whichever mix of the test's own drain pass and leaked threads consumes them, the replay lands in a thread with a fresh_emittedset → 3 emissions instead of 2.The per-thread
_emittedset is fine in production (one poller per session); only the cross-test thread leak breaks the invariant.Suggested fixes (either suffices)
tests/test_tui_gateway_server.pythat wraps_start_notification_poller, records every returned stop event, and sets them all at teardown (popping_sessionsalone does not stop the thread). Patchingcompletion_queueper-test does not help — the loop re-reads the attribute every iteration, so leaked threads would consume from the fresh queue too._notification_poller_loopalso exit when itssidis no longer in_sessions, which makes the existing_sessions.pop(...)cleanup sufficient and also reaps pollers for dead sessions in production.Happy to PR whichever direction is preferred.