Flaky test: notification-poller dedup test fails when poller threads leaked by earlier tests race on the global completion_queue

## Summary

`tests/test_tui_gateway_server.py::test_notification_poller_emits_distinct_watch_matches_once` is flaky in CI: it can fail with `AssertionError: assert 3 == 2` (one extra `status.update` emission). The root cause is a test-isolation gap, not the dedup logic itself: **notification-poller threads leaked by earlier tests in the same pytest process race on the global `process_registry.completion_queue`**, and each poller thread keeps its own `_emitted` dedup set, so an exact-replay event consumed by a *different* (leaked) poller thread is not deduped.

## Observed failure

Seen on PR #44061, run [27404366238](https://github.com/NousResearch/hermes-agent/actions/runs/27404366238) `test (6)` (the test itself is unmodified there):

```
__________ test_notification_poller_emits_distinct_watch_matches_once __________
>           assert len(status_calls) == 2
E           AssertionError: assert 3 == 2
E            +  where 3 = len([('status.update', '0abe5e79', {'command': 'tail -f app.log', 'event_type': 'watch_match', ...}),
                               ...
                               ('status.update', 'ec783fb7', {'command': 'tail -f app.log', 'event_type': 'watch_match', ...})])
```

## Smoking gun

The test starts its own poller with `sid_watch_dedup`, yet the captured emissions carry sids `0abe5e79` and `ec783fb7` — sessions from *earlier tests*. The mechanism:

1. Any test that exercises the session-init paths starts a daemon poller via `_start_notification_poller` (`tui_gateway/server.py`); the only handles to stop it are the returned `stop` event stored as `_sessions[sid]["_notif_stop"]` and the session's `_finalized` flag.
2. Tests typically clean up with `server._sessions.pop(sid)` — which removes the dict entry but never sets the stop event or `_finalized`, so the thread keeps polling `process_registry.completion_queue.get(timeout=0.5)` for the rest of the pytest process.
3. The leaked threads call `server._emit` dynamically, so they emit through whatever monkeypatch the *currently running* test installed.
4. The dedup test drains the queue and enqueues 3 `watch_match` events (2 distinct + 1 exact replay) whose process `session_id` (`proc_watch_dedup`) is owned by no live session, so `_notification_event_belongs_elsewhere` lets any poller consume them. Whichever mix of the test's own drain pass and leaked threads consumes them, the replay lands in a thread with a fresh `_emitted` set → 3 emissions instead of 2.

The per-thread `_emitted` set is fine in production (one poller per session); only the cross-test thread leak breaks the invariant.

## Suggested fixes (either suffices)

- Test-side: an autouse fixture in `tests/test_tui_gateway_server.py` that wraps `_start_notification_poller`, records every returned stop event, and sets them all at teardown (popping `_sessions` alone does not stop the thread). Patching `completion_queue` per-test does **not** help — the loop re-reads the attribute every iteration, so leaked threads would consume from the fresh queue too.
- Server-side: have `_notification_poller_loop` also exit when its `sid` is no longer in `_sessions`, which makes the existing `_sessions.pop(...)` cleanup sufficient and also reaps pollers for dead sessions in production.

Happy to PR whichever direction is preferred.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky test: notification-poller dedup test fails when poller threads leaked by earlier tests race on the global completion_queue #44789

Summary

Observed failure

Smoking gun

Suggested fixes (either suffices)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Flaky test: notification-poller dedup test fails when poller threads leaked by earlier tests race on the global completion_queue #44789

Description

Summary

Observed failure

Smoking gun

Suggested fixes (either suffices)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions