This repository was archived by the owner on May 26, 2026. It is now read-only.
chore(kora): KR-TEST-STABILITY-XDIST — fix email handler xdist flake#152
Merged
rafe-walker merged 1 commit intoMay 23, 2026
Merged
Conversation
CC#1 flagged in PR #149: tests/kora_cli/handlers/test_email_ inbound_handler.py flakes under pytest-xdist (4-6 failures with HANDLED_RECEIVED vs filtered_paused mismatch). Serial run: clean. # Root cause Two test files installed an OperationalStateHolder via direct ``h_mod._HOLDER = holder`` assignment without teardown reset: - tests/kora_cli/test_listeners/test_mcp_tools_stop_control.py (KR-MCP-STOP-CONTROL ST1, PR #142) — _holder_with_state helper mutates _HOLDER 13 times across the suite. No cleanup. - tests/kora_cli/test_listeners/test_mcp_audit_on_denial.py (KR-MCP-AUDIT-ON-DENIAL, PR #150) — success-path test installs an ACTIVE holder. No cleanup. Why xdist surfaces it: xdist workers are separate processes, so state doesn't cross workers. But WITHIN a single worker tests run serially in scheduled order. When the scheduler put one of these mutating tests on the same worker as test_email_inbound_handler AND the mutating test ran first, the dirty PAUSED/ACTIVE holder persisted. The email handler's state-gate (_check_state_gate in email_inbound_handler.py:509) then saw the stale PAUSED holder and returned HANDLED_FILTERED_PAUSED instead of HANDLED_RECEIVED. Why serial doesn't flake: pytest's default test ordering puts the handlers/ tree BEFORE the test_listeners/ tree alphabetically, so the mutating tests never run before the email tests in serial mode. xdist's worksteal scheduler doesn't preserve that ordering. # Fix Three autouse ``_reset_operational_state_holder`` fixtures (one per affected file) that set ``h_mod._HOLDER = None`` at both setup and teardown: - test_mcp_tools_stop_control.py — fixes the bleed source - test_mcp_audit_on_denial.py — fixes the second bleed source - test_email_inbound_handler.py — DEFENSIVE belt-and-suspenders so a future test author who forgets the teardown doesn't re-poison the email handler tests The defensive autouse in test_email_inbound_handler.py is the documented pattern: any test file whose tests assume a fresh ``OperationalStateHolder`` singleton should opt-in via this fixture pattern, even if today's leaking-sources are all fixed. Not changing production code (the singleton design itself stays — spec §3 explicit non-scope). The fix is test-side only. # Verification Repro confirmed pre-fix: 3-4 out of 15 xdist runs failed. Post-fix verification (xdist): - 30/30 xdist runs of alerts + test_listeners + handlers + audit + clients (627 tests) — all green - 5/5 explicit spec-mandated xdist runs — all green - Full repo (tests/kora_cli + tests/agent) xdist: 9171 passed, 43 failed — failures all pre-existing in test_anthropic_adapter.py / test_backup.py / test_gateway_service.py / test_web_server*.py / test_model_switch_custom_providers.py (zero failures in test_email_inbound_handler.py) Serial regression unchanged: 627/627 still green. # Pattern for future test authors When mutating a module-level singleton in tests: GOOD: monkeypatch.setattr(h_mod, "_HOLDER", holder) — pytest auto-restores at test teardown. ALSO GOOD: an autouse fixture that resets the singleton at setup AND teardown (the pattern landed here). BAD: ``h_mod._HOLDER = holder`` — the assignment persists past the test boundary and can bleed across the xdist worker into any other test in the same worker scheduled later. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the intermittent xdist flake CC#1 flagged in #149: `test_email_inbound_handler.py` failing 4-6 tests with `HANDLED_RECEIVED vs filtered_paused` mismatches. Test-side fix only — production code untouched (spec §3 explicit non-scope).
Diagnostic write-up
Repro (verified pre-fix)
```bash
for i in $(seq 1 15); do
pytest tests/kora_cli/alerts tests/kora_cli/test_listeners \
tests/kora_cli/handlers tests/kora_cli/audit \
tests/kora_cli/clients -n auto --tb=line -q
done
```
Result: 3-4 out of 15 runs fail with 4-7 `test_email_inbound_handler.py` failures. Same failure shape CC#1 reported: `AssertionError: assert 'filtered_paused' == 'received'`.
Root cause
Two test files install an `OperationalStateHolder` via direct `h_mod._HOLDER = holder` assignment without teardown reset:
The handler's state-gate at `kora_cli/handlers/email_inbound_handler.py:509-526` calls `get_holder()` (line 515) and returns `filtered_paused` if the holder reports `PrimaryState.PAUSED` (line 522). When a dirty holder leaks in, the email handler sees stale state.
Why xdist surfaces it but serial doesn't
xdist workers are separate processes — state can't bleed across worker boundaries. But within a single worker, tests run sequentially in scheduler-determined order. The xdist scheduler doesn't preserve pytest's default alphabetic file ordering; it uses worksteal-with-load-balancing.
In serial mode, pytest collects in alphabetic file order — `handlers/` runs before `test_listeners/` — so the leaking mutating tests never precede the email handler tests. xdist breaks this ordering coincidence and surfaces the latent bug.
Fix
Three autouse `_reset_operational_state_holder` fixtures (one per affected file) that set `h_mod._HOLDER = None` at both setup and teardown:
Pattern documented for future test authors
Each new fixture has a docstring explaining the pattern. The TL;DR (also in the commit message):
Verification
Test plan
After this lands, CI is deterministic for the email handler scope. The same fixture pattern is documented for future test authors to apply when they mutate any module-level singleton.
🤖 Generated with Claude Code