Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

chore(kora): KR-TEST-STABILITY-XDIST — fix email handler xdist flake#152

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-TEST-STABILITY-XDIST
May 23, 2026
Merged

chore(kora): KR-TEST-STABILITY-XDIST — fix email handler xdist flake#152
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-TEST-STABILITY-XDIST

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

Fixes the intermittent xdist flake CC#1 flagged in #149: `test_email_inbound_handler.py` failing 4-6 tests with `HANDLED_RECEIVED vs filtered_paused` mismatches. Test-side fix only — production code untouched (spec §3 explicit non-scope).

Diagnostic write-up

Repro (verified pre-fix)

```bash
for i in $(seq 1 15); do
pytest tests/kora_cli/alerts tests/kora_cli/test_listeners \
tests/kora_cli/handlers tests/kora_cli/audit \
tests/kora_cli/clients -n auto --tb=line -q
done
```

Result: 3-4 out of 15 runs fail with 4-7 `test_email_inbound_handler.py` failures. Same failure shape CC#1 reported: `AssertionError: assert 'filtered_paused' == 'received'`.

Root cause

Two test files install an `OperationalStateHolder` via direct `h_mod._HOLDER = holder` assignment without teardown reset:

File Source bucket Mutation count
`tests/kora_cli/test_listeners/test_mcp_tools_stop_control.py` KR-MCP-STOP-CONTROL ST1 (#142) 13 sites in `_holder_with_state` helper
`tests/kora_cli/test_listeners/test_mcp_audit_on_denial.py` KR-MCP-AUDIT-ON-DENIAL (#150) 1 site in success-path test

The handler's state-gate at `kora_cli/handlers/email_inbound_handler.py:509-526` calls `get_holder()` (line 515) and returns `filtered_paused` if the holder reports `PrimaryState.PAUSED` (line 522). When a dirty holder leaks in, the email handler sees stale state.

Why xdist surfaces it but serial doesn't

xdist workers are separate processes — state can't bleed across worker boundaries. But within a single worker, tests run sequentially in scheduler-determined order. The xdist scheduler doesn't preserve pytest's default alphabetic file ordering; it uses worksteal-with-load-balancing.

In serial mode, pytest collects in alphabetic file order — `handlers/` runs before `test_listeners/` — so the leaking mutating tests never precede the email handler tests. xdist breaks this ordering coincidence and surfaces the latent bug.

Fix

Three autouse `_reset_operational_state_holder` fixtures (one per affected file) that set `h_mod._HOLDER = None` at both setup and teardown:

File Role
`test_mcp_tools_stop_control.py` Plug the leak source
`test_mcp_audit_on_denial.py` Plug the second leak source
`test_email_inbound_handler.py` Defensive belt-and-suspenders — protects against future leak sources

Pattern documented for future test authors

Each new fixture has a docstring explaining the pattern. The TL;DR (also in the commit message):

Pattern Verdict
`monkeypatch.setattr(h_mod, "_HOLDER", holder)` ✅ Auto-restored by pytest
Autouse fixture resetting at setup + teardown ✅ Reliable
Direct `h_mod._HOLDER = holder` with no reset ❌ Bleeds under xdist

Verification

Run Result
30/30 xdist runs of the failing suite (627 tests) All green
5/5 spec-mandated xdist runs All green
Serial run (627 tests, same scope) All green
Full repo xdist (`tests/kora_cli + tests/agent`) 9171 passed; 43 failed — zero in `test_email_inbound_handler.py` (all pre-existing in `test_anthropic_adapter`, `test_gateway_service`, `test_web_server*`, `test_model_switch_custom_providers`, etc.)

Test plan

  • 30x xdist verification runs all green (spec asks for 5+; ran 30x to confirm)
  • 5x explicit spec-mandated runs all green
  • Serial run sanity (no behavior change)
  • Full repo xdist green for `test_email_inbound_handler.py` (zero failures)
  • Pre-existing failures unchanged (43 failed under xdist matches ST3 baseline)
  • No new flakes introduced
  • Defensive pattern documented in three fixture docstrings + this PR body

After this lands, CI is deterministic for the email handler scope. The same fixture pattern is documented for future test authors to apply when they mutate any module-level singleton.

🤖 Generated with Claude Code

CC#1 flagged in PR #149: tests/kora_cli/handlers/test_email_
inbound_handler.py flakes under pytest-xdist (4-6 failures with
HANDLED_RECEIVED vs filtered_paused mismatch). Serial run: clean.

# Root cause

Two test files installed an OperationalStateHolder via direct
``h_mod._HOLDER = holder`` assignment without teardown reset:

  - tests/kora_cli/test_listeners/test_mcp_tools_stop_control.py
    (KR-MCP-STOP-CONTROL ST1, PR #142) — _holder_with_state helper
    mutates _HOLDER 13 times across the suite. No cleanup.
  - tests/kora_cli/test_listeners/test_mcp_audit_on_denial.py
    (KR-MCP-AUDIT-ON-DENIAL, PR #150) — success-path test
    installs an ACTIVE holder. No cleanup.

Why xdist surfaces it: xdist workers are separate processes, so
state doesn't cross workers. But WITHIN a single worker tests run
serially in scheduled order. When the scheduler put one of these
mutating tests on the same worker as test_email_inbound_handler
AND the mutating test ran first, the dirty PAUSED/ACTIVE holder
persisted. The email handler's state-gate (_check_state_gate in
email_inbound_handler.py:509) then saw the stale PAUSED holder
and returned HANDLED_FILTERED_PAUSED instead of HANDLED_RECEIVED.

Why serial doesn't flake: pytest's default test ordering puts
the handlers/ tree BEFORE the test_listeners/ tree alphabetically,
so the mutating tests never run before the email tests in serial
mode. xdist's worksteal scheduler doesn't preserve that ordering.

# Fix

Three autouse ``_reset_operational_state_holder`` fixtures (one
per affected file) that set ``h_mod._HOLDER = None`` at both
setup and teardown:

  - test_mcp_tools_stop_control.py — fixes the bleed source
  - test_mcp_audit_on_denial.py — fixes the second bleed source
  - test_email_inbound_handler.py — DEFENSIVE belt-and-suspenders
    so a future test author who forgets the teardown doesn't
    re-poison the email handler tests

The defensive autouse in test_email_inbound_handler.py is the
documented pattern: any test file whose tests assume a fresh
``OperationalStateHolder`` singleton should opt-in via this
fixture pattern, even if today's leaking-sources are all fixed.

Not changing production code (the singleton design itself stays
— spec §3 explicit non-scope). The fix is test-side only.

# Verification

Repro confirmed pre-fix: 3-4 out of 15 xdist runs failed.

Post-fix verification (xdist):
  - 30/30 xdist runs of alerts + test_listeners + handlers +
    audit + clients (627 tests) — all green
  - 5/5 explicit spec-mandated xdist runs — all green
  - Full repo (tests/kora_cli + tests/agent) xdist: 9171
    passed, 43 failed — failures all pre-existing in
    test_anthropic_adapter.py / test_backup.py /
    test_gateway_service.py / test_web_server*.py /
    test_model_switch_custom_providers.py (zero failures in
    test_email_inbound_handler.py)

Serial regression unchanged: 627/627 still green.

# Pattern for future test authors

When mutating a module-level singleton in tests:

  GOOD: monkeypatch.setattr(h_mod, "_HOLDER", holder) — pytest
        auto-restores at test teardown.

  ALSO GOOD: an autouse fixture that resets the singleton at
        setup AND teardown (the pattern landed here).

  BAD: ``h_mod._HOLDER = holder`` — the assignment persists past
        the test boundary and can bleed across the xdist worker
        into any other test in the same worker scheduled later.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker merged commit 234eb3f into feature/phase2-upgrades May 23, 2026
@rafe-walker rafe-walker deleted the feat/kora-KR-TEST-STABILITY-XDIST branch May 23, 2026 21:32
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant