Draft upstream issue — Kanban dispatcher persistent connection / WAL FD pressure
Title: Gateway embedded Kanban dispatcher opens SQLite WAL connections every tick, causing FD/WAL pressure
Summary
The gateway embedded Kanban dispatcher currently opens and closes Kanban SQLite connections on every dispatcher tick. The dispatch path opens one connection per board, and the health telemetry path opens another connection per board on the same tick. In a long-running gateway process with a short dispatch interval, this creates repeated SQLite WAL/SHM connection churn and file descriptor pressure.
We observed this locally in a long-running Mission Control gateway process: lsof showed multiple open handles for kanban.db, kanban.db-wal, and kanban.db-shm. A prior local patch (DEC-2026-05-23-024) mitigated the close path by using a _WalSafeConnection that runs PRAGMA wal_checkpoint(TRUNCATE) before close, but that does not remove the underlying dispatcher churn pattern.
Affected area
gateway/run.py
- Embedded
_kanban_dispatcher_watcher()
- Kanban DB dispatch path and dispatcher health probe
Current behavior
Per tick:
_tick_once_for_board() opens _kb.connect(board=slug), calls _kb.dispatch_once(...), then closes the connection.
_ready_nonempty() opens another _kb.connect(board=slug) for health telemetry, checks spawnable ready/review tasks, then closes the connection.
- The watcher uses
asyncio.to_thread(...), so work may run on arbitrary default executor threads across ticks.
This is safe for event-loop blocking but unfavorable for persistent SQLite connection reuse because default sqlite connections are thread-affine.
Expected behavior
The embedded dispatcher should avoid per-tick SQLite WAL connection churn while keeping DB work off the event loop and preserving sqlite thread affinity.
Proposed fix
Use a dedicated single-thread ThreadPoolExecutor for dispatcher DB work and maintain a per-board persistent SQLite connection cache inside the dispatcher watcher:
- one executor thread named
kanban-dispatcher,
- one cached connection per active board,
- dispatch and ready/review health probes share the cached board connection,
- fingerprint changes close and reopen the cached connection,
- corrupt-board handling closes/discards cached connection and suppresses retry until DB fingerprint changes,
- watcher shutdown/cancellation closes all cached connections on the dispatcher executor thread.
This is upstreamable because it is a minimal runtime change and does not add deployment-specific assumptions.
Local validation
Focused tests added locally:
tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_uses_dedicated_single_thread_executor
tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_reuses_board_connection_across_ticks
tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_health_probe_uses_cached_connection
tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_closes_cached_connection_on_shutdown
tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_reopens_cached_connection_when_fingerprint_changes
tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_corrupt_board_closes_and_suppresses_until_fingerprint_changes
Command:
venv/bin/python -m pytest tests/gateway/test_kanban_dispatcher.py tests/hermes_cli/test_kanban_db.py -q
Result:
Related local evidence
Local deviation DEC-2026-05-23-024 previously addressed the close-path symptom with _WalSafeConnection.close() running PRAGMA wal_checkpoint(TRUNCATE) before super().close(). This issue is the underlying dispatcher lifecycle problem: repeated per-tick open/close cycles. The persistent dispatcher connection refactor reduces dependence on the close-path mitigation but does not replace the need for safe close behavior in the public kanban_db.connect() API.
Draft upstream issue — Kanban dispatcher persistent connection / WAL FD pressure
Title: Gateway embedded Kanban dispatcher opens SQLite WAL connections every tick, causing FD/WAL pressure
Summary
The gateway embedded Kanban dispatcher currently opens and closes Kanban SQLite connections on every dispatcher tick. The dispatch path opens one connection per board, and the health telemetry path opens another connection per board on the same tick. In a long-running gateway process with a short dispatch interval, this creates repeated SQLite WAL/SHM connection churn and file descriptor pressure.
We observed this locally in a long-running Mission Control gateway process:
lsofshowed multiple open handles forkanban.db,kanban.db-wal, andkanban.db-shm. A prior local patch (DEC-2026-05-23-024) mitigated the close path by using a_WalSafeConnectionthat runsPRAGMA wal_checkpoint(TRUNCATE)before close, but that does not remove the underlying dispatcher churn pattern.Affected area
gateway/run.py_kanban_dispatcher_watcher()Current behavior
Per tick:
_tick_once_for_board()opens_kb.connect(board=slug), calls_kb.dispatch_once(...), then closes the connection._ready_nonempty()opens another_kb.connect(board=slug)for health telemetry, checks spawnable ready/review tasks, then closes the connection.asyncio.to_thread(...), so work may run on arbitrary default executor threads across ticks.This is safe for event-loop blocking but unfavorable for persistent SQLite connection reuse because default sqlite connections are thread-affine.
Expected behavior
The embedded dispatcher should avoid per-tick SQLite WAL connection churn while keeping DB work off the event loop and preserving sqlite thread affinity.
Proposed fix
Use a dedicated single-thread
ThreadPoolExecutorfor dispatcher DB work and maintain a per-board persistent SQLite connection cache inside the dispatcher watcher:kanban-dispatcher,This is upstreamable because it is a minimal runtime change and does not add deployment-specific assumptions.
Local validation
Focused tests added locally:
tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_uses_dedicated_single_thread_executortests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_reuses_board_connection_across_tickstests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_health_probe_uses_cached_connectiontests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_closes_cached_connection_on_shutdowntests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_reopens_cached_connection_when_fingerprint_changestests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_corrupt_board_closes_and_suppresses_until_fingerprint_changesCommand:
Result:
Related local evidence
Local deviation DEC-2026-05-23-024 previously addressed the close-path symptom with
_WalSafeConnection.close()runningPRAGMA wal_checkpoint(TRUNCATE)beforesuper().close(). This issue is the underlying dispatcher lifecycle problem: repeated per-tick open/close cycles. The persistent dispatcher connection refactor reduces dependence on the close-path mitigation but does not replace the need for safe close behavior in the publickanban_db.connect()API.