Gateway embedded Kanban dispatcher opens SQLite WAL connections every tick, causing FD/WAL pressure

# Draft upstream issue — Kanban dispatcher persistent connection / WAL FD pressure

Title: Gateway embedded Kanban dispatcher opens SQLite WAL connections every tick, causing FD/WAL pressure

## Summary

The gateway embedded Kanban dispatcher currently opens and closes Kanban SQLite connections on every dispatcher tick. The dispatch path opens one connection per board, and the health telemetry path opens another connection per board on the same tick. In a long-running gateway process with a short dispatch interval, this creates repeated SQLite WAL/SHM connection churn and file descriptor pressure.

We observed this locally in a long-running Mission Control gateway process: `lsof` showed multiple open handles for `kanban.db`, `kanban.db-wal`, and `kanban.db-shm`. A prior local patch (DEC-2026-05-23-024) mitigated the close path by using a `_WalSafeConnection` that runs `PRAGMA wal_checkpoint(TRUNCATE)` before close, but that does not remove the underlying dispatcher churn pattern.

## Affected area

- `gateway/run.py`
- Embedded `_kanban_dispatcher_watcher()`
- Kanban DB dispatch path and dispatcher health probe

## Current behavior

Per tick:

1. `_tick_once_for_board()` opens `_kb.connect(board=slug)`, calls `_kb.dispatch_once(...)`, then closes the connection.
2. `_ready_nonempty()` opens another `_kb.connect(board=slug)` for health telemetry, checks spawnable ready/review tasks, then closes the connection.
3. The watcher uses `asyncio.to_thread(...)`, so work may run on arbitrary default executor threads across ticks.

This is safe for event-loop blocking but unfavorable for persistent SQLite connection reuse because default sqlite connections are thread-affine.

## Expected behavior

The embedded dispatcher should avoid per-tick SQLite WAL connection churn while keeping DB work off the event loop and preserving sqlite thread affinity.

## Proposed fix

Use a dedicated single-thread `ThreadPoolExecutor` for dispatcher DB work and maintain a per-board persistent SQLite connection cache inside the dispatcher watcher:

- one executor thread named `kanban-dispatcher`,
- one cached connection per active board,
- dispatch and ready/review health probes share the cached board connection,
- fingerprint changes close and reopen the cached connection,
- corrupt-board handling closes/discards cached connection and suppresses retry until DB fingerprint changes,
- watcher shutdown/cancellation closes all cached connections on the dispatcher executor thread.

This is upstreamable because it is a minimal runtime change and does not add deployment-specific assumptions.

## Local validation

Focused tests added locally:

- `tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_uses_dedicated_single_thread_executor`
- `tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_reuses_board_connection_across_ticks`
- `tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_health_probe_uses_cached_connection`
- `tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_closes_cached_connection_on_shutdown`
- `tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_reopens_cached_connection_when_fingerprint_changes`
- `tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_corrupt_board_closes_and_suppresses_until_fingerprint_changes`

Command:

```bash
venv/bin/python -m pytest tests/gateway/test_kanban_dispatcher.py tests/hermes_cli/test_kanban_db.py -q
```

Result:

```text
178 passed in 4.59s
```

## Related local evidence

Local deviation DEC-2026-05-23-024 previously addressed the close-path symptom with `_WalSafeConnection.close()` running `PRAGMA wal_checkpoint(TRUNCATE)` before `super().close()`. This issue is the underlying dispatcher lifecycle problem: repeated per-tick open/close cycles. The persistent dispatcher connection refactor reduces dependence on the close-path mitigation but does not replace the need for safe close behavior in the public `kanban_db.connect()` API.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gateway embedded Kanban dispatcher opens SQLite WAL connections every tick, causing FD/WAL pressure #31736

Draft upstream issue — Kanban dispatcher persistent connection / WAL FD pressure

Summary

Affected area

Current behavior

Expected behavior

Proposed fix

Local validation

Related local evidence

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Gateway embedded Kanban dispatcher opens SQLite WAL connections every tick, causing FD/WAL pressure #31736

Description

Draft upstream issue — Kanban dispatcher persistent connection / WAL FD pressure

Summary

Affected area

Current behavior

Expected behavior

Proposed fix

Local validation

Related local evidence

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions