[Bug]: kanban.db corruption when multiple profile gateways share the same board DB

## Description

When multiple Hermes profile gateways (e.g., `--profile bingge`, `--profile pixiel`, `--profile mafei`) run concurrently and share the same kanban board DB (`~/.hermes/kanban.db`), the SQLite database becomes corrupted. The corruption specifically affects the `kanban_notify_subs` table indexes.

This is **not** a false positive from `PRAGMA integrity_check` — the indexes genuinely become inconsistent with the table data.

## Environment

- macOS (APFS filesystem)
- SQLite 3.51.0
- Hermes Agent (latest main branch)
- 4 gateway processes: default + 3 named profiles, all sharing `kanban.db` by design

## Root Cause Analysis

### Architecture
- `kanban_home()` intentionally resolves to `~/.hermes/` for all profiles (by design, per the docstring: "The kanban board is shared across profiles")
- Each profile gateway runs its own dispatcher, which opens independent SQLite connections
- CLI commands (`hermes kanban create/complete/block/link`) also open new connections

### The Race
Multiple processes concurrently execute `BEGIN IMMEDIATE` write transactions against the same DB. While SQLite WAL mode supports concurrent readers + single writer per connection, **concurrent WAL checkpoints from separate processes** can corrupt the main DB file.

### Evidence
1. **4 gateway processes** had open file descriptors on `kanban.db` at time of corruption
2. Last events before corruption show rapid concurrent activity from different profile dispatchers:
   - bingge gateway: completed Sprint 3 PRD (13:04:34)
   - pixiel gateway: spawned Sprint 3 design task (13:04:36)
   - mafei gateway: protocol_violation → gave_up → re-spawned Sprint 2 (13:05:04)
3. Corruption was in `kanban_notify_subs` indexes (`idx_notify_task` + `sqlite_autoindex_kanban_notify_subs_1`)
4. `PRAGMA integrity_check` returned:
   ```
   Tree 10 page 10: btreeInitPage() returns error code 11
   wrong # of entries in index idx_notify_task
   wrong # of entries in index sqlite_autoindex_kanban_notify_subs_1
   ```

## Steps to Reproduce

1. Start 3+ profile gateways: `hermes --profile X gateway run --replace`
2. Run kanban operations that trigger concurrent writes (task create + claim + notify-subscribe)
3. Observe corruption after ~30-60 minutes of active use

## Suggested Fixes

### Short-term
Add a file-level advisory lock (`fcntl.flock`) around all kanban write operations in `kanban_db.py`. The existing `BEGIN IMMEDIATE` handles SQLite-level serialization, but doesn't protect against concurrent WAL checkpoints from separate processes.

### Medium-term
Serialize kanban writes through a single writer process/thread. Each gateway could send write requests to a central kanban writer instead of opening independent connections.

### Long-term
Consider PostgreSQL as an optional backend for multi-profile setups. SQLite's WAL mode has documented limitations with concurrent writers from separate processes.

## Workaround

Currently working around by:
1. Monitoring for corruption via `PRAGMA integrity_check`
2. Recovering from `.recover.*.sql` dumps when corruption is detected
3. Restarting all gateways after recovery

This is fragile — the recovery SQL can be stale, losing recent task state.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: kanban.db corruption when multiple profile gateways share the same board DB #32424

Description

Environment

Root Cause Analysis

Architecture

The Race

Evidence

Steps to Reproduce

Suggested Fixes

Short-term

Medium-term

Long-term

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug]: kanban.db corruption when multiple profile gateways share the same board DB #32424

Description

Description

Environment

Root Cause Analysis

Architecture

The Race

Evidence

Steps to Reproduce

Suggested Fixes

Short-term

Medium-term

Long-term

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions