Kanban SQLite database corruption under rapid task creation

## Bug Report: Kanban SQLite database corruption under rapid task creation

### Summary
The kanban SQLite database (`~/.hermes/kanban.db`) becomes corrupted (`database disk image is malformed`) when creating ~9-10 tasks in rapid succession via the `kanban_create` tool API. This has happened **3 times in 2 days** under normal orchestrator workflow.

### Environment
- **Hermes Agent version:** v0.14.0 (2026.5.16)
- **Python:** 3.11.15
- **OS:** Ubuntu 22.04 (Linux 6.8.0-117-generic)
- **SQLite:** bundled with Python 3.11

### Steps to Reproduce
1. Start gateway: `hermes gateway start`
2. Create tasks via `kanban_create` tool API in a loop (or rapid succession)
3. After ~9-10 tasks, the next `kanban_create` call fails with:
   ```
   {"error": "kanban_create: database disk image is malformed"}
   ```
4. All subsequent kanban operations fail with the same error

### Observed Behavior
- `kanban.db-wal` file becomes **0 bytes** after corruption
- The database file itself appears intact in size but is unreadable by SQLite
- Previously created tasks are lost
- Gateway continues running but cannot dispatch tasks

### Expected Behavior
- Creating 10+ tasks sequentially should not corrupt the database
- WAL mode should handle concurrent access safely
- If corruption occurs, it should be recoverable without full re-initialization

### Recovery Steps (currently required)
```bash
hermes gateway stop
cp ~/.hermes/kanban.db ~/.hermes/kanban.db.backup.$(date +%Y%m%d_%H%M%S)
rm -f ~/.hermes/kanban.db-shm ~/.hermes/kanban.db-wal
mv ~/.hermes/kanban.db ~/.hermes/kanban.db.corrupted.$(date +%Y%m%d_%H%M%S)
hermes kanban init
hermes gateway start
```

### Additional Context
- The issue occurs when using the **tool API** (`kanban_create`), not CLI commands
- We added 1-second delays between `kanban_create` calls as a workaround, but this is not a fix
- The dispatcher holds an open DB connection; concurrent writes from tool API calls may race with WAL checkpointing
- Previous corruption incidents: 2025-05-23 (twice), 2025-05-24 (once)

### Suggested Investigation
1. Check if the kanban DB connection uses proper transaction isolation
2. Verify WAL checkpoint behavior under rapid writes
3. Consider adding an application-level write queue or mutex for kanban operations
4. Add automatic WAL recovery on startup if `-wal` or `-shm` files are stale

### Attachments
- [ ] Will attach `kanban.db` and `kanban.db-wal` from next corruption incident if helpful


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kanban SQLite database corruption under rapid task creation #31502

Bug Report: Kanban SQLite database corruption under rapid task creation

Summary

Environment

Steps to Reproduce

Observed Behavior

Expected Behavior

Recovery Steps (currently required)

Additional Context

Suggested Investigation

Attachments

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Kanban SQLite database corruption under rapid task creation #31502

Description

Bug Report: Kanban SQLite database corruption under rapid task creation

Summary

Environment

Steps to Reproduce

Observed Behavior

Expected Behavior

Recovery Steps (currently required)

Additional Context

Suggested Investigation

Attachments

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions