Bug Report: Kanban SQLite database corruption under rapid task creation
Summary
The kanban SQLite database (~/.hermes/kanban.db) becomes corrupted (database disk image is malformed) when creating ~9-10 tasks in rapid succession via the kanban_create tool API. This has happened 3 times in 2 days under normal orchestrator workflow.
Environment
- Hermes Agent version: v0.14.0 (2026.5.16)
- Python: 3.11.15
- OS: Ubuntu 22.04 (Linux 6.8.0-117-generic)
- SQLite: bundled with Python 3.11
Steps to Reproduce
- Start gateway:
hermes gateway start
- Create tasks via
kanban_create tool API in a loop (or rapid succession)
- After ~9-10 tasks, the next
kanban_create call fails with:
{"error": "kanban_create: database disk image is malformed"}
- All subsequent kanban operations fail with the same error
Observed Behavior
kanban.db-wal file becomes 0 bytes after corruption
- The database file itself appears intact in size but is unreadable by SQLite
- Previously created tasks are lost
- Gateway continues running but cannot dispatch tasks
Expected Behavior
- Creating 10+ tasks sequentially should not corrupt the database
- WAL mode should handle concurrent access safely
- If corruption occurs, it should be recoverable without full re-initialization
Recovery Steps (currently required)
hermes gateway stop
cp ~/.hermes/kanban.db ~/.hermes/kanban.db.backup.$(date +%Y%m%d_%H%M%S)
rm -f ~/.hermes/kanban.db-shm ~/.hermes/kanban.db-wal
mv ~/.hermes/kanban.db ~/.hermes/kanban.db.corrupted.$(date +%Y%m%d_%H%M%S)
hermes kanban init
hermes gateway start
Additional Context
- The issue occurs when using the tool API (
kanban_create), not CLI commands
- We added 1-second delays between
kanban_create calls as a workaround, but this is not a fix
- The dispatcher holds an open DB connection; concurrent writes from tool API calls may race with WAL checkpointing
- Previous corruption incidents: 2025-05-23 (twice), 2025-05-24 (once)
Suggested Investigation
- Check if the kanban DB connection uses proper transaction isolation
- Verify WAL checkpoint behavior under rapid writes
- Consider adding an application-level write queue or mutex for kanban operations
- Add automatic WAL recovery on startup if
-wal or -shm files are stale
Attachments
Bug Report: Kanban SQLite database corruption under rapid task creation
Summary
The kanban SQLite database (
~/.hermes/kanban.db) becomes corrupted (database disk image is malformed) when creating ~9-10 tasks in rapid succession via thekanban_createtool API. This has happened 3 times in 2 days under normal orchestrator workflow.Environment
Steps to Reproduce
hermes gateway startkanban_createtool API in a loop (or rapid succession)kanban_createcall fails with:Observed Behavior
kanban.db-walfile becomes 0 bytes after corruptionExpected Behavior
Recovery Steps (currently required)
Additional Context
kanban_create), not CLI commandskanban_createcalls as a workaround, but this is not a fixSuggested Investigation
-walor-shmfiles are staleAttachments
kanban.dbandkanban.db-walfrom next corruption incident if helpful