Skip to content

[Bug] kanban.db corrupted on WSL2 with multiple gateway processes #32749

@culdise

Description

@culdise

Environment

  • OS: WSL2 (Windows 11), Ubuntu
  • hermes-agent: v0.14.0 (latest main branch)
  • SQLite: 3.37.2
  • Filesystem: ext4 on WSL2 virtual disk (VHDX)

Problem

kanban.db gets corrupted when multiple gateway processes run simultaneously.

Error messages:

  • sqlite3.DatabaseError: database disk image is malformed
  • sqlite3.OperationalError: disk I/O error
  • ERROR kanban dispatcher: board default database is not a valid SQLite database

Root Cause

  1. Multiple gateway processes (12+) open the same ~/.hermes/kanban.db file simultaneously
  2. Each process holds 2-6 file descriptors to the database file
  3. WAL mode's -shm (shared memory) file has synchronization issues on WSL2's 9p filesystem
  4. Even with flock-based serialization in write_txn(), corruption still occurs
  5. Corruption happens at the filesystem level, not SQLite protocol level

Attempted Fixes

  1. Added flock (LOCK_EX/LOCK_UN) in write_txn() — works in isolated tests but corruption still in production
  2. Added PRAGMA busy_timeout=5000 — reduces but doesn't eliminate corruption
  3. Tried tmpfs for kanban.db — works but data lost on WSL shutdown
  4. Current workaround: watchdog script auto-rebuilds database on corruption

Reproduction

  1. Start 12+ gateway processes (multiple profiles)
  2. Dispatch 5+ concurrent kanban tasks
  3. Within minutes, database corruption occurs

Request

Is there a recommended configuration or fix for running multiple gateway processes with a shared kanban.db on WSL2?

Related: microsoft/WSL#2395 (sqlite write locks aren't respected in WSL)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low — cosmetic, nice to havecomp/cronCron scheduler and job managementtype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions