Skip to content

Kanban: corrupted board DB + empty top-level DB → silent recreation with total data loss #30687

@msalles1

Description

@msalles1

Description

When a board-specific kanban.db is corrupted (e.g., from a dispatcher crash mid-write) AND the top-level kanban DB (~/.hermes/kanban/<board>.db) is empty or missing, running any command that touches the board (e.g., hermes kanban boards switch <board>, hermes kanban list) silently recreates both databases from scratch — with zero rows in all tables. Full data loss.

Timeline from a real incident (2026-05-22)

Time (BRT) Event
14:07 Dispatcher crashes with exception in dispatch_once (kanban_db.py:4776). DB write interrupted mid-operation
15:43 Gateway detects corrupted DB: "not a valid SQLite database; disabling dispatch"
16:00 Cron job runs hermes kanban boards switch casalmaisfertil && hermes kanban list
16:02 New kanban.db created (106KB, full schema, 0 rows in all tables)
16:03 Gateway: "database changed; retrying dispatch" — resumes on empty board

Evidence

15:43 — DB detected as invalid:

ERROR gateway.run: kanban dispatcher: board casalmaisfertil database
/root/.hermes/kanban/boards/casalmaisfertil/kanban.db is not a valid SQLite database;
disabling dispatch for this board until the file changes or the gateway restarts.

16:02 — New DB created without notify_subs table (race condition):

WARNING gateway.run: kanban notifier tick failed: no such table: kanban_notify_subs

16:03 — Dispatch resumes on empty board:

INFO gateway.run: kanban dispatcher: board casalmaisfertil database changed; retrying dispatch

Aftermath: 7 tasks across 2 content waves, including dependencies, comments, run history — all lost. Reconstructed manually from memory and workspace artifacts.

Root Cause

The top-level kanban DB was 0 bytes (never properly initialized or also corrupted). When commands encounter:

  1. A top-level DB that is empty/invalid
  2. A board DB that is corrupted/not valid SQLite

...the system silently initializes both from scratch, without:

  • Attempting WAL/journal recovery
  • Creating a backup of the corrupted file
  • Emitting a warning
  • Asking for confirmation

Expected Behavior

  1. Never silently recreate a board DB. Halt and require explicit hermes kanban init
  2. Attempt recovery first: PRAGMA integrity_check, PRAGMA wal_checkpoint(TRUNCATE)
  3. Backup before wipe: copy corrupted DB to .bak before recreating

Related Issues

Environment

  • Hermes: v0.14.0 (2026.5.16), Python 3.11.15, Linux
  • Gateway embedded dispatcher: kanban.dispatch_in_gateway: true
  • Board: casalmaisfertil (non-default, ~7 tasks)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/pluginsPlugin system and bundled pluginstype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions