Skip to content

fix(kanban): refuse corrupt db auto-init#30862

Merged
teknium1 merged 3 commits into
mainfrom
hermes/hermes-49e3731c
May 23, 2026
Merged

fix(kanban): refuse corrupt db auto-init#30862
teknium1 merged 3 commits into
mainfrom
hermes/hermes-49e3731c

Conversation

@teknium1

@teknium1 teknium1 commented May 23, 2026

Copy link
Copy Markdown
Contributor

Salvages #30707 (@NickLarcombe) onto current main.

Summary

Adds a full PRAGMA integrity_check guard at kanban DB open time. If the file has a valid SQLite header but is malformed past it (Stefan's reported case: header passes but pages are damaged), the dispatcher now preserves the corrupt file as kanban.db.corrupt.<ts>.bak and raises KanbanDbCorruptError instead of silently recreating the schema on top of it.

Complements the existing header-byte check rather than replacing it. Both layers now run from connect():

  1. _validate_sqlite_header(path) — cheap byte-level check (catches Interrupted OpenAI/httpx request thread survives across turns and writes TLS record bytes to unrelated file descriptors on delayed close #29507 TLS-overwrite shape).
  2. _guard_existing_db_is_healthy(path) — full integrity probe (catches malformed pages, broken internal metadata). Cached per-path in _INITIALIZED_PATHS so it runs once per process per path.

Changes

  • hermes_cli/kanban_db.py: KanbanDbCorruptError, _backup_corrupt_db(), _guard_existing_db_is_healthy(). Called from connect() after the header check.
  • tests/hermes_cli/test_kanban_db.py: 4 new regressions. _write_corrupt_db() now writes a valid SQLite header followed by malformed pages — matches the actual reported corruption shape (Discord thread, May 23 2026) and forces the integrity guard (not the header check) to catch it.
  • scripts/release.py: AUTHOR_MAP entry for @NickLarcombe.

Conflict resolution

  • hermes_cli/kanban_db.py: PR's base wanted to replace _validate_sqlite_header(path) with the new guard. Resolved by keeping both — header check first (cheap), integrity guard second (heavier, cached).
  • tests/hermes_cli/test_kanban_db.py: PR's base lacked the workdir/stale-event tests block that landed on main after the PR was opened. Resolved by keeping both blocks.
  • _write_corrupt_db() rewritten so tests exercise the integrity-guard path (valid header + malformed pages) instead of the header-check path. This is faithful to what the PR actually adds.

Validation

  • tests/hermes_cli/test_kanban_db.py tests/hermes_cli/test_kanban_boards.py tests/hermes_cli/test_kanban_specify_db.py tests/gateway/test_kanban_notifier.py: 235/235 pass.
  • E2E: reproduced Stefan's exact 64-byte header from the Discord thread, padded to 100 bytes, appended malformed pages. Header validator passes (as it does on main); integrity guard catches it, preserves the corrupt bytes to a .bak, raises KanbanDbCorruptError from connect(). Working as intended.

Closes #30687.

Infographic

kanban.db corruption defense

@github-actions

github-actions Bot commented May 23, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: hermes/hermes-49e3731c vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 9027 on HEAD, 9027 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4804 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

Comment thread hermes_cli/kanban_db.py Fixed
Comment thread hermes_cli/kanban_db.py Fixed
Comment thread hermes_cli/kanban_db.py Fixed
Comment thread hermes_cli/kanban_db.py Fixed
Comment thread hermes_cli/kanban_db.py Fixed
Comment thread hermes_cli/kanban_db.py Fixed
Comment thread hermes_cli/kanban_db.py Fixed
Comment thread hermes_cli/kanban_db.py Fixed
Comment thread hermes_cli/kanban_db.py Fixed
@alt-glitch alt-glitch added type/bug Something isn't working comp/cli CLI entry point, hermes_cli/, setup wizard P3 Low — cosmetic, nice to have labels May 23, 2026
…indings

Path.resolve() before any I/O and confine backup writes to the resolved
parent directory. Adds explicit parent-equality assertions so static
analyzers see the containment guarantee, and walks WAL/SHM sidecars
through the same resolved-parent path so accidental .. segments are
collapsed before shutil.copy2.

Functionally equivalent to the original PR; preserves the corrupt bytes
to <db>.corrupt.<ts>.bak in the same directory, still raises
KanbanDbCorruptError from connect(). E2E with Stefan's exact hex header
+ malformed pages still passes. 163/163 kanban tests still pass.
Comment thread hermes_cli/kanban_db.py
# Resolve once and pin the parent so subsequent path operations cannot
# escape it. ``Path.resolve()`` collapses any ``..`` segments and
# symlinks, and we only ever write inside ``parent``.
resolved = path.resolve()
Comment thread hermes_cli/kanban_db.py
if candidate.parent != parent:
return None
counter = 0
while candidate.exists():
Comment thread hermes_cli/kanban_db.py
if candidate.parent != parent:
return None
try:
shutil.copy2(resolved, candidate)
Comment thread hermes_cli/kanban_db.py
if candidate.parent != parent:
return None
try:
shutil.copy2(resolved, candidate)
Comment thread hermes_cli/kanban_db.py
return None
for suffix in ("-wal", "-shm"):
sidecar = parent / (base_name + suffix)
if sidecar.parent != parent or not sidecar.exists():
Comment thread hermes_cli/kanban_db.py Dismissed
Comment thread hermes_cli/kanban_db.py Dismissed
Comment thread hermes_cli/kanban_db.py Dismissed
Comment thread hermes_cli/kanban_db.py Dismissed
Comment thread hermes_cli/kanban_db.py Dismissed
@teknium1 teknium1 merged commit c4b8f5e into main May 23, 2026
25 of 26 checks passed
@teknium1 teknium1 deleted the hermes/hermes-49e3731c branch May 23, 2026 12:51
Bryce-huang pushed a commit to wbkunlun/hermes-agent that referenced this pull request May 29, 2026
mosaiq-systems pushed a commit to mosaiq-systems/hermes-agent that referenced this pull request May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard P3 Low — cosmetic, nice to have type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kanban: corrupted board DB + empty top-level DB → silent recreation with total data loss

4 participants