Skip to content

fix(kanban): make _migrate_add_optional_columns idempotent on concurrent open (salvage #22627)#22800

Merged
teknium1 merged 1 commit into
mainfrom
salvage/pr-22627
May 9, 2026
Merged

fix(kanban): make _migrate_add_optional_columns idempotent on concurrent open (salvage #22627)#22800
teknium1 merged 1 commit into
mainfrom
salvage/pr-22627

Conversation

@teknium1

@teknium1 teknium1 commented May 9, 2026

Copy link
Copy Markdown
Contributor

Summary

Salvage of #22627_migrate_add_optional_columns is now idempotent on concurrent open. The kanban dispatcher's first-tick-after-restart no longer crashes with duplicate column name: consecutive_failures.

Root cause

The dispatcher's _tick_once_for_board opens the DB once via connect() (which runs the migration), then calls init_db() which discards the path from _INITIALIZED_PATHS and re-opens — running the migration a second time on a new connection. _migrate_add_optional_columns snapshots cols at function entry from PRAGMA table_info(tasks). If the second connection's snapshot was taken before the first connection committed its ALTER, the guard if "consecutive_failures" not in cols is stale and the bare conn.execute("ALTER TABLE ...") raises duplicate column name.

Changes (contributor commit)

  • hermes_cli/kanban_db.py: new _add_column_if_missing(conn, table, column, ddl) helper. Catches sqlite3.OperationalError whose message lower-cases-contains "duplicate column name", returns False; re-raises any other OperationalError. All 13 ALTER call sites in _migrate_add_optional_columns now route through it. Legacy backfills (spawn_failures → consecutive_failures, last_spawn_error → last_failure_error) gated on the helper's return value so they don't double-backfill.
  • tests/hermes_cli/test_kanban_db.py: 2 idempotency tests covering the unit helper + the higher-level fully-migrated-schema scenario.

Validation

  • 2/2 new migrate tests pass on the salvage branch.
  • Existing migrate / duplicate-column tests pass (67 deselected, no regressions).

The deeper smell — _tick_once_for_board calling init_db() after connect() (which already ran the migration) — remains. That's a separate cleanup; this PR is the right defensive layer regardless.

Closes #21708 via salvage.

…ent open

ALTER TABLE calls inside _migrate_add_optional_columns were guarded by a
snapshot of PRAGMA table_info taken at function entry.  When the gateway
dispatcher opens the kanban DB twice per tick (once in _tick_once_for_board
and once via init_db's discard-and-reconnect path), a second connection can
run the same migration before the first one commits, causing:

  sqlite3.OperationalError: duplicate column name: consecutive_failures

This crashed the dispatcher on every first tick after a gateway restart
(subsequent ticks succeeded because the columns were then present).

Fix: introduce _add_column_if_missing() which wraps ALTER TABLE in a
try/except that swallows OperationalError whose message contains
'duplicate column name'.  All ALTER TABLE calls in
_migrate_add_optional_columns are routed through this helper.

Closes #21708
@github-actions

github-actions Bot commented May 9, 2026

Copy link
Copy Markdown
Contributor

🔎 Lint report: salvage/pr-22627 vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 7906 on HEAD, 7906 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4180 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@teknium1 teknium1 merged commit 7869838 into main May 9, 2026
16 of 18 checks passed
@teknium1 teknium1 deleted the salvage/pr-22627 branch May 9, 2026 20:36
@alt-glitch alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/plugins Plugin system and bundled plugins labels May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/plugins Plugin system and bundled plugins P3 Low — cosmetic, nice to have type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kanban dispatcher: 'duplicate column name: consecutive_failures' on first tick after gateway restart

3 participants