Skip to content

fix(kanban): make _migrate_add_optional_columns idempotent on concurrent open#22627

Closed
wesleysimplicio wants to merge 1 commit into
NousResearch:mainfrom
wesleysimplicio:fix/kanban-migration-idempotent
Closed

fix(kanban): make _migrate_add_optional_columns idempotent on concurrent open#22627
wesleysimplicio wants to merge 1 commit into
NousResearch:mainfrom
wesleysimplicio:fix/kanban-migration-idempotent

Conversation

@wesleysimplicio

Copy link
Copy Markdown
Contributor

Summary

The kanban dispatcher opens the DB twice per tick: once in
_tick_once_for_board and once via init_db's discard-and-reconnect path.
Both connections call _migrate_add_optional_columns, which snapshots
PRAGMA table_info(tasks) at entry. When they race, the second connection
sees the columns as absent (its snapshot was taken before the first commit)
and issues a duplicate ALTER TABLE, crashing with:

sqlite3.OperationalError: duplicate column name: consecutive_failures

This fires once per gateway restart, writing one ERROR to errors.log; the
dispatcher recovers silently on the next tick (60 s later).

Root Cause

_migrate_add_optional_columns used raw conn.execute("ALTER TABLE ...").
No error handling for the duplicate-column case existed.

Fix

Introduce _add_column_if_missing(conn, table, column, ddl) — a thin
wrapper that catches OperationalError whose message contains
"duplicate column name" and returns False instead of raising. All
ALTER TABLE calls in _migrate_add_optional_columns are routed through
this helper.

Tests

Two new tests in tests/hermes_cli/test_kanban_db.py:

  • test_add_column_if_missing_is_idempotent_on_race — verifies the helper
    returns True on first add, False on duplicate without raising.
  • test_migrate_add_optional_columns_tolerates_concurrent_migration
    runs the full migration against an already-migrated in-memory schema and
    asserts no exception is raised.

Closes #21708

…ent open

ALTER TABLE calls inside _migrate_add_optional_columns were guarded by a
snapshot of PRAGMA table_info taken at function entry.  When the gateway
dispatcher opens the kanban DB twice per tick (once in _tick_once_for_board
and once via init_db's discard-and-reconnect path), a second connection can
run the same migration before the first one commits, causing:

  sqlite3.OperationalError: duplicate column name: consecutive_failures

This crashed the dispatcher on every first tick after a gateway restart
(subsequent ticks succeeded because the columns were then present).

Fix: introduce _add_column_if_missing() which wraps ALTER TABLE in a
try/except that swallows OperationalError whose message contains
'duplicate column name'.  All ALTER TABLE calls in
_migrate_add_optional_columns are routed through this helper.

Closes NousResearch#21708
Copilot AI review requested due to automatic review settings May 9, 2026 15:13

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a startup-time race in the kanban DB migration where two concurrent connections can both attempt the same ALTER TABLE ... ADD COLUMN, causing sqlite3.OperationalError: duplicate column name: ... and crashing the first dispatcher tick after gateway restart.

Changes:

  • Add _add_column_if_missing() helper that makes ADD COLUMN idempotent by swallowing the specific duplicate-column OperationalError.
  • Route all ALTER TABLE ... ADD COLUMN statements in _migrate_add_optional_columns() through the helper (with backfill updates gated on whether this call actually added the column).
  • Add regression tests covering idempotency and “already migrated schema” behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
hermes_cli/kanban_db.py Introduces idempotent column-add helper and uses it throughout optional-column migrations to prevent duplicate-column crashes under concurrent opens.
tests/hermes_cli/test_kanban_db.py Adds tests validating the helper’s duplicate-column handling and that the full migration doesn’t raise when columns already exist.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread hermes_cli/kanban_db.py
Comment on lines +966 to +968
def _add_column_if_missing(
conn: sqlite3.Connection, table: str, column: str, ddl: str
) -> bool:
@teknium1

teknium1 commented May 9, 2026

Copy link
Copy Markdown
Contributor

Merged via salvage PR #22800. salvage cherry-picked your commit; resolved a conflict with the prior #22669 unlink_tasks regression test. Thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/plugins Plugin system and bundled plugins P3 Low — cosmetic, nice to have type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kanban dispatcher: 'duplicate column name: consecutive_failures' on first tick after gateway restart

4 participants