Summary
kanban dispatcher fails with sqlite3.OperationalError: duplicate column name: consecutive_failures on the first tick after every gateway restart, on a kanban DB that has been migrated by a prior 0.12.x → 0.13 release. Subsequent ticks succeed. Once-per-restart noise in errors.log.
Version
Hermes Agent v0.13.0 (2026.5.7)
Python: 3.11.15 (macOS 15, Apple Silicon)
OpenAI SDK: 2.32.0
Local main is at origin/main + 3 unrelated local patches (none touch kanban). The DB was created and last-migrated under 0.12.x.
Symptom
~/.hermes/logs/errors.log after gateway restart:
2026-05-08 14:21:53,349 ERROR gateway.run: kanban dispatcher: tick failed on board default
Traceback (most recent call last):
File "/Users/leon/.hermes/hermes-agent/gateway/run.py", line 3931, in _tick_once_for_board
conn = _kb.connect(board=slug)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/leon/.hermes/hermes-agent/hermes_cli/kanban_db.py", line 928, in connect
_migrate_add_optional_columns(conn)
File "/Users/leon/.hermes/hermes-agent/hermes_cli/kanban_db.py", line 996, in _migrate_add_optional_columns
conn.execute(
sqlite3.OperationalError: duplicate column name: consecutive_failures
Only 1 ERROR per gateway restart — subsequent dispatcher ticks (every 60s after) succeed silently. Gateway core, Telegram, Weixin, cron all healthy.
Database state
~/.hermes/kanban.db has 0 tasks. The schema already includes consecutive_failures, last_failure_error, max_retries (added during a prior 0.12.x migration) plus the legacy spawn_failures, last_spawn_error columns:
$ sqlite3 ~/.hermes/kanban.db "PRAGMA table_info(tasks);" | tail -10
17|spawn_failures|INTEGER|1|0|0
18|worker_pid|INTEGER|0||0
19|last_spawn_error|TEXT|0||0
...
25|skills|TEXT|0||0
26|consecutive_failures|INTEGER|1|0|0
27|last_failure_error|TEXT|0||0
28|max_retries|INTEGER|0||0
Note all the columns the migration wants to add are already present (cids 26-28).
Reproduction (does NOT reproduce in isolation)
A direct reproduction from a fresh Python process succeeds — the migration's column-existence guard (if "consecutive_failures" not in cols) correctly skips the ALTER TABLE:
import sys, os, sqlite3
sys.path.insert(0, '/Users/leon/.hermes/hermes-agent')
os.chdir('/Users/leon/.hermes/hermes-agent')
from hermes_cli.kanban_db import connect
c = connect(board='default') # succeeds, no error
c.close()
But when invoked from the gateway dispatcher's _tick_once_for_board (worker thread via asyncio.to_thread), the same call fails. There appears to be a context-dependent difference in what PRAGMA table_info(tasks) returns at the moment _migrate_add_optional_columns queries it.
Speculation on cause
Two possibilities I can think of:
-
Concurrent connections during gateway startup: the dispatcher tick races with another path that also opens the kanban DB (e.g., gateway notifier, board init). One connection sees mid-migration state.
-
Connection-local schema cache: under WAL mode + synchronous=NORMAL, schema visibility across connections may have fence ordering subtleties on first concurrent open.
The dispatcher path at gateway/run.py:3931 does:
conn = _kb.connect(board=slug) # ← line 3931, the failing call
try:
_kb.init_db(board=slug) # opens another conn that re-runs init
except Exception:
pass
init_db() discards the path from _INITIALIZED_PATHS and re-opens, forcing the migration to re-run on a second connection. So per dispatcher tick, the migration is invoked twice on two different connections.
Suggested fix
Either:
- Idempotency wrap: catch
sqlite3.OperationalError whose message contains "duplicate column name" around each ALTER TABLE in _migrate_add_optional_columns and ignore it. The end state is what we want.
- Re-query: refresh
cols from PRAGMA table_info(tasks) immediately before each guard check (the existing comment notes this is intentionally not done — but the assumption that "no step depends on a column added by a previous step in the same call" doesn't protect against another connection mutating the schema between snapshot and check).
I lean toward the idempotency-wrap fix as the simplest robust solution.
Workaround for affected users
None needed if you don't actively use kanban — the error fires once per restart and doesn't affect anything else. If kanban is in use, the second tick (60s later) succeeds and the dispatcher continues normally.
Relevant recent commits
24d48ffb8 feat(kanban): add specify — auxiliary LLM fleshes out triage tasks (#21435)
ac51c4c1a feat(kanban): per-task max_retries override (#21330) — added max_retries column
a2ff19305 chore: follow-up cleanup for Kanban migration fix
The migration handler in hermes_cli/kanban_db.py:_migrate_add_optional_columns is the relevant code path.
Summary
kanban dispatcherfails withsqlite3.OperationalError: duplicate column name: consecutive_failureson the first tick after every gateway restart, on a kanban DB that has been migrated by a prior 0.12.x → 0.13 release. Subsequent ticks succeed. Once-per-restart noise inerrors.log.Version
Local main is at
origin/main+ 3 unrelated local patches (none touch kanban). The DB was created and last-migrated under 0.12.x.Symptom
~/.hermes/logs/errors.logafter gateway restart:Only 1 ERROR per gateway restart — subsequent dispatcher ticks (every 60s after) succeed silently. Gateway core, Telegram, Weixin, cron all healthy.
Database state
~/.hermes/kanban.dbhas 0 tasks. The schema already includesconsecutive_failures,last_failure_error,max_retries(added during a prior 0.12.x migration) plus the legacyspawn_failures,last_spawn_errorcolumns:Note all the columns the migration wants to add are already present (cids 26-28).
Reproduction (does NOT reproduce in isolation)
A direct reproduction from a fresh Python process succeeds — the migration's column-existence guard (
if "consecutive_failures" not in cols) correctly skips the ALTER TABLE:But when invoked from the gateway dispatcher's
_tick_once_for_board(worker thread viaasyncio.to_thread), the same call fails. There appears to be a context-dependent difference in whatPRAGMA table_info(tasks)returns at the moment_migrate_add_optional_columnsqueries it.Speculation on cause
Two possibilities I can think of:
Concurrent connections during gateway startup: the dispatcher tick races with another path that also opens the kanban DB (e.g., gateway notifier, board init). One connection sees mid-migration state.
Connection-local schema cache: under WAL mode +
synchronous=NORMAL, schema visibility across connections may have fence ordering subtleties on first concurrent open.The dispatcher path at
gateway/run.py:3931does:init_db()discards the path from_INITIALIZED_PATHSand re-opens, forcing the migration to re-run on a second connection. So per dispatcher tick, the migration is invoked twice on two different connections.Suggested fix
Either:
sqlite3.OperationalErrorwhose message contains"duplicate column name"around eachALTER TABLEin_migrate_add_optional_columnsand ignore it. The end state is what we want.colsfromPRAGMA table_info(tasks)immediately before each guard check (the existing comment notes this is intentionally not done — but the assumption that "no step depends on a column added by a previous step in the same call" doesn't protect against another connection mutating the schema between snapshot and check).I lean toward the idempotency-wrap fix as the simplest robust solution.
Workaround for affected users
None needed if you don't actively use kanban — the error fires once per restart and doesn't affect anything else. If kanban is in use, the second tick (60s later) succeeds and the dispatcher continues normally.
Relevant recent commits
24d48ffb8 feat(kanban): add specify — auxiliary LLM fleshes out triage tasks (#21435)ac51c4c1a feat(kanban): per-task max_retries override (#21330)— addedmax_retriescolumna2ff19305 chore: follow-up cleanup for Kanban migration fixThe migration handler in
hermes_cli/kanban_db.py:_migrate_add_optional_columnsis the relevant code path.