fix(kanban): close leaked decompose connections + don't mask I/O error on rollback#32415
Closed
mw777eds wants to merge 1 commit into
Closed
fix(kanban): close leaked decompose connections + don't mask I/O error on rollback#32415mw777eds wants to merge 1 commit into
mw777eds wants to merge 1 commit into
Conversation
…r on rollback decompose_task opened the DB with `with kb.connect() as conn:`, but sqlite3's context manager only ends the transaction and never closes the connection, leaking WAL -wal/-shm file descriptors until GC. Sustained auto-decompose exhausted FDs, surfacing as SQLITE_IOERR and corrupting the board DB. Wrap all four connect() sites in contextlib.closing. write_txn's unconditional ROLLBACK raised "cannot rollback - no transaction is active" after SQLite auto-aborted the txn on SQLITE_IOERR, masking the real error. Swallow the rollback failure and re-raise the original exception. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collaborator
|
Duplicate — this PR combines two fixes that are already tracked separately:
The broader kanban SQLite hardening cluster is tracked at #31952, #30969, #31740. |
23 tasks
Collaborator
|
Closing as already fixed on main. Both halves of this PR landed via separate paths:
Thanks for catching both. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a SQLite data-corruption bug on Hermes Kanban boards (observed on
novadeck-atlas, ending indatabase disk image is malformed). Two compounding bugs in the gateway auto-decompose path:decompose_taskopened the DB withwith kb.connect() as conn:, butsqlite3.Connection's context manager only ends the transaction and never closes the connection.kb.connect()returns a rawsqlite3.connect(...), so each block leaked a live WAL connection (with-wal/-shmFDs) until GC. Sustained auto-decompose accumulated open handles, and FD/-shmopen failures surfaced to SQLite asSQLITE_IOERR("disk I/O error") even with free disk — the observed trigger. Fixed by wrapping all fourkb.connect()sites incontextlib.closing.write_txnmasked the real error — onSQLITE_IOERRSQLite has already aborted the transaction, so the unconditionalconn.execute("ROLLBACK")raised "cannot rollback - no transaction is active" (the exact production stack), hiding the original I/O error. Fixed by swallowing the rollback failure and re-raising the original exception.Ruled out: legacy NovaDeck code (none exists — the name is just a user-created board), WAL/DELETE mixed-mode (ext4 always gets WAL), and write contention (WAL serializes writers; that yields
SQLITE_BUSY, notSQLITE_IOERR).Scope kept tight to the evidenced gateway/decompose path. The dashboard plugin (
plugins/kanban/dashboard/plugin_api.py) has a similar non-closing pattern but is not in the production stack — noted as a follow-up, not touched here.Changes
hermes_cli/kanban_db.py— robustwrite_txnrollback (try/except + re-raise original).hermes_cli/kanban_decompose.py—import contextlib; wrap 4kb.connect()sites incontextlib.closing. Behavior-preserving: all writes already run inside their ownwrite_txn(incl.recompute_ready), so the dropped implicit-commit was a no-op.tests/hermes_cli/test_kanban_db.pyandtests/hermes_cli/test_kanban_decompose.py.Test plan
pytest tests/hermes_cli/{test_kanban_db,test_kanban_decompose,test_kanban_specify_db,test_kanban_core_functionality}.py— 346 passednovadeck-atlas/kanban.db— operator-driven recovery (.recoverinto a fresh DB, or re-init) is separate.🤖 Generated with Claude Code