Skip to content

fix(kanban): close decomposer SQLite connections#32135

Closed
Aliciawque wants to merge 1 commit into
NousResearch:mainfrom
Aliciawque:fix/kanban-decompose-close-sqlite-connections
Closed

fix(kanban): close decomposer SQLite connections#32135
Aliciawque wants to merge 1 commit into
NousResearch:mainfrom
Aliciawque:fix/kanban-decompose-close-sqlite-connections

Conversation

@Aliciawque

Copy link
Copy Markdown

What does this PR do?

Fixes the embedded kanban decomposer path so its SQLite connections are closed deterministically instead of relying on sqlite3.Connection.__exit__.

kanban_decompose.list_triage_ids() runs on every gateway dispatcher tick. In the current implementation, with kb.connect() as conn: only commits/rolls back and leaves the SQLite handle open. Over time that accumulates kanban.db / kanban.db-wal file descriptors and eventually trips Too many open files / unable to open database file in long-lived gateway processes.

This patch switches the decomposer helpers to contextlib.closing(kb.connect()) so the connection is always closed on block exit.

Related Issue

Fixes #29610

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✅ Tests (adding or improving test coverage)

Changes Made

  • hermes_cli/kanban_decompose.py
    • Replace with kb.connect() with contextlib.closing(kb.connect()) in the decomposer helper paths, including list_triage_ids().
  • tests/hermes_cli/test_kanban_decompose.py
    • Add a regression test that proves list_triage_ids() closes the underlying connection.

How to Test

  1. Run the targeted regression suite:
    `uv run --extra dev python -m pytest tests/hermes_cli/test_kanban_decompose.py tests/hermes_cli/test_kanban_decompose_db.py tests/gateway/test_kanban_notifier.py -q -o 'addopts='
  2. Verify the new regression passes:
    `uv run --extra dev python -m pytest tests/hermes_cli/test_kanban_decompose.py::test_list_triage_ids_closes_kanban_connection -q -o 'addopts='
  3. Optional runtime check: repeatedly call list_triage_ids() in a live gateway and confirm kanban.db fd count stays flat.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run the targeted pytest suites above and they pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on macOS 26.3

Documentation & Housekeeping

  • I've updated relevant docstrings — N/A for user-facing docs
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A

Screenshots / Logs

Relevant verification:

  • Regression test passes
  • Live gateway fd count dropped from 329 to 59 after restart and stayed flat across a dispatcher tick

Notes

Related upstream work already in flight:

Use contextlib.closing for kanban_decompose helpers so gateway ticks close SQLite handles deterministically. Add regression coverage for list_triage_ids().

@hclsys hclsys left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified — the fix is correct and safe, and I checked the write-path concern that contextlib.closing usually raises.

contextlib.closing(...).__exit__ calls only conn.close(), not sqlite3.Connection.__exit__, so it drops the implicit commit that with conn: would do. Two of the four converted blocks writekb.specify_triage_task(conn, ...) (hermes_cli/kanban_decompose.py:373) and kb.decompose_triage_task(conn, ...) (:442). Normally swapping with connclosing(conn) there would silently lose those writes.

It's safe here only because kb.connect() opens with isolation_level=None (hermes_cli/kanban_db.py:1137) — i.e. autocommit mode, where each execute() commits immediately and Connection.__exit__ has no open transaction to commit. So the implicit commit was already a no-op, and closing() loses nothing. The fd leak (handle never closed) is real and this fixes all four sites including the per-tick list_triage_ids hot path. Good.

One forward-looking note, not a blocker: this correctness now silently depends on connect() staying isolation_level=None. If anyone later switches kanban to explicit transactions, blocks :373/:442 would start losing writes under closing() with no test catching it (the new FakeConn test only asserts .close() fired). A one-line comment at the write-block sites — "safe under autocommit (connect() uses isolation_level=None); revisit if transactions are introduced" — would make that dependency explicit for the next editor.

LGTM.

@Aliciawque

Copy link
Copy Markdown
Author

我这边已经基于该问题做了一个最小修复 PR:#32135

修复点:hermes_cli/kanban_decompose.py 里的 with kb.connect() as conn: 实际不会关闭 SQLite 连接,长跑 gateway 会累积 kanban.db / kanban.db-wal fd。PR 改成 contextlib.closing(kb.connect()),并补了回归测试证明 list_triage_ids() 会关闭连接。

另外,上游已经有一个非常接近的相关 PR:#29525。我的这个 PR 只聚焦在 decomposer hot path 的已验证路径,避免把尚未复现的其它 kb.connect() 调用一并扩大范围。

@alt-glitch alt-glitch added type/bug Something isn't working comp/cli CLI entry point, hermes_cli/, setup wizard comp/plugins Plugin system and bundled plugins P3 Low — cosmetic, nice to have labels May 25, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Duplicate of #29525 — same fix (contextlib.closing on kb.connect() in kanban_decompose.py). Also see #29550 (another duplicate). Related issue: #29610.

@kshitijk4poor

Copy link
Copy Markdown
Collaborator

Closing as already fixed on main — landed via commit ebe04c66c (fix(kanban): close kanban.db FD after every connect() in long-lived processes). All kanban_decompose.py connection sites now use kb.connect_closing(). Thanks for catching the leak.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cli CLI entry point, hermes_cli/, setup wizard comp/plugins Plugin system and bundled plugins P3 Low — cosmetic, nice to have type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: embedded Kanban dispatcher still leaks sqlite/WAL file descriptors after #28301

5 participants