fix(kanban): close decomposer SQLite connections to stop fd leak by abeperl · Pull Request #29525 · NousResearch/hermes-agent

abeperl · 2026-05-20T22:40:18Z

Summary

hermes_cli/kanban_decompose.py opened SQLite connections with with kb.connect() as conn:. Python's sqlite3 connection context manager commits/rolls back the transaction but does not close the connection — so the underlying file descriptors (main DB + WAL) are only released on garbage collection, and connections held by reference cycles linger.

list_triage_ids() is called on every gateway dispatcher tick, for every board (via the auto-decompose loop in gateway/run.py), so this leaked ~1 connection (2 fds) per tick per board.

Impact (observed in production)

On a long-running gateway this exhausted the process's open-file limit (default soft RLIMIT_NOFILE of 1024) in ~5 hours. Once the fd table was full:

the Slack Socket Mode client could no longer open sockets and every reconnect failed with ClientConnectorDNSError: Cannot connect to host slack.com:443 ssl:default [Invalid argument] (EINVAL) — the agent went silent on Slack while the process stayed "active";
the kanban DB itself started throwing sqlite3.OperationalError: unable to open database file.

Host DNS/TLS to Slack was fine throughout — it was purely fd starvation. /proc/<pid>/fd showed ~993 of 1024 fds held open against kanban.db / kanban.db-wal across boards.

Fix

Wrap all four kb.connect() sites in kanban_decompose.py with contextlib.closing() so the connection is deterministically closed when the block exits. Verified on a live gateway: kanban fds dropped from ~993 to 0 and stay flat across ticks; Slack reconnects immediately.

Note for maintainers

The same with kb.connect() as conn: pattern appears in other modules that are not on the gateway hot path (hermes_cli/kanban_specify.py has an identical list_triage_ids, and there are ~34 sites in hermes_cli/kanban.py). They're latent leaks rather than active ones, so I've kept this PR scoped to the proven culprit. Happy to follow up with a sweep of the rest if you'd prefer them fixed in one go.

Test plan

CI green
Manual: ran a live gateway with the patch — ls /proc/<pid>/fd | grep -c kanban stays at 0 across many dispatcher ticks (previously climbed ~1/tick/board to the 1024 ceiling)
Manual: Slack Socket Mode reconnects and stays connected (ss -tnp shows an ESTABLISHED websocket to wss-primary.slack.com)

list_triage_ids() runs every gateway dispatcher tick per board and used 'with kb.connect() as conn:'. Python's sqlite3 connection context manager commits/rolls back the transaction but does NOT close the connection, so each tick leaked a connection (db+wal = 2 fds). Over ~5h this exhausted the 1024 fd soft limit, starving the Slack websocket client of sockets (ClientConnectorDNSError / EINVAL) and the kanban DB of file handles. Wrap all four kb.connect() sites in contextlib.closing().

alt-glitch · 2026-05-20T22:53:10Z

Related: #28802 (same class of SQLite connection leak in kanban_specify helpers), #28803 (companion fix for specify path). This PR targets the decomposer hot path specifically — distinct code path from the specify helpers but same root cause pattern.

kshitijk4poor · 2026-05-28T06:39:47Z

Closing as already fixed on main — landed via commit ebe04c66c (fix(kanban): close kanban.db FD after every connect() in long-lived processes), which introduced the kb.connect_closing() context manager and converted all kanban_decompose.py connection sites to use it. Same fix, different idiom. Thanks for catching the leak.

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/cli CLI entry point, hermes_cli/, setup wizard labels May 20, 2026

github-actions Bot mentioned this pull request May 24, 2026

🦞 OpenClaw 生态日报 2026-05-24 ivanweng2077/big_model_radar#82

Open

alt-glitch mentioned this pull request May 25, 2026

Gateway embedded Kanban dispatcher opens SQLite WAL connections every tick, causing FD/WAL pressure #31736

Closed

Aliciawque mentioned this pull request May 25, 2026

fix(kanban): close decomposer SQLite connections #32135

Closed

12 tasks

kshitijk4poor closed this May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(kanban): close decomposer SQLite connections to stop fd leak#29525

fix(kanban): close decomposer SQLite connections to stop fd leak#29525
abeperl wants to merge 1 commit into
NousResearch:mainfrom
abeperl:fix/kanban-decompose-fd-leak

abeperl commented May 20, 2026

Uh oh!

alt-glitch commented May 20, 2026

Uh oh!

kshitijk4poor commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

abeperl commented May 20, 2026

Summary

Impact (observed in production)

Fix

Note for maintainers

Test plan

Uh oh!

alt-glitch commented May 20, 2026

Uh oh!

kshitijk4poor commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants