fix(db): close() methods on SQLite classes + WAL FD leak fix#36116
fix(db): close() methods on SQLite classes + WAL FD leak fix#36116someaka wants to merge 3 commits into
Conversation
|
Partially addresses #36111 (ResponseStore FD leak in api_server). The close() methods added here cover several SQLite-owning classes but #36111 also identifies the api_server-specific issue of ResponseStore being instantiated without a valid API_SERVER_KEY and the always-true connected checker. Related to #31130 (WAL FD leak in kanban). |
mxnstrexgl
left a comment
There was a problem hiding this comment.
Note: SQLite close() methods — no test coverage
This PR adds close() on SQLite connections/classes across 5 files. No test coverage in the diff.
Suggestion: Add a test that verifies close() is idempotent (calling twice does not throw) and that WAL FD leak is actually fixed. Even a simple test that opens + closes + reopens would catch regressions.
Added — close() idempotency test suiteThanks for the suggestion @mxnstrexgl. Added
Key behaviors verified:
All 13 tests pass. ✅ |
|
Unsubscribe
…On Mon, Jun 1, 2026 at 6:56 AM someaka ***@***.***> wrote:
*someaka* left a comment (NousResearch/hermes-agent#36116)
<#36116 (comment)>
Added — close() idempotency test suite
Thanks for the suggestion @mxnstrexgl <https://github.com/mxnstrexgl>.
Added tests/hermes_cli/test_db_close_idempotency.py with *13 tests*
covering close() idempotency across 4 classes:
Class Tests Key verifications
_WalSafeConnection 4 Idempotent close, WAL checkpoint fires, in-memory
OK, already-closed graceful
InsightsEngine 3 Nulls _conn + db, idempotent, safe on fresh instance
SessionStore 3 Nulls _db, idempotent, safe when already None
SessionManager (acp_adapter) 3 Nulls _db_instance, idempotent, safe when
already None Key behaviors verified:
- ✅ *close() called twice never raises* — all 4 classes
- ✅ *close() on already-None references is a no-op* — all 4 classes
- ✅ *WAL checkpoint actually fires* — verified data survives
close/reopen cycle
- ✅ *In-memory databases work* — _WalSafeConnection handles :memory:
correctly
All 13 tests pass. ✅
—
Reply to this email directly, view it on GitHub
<#36116?email_source=notifications&email_token=BVAK7EICTDZJQYN3VJCOB5345VOMPA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINJZGE4DSOBUG4Y2M4TFMFZW63VKON2WE43DOJUWEZLEUVSXMZLOOSWGM33PORSXEX3DNRUWG2Y#issuecomment-4591898471>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BVAK7EN6GOQ63C7QAUZBNCD45VOMPAVCNFSM6AAAAACZUZMKAGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DKOJRHA4TQNBXGE>
.
Triage notifications, keep track of coding agent tasks and review pull
requests on the go with GitHub Mobile for iOS
<https://github.com/notifications/mobile/ios/BVAK7EPS6K2XDP4E5KCJZID45VOMPA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINJZGE4DSOBUG4Y2M4TFMFZW63VKON2WE43DOJUWEZLEUVSXMZLOOSVGM33PORSXEX3JN5ZQ>
and Android
<https://github.com/notifications/mobile/android/BVAK7EJMHLCZIBYVTS3URT345VOMPA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTINJZGE4DSOBUG4Y2M4TFMFZW63VKON2WE43DOJUWEZLEUVSXMZLOOSXGM33PORSXEX3BNZSHE33JMQ>.
Download it today!
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
31309bb to
65d3a17
Compare
65d3a17 to
d7093f6
Compare
d7093f6 to
68c0b96
Compare
- _WriteQueue: close thread-local sqlite3 connection - RetainDBMemoryProvider: shutdown + close queue connection - SessionStore: propagate close to underlying SessionDB - InsightsEngine: release borrowed db._conn reference - SessionManager: close lazily-initialised SessionDB Without explicit close(), WAL checkpoints never run on these connections and file descriptors leak in long-running processes.
NousResearch#30799) Long-running gateway processes open a new kanban SQLite connection every dispatcher tick. In WAL mode SQLite defers cleanup of WAL/shm file descriptors, causing a slow FD leak that eventually hits the process limit (observed: ~500 kanban.db + ~500 kanban.db-wal FDs after 14 hours) and triggers cascading failures. Fix: add _WalSafeConnection, a sqlite3.Connection subclass that runs PRAGMA wal_checkpoint(TRUNCATE) before each close. This forces SQLite to consolidate the WAL and release its file descriptors immediately. All existing callers get this behaviour automatically since connect() uses the subclass via the factory parameter.
Addresses review feedback from mxnstrexgl on NousResearch#36116: - 13 tests covering close() idempotency on 4 classes: * _WalSafeConnection: idempotent, WAL checkpoint verified, in-memory OK * InsightsEngine: nulls _conn and db references, safe on fresh instance * SessionStore: nulls _db, safe when already None * SessionManager (acp_adapter): nulls _db_instance, safe when already None - Key behaviors verified: * close() called twice never raises * close() on already-None references is a no-op * WAL checkpoint actually fires before connection close * Data survives close/reopen cycle
68c0b96 to
c042176
Compare
Addresses review feedback from mxnstrexgl on NousResearch#36116: - 13 tests covering close() idempotency on 4 classes: * _WalSafeConnection: idempotent, WAL checkpoint verified, in-memory OK * InsightsEngine: nulls _conn and db references, safe on fresh instance * SessionStore: nulls _db, safe when already None * SessionManager (acp_adapter): nulls _db_instance, safe when already None - Key behaviors verified: * close() called twice never raises * close() on already-None references is a no-op * WAL checkpoint actually fires before connection close * Data survives close/reopen cycle
What
Add
close()methods to 5 SQLite-owning classes (acp_adapter/session.py,agent/insights.py,gateway/session.py,hermes_cli/kanban_db.py,plugins/memory/retaindb/__init__.py): Proper resource cleanup to prevent file descriptor leaks on shutdown.Fix WAL file descriptor leak in kanban DB (
hermes_cli/kanban_db.py): The connect/close cycle was not properly closing WAL journal file descriptors, causing gradual FD exhaustion. Fixes kanban dispatcher FD leak: SQLite connections not releasing file descriptors in WAL mode #30799.Why
SQLite connections that aren't properly closed leak file descriptors. In long-running gateway processes, this eventually hits the FD limit and causes
database is lockederrors.Testing
/proc/self/fdcount before/after repeated connect/close cycles