Skip to content

fix(gateway): checkpoint WAL before close to fix SQLite FD leak (#37369)#38740

Open
ashishpatel26 wants to merge 1 commit into
NousResearch:mainfrom
ashishpatel26:fix/fd-leak-response-store-sqlite-37369
Open

fix(gateway): checkpoint WAL before close to fix SQLite FD leak (#37369)#38740
ashishpatel26 wants to merge 1 commit into
NousResearch:mainfrom
ashishpatel26:fix/fd-leak-response-store-sqlite-37369

Conversation

@ashishpatel26

Copy link
Copy Markdown
Contributor

Summary

  • ResponseStore.close() now runs PRAGMA wal_checkpoint(PASSIVE) before conn.close() to flush WAL pages and release the WAL/SHM sidecar file descriptors
  • Adds a double-close guard: self._conn = None before any I/O prevents a second close() from re-entering the exception paths
  • Adds __enter__/__exit__ so callers can use with ResponseStore() as store: pattern

Root cause (fixes #37369)

SQLite in WAL mode keeps separate WAL and SHM sidecar files open. Without an explicit checkpoint before conn.close(), some platforms never fully release the FDs for those sidecars. Over the lifetime of a busy gateway process this causes steady FD accumulation until the process hits the OS limit and crashes with Too many open files.

The PASSIVE checkpoint flushes without blocking writers; on a busy server the checkpoint may not fully complete, but the goal is to release this connection's own FDs.

Test plan

  • tests/gateway/test_api_server.py — verifies checkpoint is called on close(), that double-close is a no-op, and that __exit__ delegates to close()

🤖 Generated with Claude Code

NousResearch#37369)

ResponseStore.close() ran PRAGMA wal_checkpoint(PASSIVE) before conn.close()
to flush WAL pages so SQLite can release its WAL and SHM sidecar file
descriptors on close.  Without this checkpoint the WAL/SHM FDs accumulate
across request lifecycles on platforms where SQLite keeps them open.

Also adds context-manager support (__enter__/__exit__) and a double-close
guard (self._conn = None before I/O) so callers can use
and concurrent close() calls are safe.
@alt-glitch alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists labels Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FD leak: response_store.db opens multiple SQLite connections without closing, hits ulimit after ~2 days

2 participants