Bug Description
The api_server platform accumulates file descriptors (FDs) over time due to SQLite WAL connections in ResponseStore not being properly closed during platform retry cycles.
Environment
- OS: macOS 26.4.1 (Mac Mini)
- Installation: Homebrew
- Hermes version: (latest via Homebrew)
- Gateway PID: 42506, uptime ~12 hours
- FD limit: 65,535 (raised from Mac default 256)
Reproduction
- Enable api_server platform via
API_SERVER_ENABLED=true in ~/.hermes/.env (with no API_SERVER_KEY set)
- Observe FD count:
lsof -p $(pgrep -f 'hermes_cli.main gateway') | grep response_store.db | wc -l
- After 12 hours: 122+ FDs pointing to
response_store.db on a single gateway process (PID 42506)
- This equals ~41 complete SQLite WAL connection sets (main db + WAL + SHM = 3 FDs each)
Root Cause
Three contributing factors:
1. api_server auto-enabled without API_SERVER_KEY
In gateway/config.py line 1486:
if api_server_enabled or api_server_key:
config.platforms[Platform.API_SERVER] = PlatformConfig()
The platform is instantiated even when only API_SERVER_ENABLED=true is set, without a valid API_SERVER_KEY. The HTTP server refuses to start (Refusing to start: API_SERVER_KEY is required) but the adapter is still loaded into the gateway.
2. Connected check always returns True
In gateway/config.py line 425:
_PLATFORM_CONNECTED_CHECKERS = {
Platform.API_SERVER: lambda cfg: True, # always returns True
...
}
The api_server is always reported as "connected" regardless of whether it is actually running. This is misleading and may prevent proper retry/recovery logic.
3. ResponseStore opened at init, never closed
In gateway/platforms/api_server.py line 706:
class APIServerAdapter(BasePlatformAdapter):
def __init__(self, config: PlatformConfig):
...
self._response_store = ResponseStore() # line 706
ResponseStore.__init__ opens a SQLite connection with WAL mode (sqlite3.connect(..., check_same_thread=False) + apply_wal_with_fallback). This is called at adapter __init__ time, not when the HTTP server starts. The connection is never explicitly closed — no close() method is defined on ResponseStore, and APIServerAdapter has no teardown logic for the store.
On each gateway restart or platform reconnect cycle, a new ResponseStore instance may be created while old ones are not garbage-collected, leading to accumulation of SQLite WAL file handles.
Impact
- Gateway hits
OSError: [Errno 24] Too many open files after ~17–24 hours of uptime
- Cron jobs fail silently when the FD limit is reached (scheduler can't open files)
- Kanban dispatcher fails (
kanban_db.py fails first at line 1111)
- Gateway becomes unresponsive and requires manual restart
- 10+ unexpected restarts observed in one month on this setup
Proposed Fix
Fix 1: Require API_SERVER_KEY for platform to be loaded
# config.py line 1486 — change OR to AND
if api_server_enabled and api_server_key:
config.platforms[Platform.API_SERVER] = PlatformConfig()
Fix 2: Fix the connected checker to validate key presence
_PLATFORM_CONNECTED_CHECKERS = {
Platform.API_SERVER: lambda cfg: bool(cfg.extra.get("key")) if cfg else False,
...
}
Fix 3: Add close() to ResponseStore and call it on adapter teardown
# api_server.py — ResponseStore
def close(self):
if self._conn:
self._conn.close()
self._conn = None
# api_server.py — APIServerAdapter
def stop(self):
if self._response_store:
self._response_store.close()
self._response_store = None
...
Alternatively, make ResponseStore a process-wide singleton so repeated adapter instantiation does not create new SQLite connections.
Verification
# Count response_store.db FDs on gateway PID
lsof -p $(pgrep -f 'hermes_cli.main gateway') 2>/dev/null | grep response_store.db | wc -l
# Should be stable at ~3 (one set of db+wal+shm) after fix
# Before fix: grows by ~3 every gateway restart or retry cycle
Bug Description
The api_server platform accumulates file descriptors (FDs) over time due to SQLite WAL connections in ResponseStore not being properly closed during platform retry cycles.
Environment
Reproduction
API_SERVER_ENABLED=truein~/.hermes/.env(with noAPI_SERVER_KEYset)lsof -p $(pgrep -f 'hermes_cli.main gateway') | grep response_store.db | wc -lresponse_store.dbon a single gateway process (PID 42506)Root Cause
Three contributing factors:
1. api_server auto-enabled without API_SERVER_KEY
In
gateway/config.pyline 1486:The platform is instantiated even when only
API_SERVER_ENABLED=trueis set, without a validAPI_SERVER_KEY. The HTTP server refuses to start (Refusing to start: API_SERVER_KEY is required) but the adapter is still loaded into the gateway.2. Connected check always returns True
In
gateway/config.pyline 425:The api_server is always reported as "connected" regardless of whether it is actually running. This is misleading and may prevent proper retry/recovery logic.
3. ResponseStore opened at init, never closed
In
gateway/platforms/api_server.pyline 706:ResponseStore.__init__opens a SQLite connection with WAL mode (sqlite3.connect(..., check_same_thread=False)+apply_wal_with_fallback). This is called at adapter__init__time, not when the HTTP server starts. The connection is never explicitly closed — noclose()method is defined onResponseStore, andAPIServerAdapterhas no teardown logic for the store.On each gateway restart or platform reconnect cycle, a new
ResponseStoreinstance may be created while old ones are not garbage-collected, leading to accumulation of SQLite WAL file handles.Impact
OSError: [Errno 24] Too many open filesafter ~17–24 hours of uptimekanban_db.pyfails first at line 1111)Proposed Fix
Fix 1: Require API_SERVER_KEY for platform to be loaded
Fix 2: Fix the connected checker to validate key presence
Fix 3: Add close() to ResponseStore and call it on adapter teardown
Alternatively, make
ResponseStorea process-wide singleton so repeated adapter instantiation does not create new SQLite connections.Verification