File descriptor leak in api_server platform: ResponseStore SQLite connections not closed on retry

## Bug Description

The api_server platform accumulates file descriptors (FDs) over time due to SQLite WAL connections in ResponseStore not being properly closed during platform retry cycles.

## Environment

- **OS:** macOS 26.4.1 (Mac Mini)
- **Installation:** Homebrew
- **Hermes version:** (latest via Homebrew)
- **Gateway PID:** 42506, uptime ~12 hours
- **FD limit:** 65,535 (raised from Mac default 256)

## Reproduction

1. Enable api_server platform via `API_SERVER_ENABLED=true` in `~/.hermes/.env` (with no `API_SERVER_KEY` set)
2. Observe FD count: `lsof -p $(pgrep -f 'hermes_cli.main gateway') | grep response_store.db | wc -l`
3. After 12 hours: 122+ FDs pointing to `response_store.db` on a single gateway process (PID 42506)
4. This equals ~41 complete SQLite WAL connection sets (main db + WAL + SHM = 3 FDs each)

## Root Cause

Three contributing factors:

### 1. api_server auto-enabled without API_SERVER_KEY

In `gateway/config.py` line 1486:
```python
if api_server_enabled or api_server_key:
    config.platforms[Platform.API_SERVER] = PlatformConfig()
```

The platform is instantiated even when only `API_SERVER_ENABLED=true` is set, without a valid `API_SERVER_KEY`. The HTTP server refuses to start (`Refusing to start: API_SERVER_KEY is required`) but the adapter is still loaded into the gateway.

### 2. Connected check always returns True

In `gateway/config.py` line 425:
```python
_PLATFORM_CONNECTED_CHECKERS = {
    Platform.API_SERVER: lambda cfg: True,  # always returns True
    ...
}
```

The api_server is always reported as "connected" regardless of whether it is actually running. This is misleading and may prevent proper retry/recovery logic.

### 3. ResponseStore opened at __init__, never closed

In `gateway/platforms/api_server.py` line 706:
```python
class APIServerAdapter(BasePlatformAdapter):
    def __init__(self, config: PlatformConfig):
        ...
        self._response_store = ResponseStore()  # line 706
```

`ResponseStore.__init__` opens a SQLite connection with WAL mode (`sqlite3.connect(..., check_same_thread=False)` + `apply_wal_with_fallback`). This is called at adapter `__init__` time, not when the HTTP server starts. The connection is never explicitly closed — no `close()` method is defined on `ResponseStore`, and `APIServerAdapter` has no teardown logic for the store.

On each gateway restart or platform reconnect cycle, a new `ResponseStore` instance may be created while old ones are not garbage-collected, leading to accumulation of SQLite WAL file handles.

## Impact

- Gateway hits `OSError: [Errno 24] Too many open files` after ~17–24 hours of uptime
- Cron jobs fail silently when the FD limit is reached (scheduler can't open files)
- Kanban dispatcher fails (`kanban_db.py` fails first at line 1111)
- Gateway becomes unresponsive and requires manual restart
- 10+ unexpected restarts observed in one month on this setup

## Proposed Fix

### Fix 1: Require API_SERVER_KEY for platform to be loaded
```python
# config.py line 1486 — change OR to AND
if api_server_enabled and api_server_key:
    config.platforms[Platform.API_SERVER] = PlatformConfig()
```

### Fix 2: Fix the connected checker to validate key presence
```python
_PLATFORM_CONNECTED_CHECKERS = {
    Platform.API_SERVER: lambda cfg: bool(cfg.extra.get("key")) if cfg else False,
    ...
}
```

### Fix 3: Add close() to ResponseStore and call it on adapter teardown

```python
# api_server.py — ResponseStore
def close(self):
    if self._conn:
        self._conn.close()
        self._conn = None

# api_server.py — APIServerAdapter
def stop(self):
    if self._response_store:
        self._response_store.close()
        self._response_store = None
    ...
```

Alternatively, make `ResponseStore` a process-wide singleton so repeated adapter instantiation does not create new SQLite connections.

## Verification

```bash
# Count response_store.db FDs on gateway PID
lsof -p $(pgrep -f 'hermes_cli.main gateway') 2>/dev/null | grep response_store.db | wc -l

# Should be stable at ~3 (one set of db+wal+shm) after fix
# Before fix: grows by ~3 every gateway restart or retry cycle
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

File descriptor leak in api_server platform: ResponseStore SQLite connections not closed on retry #36111

Bug Description

Environment

Reproduction

Root Cause

1. api_server auto-enabled without API_SERVER_KEY

2. Connected check always returns True

3. ResponseStore opened at init, never closed

Impact

Proposed Fix

Fix 1: Require API_SERVER_KEY for platform to be loaded

Fix 2: Fix the connected checker to validate key presence

Fix 3: Add close() to ResponseStore and call it on adapter teardown

Verification

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

File descriptor leak in api_server platform: ResponseStore SQLite connections not closed on retry #36111

Description

Bug Description

Environment

Reproduction

Root Cause

1. api_server auto-enabled without API_SERVER_KEY

2. Connected check always returns True

3. ResponseStore opened at init, never closed

Impact

Proposed Fix

Fix 1: Require API_SERVER_KEY for platform to be loaded

Fix 2: Fix the connected checker to validate key presence

Fix 3: Add close() to ResponseStore and call it on adapter teardown

Verification

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions