fix(gateway): close ResponseStore on disconnect and require key to load api_server platform#36180
Open
arimu1 wants to merge 1 commit into
Open
fix(gateway): close ResponseStore on disconnect and require key to load api_server platform#36180arimu1 wants to merge 1 commit into
arimu1 wants to merge 1 commit into
Conversation
…ad api_server Three changes that together eliminate the SQLite WAL file descriptor leak reported in NousResearch#36111: 1. gateway/config.py — require API_SERVER_KEY to load the platform. Previously `API_SERVER_ENABLED=true` without a key caused the adapter to be instantiated (opening a ResponseStore/SQLite connection) even though the HTTP server would immediately refuse to start. Changing the condition from `or` to `if api_server_key` prevents the spurious load. Setting only API_SERVER_KEY (without the explicit flag) still works — the key alone is sufficient intent to enable the platform. 2. gateway/config.py — fix _PLATFORM_CONNECTED_CHECKERS for API_SERVER. The checker was a no-op `lambda: True`; it now returns True only when the platform config has a `key` stored, matching actual readiness. 3. gateway/platforms/api_server.py — call response_store.close() in disconnect(). ResponseStore.close() already existed but was never called on adapter teardown, leaving the SQLite connection (+ WAL + SHM = 3 FDs) open. On each gateway retry cycle a new adapter/store was created while the old FDs were not GC'd, accumulating ~3 FDs per cycle until EMFILE after ~17–24 hours uptime. Fixes NousResearch#36111 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
mxnstrexgl
approved these changes
Jun 1, 2026
mxnstrexgl
left a comment
There was a problem hiding this comment.
LGTM — automated review passed. No security, quality, or test coverage issues detected.
tonydwb
approved these changes
Jun 1, 2026
tonydwb
left a comment
There was a problem hiding this comment.
Code Review Summary
Verdict: Approved ✅
Review
fix(gateway): close ResponseStore on disconnect and require key to load api_server platform
Excellent 3-part fix for a production FD leak. Key observations:
- Root cause analysis: Excellent — three layered causes identified (platform loaded without key, connected checker always true, ResponseStore never closed) with clear evidence (122 open FDs after 12 hours).
- Fix completeness: All three root causes are addressed independently:
if api_server_key(notapi_server_enabled or api_server_key) — prevents platform instantiation without authlambda cfg: bool(cfg.extra.get("key"))— honest connected checkerResponseStore.close()+ null indisconnect()— proper teardown
- Backward compatibility: Setting only API_SERVER_KEY (without API_SERVER_ENABLED) continues to work, matching existing docs.
- Tests confirmed: 32 test_api_server_toolset + 32 test_config pass.
Looks Good
- Minimal, targeted changes (5 lines + 2 deletions)
- Clear commit message
- Each root cause has its own independent fix
Reviewed by Hermes Agent
Author
|
Friendly ping — this has two approvals and no blocking comments. Happy to rebase or make any adjustments if needed before merge. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three targeted changes that together eliminate the SQLite WAL file descriptor leak reported in #36111:
API_SERVER_KEYto load the platform (gateway/config.py)gateway/config.py)ResponseStoreindisconnect()(gateway/platforms/api_server.py)Problem
After ~17–24 hours of uptime on a gateway with
API_SERVER_ENABLED=truebut noAPI_SERVER_KEY, file descriptors accumulated untilOSError: [Errno 24] Too many open filescrashed the gateway. Three layered root causes:Root cause 1 — platform loaded without a key
API_SERVER_ENABLED=truewithoutAPI_SERVER_KEYstill passed theorcondition, instantiatingAPIServerAdapterand opening aResponseStore/SQLite connection, even though the HTTP server immediately refuses to start with"Refusing to start: API_SERVER_KEY is required".Root cause 2 — connected checker always True
Platform.API_SERVER: lambda cfg: Truealways reported the platform as connected, hiding misconfiguration from retry/health logic.Root cause 3 — ResponseStore never closed
ResponseStore.close()existed but was never called on adapter teardown. Each gateway retry cycle created a newResponseStore(3 FDs: db + WAL + SHM) while old connections were not GC'd. The reporter measured 122 open FDs toresponse_store.dbafter 12 hours (~41 leaked connection sets).Changes
gateway/config.pyif api_server_enabled or api_server_key→if api_server_keygateway/config.pylambda: True→lambda cfg: bool(cfg.extra.get("key"))gateway/platforms/api_server.pyself._response_store.close()+ null indisconnect()Backward compatibility: Setting only
API_SERVER_KEY(withoutAPI_SERVER_ENABLED) continues to work — the key alone is sufficient to load the platform, matching the existing docs and the intent of theapi_server_keycheck already present in the block body.Verification
tests/gateway/test_api_server_toolset.py— 32 passedtests/gateway/test_config.py— 32 passedExisting PRs
Searched open and closed PRs — no prior attempt found for this issue.
Checklist
fix(gateway):)AI disclosure: fix developed with Claude Sonnet 4.6 (arimu1 reviewed and approved the diff)