fix(gateway): tighten Telegram proxy-pool keepalive to stop fd leak#31687
Closed
konsisumer wants to merge 1 commit into
Closed
fix(gateway): tighten Telegram proxy-pool keepalive to stop fd leak#31687konsisumer wants to merge 1 commit into
konsisumer wants to merge 1 commit into
Conversation
The proxy-path HTTPXRequest pools only set max_connections, leaving keepalive_expiry at httpx's 5s default. Behind a flaky proxy, half-closed sockets accumulate faster than they drain and exhaust the fd budget after days of operation. Reuse the shared NousResearch#18451 keepalive tuning while keeping the configured max_connections ceiling. Refs NousResearch#31599
This was referenced May 25, 2026
Contributor
Author
|
Closing — deferring to #37400 by @datin-antasena which addresses the same. Reopen if that PR stalls. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Tightens keepalive eviction on the Telegram proxy-path HTTPXRequest pools so half-closed sockets stop accumulating until the process hits its fd limit.
What does this PR do?
Behind a flaky local HTTP proxy, the Telegram adapter's general request pool accumulates hundreds of half-closed (
CLOSED) sockets over days of operation until the process exceeds its fd budget and everybot.send_message()/set_my_commands()fails withhttpx.ConnectError: All connection attempts failed(#31599).Root cause: PTB's
HTTPXRequestderiveshttpx.Limitsfromconnection_pool_sizebut only setsmax_connections, leavingmax_keepalive_connections/keepalive_expiryat httpx's defaults (20 / 5s). Telegram is the only long-lived httpx client in the gateway not using the sharedplatform_httpx_limits()keepalive hardening added for the same fd-exhaustion-through-proxy class of bug in #18451 — every other persistent adapter (QQ Bot, Feishu, WeCom, DingTalk, Signal, BlueBubbles, WeCom-callback) already does.This PR closes that gap for the proxy path: it builds the proxy-path
HTTPXRequestpools with the shared #18451 keepalive tuning (shorterkeepalive_expiryso idle/dead sockets drain promptly) while keepingmax_connectionsat the configured pool size, so concurrent sends are unaffected and the deliberate 512-slot pool is preserved. The helper returns empty kwargs when httpx is unavailable, soHTTPXRequestfalls back to its own limits.Related Issue
Refs #31599
This addresses the proxy-path keepalive accumulation (the reporter's suggested fix 1, adapted to preserve
max_connections). The periodic general-pool drain (fix 2) and send-path heartbeat observability (fix 3) the reporter also proposed are left as follow-ups, so this isRefs, notCloses.Type of Change
Changes Made
gateway/platforms/telegram.py: add_proxy_request_httpx_kwargs()that reusesplatform_httpx_limits()([Bug]: CLOSE_WAIT fd leak causes all platforms to stop responding after ~1-2 hours #18451) to bound keepalive while keepingmax_connectionsat the configured pool size; pass it into the proxy-pathHTTPXRequestconstructions.tests/gateway/test_telegram_proxy_pool_limits.py: regression tests asserting keepalive is tightened below httpx's 5s default, themax_connectionsceiling is preserved, kwargs are empty when httpx is unavailable, and the shared keepalive env overrides apply.How to Test
pytest tests/gateway/test_telegram_proxy_pool_limits.py tests/gateway/test_platform_http_client_limits.py -q— all pass.httpx.Limits(keepalive_expiry< 5s, boundedmax_keepalive_connections,max_connectionsstill equal toHERMES_TELEGRAM_HTTP_POOL_SIZE), so idle/half-closed sockets through the proxy drain promptly instead of piling up asCLOSEDfds.What platforms tested on
pytest+ruff checkpass; the change is pure-Python httpx pool configuration with no platform-specific syscalls.Checklist
Code
fix(scope):,feat(scope):, etc.)pytest tests/ -qand all tests pass (ran the affected gateway tests)Documentation & Housekeeping
docs/, docstrings) — or N/A (covered by docstrings)cli-config.yaml.exampleif I added/changed config keys — N/A (reuses existingHERMES_GATEWAY_HTTPX_*env overrides)CONTRIBUTING.mdorAGENTS.mdif I changed architecture or workflows — N/A