fix: Bound Telegram httpx general-pool and drain after reconnect (#31599)#37400
Open
datin-antasena wants to merge 1 commit into
Open
fix: Bound Telegram httpx general-pool and drain after reconnect (#31599)#37400datin-antasena wants to merge 1 commit into
datin-antasena wants to merge 1 commit into
Conversation
19 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the Telegram adapter FD leak where half-closed httpx sockets accumulate in the general request pool (
_request[1]) through proxy reconnection cycles, eventually hittingOSError: [Errno 24] Too many open filesafter ~2 days.Credit: Excellent root cause analysis by @JustinHuber in #31599. The distinction between polling pool (
_request[0], already drained) and general pool (_request[1], never drained) was the key insight.Root Cause
connection_pool_size=512inrequest_kwargs— far more than any single adapter needs_drain_polling_connections()only resets_request[0](polling)._request[1](general, used bysend_message/set_my_commands) is never drained_http_client_limits.pyalready providesplatform_httpx_limits()with tighter defaults (keepalive 2s, max 10 idle), used by Signal, WeCom, DingTalk, BlueBubbles — but Telegram wasn't using itFix
Three changes, minimal surface:
Bound the general pool —
connection_pool_sizereduced from 512 to 20.platform_httpx_limits()now applied to both HTTPXRequest instances viahttpx_kwargs["limits"], givingmax_connections=20,max_keepalive_connections=10,keepalive_expiry=2.0.HERMES_TELEGRAM_HTTP_POOL_SIZEenv var still works as override.Drain general pool after reconnect — Added
_drain_general_connections()(calls_drain_request_connections(1, "General")). Called after successful polling reconnect in_handle_polling_network_error, when the send path is already marked degraded. Not called during normal polling drain to avoid interrupting active sends.Refactored drain helper —
_drain_request_connections(request_index, label)is the shared implementation._drain_polling_connections()and_drain_general_connections()are thin wrappers.Tests
test_connect_bounds_httpx_request_pools_with_shared_limits— verifies limits applied correctlytest_connect_keeps_telegram_pool_size_env_override— verifies env var override still workstest_reconnect_drains_polling_request_only— polling drain helper doesn't touch general pooltest_reconnect_success_drains_general_request_after_polling_resumes— general pool drained after successful reconnectWhy not drain general pool on every reconnect attempt?
The general pool serves
send_message/edit_message. Draining it mid-reconnect (before polling resumes) would interrupt in-flight user-facing sends. We drain it only after polling successfully resumes, when the send path is already degraded and callers should be using fallback delivery.Closes #31599