Skip to content

fix(gateway): drain stale httpx polling connections on Telegram reconnect#17015

Merged
kshitijk4poor merged 2 commits into
mainfrom
salvage/telegram-pool-drain
Apr 28, 2026
Merged

fix(gateway): drain stale httpx polling connections on Telegram reconnect#17015
kshitijk4poor merged 2 commits into
mainfrom
salvage/telegram-pool-drain

Conversation

@kshitijk4poor

@kshitijk4poor kshitijk4poor commented Apr 28, 2026

Copy link
Copy Markdown
Collaborator

Summary

Salvage of #16466 by @Mirac1eSky — drains stale httpx connections during Telegram polling reconnect to prevent pool exhaustion through proxy-related network errors.

What the original PR identified

When Telegram polling drops through a proxy (e.g. sing-box), updater.stop() + start_polling() leaves the underlying httpx connections in a half-closed state. After enough cycles the default 256-connection pool fills up, causing:

Pool timeout: All connections in the connection pool are occupied.

What changed from the original

The original PR called bot.shutdown() + bot.initialize() which cycles both httpx connection pools:

  • _request[0] — getUpdates (polling only)
  • _request[1] — general (send_message, edit_message, etc.)

This creates a race condition: any concurrent send_message/edit_message call hitting _request[1] between shutdown and re-initialize gets RuntimeError("This HTTPXRequest is not initialized!"). Additionally, bot.initialize() calls get_me() (a network round-trip) which is likely to fail during network error recovery.

Our fix targets only _request[0] (the polling request) via HTTPXRequest.shutdown() + HTTPXRequest.initialize() directly. The general request is never touched, so concurrent message sends are safe. No get_me() call is made.

Additional improvements over original

Aspect Original PR This salvage
Scope bot.shutdown() — kills all connections _request[0] only — polling connections only
Race condition Yes — concurrent sends fail with RuntimeError None — general request untouched
Network call get_me() during initialize (likely fails) No network call — just rebuilds httpx client
Coverage _handle_polling_network_error only Both _handle_polling_network_error AND _handle_polling_conflict
Code structure Inline in one method Shared _drain_polling_connections() helper
Fault isolation Single try block — shutdown failure skips initialize Separate try blocks — initialize always attempted even if shutdown fails
Diagnosability Silent except: pass DEBUG-level logging with exc_info on failures
PTB coupling Undocumented Docstring notes PTB 22.x _request tuple structure, flags for PTB 23+ review

Tests

9 tests pass (4 existing + 5 new):

  • test_reconnect_drains_polling_request_only — verifies only _request[0] is cycled, _request[1] untouched
  • test_reconnect_continues_if_drain_fails — both shutdown + initialize fail, reconnect still proceeds
  • test_initialize_still_runs_when_shutdown_fails — shutdown raises but initialize is still called (separate try blocks)
  • test_conflict_retry_also_drains_polling_connections — conflict path also drains
  • test_drain_helper_noop_without_app — graceful no-op when app is None

E2E validated with realistic MockHTTPXRequest objects tracking shutdown/initialize state.

Full gateway suite: 3844 passed, 61 skipped, 13 failed (all 13 failures pre-existing on main).

Files changed

  • gateway/platforms/telegram.py — new _drain_polling_connections() helper + calls in both reconnect paths (+40 lines)
  • tests/gateway/test_telegram_network_reconnect.py — 5 new tests (+120 lines)
  • scripts/release.py — add Mirac1eSky to AUTHOR_MAP (+1 line)

@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/gateway Gateway runner, session dispatch, delivery platform/telegram Telegram bot adapter labels Apr 28, 2026
@alt-glitch

Copy link
Copy Markdown
Collaborator

Improved salvage of #16466 — targets only the polling request object to avoid race conditions with concurrent sends. Supersedes #16466.

Mirac1eSky and others added 2 commits April 28, 2026 19:05
…nect

Network errors through proxies (e.g. sing-box) can leave httpx
connections in a half-closed state occupying pool slots.  After enough
reconnect cycles the 256-connection default fills up entirely, causing
Pool timeout: All connections in the connection pool are occupied.

Fix: cycle only the getUpdates request object (_request[0]) via
shut-down + re-initialize before restarting polling.  This drains stale
connections without touching the general request (_request[1]) that
concurrent send_message / edit_message calls rely on.

The drain is applied to both _handle_polling_network_error and
_handle_polling_conflict reconnect paths via a shared
_drain_polling_connections() helper.  Failures in the drain are
swallowed so reconnect always proceeds.

Based on #16466 by @Mirac1eSky.
@kshitijk4poor kshitijk4poor force-pushed the salvage/telegram-pool-drain branch from 2bf8240 to 8f0e2b8 Compare April 28, 2026 13:35
@kshitijk4poor kshitijk4poor merged commit b5905f0 into main Apr 28, 2026
11 of 12 checks passed
@kshitijk4poor kshitijk4poor deleted the salvage/telegram-pool-drain branch April 28, 2026 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery P1 High — major feature broken, no workaround platform/telegram Telegram bot adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants