fix(gateway): drain stale httpx connections on Telegram polling reconnect by Mirac1eSky · Pull Request #16466 · NousResearch/hermes-agent

Mirac1eSky · 2026-04-27T09:02:22Z

Problem

When the Telegram polling connection drops (e.g. proxy interruption, network blip), the _handle_polling_network_error reconnect path calls updater.stop() followed by start_polling(). However, this does not close the underlying httpx connections in the HTTPXRequest connection pool.

Each network error leaves stale/half-closed connections occupying pool slots. After repeated errors (we observed 40+ per day through a sing-box proxy), the default 256-connection pool fills up entirely, causing:

Pool timeout: All connections in the connection pool are occupied.
Request was *not* sent to Telegram.

At this point the bot becomes completely unresponsive — no inbound messages, no outbound replies.

Fix

During reconnect in _handle_polling_network_error(), shut down and re-initialize the bot's request objects before starting a new polling session:

await self._app.bot.shutdown()   # release all stale connections
await self._app.bot.initialize() # create fresh connections

Both steps are wrapped in try/except so a failure in either one doesn't block the reconnect attempt.

Evidence

Verified live: after triggering a network error (restart sing-box), the logs show:

16:26:15  WARNING  Telegram network error, scheduling reconnect
16:26:21  INFO     Bot request objects shut down before reconnect
16:26:22  INFO     Bot request objects re-initialized for reconnect
16:26:22  INFO     Telegram polling resumed after network error (attempt 1)

Tests

3 new tests added to tests/gateway/test_telegram_network_reconnect.py:

test_reconnect_drains_stale_connections — verifies shutdown → initialize → start_polling order
test_reconnect_continues_if_bot_shutdown_fails — shutdown failure doesn't block reconnect
test_reconnect_continues_if_bot_initialize_fails — init failure doesn't block reconnect

All 7 tests in the file pass (4 existing + 3 new).

Platforms tested

Linux (Ubuntu 24.04, Python 3.11)

…nect Network errors (especially through a proxy like sing-box) leave httpx connections in a half-closed state that occupy pool slots. After ~40 errors the 256-connection pool fills up, causing PoolTimeout and making the bot unresponsive to both inbound and outbound messages. Fix: during reconnect in _handle_polling_network_error(), shut down and re-initialize the bot's request objects to release stale connections before starting a new polling session. Regression tests: 3 new tests cover the shutdown/init drain, and graceful continuation if either step fails.

Copilot

Pull request overview

Fixes Telegram polling reconnection in the gateway by explicitly draining and recreating the bot’s underlying HTTPX request objects during network-error recovery, preventing the connection pool from being exhausted after repeated proxy/network interruptions.

Changes:

Add bot.shutdown() + bot.initialize() during _handle_polling_network_error() reconnect flow to release stale httpx pool connections before restarting polling.
Add async tests covering the reconnect flow and ensuring reconnect proceeds even if shutdown/initialize fail.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`gateway/platforms/telegram.py`	Drains/reinitializes bot request objects prior to restarting Telegram polling after transient network errors.
`tests/gateway/test_telegram_network_reconnect.py`	Adds tests for the new reconnect behavior (drain order + resilience to shutdown/initialize failures).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    # The order matters: shutdown → initialize → start_polling
+    call_order = []
+    for call in mock_bot.shutdown.mock_calls:
+        call_order.append("shutdown")
+    for call in mock_updater.stop.mock_calls:
+        call_order.append("stop")
+    for call in mock_bot.initialize.mock_calls:
+        call_order.append("initialize")
+    for call in mock_updater.start_polling.mock_calls:
+        call_order.append("start_polling")
+
+    assert "shutdown" in call_order
+    assert "initialize" in call_order
+    assert call_order.index("shutdown") < call_order.index("start_polling")
+    assert call_order.index("initialize") < call_order.index("start_polling")


+                pass
+            try:
+                await self._app.bot.initialize()
+                logger.debug("[%s] Bot request objects re-initialized for reconnect", self.name)
+            except Exception:
+                pass


+    mock_bot = AsyncMock()
+    mock_bot.shutdown = AsyncMock(side_effect=Exception("shutdown failed"))
+    mock_bot.initialize = AsyncMock()
+


+    mock_bot = AsyncMock()
+    mock_bot.shutdown = AsyncMock()
+    mock_bot.initialize = AsyncMock(side_effect=Exception("init failed"))
+


@Mirac1eSky

…nect Network errors through proxies (e.g. sing-box) can leave httpx connections in a half-closed state occupying pool slots. After enough reconnect cycles the 256-connection default fills up entirely, causing Pool timeout: All connections in the connection pool are occupied. Fix: cycle only the getUpdates request object (_request[0]) via shut-down + re-initialize before restarting polling. This drains stale connections without touching the general request (_request[1]) that concurrent send_message / edit_message calls rely on. The drain is applied to both _handle_polling_network_error and _handle_polling_conflict reconnect paths via a shared _drain_polling_connections() helper. Failures in the drain are swallowed so reconnect always proceeds. Based on #16466 by @Mirac1eSky.

@Mirac1eSky

…nect Network errors through proxies (e.g. sing-box) can leave httpx connections in a half-closed state occupying pool slots. After enough reconnect cycles the 256-connection default fills up entirely, causing Pool timeout: All connections in the connection pool are occupied. Fix: cycle only the getUpdates request object (_request[0]) via shut-down + re-initialize before restarting polling. This drains stale connections without touching the general request (_request[1]) that concurrent send_message / edit_message calls rely on. The drain is applied to both _handle_polling_network_error and _handle_polling_conflict reconnect paths via a shared _drain_polling_connections() helper. Failures in the drain are swallowed so reconnect always proceeds. Based on #16466 by @Mirac1eSky.

kshitijk4poor · 2026-04-28T13:38:03Z

Merged via #17015. Your commit was cherry-picked onto current main with your authorship preserved in git log.

The salvage narrows the fix to only reset the polling request object (_request[0]) instead of calling bot.shutdown() which would cycle both connection pools and race with concurrent send_message/edit_message calls. Also extended the drain to the _handle_polling_conflict path and added separate try blocks so initialize() always runs even if the prior step raises.

Thanks for identifying the root cause — the proxy-related pool exhaustion analysis was spot on!

@Mirac1eSky

…nect Network errors through proxies (e.g. sing-box) can leave httpx connections in a half-closed state occupying pool slots. After enough reconnect cycles the 256-connection default fills up entirely, causing Pool timeout: All connections in the connection pool are occupied. Fix: cycle only the getUpdates request object (_request[0]) via shut-down + re-initialize before restarting polling. This drains stale connections without touching the general request (_request[1]) that concurrent send_message / edit_message calls rely on. The drain is applied to both _handle_polling_network_error and _handle_polling_conflict reconnect paths via a shared _drain_polling_connections() helper. Failures in the drain are swallowed so reconnect always proceeds. Based on #16466 by @Mirac1eSky.

@Mirac1eSky

…nect Network errors through proxies (e.g. sing-box) can leave httpx connections in a half-closed state occupying pool slots. After enough reconnect cycles the 256-connection default fills up entirely, causing Pool timeout: All connections in the connection pool are occupied. Fix: cycle only the getUpdates request object (_request[0]) via shut-down + re-initialize before restarting polling. This drains stale connections without touching the general request (_request[1]) that concurrent send_message / edit_message calls rely on. The drain is applied to both _handle_polling_network_error and _handle_polling_conflict reconnect paths via a shared _drain_polling_connections() helper. Failures in the drain are swallowed so reconnect always proceeds. Based on NousResearch#16466 by @Mirac1eSky.

@Mirac1eSky

…nect Network errors through proxies (e.g. sing-box) can leave httpx connections in a half-closed state occupying pool slots. After enough reconnect cycles the 256-connection default fills up entirely, causing Pool timeout: All connections in the connection pool are occupied. Fix: cycle only the getUpdates request object (_request[0]) via shut-down + re-initialize before restarting polling. This drains stale connections without touching the general request (_request[1]) that concurrent send_message / edit_message calls rely on. The drain is applied to both _handle_polling_network_error and _handle_polling_conflict reconnect paths via a shared _drain_polling_connections() helper. Failures in the drain are swallowed so reconnect always proceeds. Based on NousResearch#16466 by @Mirac1eSky.

@Mirac1eSky

…nect Network errors through proxies (e.g. sing-box) can leave httpx connections in a half-closed state occupying pool slots. After enough reconnect cycles the 256-connection default fills up entirely, causing Pool timeout: All connections in the connection pool are occupied. Fix: cycle only the getUpdates request object (_request[0]) via shut-down + re-initialize before restarting polling. This drains stale connections without touching the general request (_request[1]) that concurrent send_message / edit_message calls rely on. The drain is applied to both _handle_polling_network_error and _handle_polling_conflict reconnect paths via a shared _drain_polling_connections() helper. Failures in the drain are swallowed so reconnect always proceeds. Based on NousResearch#16466 by @Mirac1eSky.

@Mirac1eSky

…nect Network errors through proxies (e.g. sing-box) can leave httpx connections in a half-closed state occupying pool slots. After enough reconnect cycles the 256-connection default fills up entirely, causing Pool timeout: All connections in the connection pool are occupied. Fix: cycle only the getUpdates request object (_request[0]) via shut-down + re-initialize before restarting polling. This drains stale connections without touching the general request (_request[1]) that concurrent send_message / edit_message calls rely on. The drain is applied to both _handle_polling_network_error and _handle_polling_conflict reconnect paths via a shared _drain_polling_connections() helper. Failures in the drain are swallowed so reconnect always proceeds. Based on NousResearch#16466 by @Mirac1eSky.

@Mirac1eSky

…nect Network errors through proxies (e.g. sing-box) can leave httpx connections in a half-closed state occupying pool slots. After enough reconnect cycles the 256-connection default fills up entirely, causing Pool timeout: All connections in the connection pool are occupied. Fix: cycle only the getUpdates request object (_request[0]) via shut-down + re-initialize before restarting polling. This drains stale connections without touching the general request (_request[1]) that concurrent send_message / edit_message calls rely on. The drain is applied to both _handle_polling_network_error and _handle_polling_conflict reconnect paths via a shared _drain_polling_connections() helper. Failures in the drain are swallowed so reconnect always proceeds. Based on NousResearch#16466 by @Mirac1eSky.

@Mirac1eSky

…nect Network errors through proxies (e.g. sing-box) can leave httpx connections in a half-closed state occupying pool slots. After enough reconnect cycles the 256-connection default fills up entirely, causing Pool timeout: All connections in the connection pool are occupied. Fix: cycle only the getUpdates request object (_request[0]) via shut-down + re-initialize before restarting polling. This drains stale connections without touching the general request (_request[1]) that concurrent send_message / edit_message calls rely on. The drain is applied to both _handle_polling_network_error and _handle_polling_conflict reconnect paths via a shared _drain_polling_connections() helper. Failures in the drain are swallowed so reconnect always proceeds. Based on NousResearch#16466 by @Mirac1eSky.

@Mirac1eSky

…nect Network errors through proxies (e.g. sing-box) can leave httpx connections in a half-closed state occupying pool slots. After enough reconnect cycles the 256-connection default fills up entirely, causing Pool timeout: All connections in the connection pool are occupied. Fix: cycle only the getUpdates request object (_request[0]) via shut-down + re-initialize before restarting polling. This drains stale connections without touching the general request (_request[1]) that concurrent send_message / edit_message calls rely on. The drain is applied to both _handle_polling_network_error and _handle_polling_conflict reconnect paths via a shared _drain_polling_connections() helper. Failures in the drain are swallowed so reconnect always proceeds. Based on NousResearch#16466 by @Mirac1eSky.

Copilot AI review requested due to automatic review settings April 27, 2026 09:02

Copilot started reviewing on behalf of Mirac1eSky April 27, 2026 09:02 View session

Copilot AI reviewed Apr 27, 2026

View reviewed changes

alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/gateway Gateway runner, session dispatch, delivery platform/telegram Telegram bot adapter labels Apr 27, 2026

Mirac1eSky force-pushed the fix/telegram-pool-connection-drain branch from 77351d6 to dd94477 Compare April 28, 2026 03:26

kshitijk4poor mentioned this pull request Apr 28, 2026

fix(gateway): drain stale httpx polling connections on Telegram reconnect #17015

Merged

kshitijk4poor closed this Apr 28, 2026

teknium1 mentioned this pull request May 2, 2026

fix(telegram): probe polling liveness after reconnect to detect wedged Updater (salvage #18088) #18751

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): drain stale httpx connections on Telegram polling reconnect#16466

fix(gateway): drain stale httpx connections on Telegram polling reconnect#16466
Mirac1eSky wants to merge 1 commit into
NousResearch:mainfrom
Mirac1eSky:fix/telegram-pool-connection-drain

Mirac1eSky commented Apr 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

kshitijk4poor commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Mirac1eSky commented Apr 27, 2026

Problem

Fix

Evidence

Tests

Platforms tested

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

kshitijk4poor commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants