fix telegram: drain both polling AND general httpx connection pools#35624
fix telegram: drain both polling AND general httpx connection pools#35624q3874758 wants to merge 1 commit into
Conversation
The _drain_polling_connections method only reset the getUpdates polling request pool (_request[0]) but left the general request pool (_request[1]) untouched. On Windows (especially with proxy software like sing-box), repeated network errors leave httpx connections in a half-closed state that still occupy pool slots. The general request pool (used for all send_message/edit_message calls) eventually fills up, causing: 'Pool timeout: All connections in the connection pool are occupied' Changes: - Drain BOTH polling and general request pools - Run shutdown()+initialize() in a thread pool to avoid async event loop contention (avoids deadlock when called from async context) - Both pools have independent connection pools so must be drained separately
|
I found a bug in the
def _drain_one(name: str, req) -> None:
try:
loop = asyncio.get_event_loop()
except RuntimeError:
return
try:
coro = req.shutdown()
future = loop.run_in_executor(None, asyncio.run, coro)
future.result(timeout=5)
...The problem: Suggested fix: Use def _drain_one(name: str, req) -> None:
try:
loop = asyncio.get_running_loop()
except RuntimeError:
return
try:
future = asyncio.run_coroutine_threadsafe(req.shutdown(), loop)
future.result(timeout=5)
except Exception:
pass
try:
future = asyncio.run_coroutine_threadsafe(req.initialize(), loop)
future.result(timeout=5)
logger.debug("[%s] %s pool drained", self.name, name)
except Exception:
logger.debug("[%s] %s re-init failed (non-fatal)", self.name, name, exc_info=True)Note: if the caller is already running on the event loop (which Also note that the competing PR #35623 takes a different (and correct) approach: instead of draining the existing pool, it creates a fresh one-shot |
tonydwb
left a comment
There was a problem hiding this comment.
Code Review Summary
Verdict: Approved ✅
What it does
Expands _drain_polling_connections() to drain BOTH the polling request pool (_request[0]) AND the general request pool (_request[1]), since either can be the source of Pool timeout: All connections in the connection pool are occupied. Uses loop.run_in_executor(None, asyncio.run, coro) to avoid blocking the async gateway event loop during shutdown()/initialize().
✅ Looks Good
- Correct diagnosis: The original code only drained the polling request pool (
_request[0]), butsend_message/edit_messagecalls through the general pool (_request[1]) can also exhaust connections - Thread-pool safety: Wrapping async shutdown/initialize in
run_in_executoravoids blocking the event loop — a subtle and correct pattern - 5s timeout: Each drain has a 5s timeout to prevent hangs
- Graceful degradation: Both pools are drained independently; if one fails, the other still gets drained. All exceptions are caught at the inner level
- Focused change: 1 file modified, minimal diff
💡 Suggestion (non-blocking)
- The
_drain_onehelper catches all exceptions silently for the shutdown phase. Consider logging a debug message for shutdown failures too (currently only init failures are logged), since a failed shutdown could leave connections in a semi-closed state
Reviewed by Hermes Agent
tonydwb
left a comment
There was a problem hiding this comment.
Code Review Summary
Verdict: Approved ✅
Reviewed Changes
- gateway/platforms/telegram.py:_drain_polling_connections — Now drains both the polling request pool AND the general request pool, using a thread-pool executor to avoid blocking the async loop.
✅ Looks Good
- Correctness: The old code only drained the polling request pool (
_request[0]), but pool exhaustion can also come from the general httpx pool (_request[1]) — especially under heavy concurrentsend_message/edit_messagetraffic. Draining both is the right fix. - Implementation: Uses
loop.run_in_executor(None, asyncio.run, coro)withfuture.result(timeout=5)to safely call synchronous shutdown/initialize without blocking the async event loop. This is a well-known pattern for this scenario. - Error handling: Both shutdown and re-initialize failures are caught and logged as debug-level (non-fatal). The adapter continues regardless.
- No security concerns: No credential exposure or injection vectors.
Reviewed by Hermes Agent
Problem
On Windows (especially with proxy software like sing-box), repeated network errors leave httpx connections in a half-closed state that still occupy pool slots. The Telegram adapter's _drain_polling_connections() method only reset the getUpdates polling request pool (_request[0]) but left the general request pool (_request[1]) untouched.
Since send_message / edit_message use the general pool (512 connections), it eventually fills up causing:
Pool timeout: All connections in the connection pool are occupied
Fix
_drain_polling_connections() now drains both polling AND general request pools. Reset runs via thread pool to avoid async event loop contention on Windows.
Testing
Gateway restarted — Telegram connects and messages send without pool timeout errors.
The original comment claimed the general pool was left untouched to avoid interrupting concurrent send_message/edit_message calls, but both pools are independent HTTPXRequest instances — draining one cannot affect the other.