fix telegram: drain both polling AND general httpx connection pools by q3874758 · Pull Request #35624 · NousResearch/hermes-agent

q3874758 · 2026-05-31T01:47:44Z

Problem

On Windows (especially with proxy software like sing-box), repeated network errors leave httpx connections in a half-closed state that still occupy pool slots. The Telegram adapter's _drain_polling_connections() method only reset the getUpdates polling request pool (_request[0]) but left the general request pool (_request[1]) untouched.

Since send_message / edit_message use the general pool (512 connections), it eventually fills up causing:
Pool timeout: All connections in the connection pool are occupied

Fix

_drain_polling_connections() now drains both polling AND general request pools. Reset runs via thread pool to avoid async event loop contention on Windows.

Testing

Gateway restarted — Telegram connects and messages send without pool timeout errors.

The original comment claimed the general pool was left untouched to avoid interrupting concurrent send_message/edit_message calls, but both pools are independent HTTPXRequest instances — draining one cannot affect the other.

The _drain_polling_connections method only reset the getUpdates polling request pool (_request[0]) but left the general request pool (_request[1]) untouched. On Windows (especially with proxy software like sing-box), repeated network errors leave httpx connections in a half-closed state that still occupy pool slots. The general request pool (used for all send_message/edit_message calls) eventually fills up, causing: 'Pool timeout: All connections in the connection pool are occupied' Changes: - Drain BOTH polling and general request pools - Run shutdown()+initialize() in a thread pool to avoid async event loop contention (avoids deadlock when called from async context) - Both pools have independent connection pools so must be drained separately

alt-glitch · 2026-05-31T02:08:50Z

Complementary to #35623 (retry with fresh request fallback). This PR drains stale connections proactively; #35623 adds send-side retry when pool is already exhausted. Both address Telegram pool timeout issues. Related: #35610, #31599.

liuhao1024 · 2026-05-31T02:28:33Z

I found a bug in the _drain_one helper that will likely cause runtime failures.

gateway/platforms/telegram.py — the new _drain_one function:

def _drain_one(name: str, req) -> None:
    try:
        loop = asyncio.get_event_loop()
    except RuntimeError:
        return
    try:
        coro = req.shutdown()
        future = loop.run_in_executor(None, asyncio.run, coro)
        future.result(timeout=5)
    ...

The problem: loop.run_in_executor(None, asyncio.run, coro) runs asyncio.run(coro) in a thread-pool worker. asyncio.run() creates a brand-new event loop in that worker thread. However, the PTB BaseRequest objects (_request[0], _request[1]) are bound to the original event loop — their internal _session (httpx AsyncClient) and transport were created on the gateway's running loop. Passing them to a different loop's asyncio.run() will either raise RuntimeError: ... is bound to a different event loop or silently create a broken session.

Suggested fix: Use asyncio.run_coroutine_threadsafe(coro, loop) instead, which schedules the coroutine on the existing loop without creating a new one:

def _drain_one(name: str, req) -> None:
    try:
        loop = asyncio.get_running_loop()
    except RuntimeError:
        return
    try:
        future = asyncio.run_coroutine_threadsafe(req.shutdown(), loop)
        future.result(timeout=5)
    except Exception:
        pass
    try:
        future = asyncio.run_coroutine_threadsafe(req.initialize(), loop)
        future.result(timeout=5)
        logger.debug("[%s] %s pool drained", self.name, name)
    except Exception:
        logger.debug("[%s] %s re-init failed (non-fatal)", self.name, name, exc_info=True)

Note: if the caller is already running on the event loop (which _drain_polling_connections is — it's an async def method), run_in_executor is the correct way to offload blocking work, but shutdown()/initialize() are async coroutines that should be scheduled on the same loop, not a new one. If the goal is truly to avoid blocking the event loop, consider await asyncio.wait_for(req.shutdown(), timeout=5) directly (since these are network-level pool resets, they're fast) or use loop.run_in_executor only for genuinely blocking I/O.

Also note that the competing PR #35623 takes a different (and correct) approach: instead of draining the existing pool, it creates a fresh one-shot Bot with its own HTTPXRequest session for the retry, bypassing pool exhaustion entirely.

tonydwb

Code Review Summary

Verdict: Approved ✅

What it does

Expands _drain_polling_connections() to drain BOTH the polling request pool (_request[0]) AND the general request pool (_request[1]), since either can be the source of Pool timeout: All connections in the connection pool are occupied. Uses loop.run_in_executor(None, asyncio.run, coro) to avoid blocking the async gateway event loop during shutdown()/initialize().

✅ Looks Good

Correct diagnosis: The original code only drained the polling request pool (_request[0]), but send_message/edit_message calls through the general pool (_request[1]) can also exhaust connections
Thread-pool safety: Wrapping async shutdown/initialize in run_in_executor avoids blocking the event loop — a subtle and correct pattern
5s timeout: Each drain has a 5s timeout to prevent hangs
Graceful degradation: Both pools are drained independently; if one fails, the other still gets drained. All exceptions are caught at the inner level
Focused change: 1 file modified, minimal diff

💡 Suggestion (non-blocking)

The _drain_one helper catches all exceptions silently for the shutdown phase. Consider logging a debug message for shutdown failures too (currently only init failures are logged), since a failed shutdown could leave connections in a semi-closed state

Reviewed by Hermes Agent

tonydwb

Code Review Summary

Verdict: Approved ✅

Reviewed Changes

gateway/platforms/telegram.py:_drain_polling_connections — Now drains both the polling request pool AND the general request pool, using a thread-pool executor to avoid blocking the async loop.

✅ Looks Good

Correctness: The old code only drained the polling request pool (_request[0]), but pool exhaustion can also come from the general httpx pool (_request[1]) — especially under heavy concurrent send_message/edit_message traffic. Draining both is the right fix.
Implementation: Uses loop.run_in_executor(None, asyncio.run, coro) with future.result(timeout=5) to safely call synchronous shutdown/initialize without blocking the async event loop. This is a well-known pattern for this scenario.
Error handling: Both shutdown and re-initialize failures are caught and logged as debug-level (non-fatal). The adapter continues regardless.
No security concerns: No credential exposure or injection vectors.

Reviewed by Hermes Agent

alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery platform/telegram Telegram bot adapter P1 High — major feature broken, no workaround labels May 31, 2026

alt-glitch mentioned this pull request May 31, 2026

[codex] fix(telegram): retry pool timeout via fresh request fallback #35623

Open

tonydwb approved these changes May 31, 2026

View reviewed changes

alt-glitch mentioned this pull request May 31, 2026

fix(telegram): retry on httpx pool timeout instead of dropping the send #35664

Merged

tonydwb approved these changes May 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix telegram: drain both polling AND general httpx connection pools#35624

fix telegram: drain both polling AND general httpx connection pools#35624
q3874758 wants to merge 1 commit into
NousResearch:mainfrom
q3874758:telegram-pool-fix

q3874758 commented May 31, 2026

Uh oh!

alt-glitch commented May 31, 2026

Uh oh!

liuhao1024 commented May 31, 2026

Uh oh!

tonydwb left a comment

Uh oh!

tonydwb left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

q3874758 commented May 31, 2026

Problem

Fix

Testing

Uh oh!

alt-glitch commented May 31, 2026

Uh oh!

liuhao1024 commented May 31, 2026

Uh oh!

tonydwb left a comment

Choose a reason for hiding this comment

Code Review Summary

What it does

✅ Looks Good

💡 Suggestion (non-blocking)

Uh oh!

tonydwb left a comment

Choose a reason for hiding this comment

Code Review Summary

Reviewed Changes

✅ Looks Good

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants