fix: Bound Telegram httpx general-pool and drain after reconnect (#31599) by datin-antasena · Pull Request #37400 · NousResearch/hermes-agent

datin-antasena · 2026-06-02T13:02:57Z

Summary

Fixes the Telegram adapter FD leak where half-closed httpx sockets accumulate in the general request pool (_request[1]) through proxy reconnection cycles, eventually hitting OSError: [Errno 24] Too many open files after ~2 days.

Credit: Excellent root cause analysis by @JustinHuber in #31599. The distinction between polling pool (_request[0], already drained) and general pool (_request[1], never drained) was the key insight.

Root Cause

connection_pool_size=512 in request_kwargs — far more than any single adapter needs
_drain_polling_connections() only resets _request[0] (polling). _request[1] (general, used by send_message/set_my_commands) is never drained
_http_client_limits.py already provides platform_httpx_limits() with tighter defaults (keepalive 2s, max 10 idle), used by Signal, WeCom, DingTalk, BlueBubbles — but Telegram wasn't using it

Fix

Three changes, minimal surface:

Bound the general pool — connection_pool_size reduced from 512 to 20. platform_httpx_limits() now applied to both HTTPXRequest instances via httpx_kwargs["limits"], giving max_connections=20, max_keepalive_connections=10, keepalive_expiry=2.0. HERMES_TELEGRAM_HTTP_POOL_SIZE env var still works as override.
Drain general pool after reconnect — Added _drain_general_connections() (calls _drain_request_connections(1, "General")). Called after successful polling reconnect in _handle_polling_network_error, when the send path is already marked degraded. Not called during normal polling drain to avoid interrupting active sends.
Refactored drain helper — _drain_request_connections(request_index, label) is the shared implementation. _drain_polling_connections() and _drain_general_connections() are thin wrappers.

Tests

test_connect_bounds_httpx_request_pools_with_shared_limits — verifies limits applied correctly
test_connect_keeps_telegram_pool_size_env_override — verifies env var override still works
test_reconnect_drains_polling_request_only — polling drain helper doesn't touch general pool
test_reconnect_success_drains_general_request_after_polling_resumes — general pool drained after successful reconnect
567 Telegram tests pass (canonical per-file runner)

Why not drain general pool on every reconnect attempt?

The general pool serves send_message / edit_message. Draining it mid-reconnect (before polling resumes) would interrupt in-flight user-facing sends. We drain it only after polling successfully resumes, when the send path is already degraded and callers should be using fallback delivery.

Closes #31599

Fix Telegram httpx pool FD leak

a5c376e

datin-antasena mentioned this pull request Jun 2, 2026

Telegram adapter leaks httpx general-pool connections through HTTP proxy (CLOSED sockets accumulate, fd limit hit after ~2 days) #31599

Open

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/telegram Telegram bot adapter labels Jun 2, 2026

konsisumer mentioned this pull request Jun 4, 2026

fix(gateway): tighten Telegram proxy-pool keepalive to stop fd leak #31687

Closed

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Bound Telegram httpx general-pool and drain after reconnect (#31599)#37400

fix: Bound Telegram httpx general-pool and drain after reconnect (#31599)#37400
datin-antasena wants to merge 1 commit into
NousResearch:mainfrom
datin-antasena:fix/telegram-httpx-pool-fd-leak

datin-antasena commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

datin-antasena commented Jun 2, 2026

Summary

Root Cause

Fix

Tests

Why not drain general pool on every reconnect attempt?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants