fix(gateway): bound Telegram general pool on proxy path to cap fd leak by konsisumer · Pull Request #32003 · NousResearch/hermes-agent

konsisumer · 2026-05-25T10:11:58Z

Bounds the Telegram adapter's httpx connection pool when a proxy is configured, so half-closed sockets can no longer accumulate until the process hits its file-descriptor limit.

What does this PR do?

When the Telegram adapter routes through a local HTTP proxy, half-closed connections accumulate in the httpx general request pool faster than httpx evicts them — httpcore's tunnel-proxy path does not always release the underlying socket on ConnectError. With the current unbounded connection_pool_size default (512), these dead CLOSED sockets pile up over days of flaky-proxy operation until the process exceeds its fd limit and every send_message / set_my_commands fails with httpx.ConnectError: All connection attempts failed.

This implements the reporter's suggested Refs #1: pass a bounded httpx.Limits into the proxy HTTPXRequest construction. Capping max_connections/max_keepalive_connections bounds the leak and makes it surface immediately instead of silently wedging the gateway after ~2 days. The caps are env-overridable (HERMES_TELEGRAM_PROXY_MAX_CONNECTIONS, HERMES_TELEGRAM_PROXY_MAX_KEEPALIVE) for deployments that legitimately need more.

Scoped to the proxy branch only — the non-proxy and fallback-IP paths are unchanged. The issue's other suggested fixes (periodic drain of _request[1], send-path heartbeat, maxfiles startup warning) are deliberately deferred, hence Refs rather than Fixes.

Related Issue

Refs #31599

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

gateway/platforms/telegram.py: add _bounded_proxy_limits() helper returning an env-overridable httpx.Limits (defaults: max_connections=20, max_keepalive_connections=10); pass it via httpx_kwargs={"limits": ...} to the proxy HTTPXRequest and get_updates_request construction. PTB merges httpx_kwargs on top of its connection_pool_size-derived limits, so the bounded limits win.
tests/gateway/test_telegram_proxy_pool_bound.py: unit tests for the helper's defaults, env overrides, and invalid-env fallback.

How to Test

pytest tests/gateway/test_telegram_proxy_pool_bound.py -q — passes (3 tests).
Configure TELEGRAM_PROXY (or a system proxy resolved by resolve_proxy_url) pointing at a flaky local proxy; start the gateway with Telegram enabled. Over reconnect cycles, lsof -p <gateway_pid> | wc -l now plateaus near the bounded pool size instead of climbing past launchctl limit maxfiles.
Note: the full pytest tests/gateway/ -k telegram selection shows 6 pre-existing order-dependent failures (MarkdownV2-escaping / model-picker tests) that reproduce on main without this change and pass in isolation; they are unrelated to this proxy fix.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform: macOS 15 (darwin-arm64)

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A
I've updated cli-config.yaml.example if I added/changed config keys — or N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Through a flaky local HTTP proxy, half-closed sockets accumulate in the httpx general request pool faster than httpx evicts them (httpcore's tunnel-proxy path does not always release the socket on ConnectError). With the unbounded connection_pool_size default the dead sockets pile up until the process hits its fd limit and every send fails after ~2 days. Pass a bounded httpx.Limits (max_connections=20, max_keepalive=10, env-overridable) into the proxy HTTPXRequest construction so the leak is capped and surfaces immediately instead of silently wedging the gateway. Refs NousResearch#31599

alt-glitch · 2026-05-25T10:18:51Z

Competing PR — #31885 and #31687 also address the same Telegram proxy-path fd leak (Refs #31599). All three take slightly different approaches to bounding the httpx connection pool on the proxy code path.

hclsys · 2026-05-25T10:19:38Z

The fix is sensible — bounding the proxy-path httpx pool so half-closed tunnel sockets can't accumulate to the fd limit, with env-overridable caps and an immediate-failure-over-slow-leak tradeoff, is the right shape for #31599.

But heads-up: this looks like it overlaps your own still-open #31885 (fix(gateway): bound Telegram proxy httpx pools). Both target #31599, both touch the proxy branch in gateway/platforms/telegram.py, both add an httpx.Limits(...) helper (_bounded_proxy_limits() here vs _proxy_http_limits() there) passed via httpx_kwargs={"limits": ...} on the proxy HTTPXRequest, and both add a near-identical test (test_telegram_proxy_pool_bound.py vs test_telegram_proxy_limits.py). They'll conflict on the same lines, and a maintainer would have to pick one. Might be cleanest to fold this into #31885 (or close that in favor of this) and note which is canonical, so they're not reviewed/merged against each other.

One verification I couldn't do locally (the telegram SDK isn't installed in my env): please confirm the pinned python-telegram-bot version's HTTPXRequest actually accepts httpx_kwargs={"limits": ...} and forwards limits to the underlying httpx.AsyncClient — if that kwarg name changed across PTB versions, the proxy branch would raise at construction. Assuming that's confirmed (and presumably it is, since #31885 uses the same mechanism), the bounding logic itself is sound.

konsisumer · 2026-06-04T08:12:36Z

Closing — deferring to #31885 by @konsisumer which addresses the same. Reopen if that PR stalls.

alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/gateway Gateway runner, session dispatch, delivery platform/telegram Telegram bot adapter labels May 25, 2026

alt-glitch mentioned this pull request May 25, 2026

fix(telegram): bound proxy-path httpx pool keepalive to stop fd leak #32124

Closed

19 tasks

q3874758 mentioned this pull request May 31, 2026

[Bug] Telegram: Pool timeout causes silent send failures — no retry or fallback on exhausted general pool #35610

Closed

konsisumer closed this Jun 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): bound Telegram general pool on proxy path to cap fd leak#32003

fix(gateway): bound Telegram general pool on proxy path to cap fd leak#32003
konsisumer wants to merge 1 commit into
NousResearch:mainfrom
konsisumer:fix/telegram-proxy-pool-bound

konsisumer commented May 25, 2026

Uh oh!

alt-glitch commented May 25, 2026

Uh oh!

hclsys commented May 25, 2026

Uh oh!

konsisumer commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

konsisumer commented May 25, 2026

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Uh oh!

alt-glitch commented May 25, 2026

Uh oh!

hclsys commented May 25, 2026

Uh oh!

konsisumer commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants