fix(gateway): tighten Telegram proxy-pool keepalive to stop fd leak by konsisumer · Pull Request #31687 · NousResearch/hermes-agent

konsisumer · 2026-05-24T22:01:18Z

Tightens keepalive eviction on the Telegram proxy-path HTTPXRequest pools so half-closed sockets stop accumulating until the process hits its fd limit.

What does this PR do?

Behind a flaky local HTTP proxy, the Telegram adapter's general request pool accumulates hundreds of half-closed (CLOSED) sockets over days of operation until the process exceeds its fd budget and every bot.send_message() / set_my_commands() fails with httpx.ConnectError: All connection attempts failed (#31599).

Root cause: PTB's HTTPXRequest derives httpx.Limits from connection_pool_size but only sets max_connections, leaving max_keepalive_connections / keepalive_expiry at httpx's defaults (20 / 5s). Telegram is the only long-lived httpx client in the gateway not using the shared platform_httpx_limits() keepalive hardening added for the same fd-exhaustion-through-proxy class of bug in #18451 — every other persistent adapter (QQ Bot, Feishu, WeCom, DingTalk, Signal, BlueBubbles, WeCom-callback) already does.

This PR closes that gap for the proxy path: it builds the proxy-path HTTPXRequest pools with the shared #18451 keepalive tuning (shorter keepalive_expiry so idle/dead sockets drain promptly) while keeping max_connections at the configured pool size, so concurrent sends are unaffected and the deliberate 512-slot pool is preserved. The helper returns empty kwargs when httpx is unavailable, so HTTPXRequest falls back to its own limits.

Related Issue

Refs #31599

This addresses the proxy-path keepalive accumulation (the reporter's suggested fix 1, adapted to preserve max_connections). The periodic general-pool drain (fix 2) and send-path heartbeat observability (fix 3) the reporter also proposed are left as follow-ups, so this is Refs, not Closes.

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

gateway/platforms/telegram.py: add _proxy_request_httpx_kwargs() that reuses platform_httpx_limits() ([Bug]: CLOSE_WAIT fd leak causes all platforms to stop responding after ~1-2 hours #18451) to bound keepalive while keeping max_connections at the configured pool size; pass it into the proxy-path HTTPXRequest constructions.
tests/gateway/test_telegram_proxy_pool_limits.py: regression tests asserting keepalive is tightened below httpx's 5s default, the max_connections ceiling is preserved, kwargs are empty when httpx is unavailable, and the shared keepalive env overrides apply.

How to Test

pytest tests/gateway/test_telegram_proxy_pool_limits.py tests/gateway/test_platform_http_client_limits.py -q — all pass.
Manual: with a Telegram proxy configured, the proxy-path pools now build with the shared tighter httpx.Limits (keepalive_expiry < 5s, bounded max_keepalive_connections, max_connections still equal to HERMES_TELEGRAM_HTTP_POOL_SIZE), so idle/half-closed sockets through the proxy drain promptly instead of piling up as CLOSED fds.

What platforms tested on

macOS on darwin-arm64 (local): pytest + ruff check pass; the change is pure-Python httpx pool configuration with no platform-specific syscalls.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass (ran the affected gateway tests)
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform: macOS 15 (darwin-arm64)

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A (covered by docstrings)
I've updated cli-config.yaml.example if I added/changed config keys — N/A (reuses existing HERMES_GATEWAY_HTTPX_* env overrides)
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — N/A (no platform-specific code)
I've updated tool descriptions/schemas if I changed tool behavior — N/A

The proxy-path HTTPXRequest pools only set max_connections, leaving keepalive_expiry at httpx's 5s default. Behind a flaky proxy, half-closed sockets accumulate faster than they drain and exhaust the fd budget after days of operation. Reuse the shared NousResearch#18451 keepalive tuning while keeping the configured max_connections ceiling. Refs NousResearch#31599

konsisumer · 2026-06-04T22:29:07Z

Closing — deferring to #37400 by @datin-antasena which addresses the same. Reopen if that PR stalls.

alt-glitch added P2 Medium — degraded but workaround exists type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery platform/telegram Telegram bot adapter labels May 24, 2026

This was referenced May 25, 2026

fix(gateway): bound Telegram proxy httpx pools to stop fd leak #31885

Closed

fix(gateway): bound Telegram general pool on proxy path to cap fd leak #32003

Closed

fix(telegram): bound proxy-path httpx pool keepalive to stop fd leak #32124

Closed

konsisumer closed this Jun 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): tighten Telegram proxy-pool keepalive to stop fd leak#31687

fix(gateway): tighten Telegram proxy-pool keepalive to stop fd leak#31687
konsisumer wants to merge 1 commit into
NousResearch:mainfrom
konsisumer:fix/telegram-proxy-pool-keepalive-leak

konsisumer commented May 24, 2026 •

edited

Loading

Uh oh!

konsisumer commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

konsisumer commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

What platforms tested on

Checklist

Code

Documentation & Housekeeping

Uh oh!

konsisumer commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

konsisumer commented May 24, 2026 •

edited

Loading