Skip to content

fix(gateway/telegram): reduce HTTPX pool-timeout failures during reconnect#6897

Closed
borischou wants to merge 1 commit into
NousResearch:mainfrom
borischou:fix/telegram-httpx-pool-timeout-resilience
Closed

fix(gateway/telegram): reduce HTTPX pool-timeout failures during reconnect#6897
borischou wants to merge 1 commit into
NousResearch:mainfrom
borischou:fix/telegram-httpx-pool-timeout-resilience

Conversation

@borischou

@borischou borischou commented Apr 10, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR hardens Telegram gateway networking under unstable connectivity and reconnect loops.

Changes

  • configure HTTPXRequest with safer defaults (env-overridable):
    • HERMES_TELEGRAM_HTTP_POOL_SIZE (default: 512)
    • HERMES_TELEGRAM_HTTP_POOL_TIMEOUT (default: 8.0s)
    • HERMES_TELEGRAM_HTTP_CONNECT_TIMEOUT (default: 10.0s)
    • HERMES_TELEGRAM_HTTP_READ_TIMEOUT (default: 20.0s)
    • HERMES_TELEGRAM_HTTP_WRITE_TIMEOUT (default: 20.0s)
  • use separate HTTPXRequest instances for request and get-updates paths to reduce pool contention
  • when a proxy is configured, skip fallback-IP transport (proxy already handles routing)
  • add explicit logging for fallback transport decisions

Why

Observed repeated runtime failures:

  • TimedOut: Pool timeout: All connections in the connection pool are occupied
  • often surfaced during polling reconnect/bootstrap (e.g. delete_webhook)

PTB default pool_timeout=1.0s is too aggressive under transient network stress. Increasing pool budget/timeouts and reducing shared-path contention improves resilience.

Validation

  • venv/bin/python -m pytest tests/gateway/test_telegram_network.py -q (45 passed)
  • venv/bin/python -m pytest tests/gateway/test_telegram_network_reconnect.py tests/gateway/test_telegram_text_batching.py tests/gateway/test_telegram_reply_mode.py -q (34 passed)
  • local restart validation confirms successful reconnect in polling mode.

Notes

Behavior remains configurable via env vars; defaults are chosen for safer operation on flaky networks.

- configure Telegram HTTPXRequest pool/timeouts with env-overridable defaults\n- use separate request/get_updates request objects to reduce pool contention\n- skip fallback-IP transport when proxy is configured (or explicitly disabled)\n\nThis mitigates recurrent pool-timeout failures during polling reconnect/bootstrap (delete_webhook).
@teknium1

Copy link
Copy Markdown
Contributor

Merged via PR #7123. Your commit was cherry-picked onto current main with your authorship preserved. The HTTPX pool hardening is a real improvement — thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants