Skip to content

fix(gateway): wire HERMES_TELEGRAM_HTTP_MEDIA_WRITE_TIMEOUT to PTB so large media uploads stop hitting the 20s default#21777

Open
tangivis wants to merge 3 commits into
NousResearch:mainfrom
tangivis:fix/telegram-media-write-timeout
Open

fix(gateway): wire HERMES_TELEGRAM_HTTP_MEDIA_WRITE_TIMEOUT to PTB so large media uploads stop hitting the 20s default#21777
tangivis wants to merge 3 commits into
NousResearch:mainfrom
tangivis:fix/telegram-media-write-timeout

Conversation

@tangivis

@tangivis tangivis commented May 8, 2026

Copy link
Copy Markdown

What does this PR do?

Telegram media uploads (send_video, send_document, large send_photo) hit WriteTimeout after exactly 20 seconds because both Telegram send paths construct PTB's HTTPXRequest without passing media_write_timeout. The existing env var infrastructure suggests this is meant to be tunable — the gateway adapter already plumbs connect_timeout, read_timeout, write_timeout, pool_timeout through HERMES_TELEGRAM_HTTP_* — but media_write_timeout was simply never wired up.

Result: setting HERMES_TELEGRAM_HTTP_MEDIA_WRITE_TIMEOUT=600 in .env does nothing. Bumping HERMES_TELEGRAM_HTTP_WRITE_TIMEOUT doesn't help either, because PTB applies a separate media_write_timeout to multipart uploads — write_timeout only governs JSON message sends.

This PR plugs the gap in both Telegram send paths and defaults to 300s (matching the request-level generosity already present elsewhere). Users who need more for very large files on slow uplinks can override via env.

Related Issue

Fixes #21757

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • gateway/platforms/telegram.py:936-948 — add media_write_timeout to request_kwargs so both the polling client and the dedicated get_updates client receive it. Default 300s, override via HERMES_TELEGRAM_HTTP_MEDIA_WRITE_TIMEOUT.
  • tools/send_message_tool.py:_send_telegram — replace the bare Bot(token=token) with an HTTPXRequest-configured client that mirrors the gateway adapter's env vars. Cron-delivered media and direct send_message tool calls also need the longer timeout, otherwise they hit the same 20s ceiling.
  • tests/gateway/test_telegram_timeouts.py (new) — three tests covering default, env override, and graceful fallback for malformed env values.
  • tests/tools/test_send_message_tool.py — updated _install_telegram_mock helper (now also mocks telegram.request and accepts the new request= kwarg on Bot); added TestSendTelegramTimeoutWiring mirroring the gateway tests.
  • website/docs/reference/environment-variables.md:419 — document the new HERMES_TELEGRAM_HTTP_MEDIA_WRITE_TIMEOUT knob alongside the existing Telegram HTTP timeouts.

How to Test

Repro on main (faa13e49f):

  1. Configure a Telegram bot, install the gateway, and have the agent reply with a video file ≥ ~10 MB on a real uplink (≤ 5 Mbps upstream is enough to trigger).
  2. Tail journalctl --user -u hermes-gateway -f and observe httpx.WriteTimeout after ~20s.
  3. Add HERMES_TELEGRAM_HTTP_MEDIA_WRITE_TIMEOUT=600 to ~/.hermes/.env, restart the gateway → same failure, because the env var was never read.

After this PR:

  1. Same setup; large media uploads now succeed (default 300s budget).
  2. With HERMES_TELEGRAM_HTTP_MEDIA_WRITE_TIMEOUT=600 set, HTTPXRequest is constructed with media_write_timeout=600.0 (verified by the new tests).

Test runs

$ python -m pytest tests/gateway/test_telegram_timeouts.py tests/tools/test_send_message_tool.py::TestSendTelegramTimeoutWiring -v -o addopts=
======================== 6 passed in 23.39s ========================

$ python -m pytest tests/tools/test_send_message_tool.py tests/gateway/test_telegram_conflict.py tests/gateway/test_telegram_documents.py tests/gateway/test_telegram_timeouts.py -q -o addopts=
======================== 148 passed, 2 warnings in 112.33s ========================

Pinned via scripts/run_tests.sh for parity, but I ran in single-process mode here because xdist was thrashing on this machine (unrelated load).

Why a Telegram-specific env var (rather than a generic gateway one)

The repo already has a generic gateway HTTP knob — HERMES_GATEWAY_HTTPX_* in gateway/platforms/_http_client_limits.py — but it controls httpx.Limits (keepalive, max-connections), not request-level timeouts. media_write_timeout is unique to PTB's HTTPXRequest wrapper; raw httpx and the other platforms' SDKs (discord.py, slack_bolt, baileys) don't have an equivalent concept. Keeping this Telegram-specific avoids a leaky abstraction.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(gateway):)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix
  • I've run pytest and the targeted tests pass; full Telegram suite (148 tests) is green
  • I've added tests for my changes (3 gateway + 3 tool path)
  • I've tested on my platform: Ubuntu 22.04, Python 3.11.15, python-telegram-bot 22.7

Documentation & Housekeeping

  • I've updated relevant documentation — website/docs/reference/environment-variables.md
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A (env var only, no YAML key)
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A
  • I've considered cross-platform impact (Windows, macOS) — N/A (env-var read only, no platform-specific code)
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A (timeout change is invisible to the agent)

Both Telegram send paths construct python-telegram-bot's HTTPXRequest
without setting media_write_timeout, so multipart uploads (send_video,
send_document, large send_photo) fall back to PTB's 20s default. That
default is too short for any meaningful media on real-world uplinks,
and the existing HERMES_TELEGRAM_HTTP_* env vars couldn't override it
because the binding was missing in code.

Fixes the wiring in both paths and defaults to 300s — matching the
adapter's other generous request timeouts and giving users a single
env var (HERMES_TELEGRAM_HTTP_MEDIA_WRITE_TIMEOUT) to bump higher when
needed:

- gateway/platforms/telegram.py:936-942 — add the missing key to
  request_kwargs so both the polling client and the dedicated
  get_updates client share the same media-write budget.
- tools/send_message_tool.py:_send_telegram — replace the bare
  Bot(token=token) with an HTTPXRequest-configured client that mirrors
  the adapter's env vars. Without this, cron-driven sends and direct
  send_message tool calls hit the same 20s ceiling.

Tests cover the default value, env override, and graceful fallback on
malformed env values, for both code paths.

Fixes NousResearch#21757
@alt-glitch alt-glitch added type/bug Something isn't working comp/gateway Gateway runner, session dispatch, delivery comp/tools Tool registry, model_tools, toolsets platform/telegram Telegram bot adapter P2 Medium — degraded but workaround exists labels May 8, 2026
tangivis and others added 2 commits May 8, 2026 10:35
Match the surrounding request_kwargs style (other entries are bare).
The block-level rationale already covers PTB defaults at line 920-922.

Per review feedback.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/gateway Gateway runner, session dispatch, delivery comp/tools Tool registry, model_tools, toolsets P2 Medium — degraded but workaround exists platform/telegram Telegram bot adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gateway Telegram adapter ignores HERMES_TELEGRAM_HTTP_MEDIA_WRITE_TIMEOUT — large media uploads time out at PTB's 20s default

2 participants