Skip to content

fix(telegram): detect wedged send path after reconnect storms (#31165)#31252

Closed
dskwe wants to merge 2 commits into
NousResearch:mainfrom
dskwe:fix/cron-telegram-live-adapter-silent-drop
Closed

fix(telegram): detect wedged send path after reconnect storms (#31165)#31252
dskwe wants to merge 2 commits into
NousResearch:mainfrom
dskwe:fix/cron-telegram-live-adapter-silent-drop

Conversation

@dskwe

@dskwe dskwe commented May 24, 2026

Copy link
Copy Markdown
Contributor

Problem

Cron jobs configured with deliver: telegram:<chat_id> can silently drop messages after the Telegram gateway experiences a reconnect storm (sustained Bad Gateway / TimedOut errors). The gateway logs report successful delivery (delivered to telegram:... via live adapter) but the message never arrives in Telegram. No error is surfaced to the user.

Root Cause

After prolonged network error cycles, the python-telegram-bot library's internal httpx connection pool can enter a wedged state where:

  • bot.send_message() returns a valid Message object (real message_id, no exception)
  • The HTTP request is never actually transmitted to Telegram's API
  • The polling path (getUpdates) recovers independently via reconnection logic
  • The send path (sendMessage) silently fails with no error signal

The cron scheduler's live-adapter delivery path trusts the adapter's SendResult.success return value, so a wedged send looks identical to a successful delivery.

Fix

Add send-path health tracking to TelegramAdapter:

  1. _send_path_degraded flag — set True when _handle_polling_network_error fires (reconnect storm detected)
  2. Post-send probe — when degraded, send() runs a lightweight getMe() after the actual send. If the probe times out (5s) or fails, return SendResult(success=False) instead of the false-positive result
  3. Flag clearing_verify_polling_after_reconnect clears the flag after a successful getMe() probe confirms recovery
  4. Fallback — returning success=False lets the cron scheduler's existing fallback mechanism deliver via the standalone path (fresh HTTP session)

Files changed: gateway/platforms/telegram.py (+39 lines)

Changes

  • gateway/platforms/telegram.py: Added _send_path_degraded, _last_reconnect_error_at fields; reconnect storm detection in error handler; post-send health probe in send(); flag clearing in reconnect verification

How to Test

  1. ./venv/bin/python3 -m pytest tests/gateway/test_telegram_send_path_health.py -v — 5 new tests
  2. ./venv/bin/python3 -m pytest tests/gateway/ -v — all existing gateway tests pass
  3. Manual: trigger a Telegram reconnect storm, observe that cron delivery falls back to standalone path instead of silent drop

dskwe added 2 commits May 24, 2026 09:58
…search#31165)

After sustained Bad Gateway / TimedOut reconnect storms, the PTB httpx
client can enter a state where bot.send_message() returns a valid
Message object (real message_id) but the message never reaches the
recipient. The gateway's polling path recovers (getUpdates works) but
the send path (sendMessage) silently drops.

This adds a _send_path_degraded flag to TelegramAdapter:
- Set to True when _handle_polling_network_error fires
- Cleared when _verify_polling_after_reconnect's getMe() probe passes
- When degraded, send() runs a post-send getMe() probe; if the probe
  fails, returns SendResult(success=False) so callers (cron live-adapter
  branch) fall through to the standalone delivery path which uses a
  fresh HTTP session.
@alt-glitch alt-glitch added type/bug Something isn't working P1 High — major feature broken, no workaround comp/gateway Gateway runner, session dispatch, delivery platform/telegram Telegram bot adapter comp/cron Cron scheduler and job management labels May 24, 2026
@teknium1

Copy link
Copy Markdown
Contributor

Merged via #31441 with your authorship preserved (commit 476c897 on main). I reshaped the fix down to 14 LOC: reused the existing reconnect heartbeat probe in _verify_polling_after_reconnect as the source of truth instead of a per-send getMe() probe — same _send_path_degraded design, just one source of truth instead of two probe sites. Thanks for the diagnosis and the salvage-friendly shape!

@teknium1 teknium1 closed this May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cron Cron scheduler and job management comp/gateway Gateway runner, session dispatch, delivery P1 High — major feature broken, no workaround platform/telegram Telegram bot adapter type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants