Bug Description
Scheduled cron jobs with deliver: telegram:CHAT_ID can stop arriving in Telegram after the gateway has been running through sustained Telegram reconnect storms (Bad Gateway / TimedOut). The scheduler still records the delivery as successful:
jobs.json shows last_status: ok and last_delivery_error: null
- the cron output file contains a full, non-empty response
- scheduler logs say the job was
delivered to telegram:CHAT_ID via live adapter
- but the Telegram message never reaches the user
Restarting the gateway consistently restores delivery. This points at the long-running TelegramAdapter / python-telegram-bot client entering a bad state after reconnect loops, while cron's live-adapter branch still treats sends as successful.
Affected Components
cron/scheduler.py — _deliver_result, live-adapter branch using runtime_adapter.send(...) via asyncio.run_coroutine_threadsafe
gateway/platforms/telegram.py — TelegramAdapter.send
python-telegram-bot 22.7
Observed Behavior
Multiple cron jobs configured with deliver: telegram:... were affected at once, so this does not appear to be job-specific.
Typical evidence from the broken state:
jobs.json: last_status = ok
jobs.json: last_delivery_error = null
~/.hermes/cron/output/{job_id}/...md contains non-empty output
INFO cron.scheduler: Job '{job_id}': delivered to telegram:CHAT_ID via live adapter
Actual result: no Telegram message arrives.
The condition appears after reconnect bursts like:
[Telegram] Telegram network error, scheduling reconnect: Bad Gateway
[Telegram] Telegram network error (attempt 1/10), reconnecting in 5s. Error: Bad Gateway
telegram.error.TimedOut: Timed out
gateway_state.json continues to report platforms.telegram.state == "connected"; its updated_at can remain frozen at the last successful state transition, usually gateway startup.
Expected Behavior
If TelegramAdapter.send() returns SendResult(success=True, message_id="1234"), the message should actually be delivered to the configured chat.
If the live adapter is unhealthy or Telegram refuses/drops the send, the adapter should surface a failure so cron can either:
- fall through to the standalone delivery path, or
- record a delivery error / retryable delivery failure instead of marking the run as successfully delivered.
Diagnostic Evidence
After a fresh gateway restart, the same manually triggered cron job delivered successfully and diagnostic logging around the live-adapter call showed:
WARNING cron.scheduler: DIAG cron-deliver job=JOB_ID plat=telegram chat=CHAT_ID
adapter='TelegramAdapter' loop_running=True text_len=890 skip_live=False
WARNING cron.scheduler: DIAG live-adapter-result job=JOB_ID type=SendResult
repr=SendResult(success=True, message_id='1245', error=None,
raw_response={'message_ids': ['1245']},
retryable=False, continuation_message_ids=())
success_attr=True
INFO cron.scheduler: Job 'JOB_ID': delivered to telegram:CHAT_ID via live adapter
That message arrived. In the broken state before restart, the same job and configuration had repeatedly reported last_status: ok with no delivery.
A standalone send in a separate process using the same bot token, chat id, and platform config succeeded while the cron/live-adapter path was the suspected failure point:
from gateway.config import Platform, load_gateway_config
from tools.send_message_tool import _send_to_platform
cfg = load_gateway_config()
pconfig = cfg.platforms.get(Platform.TELEGRAM)
result = await _send_to_platform(Platform.TELEGRAM, pconfig, "CHAT_ID", "ping")
# {'success': True, 'platform': 'telegram', 'chat_id': '...', 'message_id': '1243'}
The standalone message arrived. The later live-adapter cron delivery reported a nearby message_id, confirming the same bot/chat backend was being used.
Workaround
Locally, cron delivery was patched to skip the live-adapter branch for Telegram and always use the standalone path. Since standalone delivery is already used by send_message tool calls, this restored cron Telegram delivery in the affected stack.
This is not a complete upstream fix because some platforms may need a live adapter (for example E2EE-only Matrix/Signal paths), but it suggests Telegram cron delivery should not blindly trust a long-lived adapter that has survived repeated reconnect errors.
Suspected Root Cause
After sustained Bad Gateway / TimedOut reconnect loops, the python-telegram-bot Bot instance held by TelegramAdapter._bot may enter a wedged state where bot.send_message() returns a Message object (so TelegramAdapter.send returns SendResult(success=True, message_id=...)), but the message is not transmitted in a way that reaches the recipient.
The gateway's own state machine still reports Telegram as connected because polling/reconnect state and send-path health are not independently verified.
Possible mechanisms:
- PTB/httpx client is wedged on a stale connection and incorrectly reports success.
- Polling/getUpdates recovers but
sendMessage is not healthy.
- The request is accepted against an unexpected chat/topic context, though a standalone probe with the same chat id worked.
Suggested Fix Directions
In order of increasing intrusiveness:
- Add periodic Telegram adapter health checks (
getMe() or a configured debug-channel self-send) and force a full adapter reconnect/rebuild if checks fail.
- Count consecutive
Bad Gateway / TimedOut reconnect errors. After a threshold, discard and recreate the PTB Bot and Application objects rather than reusing the same client.
- In cron delivery, prefer the standalone Telegram path over the live adapter unless the platform explicitly requires live-adapter semantics.
- At minimum, fall through if
SendResult lacks raw_response, message_id, or other strong delivery evidence. This will not catch the observed real-looking message_id case, but it is still defensive.
Steps to Reproduce
This is non-deterministic and depends on Telegram/network instability:
- Run
hermes-gateway continuously for several days with Telegram enabled.
- Let Telegram encounter repeated
Bad Gateway / timeout reconnect bursts, or simulate intermittent outbound HTTPS failures to api.telegram.org.
- Trigger any cron job with
deliver: telegram:CHAT_ID.
- Observe that cron reports successful live-adapter delivery but the message does not arrive.
- Restart the gateway.
- Trigger the same cron job again; delivery resumes.
Environment
- hermes-agent commit
b833d8501 / tag v2026.5.7
- Python 3.11.2
- python-telegram-bot 22.7
- Debian 12, Linux 6.1.0-44 amd64
- Gateway managed by systemd as a non-root user
- Telegram was the only configured messaging platform in the affected stack
Related / Not Duplicates
Existing related issues cover nearby symptoms but not this exact false-success live-adapter failure mode:
Bug Description
Scheduled cron jobs with
deliver: telegram:CHAT_IDcan stop arriving in Telegram after the gateway has been running through sustained Telegram reconnect storms (Bad Gateway/TimedOut). The scheduler still records the delivery as successful:jobs.jsonshowslast_status: okandlast_delivery_error: nulldelivered to telegram:CHAT_ID via live adapterRestarting the gateway consistently restores delivery. This points at the long-running
TelegramAdapter/python-telegram-botclient entering a bad state after reconnect loops, while cron's live-adapter branch still treats sends as successful.Affected Components
cron/scheduler.py—_deliver_result, live-adapter branch usingruntime_adapter.send(...)viaasyncio.run_coroutine_threadsafegateway/platforms/telegram.py—TelegramAdapter.sendpython-telegram-bot22.7Observed Behavior
Multiple cron jobs configured with
deliver: telegram:...were affected at once, so this does not appear to be job-specific.Typical evidence from the broken state:
Actual result: no Telegram message arrives.
The condition appears after reconnect bursts like:
gateway_state.jsoncontinues to reportplatforms.telegram.state == "connected"; itsupdated_atcan remain frozen at the last successful state transition, usually gateway startup.Expected Behavior
If
TelegramAdapter.send()returnsSendResult(success=True, message_id="1234"), the message should actually be delivered to the configured chat.If the live adapter is unhealthy or Telegram refuses/drops the send, the adapter should surface a failure so cron can either:
Diagnostic Evidence
After a fresh gateway restart, the same manually triggered cron job delivered successfully and diagnostic logging around the live-adapter call showed:
That message arrived. In the broken state before restart, the same job and configuration had repeatedly reported
last_status: okwith no delivery.A standalone send in a separate process using the same bot token, chat id, and platform config succeeded while the cron/live-adapter path was the suspected failure point:
The standalone message arrived. The later live-adapter cron delivery reported a nearby
message_id, confirming the same bot/chat backend was being used.Workaround
Locally, cron delivery was patched to skip the live-adapter branch for Telegram and always use the standalone path. Since standalone delivery is already used by
send_messagetool calls, this restored cron Telegram delivery in the affected stack.This is not a complete upstream fix because some platforms may need a live adapter (for example E2EE-only Matrix/Signal paths), but it suggests Telegram cron delivery should not blindly trust a long-lived adapter that has survived repeated reconnect errors.
Suspected Root Cause
After sustained
Bad Gateway/TimedOutreconnect loops, thepython-telegram-botBotinstance held byTelegramAdapter._botmay enter a wedged state wherebot.send_message()returns aMessageobject (soTelegramAdapter.sendreturnsSendResult(success=True, message_id=...)), but the message is not transmitted in a way that reaches the recipient.The gateway's own state machine still reports Telegram as connected because polling/reconnect state and send-path health are not independently verified.
Possible mechanisms:
sendMessageis not healthy.Suggested Fix Directions
In order of increasing intrusiveness:
getMe()or a configured debug-channel self-send) and force a full adapter reconnect/rebuild if checks fail.Bad Gateway/TimedOutreconnect errors. After a threshold, discard and recreate the PTBBotandApplicationobjects rather than reusing the same client.SendResultlacksraw_response,message_id, or other strong delivery evidence. This will not catch the observed real-lookingmessage_idcase, but it is still defensive.Steps to Reproduce
This is non-deterministic and depends on Telegram/network instability:
hermes-gatewaycontinuously for several days with Telegram enabled.Bad Gateway/ timeout reconnect bursts, or simulate intermittent outbound HTTPS failures toapi.telegram.org.deliver: telegram:CHAT_ID.Environment
b833d8501/ tagv2026.5.7Related / Not Duplicates
Existing related issues cover nearby symptoms but not this exact false-success live-adapter failure mode:
Bad Gatewayreconnect loop can make the gateway unresponsive. This issue differs because the gateway and cron keep running, but live-adapter cron delivery silently drops while reporting success.send_message_tool._send_telegramlacking Telegram fallback transport on blocked networks. This report is the inverse: standalone send works, live adapter is suspected broken.