fix(telegram): self-reschedule reconnect when start_polling fails after 502#3268
Merged
Conversation
After a Telegram 502, _handle_polling_network_error calls updater.stop() then start_polling(). If start_polling() also raises, the old code logged a warning and returned — but the comment 'The next network error will trigger another attempt' was wrong. The updater loop is dead after stop(), so no further error callbacks ever fire. The gateway stays alive but permanently deaf to messages. Fix: when start_polling() fails in the except branch, schedule a new _handle_polling_network_error task to continue the exponential backoff retry chain. The task is tracked in _background_tasks (preventing GC). Guarded by has_fatal_error to avoid spurious retries during shutdown. Closes #3173. Salvaged from PR #3177 by Mibayy.
angelburgosrosado
pushed a commit
to angelburgosrosado/hermes-agent
that referenced
this pull request
Apr 27, 2026
…usResearch#3268) After a Telegram 502, _handle_polling_network_error calls updater.stop() then start_polling(). If start_polling() also raises, the old code logged a warning and returned — but the comment 'The next network error will trigger another attempt' was wrong. The updater loop is dead after stop(), so no further error callbacks ever fire. The gateway stays alive but permanently deaf to messages. Fix: when start_polling() fails in the except branch, schedule a new _handle_polling_network_error task to continue the exponential backoff retry chain. The task is tracked in _background_tasks (preventing GC). Guarded by has_fatal_error to avoid spurious retries during shutdown. Closes NousResearch#3173. Salvaged from PR NousResearch#3177 by Mibayy.
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
…usResearch#3268) After a Telegram 502, _handle_polling_network_error calls updater.stop() then start_polling(). If start_polling() also raises, the old code logged a warning and returned — but the comment 'The next network error will trigger another attempt' was wrong. The updater loop is dead after stop(), so no further error callbacks ever fire. The gateway stays alive but permanently deaf to messages. Fix: when start_polling() fails in the except branch, schedule a new _handle_polling_network_error task to continue the exponential backoff retry chain. The task is tracked in _background_tasks (preventing GC). Guarded by has_fatal_error to avoid spurious retries during shutdown. Closes NousResearch#3173. Salvaged from PR NousResearch#3177 by Mibayy.
olympus-terminal
pushed a commit
to olympus-terminal/hermes-agent
that referenced
this pull request
May 16, 2026
…usResearch#3268) After a Telegram 502, _handle_polling_network_error calls updater.stop() then start_polling(). If start_polling() also raises, the old code logged a warning and returned — but the comment 'The next network error will trigger another attempt' was wrong. The updater loop is dead after stop(), so no further error callbacks ever fire. The gateway stays alive but permanently deaf to messages. Fix: when start_polling() fails in the except branch, schedule a new _handle_polling_network_error task to continue the exponential backoff retry chain. The task is tracked in _background_tasks (preventing GC). Guarded by has_fatal_error to avoid spurious retries during shutdown. Closes NousResearch#3173. Salvaged from PR NousResearch#3177 by Mibayy.
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
…usResearch#3268) After a Telegram 502, _handle_polling_network_error calls updater.stop() then start_polling(). If start_polling() also raises, the old code logged a warning and returned — but the comment 'The next network error will trigger another attempt' was wrong. The updater loop is dead after stop(), so no further error callbacks ever fire. The gateway stays alive but permanently deaf to messages. Fix: when start_polling() fails in the except branch, schedule a new _handle_polling_network_error task to continue the exponential backoff retry chain. The task is tracked in _background_tasks (preventing GC). Guarded by has_fatal_error to avoid spurious retries during shutdown. Closes NousResearch#3173. Salvaged from PR NousResearch#3177 by Mibayy.
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
…usResearch#3268) After a Telegram 502, _handle_polling_network_error calls updater.stop() then start_polling(). If start_polling() also raises, the old code logged a warning and returned — but the comment 'The next network error will trigger another attempt' was wrong. The updater loop is dead after stop(), so no further error callbacks ever fire. The gateway stays alive but permanently deaf to messages. Fix: when start_polling() fails in the except branch, schedule a new _handle_polling_network_error task to continue the exponential backoff retry chain. The task is tracked in _background_tasks (preventing GC). Guarded by has_fatal_error to avoid spurious retries during shutdown. Closes NousResearch#3173. Salvaged from PR NousResearch#3177 by Mibayy.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #3173.
After a Telegram 502,
_handle_polling_network_errorcallsupdater.stop()thenstart_polling(). Ifstart_polling()also raises, the old code logged a warning and returned with this comment:# The next network error will trigger another attempt.That comment was wrong. The polling error callback only fires from the updater's internal loop — once
stop()kills that loop, no further callbacks ever fire. The gateway stays alive but permanently deaf to messages.Fix
When
start_polling()fails, schedule a new_handle_polling_network_errortask to continue the exponential backoff retry chain (5s → 10s → 20s → 40s → 60s cap, up to 10 attempts). The task is tracked in_background_tasksto prevent GC. Guarded byhas_fatal_errorto avoid spurious retries during shutdown.Improvements over original PR
asyncio.ensure_future()instead of deprecatedasyncio.get_event_loop().create_task()_background_tasksset (consistent with the task-tracking fix just merged in fix: store asyncio task references to prevent GC mid-execution #3267)_background_tasksstate instead of mocking the event loopSalvaged from #3177 by @Mibayy.
Test plan
test_telegram_network_reconnect.py