Skip to content

fix(telegram): self-reschedule reconnect when start_polling fails after 502#3268

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-dd753a5f
Mar 26, 2026
Merged

fix(telegram): self-reschedule reconnect when start_polling fails after 502#3268
teknium1 merged 1 commit into
mainfrom
hermes/hermes-dd753a5f

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Summary

Closes #3173.

After a Telegram 502, _handle_polling_network_error calls updater.stop() then start_polling(). If start_polling() also raises, the old code logged a warning and returned with this comment:

# The next network error will trigger another attempt.

That comment was wrong. The polling error callback only fires from the updater's internal loop — once stop() kills that loop, no further callbacks ever fire. The gateway stays alive but permanently deaf to messages.

Fix

When start_polling() fails, schedule a new _handle_polling_network_error task to continue the exponential backoff retry chain (5s → 10s → 20s → 40s → 60s cap, up to 10 attempts). The task is tracked in _background_tasks to prevent GC. Guarded by has_fatal_error to avoid spurious retries during shutdown.

Improvements over original PR

Salvaged from #3177 by @Mibayy.

Test plan

  • 4 new tests in test_telegram_network_reconnect.py
  • Gateway suite: 1471 passed

After a Telegram 502, _handle_polling_network_error calls updater.stop()
then start_polling(). If start_polling() also raises, the old code logged
a warning and returned — but the comment 'The next network error will
trigger another attempt' was wrong. The updater loop is dead after stop(),
so no further error callbacks ever fire. The gateway stays alive but
permanently deaf to messages.

Fix: when start_polling() fails in the except branch, schedule a new
_handle_polling_network_error task to continue the exponential backoff
retry chain. The task is tracked in _background_tasks (preventing GC).
Guarded by has_fatal_error to avoid spurious retries during shutdown.

Closes #3173.
Salvaged from PR #3177 by Mibayy.
@teknium1 teknium1 merged commit 6610c37 into main Mar 26, 2026
1 of 2 checks passed
angelburgosrosado pushed a commit to angelburgosrosado/hermes-agent that referenced this pull request Apr 27, 2026
…usResearch#3268)

After a Telegram 502, _handle_polling_network_error calls updater.stop()
then start_polling(). If start_polling() also raises, the old code logged
a warning and returned — but the comment 'The next network error will
trigger another attempt' was wrong. The updater loop is dead after stop(),
so no further error callbacks ever fire. The gateway stays alive but
permanently deaf to messages.

Fix: when start_polling() fails in the except branch, schedule a new
_handle_polling_network_error task to continue the exponential backoff
retry chain. The task is tracked in _background_tasks (preventing GC).
Guarded by has_fatal_error to avoid spurious retries during shutdown.

Closes NousResearch#3173.
Salvaged from PR NousResearch#3177 by Mibayy.
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…usResearch#3268)

After a Telegram 502, _handle_polling_network_error calls updater.stop()
then start_polling(). If start_polling() also raises, the old code logged
a warning and returned — but the comment 'The next network error will
trigger another attempt' was wrong. The updater loop is dead after stop(),
so no further error callbacks ever fire. The gateway stays alive but
permanently deaf to messages.

Fix: when start_polling() fails in the except branch, schedule a new
_handle_polling_network_error task to continue the exponential backoff
retry chain. The task is tracked in _background_tasks (preventing GC).
Guarded by has_fatal_error to avoid spurious retries during shutdown.

Closes NousResearch#3173.
Salvaged from PR NousResearch#3177 by Mibayy.
olympus-terminal pushed a commit to olympus-terminal/hermes-agent that referenced this pull request May 16, 2026
…usResearch#3268)

After a Telegram 502, _handle_polling_network_error calls updater.stop()
then start_polling(). If start_polling() also raises, the old code logged
a warning and returned — but the comment 'The next network error will
trigger another attempt' was wrong. The updater loop is dead after stop(),
so no further error callbacks ever fire. The gateway stays alive but
permanently deaf to messages.

Fix: when start_polling() fails in the except branch, schedule a new
_handle_polling_network_error task to continue the exponential backoff
retry chain. The task is tracked in _background_tasks (preventing GC).
Guarded by has_fatal_error to avoid spurious retries during shutdown.

Closes NousResearch#3173.
Salvaged from PR NousResearch#3177 by Mibayy.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…usResearch#3268)

After a Telegram 502, _handle_polling_network_error calls updater.stop()
then start_polling(). If start_polling() also raises, the old code logged
a warning and returned — but the comment 'The next network error will
trigger another attempt' was wrong. The updater loop is dead after stop(),
so no further error callbacks ever fire. The gateway stays alive but
permanently deaf to messages.

Fix: when start_polling() fails in the except branch, schedule a new
_handle_polling_network_error task to continue the exponential backoff
retry chain. The task is tracked in _background_tasks (preventing GC).
Guarded by has_fatal_error to avoid spurious retries during shutdown.

Closes NousResearch#3173.
Salvaged from PR NousResearch#3177 by Mibayy.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…usResearch#3268)

After a Telegram 502, _handle_polling_network_error calls updater.stop()
then start_polling(). If start_polling() also raises, the old code logged
a warning and returned — but the comment 'The next network error will
trigger another attempt' was wrong. The updater loop is dead after stop(),
so no further error callbacks ever fire. The gateway stays alive but
permanently deaf to messages.

Fix: when start_polling() fails in the except branch, schedule a new
_handle_polling_network_error task to continue the exponential backoff
retry chain. The task is tracked in _background_tasks (preventing GC).
Guarded by has_fatal_error to avoid spurious retries during shutdown.

Closes NousResearch#3173.
Salvaged from PR NousResearch#3177 by Mibayy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gateway crashes on Telegram Bad Gateway (502) — reconnect loop fails

1 participant