fix(telegram): increase cold-boot retry budget and refresh fallback IPs#5770
Closed
Bartok9 wants to merge 1 commit into
Closed
fix(telegram): increase cold-boot retry budget and refresh fallback IPs#5770Bartok9 wants to merge 1 commit into
Bartok9 wants to merge 1 commit into
Conversation
On systems where Hermes starts at boot time (launchd, systemd), the Telegram platform's connect() can fail before the OS network stack is ready. The gateway stays alive but Telegram is silently dead. Root causes: 1. Retry budget too small (3 attempts / ~3s) for cold boot (10-30s+) 2. discover_fallback_ips() called once before retry loop, caching the failure state for all subsequent attempts Changes: - Increase retry budget to 8 attempts (~60s total) with capped backoff - Move fallback IP discovery and app building inside the retry loop so each attempt gets fresh network state - Log exhaustion clearly before raising Total backoff: 1+2+4+8+15+15+15 = ~60 seconds, covering typical delays. Fixes NousResearch#5729
teknium1
pushed a commit
that referenced
this pull request
Apr 16, 2026
Bump connect retry attempts from 3 to 8 and cap exponential backoff at 15 seconds. Old budget: 3 attempts, 1+2+4=7s total — insufficient for cold boot on slow networks or embedded devices. New budget: 8 attempts, 1+2+4+8+15+15+15=~60s total. Inspired by PR #5770 by @Bartok9 (re-implemented against current main since original was 913 commits stale with conflicts).
teknium1
pushed a commit
that referenced
this pull request
Apr 16, 2026
Bump connect retry attempts from 3 to 8 and cap exponential backoff at 15 seconds. Old budget: 3 attempts, 1+2+4=7s total — insufficient for cold boot on slow networks or embedded devices. New budget: 8 attempts, 1+2+4+8+15+15+15=~60s total. Inspired by PR #5770 by @Bartok9 (re-implemented against current main since original was 913 commits stale with conflicts).
Contributor
lauchiwa
pushed a commit
to lauchiwa/hermes-agent
that referenced
this pull request
Apr 17, 2026
Bump connect retry attempts from 3 to 8 and cap exponential backoff at 15 seconds. Old budget: 3 attempts, 1+2+4=7s total — insufficient for cold boot on slow networks or embedded devices. New budget: 8 attempts, 1+2+4+8+15+15+15=~60s total. Inspired by PR NousResearch#5770 by @Bartok9 (re-implemented against current main since original was 913 commits stale with conflicts). (cherry picked from commit f055907)
ulasbilgen
pushed a commit
to ulasbilgen/hermes-adhd-agent
that referenced
this pull request
May 1, 2026
Bump connect retry attempts from 3 to 8 and cap exponential backoff at 15 seconds. Old budget: 3 attempts, 1+2+4=7s total — insufficient for cold boot on slow networks or embedded devices. New budget: 8 attempts, 1+2+4+8+15+15+15=~60s total. Inspired by PR NousResearch#5770 by @Bartok9 (re-implemented against current main since original was 913 commits stale with conflicts).
aj-nt
pushed a commit
to aj-nt/hermes-agent
that referenced
this pull request
May 1, 2026
Bump connect retry attempts from 3 to 8 and cap exponential backoff at 15 seconds. Old budget: 3 attempts, 1+2+4=7s total — insufficient for cold boot on slow networks or embedded devices. New budget: 8 attempts, 1+2+4+8+15+15+15=~60s total. Inspired by PR NousResearch#5770 by @Bartok9 (re-implemented against current main since original was 913 commits stale with conflicts).
02356abc
pushed a commit
to 02356abc/hermes-agent
that referenced
this pull request
May 14, 2026
Bump connect retry attempts from 3 to 8 and cap exponential backoff at 15 seconds. Old budget: 3 attempts, 1+2+4=7s total — insufficient for cold boot on slow networks or embedded devices. New budget: 8 attempts, 1+2+4+8+15+15+15=~60s total. Inspired by PR NousResearch#5770 by @Bartok9 (re-implemented against current main since original was 913 commits stale with conflicts).
gweeteve
pushed a commit
to gweeteve/hermes-agent
that referenced
this pull request
Jun 2, 2026
Bump connect retry attempts from 3 to 8 and cap exponential backoff at 15 seconds. Old budget: 3 attempts, 1+2+4=7s total — insufficient for cold boot on slow networks or embedded devices. New budget: 8 attempts, 1+2+4+8+15+15+15=~60s total. Inspired by PR NousResearch#5770 by @Bartok9 (re-implemented against current main since original was 913 commits stale with conflicts).
Egavasyug
pushed a commit
to Egavasyug/hermes-agent
that referenced
this pull request
Jun 10, 2026
Bump connect retry attempts from 3 to 8 and cap exponential backoff at 15 seconds. Old budget: 3 attempts, 1+2+4=7s total — insufficient for cold boot on slow networks or embedded devices. New budget: 8 attempts, 1+2+4+8+15+15+15=~60s total. Inspired by PR NousResearch#5770 by @Bartok9 (re-implemented against current main since original was 913 commits stale with conflicts).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On systems where Hermes starts at boot time (macOS launchd, systemd), the Telegram platform's
connect()can fail before the OS network stack is ready:The gateway stays alive but Telegram is silently dead until a manual restart. This is especially bad because:
launchctl/systemctlKeepAlivedoesn't help because the process stays aliveRoot Causes
discover_fallback_ips()called once before retry loop: At cold boot, DoH queries also fail, caching the failure state for all subsequent attemptsChanges
Testing
sudo ifconfig en0 down, start hermes, wait 20s, re-enable. Should see retries, eventual success.Impact
Fixes #5729