-
Notifications
You must be signed in to change notification settings - Fork 780
Closed
Description
The method used for dialing addresses configured as persistent peers is the following:
Lines 391 to 438 in b47d18e
| func (sw *Switch) reconnectToPeer(addr *NetAddress) { | |
| if sw.reconnecting.Has(string(addr.ID)) { | |
| return | |
| } | |
| sw.reconnecting.Set(string(addr.ID), addr) | |
| defer sw.reconnecting.Delete(string(addr.ID)) | |
| start := time.Now() | |
| sw.Logger.Info("Reconnecting to peer", "addr", addr) | |
| for i := 0; i < reconnectAttempts; i++ { | |
| if !sw.IsRunning() { | |
| return | |
| } | |
| err := sw.DialPeerWithAddress(addr) | |
| if err == nil { | |
| return // success | |
| } else if _, ok := err.(ErrCurrentlyDialingOrExistingAddress); ok { | |
| return | |
| } | |
| sw.Logger.Info("Error reconnecting to peer. Trying again", "tries", i, "err", err, "addr", addr) | |
| // sleep a set amount | |
| sw.randomSleep(reconnectInterval) | |
| continue | |
| } | |
| sw.Logger.Error("Failed to reconnect to peer. Beginning exponential backoff", | |
| "addr", addr, "elapsed", time.Since(start)) | |
| for i := 0; i < reconnectBackOffAttempts; i++ { | |
| if !sw.IsRunning() { | |
| return | |
| } | |
| // sleep an exponentially increasing amount | |
| sleepIntervalSeconds := math.Pow(reconnectBackOffBaseSeconds, float64(i)) | |
| sw.randomSleep(time.Duration(sleepIntervalSeconds) * time.Second) | |
| err := sw.DialPeerWithAddress(addr) | |
| if err == nil { | |
| return // success | |
| } else if _, ok := err.(ErrCurrentlyDialingOrExistingAddress); ok { | |
| return | |
| } | |
| sw.Logger.Info("Error reconnecting to peer. Trying again", "tries", i, "err", err, "addr", addr) | |
| } | |
| sw.Logger.Error("Failed to reconnect to peer. Giving up", "addr", addr, "elapsed", time.Since(start)) | |
| } |
According with the comments in the code, and some of our documentation, the total duration of re-connection attempts should be around 1 day:
Lines 23 to 31 in b47d18e
| // repeatedly try to reconnect for a few minutes | |
| // ie. 5 * 20 = 100s. | |
| reconnectAttempts = 20 | |
| reconnectInterval = 5 * time.Second | |
| // then move into exponential backoff mode for ~1day | |
| // ie. 3**10 = 16hrs. | |
| reconnectBackOffAttempts = 10 | |
| reconnectBackOffBaseSeconds = 3 |
But, from this snap of the code, derived from the previous one, the total interval between the configured attempts (30) is actually around 8h14m: https://go.dev/play/p/JWfU6lerps5.
Namely, the expected behavior does not match the implemented behavior.
Reactions are currently unavailable