Fix a bug where servers could be marked as up when they were failing#16506
Fix a bug where servers could be marked as up when they were failing#16506
Conversation
| @@ -0,0 +1 @@ | |||
| Fix a bug introduced in Synapse 1.59.0 where servers would be incorrectly marked as available when a request resulted in an error. | |||
There was a problem hiding this comment.
I'm not sure if there's a more visible symptom? Perhaps it would cause things to be retried too often?
synapse/util/retryutils.py
Outdated
| # the notifier. | ||
| self.replication_client.send_remote_server_up(self.destination) | ||
| # If the server was previously failing, but is no longer. | ||
| if previously_failing: |
There was a problem hiding this comment.
@erikjohnston this might need some thoughts from you as the original author of #12500 -- was this done on purpose and I'm missing some understanding?
There was a problem hiding this comment.
Actually, I think this is not quite right still, it will end up calling this code if we were previously failing & still failing. I think?
There was a problem hiding this comment.
Actually, I think this is not quite right still, it will end up calling this code if we were previously failing & still failing. I think?
It should be OK now. 👍
There was a problem hiding this comment.
The logic could be simplified to only check not currently_failing, which depends on the earlier return to not kick off the background process. But I find this a bit clearer to include previously_failing and not currently_failing. 🤷
As described in #15226, the
RetryDestinationLimiterseems to mark servers as up when the connection fails. It seems to be a regression from #12500, see #12500 (comment).I think this will cause the notifier to be woken up and additional replication traffic. I think it will then cause federation traffic via a new transaction being sent to the unreachable server?