Fix a bug where servers could be marked as up when they were failing by clokep · Pull Request #16506 · matrix-org/synapse

clokep · 2023-10-16T17:54:27Z

As described in #15226, the RetryDestinationLimiter seems to mark servers as up when the connection fails. It seems to be a regression from #12500, see #12500 (comment).

I think this will cause the notifier to be woken up and additional replication traffic. I think it will then cause federation traffic via a new transaction being sent to the unreachable server?

clokep · 2023-10-16T17:56:54Z

changelog.d/16506.bugfix

@@ -0,0 +1 @@
+Fix a bug introduced in Synapse 1.59.0 where servers would be incorrectly marked as available when a request resulted in an error.


I'm not sure if there's a more visible symptom? Perhaps it would cause things to be retried too often?

clokep · 2023-10-16T18:02:15Z

synapse/util/retryutils.py

-                    # the notifier.
-                    self.replication_client.send_remote_server_up(self.destination)
+                # If the server was previously failing, but is no longer.
+                if previously_failing:


@erikjohnston this might need some thoughts from you as the original author of #12500 -- was this done on purpose and I'm missing some understanding?

Actually, I think this is not quite right still, it will end up calling this code if we were previously failing & still failing. I think?

Actually, I think this is not quite right still, it will end up calling this code if we were previously failing & still failing. I think?

It should be OK now. 👍

The logic could be simplified to only check not currently_failing, which depends on the earlier return to not kick off the background process. But I find this a bit clearer to include previously_failing and not currently_failing. 🤷

tests/util/test_retryutils.py

erikjohnston

Seems legit to me!

clokep added 3 commits October 16, 2023 13:34

Clarify comments.

e46f13b

Do not mark servers as up for failures.

327a2c9

Newsfragment

7a25e70

This was referenced Oct 16, 2023

Immediately retry any requests that have backed off when a server comes back online. #12500

Merged

Clarify retry notification code #15226

Closed

clokep linked an issue Oct 16, 2023 that may be closed by this pull request

Clarify retry notification code #15226

Closed

clokep commented Oct 16, 2023

View reviewed changes

Additional fixes.

2da1f39

clokep commented Oct 16, 2023

View reviewed changes

tests/util/test_retryutils.py Outdated Show resolved Hide resolved

Typo fix.

022c359

clokep marked this pull request as ready for review October 16, 2023 19:40

clokep requested a review from a team as a code owner October 16, 2023 19:40

erikjohnston approved these changes Oct 17, 2023

View reviewed changes

clokep merged commit 77dfc1f into develop Oct 17, 2023

clokep deleted the clokep/server-up branch October 17, 2023 11:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix a bug where servers could be marked as up when they were failing#16506

Fix a bug where servers could be marked as up when they were failing#16506
clokep merged 5 commits intodevelopfrom
clokep/server-up

clokep commented Oct 16, 2023 •

edited

Loading

Uh oh!

clokep Oct 16, 2023

Uh oh!

clokep Oct 16, 2023

Uh oh!

clokep Oct 16, 2023

Uh oh!

clokep Oct 16, 2023

Uh oh!

clokep Oct 16, 2023

Uh oh!

Uh oh!

erikjohnston left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1 @@
		Fix a bug introduced in Synapse 1.59.0 where servers would be incorrectly marked as available when a request resulted in an error.

Uh oh!

Conversation

clokep commented Oct 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clokep Oct 16, 2023

Choose a reason for hiding this comment

Uh oh!

clokep Oct 16, 2023

Choose a reason for hiding this comment

Uh oh!

clokep Oct 16, 2023

Choose a reason for hiding this comment

Uh oh!

clokep Oct 16, 2023

Choose a reason for hiding this comment

Uh oh!

clokep Oct 16, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

erikjohnston left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

clokep commented Oct 16, 2023 •

edited

Loading