Make manual failover reset the on-going election to promote failover#1274
Merged
enjoy-binbin merged 7 commits intoNov 22, 2024
Merged
Conversation
If a manual failover got timed out, like the election don't get the enough votes, since we have a auth_timeout and a auth_retry_time, a new manual failover will not be able to proceed on the replica side. Like if we initiate a new manual failover after a election timed out, we will pause the primary, but on the replica side, due to retry_time, replica does not trigger the new election and the manual failover will eventually time out. In this case, if we initiate manual failover again and there is an ongoing election, we will reset it so that the replica can initiate a new election at the manual failover's request. Signed-off-by: Binbin <binloveplay1314@qq.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #1274 +/- ##
============================================
+ Coverage 70.68% 70.70% +0.02%
============================================
Files 115 115
Lines 63177 63178 +1
============================================
+ Hits 44657 44673 +16
+ Misses 18520 18505 -15
🚀 New features to boost your workflow:
|
Member
Author
|
A log demo from the test case (before the fix). replica: the primary: |
hpatro
reviewed
Nov 8, 2024
…_reset Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
enjoy-binbin
commented
Nov 11, 2024
madolson
approved these changes
Nov 14, 2024
Signed-off-by: Binbin <binloveplay1314@qq.com>
zuiderkwast
approved these changes
Nov 17, 2024
zuiderkwast
left a comment
Contributor
There was a problem hiding this comment.
Not a full review. The idea looks good.
…_reset Signed-off-by: Binbin <binloveplay1314@qq.com>
zuiderkwast
requested changes
Nov 21, 2024
zuiderkwast
left a comment
Contributor
There was a problem hiding this comment.
21K added lines? Lots of temp files?
Signed-off-by: Binbin <binloveplay1314@qq.com>
Member
Author
|
opps, sorry, a bad conflict handling. |
zuiderkwast
approved these changes
Nov 21, 2024
zuiderkwast
left a comment
Contributor
There was a problem hiding this comment.
OK, we should probably add .cmake (etc.) to .gitignore to prevent those files from being added by mistake.
hpatro
approved these changes
Nov 21, 2024
enjoy-binbin
added a commit
to vitarb/valkey
that referenced
this pull request
Jun 17, 2025
…alkey-io#1274) If a manual failover got timed out, like the election don't get the enough votes, since we have a auth_timeout and a auth_retry_time, a new manual failover will not be able to proceed on the replica side. Like if we initiate a new manual failover after a election timed out, we will pause the primary, but on the replica side, due to retry_time, replica does not trigger the new election and the manual failover will eventually time out. In this case, if we initiate manual failover again and there is an ongoing election, we will reset it so that the replica can initiate a new election at the manual failover's request. Signed-off-by: Binbin <binloveplay1314@qq.com>
Merged
enjoy-binbin
added a commit
to vitarb/valkey
that referenced
this pull request
Jun 17, 2025
…alkey-io#1274) If a manual failover got timed out, like the election don't get the enough votes, since we have a auth_timeout and a auth_retry_time, a new manual failover will not be able to proceed on the replica side. Like if we initiate a new manual failover after a election timed out, we will pause the primary, but on the replica side, due to retry_time, replica does not trigger the new election and the manual failover will eventually time out. In this case, if we initiate manual failover again and there is an ongoing election, we will reset it so that the replica can initiate a new election at the manual failover's request. Signed-off-by: Binbin <binloveplay1314@qq.com>
zuiderkwast
pushed a commit
to vitarb/valkey
that referenced
this pull request
Aug 15, 2025
…alkey-io#1274) If a manual failover got timed out, like the election don't get the enough votes, since we have a auth_timeout and a auth_retry_time, a new manual failover will not be able to proceed on the replica side. Like if we initiate a new manual failover after a election timed out, we will pause the primary, but on the replica side, due to retry_time, replica does not trigger the new election and the manual failover will eventually time out. In this case, if we initiate manual failover again and there is an ongoing election, we will reset it so that the replica can initiate a new election at the manual failover's request. Signed-off-by: Binbin <binloveplay1314@qq.com>
zuiderkwast
pushed a commit
to vitarb/valkey
that referenced
this pull request
Aug 15, 2025
…alkey-io#1274) If a manual failover got timed out, like the election don't get the enough votes, since we have a auth_timeout and a auth_retry_time, a new manual failover will not be able to proceed on the replica side. Like if we initiate a new manual failover after a election timed out, we will pause the primary, but on the replica side, due to retry_time, replica does not trigger the new election and the manual failover will eventually time out. In this case, if we initiate manual failover again and there is an ongoing election, we will reset it so that the replica can initiate a new election at the manual failover's request. Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
zuiderkwast
pushed a commit
to vitarb/valkey
that referenced
this pull request
Aug 21, 2025
…alkey-io#1274) If a manual failover got timed out, like the election don't get the enough votes, since we have a auth_timeout and a auth_retry_time, a new manual failover will not be able to proceed on the replica side. Like if we initiate a new manual failover after a election timed out, we will pause the primary, but on the replica side, due to retry_time, replica does not trigger the new election and the manual failover will eventually time out. In this case, if we initiate manual failover again and there is an ongoing election, we will reset it so that the replica can initiate a new election at the manual failover's request. Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
zuiderkwast
pushed a commit
that referenced
this pull request
Aug 22, 2025
…1274) If a manual failover got timed out, like the election don't get the enough votes, since we have a auth_timeout and a auth_retry_time, a new manual failover will not be able to proceed on the replica side. Like if we initiate a new manual failover after a election timed out, we will pause the primary, but on the replica side, due to retry_time, replica does not trigger the new election and the manual failover will eventually time out. In this case, if we initiate manual failover again and there is an ongoing election, we will reset it so that the replica can initiate a new election at the manual failover's request. Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
sarthakaggarwal97
pushed a commit
to sarthakaggarwal97/valkey
that referenced
this pull request
Sep 16, 2025
…alkey-io#1274) If a manual failover got timed out, like the election don't get the enough votes, since we have a auth_timeout and a auth_retry_time, a new manual failover will not be able to proceed on the replica side. Like if we initiate a new manual failover after a election timed out, we will pause the primary, but on the replica side, due to retry_time, replica does not trigger the new election and the manual failover will eventually time out. In this case, if we initiate manual failover again and there is an ongoing election, we will reset it so that the replica can initiate a new election at the manual failover's request. Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
If a manual failover got timed out, like the election don't get the
enough votes, since we have a auth_timeout and a auth_retry_time, a
new manual failover will not be able to proceed on the replica side.
Like if we initiate a new manual failover after a election timed out,
we will pause the primary, but on the replica side, due to retry_time,
replica does not trigger the new election and the manual failover will
eventually time out.
In this case, if we initiate manual failover again and there is an
ongoing election, we will reset it so that the replica can initiate
a new election at the manual failover's request.