Rewrite the faster failover test case#3495
Merged
Merged
Conversation
The test introduced in valkey-io#2227 is fragile because it unconditionally asserted that "best ranked replica" would be logged for every shard's replica. This assumption was incorrect for two reasons: 1. myselfIsBestRankedReplica() requires failover_failed_primary_rank == 0. When two primaries fail simultaneously, only the shard with the lowest shard_id gets failed_primary_rank == 0. The other shard's replicas can only trigger "Myself become the best ranked replica" after the first shard completes failover and the result propagates via gossip. 2. clusterAllReplicasThinkPrimaryIsFail() requires all sibling replicas to have their MY_PRIMARY_FAIL flag propagated via gossip. If the election delay is shorter than the gossip propagation time, the replica may start the election before receiving the flag. It's also too strict and checks too many things. Like we can check that there is no delay before starting the failover, but we shouldn't check that the rank 0 replica actually wins, because maybe it doesn't. Also we're not sure the other replica does a psync afterwards, it can also do a full sync. Restructure the test into three focused scenarios with retry logic: - Test 0 (3 primaries + 1 replica): The sole replica is deterministically the best ranked replica. All conditions are trivially satisfied. - Test 1 (3 primaries + 2 replicas): Single primary failure with two competing replicas. failed_primary_rank is deterministically 0, but MY_PRIMARY_FAIL gossip timing may prevent triggering in some runs. Uses internal retry with cluster recovery to ensure at least one successful trigger. - Test 2 (5 primaries + 7 replicas): Two primaries failure. Verifies no failover timeout and uses retry to cover the "best ranked replica" path for at least one shard. Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Member
Author
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #3495 +/- ##
============================================
- Coverage 76.80% 76.53% -0.27%
============================================
Files 157 157
Lines 79048 79048
============================================
- Hits 60711 60499 -212
- Misses 18337 18549 +212 🚀 New features to boost your workflow:
|
zuiderkwast
approved these changes
Apr 16, 2026
zuiderkwast
left a comment
Contributor
There was a problem hiding this comment.
Very advanced test cases! Looks definitely more robust that before.
sarthakaggarwal97
pushed a commit
to sarthakaggarwal97/valkey
that referenced
this pull request
Apr 16, 2026
The test introduced in valkey-io#2227 is fragile because it unconditionally asserted that "best ranked replica" would be logged for every shard's replica. This assumption was incorrect for two reasons: 1. myselfIsBestRankedReplica() requires failover_failed_primary_rank == 0. When two primaries fail simultaneously, only the shard with the lowest shard_id gets failed_primary_rank == 0. The other shard's replicas can only trigger "Myself become the best ranked replica" after the first shard completes failover and the result propagates via gossip. 2. clusterAllReplicasThinkPrimaryIsFail() requires all sibling replicas to have their MY_PRIMARY_FAIL flag propagated via gossip. If the election delay is shorter than the gossip propagation time, the replica may start the election before receiving the flag. It's also too strict and checks too many things. Like we can check that there is no delay before starting the failover, but we shouldn't check that the rank 0 replica actually wins, because maybe it doesn't. Also we're not sure the other replica does a psync afterwards, it can also do a full sync. Restructure the test into three focused scenarios with retry logic: - Test 0 (3 primaries + 1 replica): The sole replica is deterministically the best ranked replica. All conditions are trivially satisfied. - Test 1 (3 primaries + 2 replicas): Single primary failure with two competing replicas. failed_primary_rank is deterministically 0, but MY_PRIMARY_FAIL gossip timing may prevent triggering in some runs. Uses internal retry with cluster recovery to ensure at least one successful trigger. - Test 2 (5 primaries + 7 replicas): Two primaries failure. Verifies no failover timeout and uses retry to cover the "best ranked replica" path for at least one shard. Signed-off-by: Binbin <binloveplay1314@qq.com>
madolson
pushed a commit
that referenced
this pull request
Apr 27, 2026
The test introduced in #2227 is fragile because it unconditionally asserted that "best ranked replica" would be logged for every shard's replica. This assumption was incorrect for two reasons: 1. myselfIsBestRankedReplica() requires failover_failed_primary_rank == 0. When two primaries fail simultaneously, only the shard with the lowest shard_id gets failed_primary_rank == 0. The other shard's replicas can only trigger "Myself become the best ranked replica" after the first shard completes failover and the result propagates via gossip. 2. clusterAllReplicasThinkPrimaryIsFail() requires all sibling replicas to have their MY_PRIMARY_FAIL flag propagated via gossip. If the election delay is shorter than the gossip propagation time, the replica may start the election before receiving the flag. It's also too strict and checks too many things. Like we can check that there is no delay before starting the failover, but we shouldn't check that the rank 0 replica actually wins, because maybe it doesn't. Also we're not sure the other replica does a psync afterwards, it can also do a full sync. Restructure the test into three focused scenarios with retry logic: - Test 0 (3 primaries + 1 replica): The sole replica is deterministically the best ranked replica. All conditions are trivially satisfied. - Test 1 (3 primaries + 2 replicas): Single primary failure with two competing replicas. failed_primary_rank is deterministically 0, but MY_PRIMARY_FAIL gossip timing may prevent triggering in some runs. Uses internal retry with cluster recovery to ensure at least one successful trigger. - Test 2 (5 primaries + 7 replicas): Two primaries failure. Verifies no failover timeout and uses retry to cover the "best ranked replica" path for at least one shard. Signed-off-by: Binbin <binloveplay1314@qq.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The test introduced in #2227 is fragile because it unconditionally asserted
that "best ranked replica" would be logged for every shard's replica. This
assumption was incorrect for two reasons:
myselfIsBestRankedReplica() requires failover_failed_primary_rank == 0.
When two primaries fail simultaneously, only the shard with the lowest
shard_id gets failed_primary_rank == 0. The other shard's replicas can
only trigger "Myself become the best ranked replica" after the first
shard completes failover and the result propagates via gossip.
clusterAllReplicasThinkPrimaryIsFail() requires all sibling replicas
to have their MY_PRIMARY_FAIL flag propagated via gossip. If the
election delay is shorter than the gossip propagation time, the
replica may start the election before receiving the flag.
It's also too strict and checks too many things. Like we can check that there
is no delay before starting the failover, but we shouldn't check that the rank
0 replica actually wins, because maybe it doesn't. Also we're not sure the other
replica does a psync afterwards, it can also do a full sync.
Restructure the test into three focused scenarios with retry logic:
Test 0 (3 primaries + 1 replica): The sole replica is deterministically
the best ranked replica. All conditions are trivially satisfied.
Test 1 (3 primaries + 2 replicas): Single primary failure with two
competing replicas. failed_primary_rank is deterministically 0, but
MY_PRIMARY_FAIL gossip timing may prevent triggering in some runs.
Uses internal retry with cluster recovery to ensure at least one
successful trigger.
Test 2 (5 primaries + 7 replicas): Two primaries failure. Verifies no
failover timeout and uses retry to cover the "best ranked replica"
path for at least one shard.