Fix replica (the old primary) claims to sitll have slots after manual failover#2301
Conversation
… failover Signed-off-by: Binbin <binloveplay1314@qq.com>
|
In the test, there will be a short time window that the replica claims to still have slots. We can see 84088c54fcae0666b552229b806e4f96eaba2a0b become a replica but own the slots I will try to think of a way to add the test, there are currently no stable steps to reproduce it. |
Signed-off-by: Binbin <binloveplay1314@qq.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #2301 +/- ##
============================================
- Coverage 71.49% 71.46% -0.04%
============================================
Files 123 123
Lines 66937 67104 +167
============================================
+ Hits 47859 47954 +95
- Misses 19078 19150 +72
🚀 New features to boost your workflow:
|
zuiderkwast
left a comment
There was a problem hiding this comment.
LGTM
Is this the failures we saw in Daily a few days ago?
I guess not. I found this from another code review. |
|
btw, there is a saying that, we should not update sender->replicaof info based on the sender infomartion, like in here, we updated the sender_claimed_primary flag, so i guess it’s OK if we update the slots info. But I also want to hear your thoughts, like, should sender->replicaof information be updated based on sender information? |
I don'tknow this saying. Is it written anywhere? I think information about sender directly from the sender can be trusted. It is direct information, so it should be correct. Why not? |
ohh, maybe i gave the wrond understanding, i mean, update the other node (sender->replicaof) infomation based on the sender infomartion.
I guess not, i did not see this, a person from my team said this, he mentioned something that we usually don't update the info of another node based on the info of one node. Like in here, we update the info of sender->replicaof (flags / slots info) based on the info of sender. |
I think this should be a transitional state and eventually the replica drops these slots. that being said, this pr makes sense to me and is the right thing to do, since the role change and the slot ownership change should've been an atomic operation.
yeah these 3 changes are about avoiding the |
PingXie
left a comment
There was a problem hiding this comment.
LGTM - thanks @enjoy-binbin. great investigation.
Signed-off-by: Binbin <binloveplay1314@qq.com>
|
@PingXie @zuiderkwast Please check the new changes, i added some assert, a new debug command, and the test (Tests can fail reliably before the fix). |
Co-authored-by: Ping Xie <pingxie@outlook.com> Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
|
Merged, very nice. Backport or not? |
This is a safe change so I vote for backport |
|
Added to 8.0/8.1 project for backport. |
…r manual failover (valkey-io#2301)" This reverts commit 507042d. Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
…er manual failover (valkey-io#2301)" This reverts commit 6ce01c1. Signed-off-by: Roshan Khatri <rvkhatri@amazon.com>
…r manual failover (valkey-io#2301)" This reverts commit 507042d.
…g slots (#2370) In #2301, we added clusterMoveNodeSlots to implement the logic of moving slots from old primary to new primary, when myself receives the replica (old primary) message first and the new primary message later in a shard failover. However due to this, when myself receives the new primary message later next time, there is no way to call clusterUpdateSlotsConfigWith, because we have already updated the slots of the new primary before. This result in, for example, importing slots and migrating slots not being updated, see #445. In this commit, we also make clusterMoveNodeSlots to move importing slots and migrating slots. Fixes #2363. Signed-off-by: Binbin <binloveplay1314@qq.com>
…r manual failover (valkey-io#2301)" This reverts commit 507042d.
@PingXie, unless we are extremely confident, I wouldn't consider backporting novel severAsserts to be safe. This can introduce new crashes that have caused CVEs in the past. |
The assert was added in valkey-io#2301 and we found that there are some situations would trigger assert and crash the server. The reason we added the assert is because, in the code: 1. sender_claimed_primary and sender are in the same shard 2. and sender is the old primary, sender_claimed_primary is the old replica 3. and now sender become a replica, sender_claimed_primary become a primary That means a failover happend in the shard, and sender should be the primary of sender_claimed_primary. But obviously this assumption may be wrong, we rely on shard_id to determine whether it is in a same shard, and assume that a shard can only have one primary. But this is wrong, from valkey-io#2279 we can know there will be a case that we can create two primaries in the same shard due to the untimely update of shard_id. So we can create a test that trigger the assert in this way: 1. pre condition: two primaries in the same shard, one has slots and one is empty. 2. replica doing a cluster failover 3. the empty primary doing a cluster replicate with the replica (new primary) We change the assert to an if condition to fix it. Signed-off-by: Binbin <binloveplay1314@qq.com>
…2431) The assert was added in #2301 and we found that there are some situations would trigger assert and crash the server. The reason we added the assert is because, in the code: 1. sender_claimed_primary and sender are in the same shard 2. and sender is the old primary, sender_claimed_primary is the old replica 3. and now sender become a replica, sender_claimed_primary become a primary That means a failover happend in the shard, and sender should be the primary of sender_claimed_primary. But obviously this assumption may be wrong, we rely on shard_id to determine whether it is in a same shard, and assume that a shard can only have one primary. But this is wrong, from #2279 we can know there will be a case that we can create two primaries in the same shard due to the untimely update of shard_id. So we can create a test that trigger the assert in this way: 1. pre condition: two primaries in the same shard, one has slots and one is empty. 2. replica doing a cluster failover 3. the empty primary doing a cluster replicate with the replica (new primary) We change the assert to an if condition to fix it. Closes #2423. Note that the test written here also exposes the issue in #2441, so these two may need to be addressed together. Signed-off-by: Binbin <binloveplay1314@qq.com>
…alkey-io#2431) The assert was added in valkey-io#2301 and we found that there are some situations would trigger assert and crash the server. The reason we added the assert is because, in the code: 1. sender_claimed_primary and sender are in the same shard 2. and sender is the old primary, sender_claimed_primary is the old replica 3. and now sender become a replica, sender_claimed_primary become a primary That means a failover happend in the shard, and sender should be the primary of sender_claimed_primary. But obviously this assumption may be wrong, we rely on shard_id to determine whether it is in a same shard, and assume that a shard can only have one primary. But this is wrong, from valkey-io#2279 we can know there will be a case that we can create two primaries in the same shard due to the untimely update of shard_id. So we can create a test that trigger the assert in this way: 1. pre condition: two primaries in the same shard, one has slots and one is empty. 2. replica doing a cluster failover 3. the empty primary doing a cluster replicate with the replica (new primary) We change the assert to an if condition to fix it. Closes valkey-io#2423. Note that the test written here also exposes the issue in valkey-io#2441, so these two may need to be addressed together. Signed-off-by: Binbin <binloveplay1314@qq.com>
|
I don't think we should backport these. I removed from the 8.0 and 8.1 projects. Let me know if anyone disagrees. |
…alkey-io#2431) The assert was added in valkey-io#2301 and we found that there are some situations would trigger assert and crash the server. The reason we added the assert is because, in the code: 1. sender_claimed_primary and sender are in the same shard 2. and sender is the old primary, sender_claimed_primary is the old replica 3. and now sender become a replica, sender_claimed_primary become a primary That means a failover happend in the shard, and sender should be the primary of sender_claimed_primary. But obviously this assumption may be wrong, we rely on shard_id to determine whether it is in a same shard, and assume that a shard can only have one primary. But this is wrong, from valkey-io#2279 we can know there will be a case that we can create two primaries in the same shard due to the untimely update of shard_id. So we can create a test that trigger the assert in this way: 1. pre condition: two primaries in the same shard, one has slots and one is empty. 2. replica doing a cluster failover 3. the empty primary doing a cluster replicate with the replica (new primary) We change the assert to an if condition to fix it. Closes valkey-io#2423. Note that the test written here also exposes the issue in valkey-io#2441, so these two may need to be addressed together. Signed-off-by: Binbin <binloveplay1314@qq.com> Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
…g slots (valkey-io#2370) In valkey-io#2301, we added clusterMoveNodeSlots to implement the logic of moving slots from old primary to new primary, when myself receives the replica (old primary) message first and the new primary message later in a shard failover. However due to this, when myself receives the new primary message later next time, there is no way to call clusterUpdateSlotsConfigWith, because we have already updated the slots of the new primary before. This result in, for example, importing slots and migrating slots not being updated, see valkey-io#445. In this commit, we also make clusterMoveNodeSlots to move importing slots and migrating slots. Fixes valkey-io#2363. Signed-off-by: Binbin <binloveplay1314@qq.com>
…g slots (valkey-io#2370) In valkey-io#2301, we added clusterMoveNodeSlots to implement the logic of moving slots from old primary to new primary, when myself receives the replica (old primary) message first and the new primary message later in a shard failover. However due to this, when myself receives the new primary message later next time, there is no way to call clusterUpdateSlotsConfigWith, because we have already updated the slots of the new primary before. This result in, for example, importing slots and migrating slots not being updated, see valkey-io#445. In this commit, we also make clusterMoveNodeSlots to move importing slots and migrating slots. Fixes valkey-io#2363. Signed-off-by: Binbin <binloveplay1314@qq.com> (cherry picked from commit a3907ad)
update
Please also pick #2370 and #2431 and #2441 and #2477 you want to pick this one.
================
After a failover occurs, when the PING-PONG of the replica, that is,
the old primary, reaches other shard nodes faster, the corresponding
code path will be reached. We will adjust the sender to be a replica
and sender_claimed_primary to be the primary node, but in myself view,
slots still belong to the sender, which is a replica.
This actually restores part of the code in 28976a9.
It was lost in the changes to #445 and #754.
These 3 changes are about avoiding the replicaof cycle but ended up creating
a "cycle" of assumptions themselves. when #445 removed this logic, it was
assumed that a replica could also drive clusterUpdateSlotsConfigWith
but this was actually a regression that resulted in a replicaof cycle (#753).
#754 fixed the replicaof cycle by disallowing a replica to drive
clusterUpdateSlotsConfigWith but missed restoring the original logic.
Added a new
DEBUG DISABLE-CLUSTER-RECONNECTION <0|1>this will preventcluster nodes from reconnecting so that the issue can be reproduce in test.