Ensure only primary sender drives slot ownership updates#754
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #754 +/- ##
============================================
+ Coverage 70.25% 70.28% +0.03%
============================================
Files 112 112
Lines 60590 60587 -3
============================================
+ Hits 42567 42586 +19
+ Misses 18023 18001 -22
|
hpatro
left a comment
There was a problem hiding this comment.
Question from the issue (repeating here):
Is it any problematic to handle topology update from replica with higher config epoch?
zuiderkwast
left a comment
There was a problem hiding this comment.
The change seems to make sense but I don't fully understand the fix and how it solves the problem. The essence of the problem can be seen in the diff screenshot in #753? Please update the PR description to independently describe the change, rather than only referring to the issue.
It's harder to review the fix when there's refactoring in the same commit. It's easier if the refacoring and the acutal fix are at least in two separate commits.
(I imagine Oran Agra would have rejected the renaming of variables, to avoid any kind of unnecessary changes.)
Yeah I think it will work but I feel that it might involve a bigger change. This change on the other hand is more scoped, relatively speaking. We can definitely explore this idea separately/incrementally. |
Yes - replicas updating the slot ownership on behalf of their primaries is the regression (or behavior change) introduced by #445.
I don't agree with the "unnecessary changes". All these changes are either a continuation/fine-tune of #445 or for a good reason of improving readability. I have gone through this code path more times than I can count but I still get lost easily because the naming is just so confusing. A blank statement of "unnecessary changes" is just saying "I don't know what I am doing" - sorry for the unnecessarily strong response :) |
Sorry for causing frustration! I know you've very familiar with this code. More so than the rest of us. With "unnecessary changes", I just meant changes that don't change the logic, i.e. refactoring, not strictly necessary to fix the bug. I didn't say I'm against it. Only that I'd prefer them in separate commits, since it would have made it easier for me to read the diff and spot the actual behaviour change. The diff is +68 −82, while the actual fix is ~5 lines. Even the title "ensure only primary sender drives..." seemed a bit cryptic to me at first, but of course it's obvious to anyone who has this code fresh in memory. (My though processes involved "What's a primary sender again... Hm.. right, it's when we're receiving a packet on the cluster bus from a primary.") I'm trying to make reviewing easier for myself by requiring more from the contributors. Not sure if it's sane but I want to be able to handle more PRs faster without being burnt out. I should know you're in the same position. |
No worries - we are all doing our jobs.
I disagree. I don't consider this a refactoring, which I agree is better handled in a separate PR. There is a reason why I introduced the regression in the first place. Our discussions on the names show the exact problem of this code not having established clear mental concepts and this was partly the reason why I introduced the regression in the first place. Making these changes in a separate PR loses the exact context of why these changes are needed. And refactoring PRs don't normally get the right amount of scrutinization, which is what I want for this PR. I am all for any nitpick that could help improve the code quality in such a critical part of the code base.
I believe I have explained the issue to my best ability in #753, which I linked to this PR as well. Have you got a chance to check it out? I have a tendency to split the bug and PR. Will make sure to carry the context over for this (and future) PR. |
Signed-off-by: Ping Xie <pingxie@google.com>
Signed-off-by: Ping Xie <pingxie@google.com>
Signed-off-by: Ping Xie <pingxie@google.com>
Signed-off-by: Ping Xie <pingxie@google.com>
Signed-off-by: Ping Xie <pingxie@google.com>
Signed-off-by: Ping Xie <pingxie@google.com>
|
I have updated the names per recommendation. I think they look clearer now. @zuiderkwast PTAL. Also I think there is another (0-day) bug. We never check the sender's config epoch before accepting its claim of being a primary. I will address it in a separate PR :). https://github.com/valkey-io/valkey/blob/unstable/src/cluster_legacy.c#L3119 |
zuiderkwast
left a comment
There was a problem hiding this comment.
Great! These "sender-claimed" names definitely make things more clear.
Also I think there is another (0-day) bug. We never check the sender's config epoch before accepting its claim of being a primary. I will address it in a separate PR :).
OK. I wouldn't mind having it in the same PR (just a separate commit within the PR for easier reading). But a separate PR is also good to if we want to backport it or something.
I don't consider this a refactoring, which I agree is better handled in a separate PR.
I never said separate PR. That would be excessive. I just meant separate commits within the same PR. It might have made it easier to review, though this may just have been me trying to find excuses for not understanding the fix.
I believe I have explained the issue to my best ability in #753, which I linked to this PR as well. Have you got a chance to check it out? I have a tendency to split the bug and PR. Will make sure to carry the context over for this (and future) PR.
Yes, the issue is very good. The problem description and the description of the change are not the same things though. In the PR I think it's good to describe the actual change. The problem description doesn't need to be repeated in detail in the PR when it's explained in the issue.
For the PR description, I'd be happy just what you wrote under "Fix:", and possibly mentioning the additional changes to make it easier to understand the diff. No need to update it now, but I'm copy-pasting some sentences here just to show what I mean, what I think is enough, yet useful for reviewing and also enough to have in the commit log after merging:
Fixes a regression introduced in PR #445, which allowed a message from a replica
to update the slot ownership of its primary. The regression results in a
replicaof cycle, causing server crashes due to the cycle detection assert. The
fix restores the previous behavior where only primary senders can trigger
clusterUpdateSlotsConfigWith.
Additional changes:
* Handling of primaries without slots is obsoleted by new handling of when a
sender that was a replica announces that it is now a primary.
* Replication loop detection code is unchanged but shifted downwards.
* Some variables are renamed for better readability and some are introduced to
avoid repeated memcmp() calls.
|
Thanks @zuiderkwast!
Let me start a new PR on this one. It is probably easier that way.
Got it. My bad. I misunderstood. Reviewing commits is a good point.
Make sense. I can do that too. |
I want it in a separate PR. I'm going to fairly strongly disagree with Viktor that I think in this specific case the refactor made the change harder for me to follow. I think that is because I'm more familiar with the code, outside of the newly introduced variables I found everything harder to follow. |
Yeah, allowing replicas to update slot ownership on behalf of their primaries is still the change that makes me feel uncomfortable. In the past the algorithm was to just trust primaries as much as possible. |
This is likely proximity bias :). I would suggest also reviewing the function without diff'ing line by line, and then back to the diff. This is how I come to being at peace with it. Let me know what you think afterwards. |
Everyone is biased. |
I'll stand behind my original statement that the refactoring made evaluating the technical changes more difficult, and would still prefer it in a separate PR. I also think that is a better broad decision to take as a team. The problem with a separate commit is that we squash and merge, and lose the commit. In some cases you are introducing new concepts, and for that the new names make sense. But in other it just seems like you are renaming variables. |
madolson
left a comment
There was a problem hiding this comment.
I think the change makes sense. We probably should avoid taking input from the replica about three state of its primaries as much as possible.
…eplicaof-cycle
Sure. I can do a separate PR in the future.
Will clean this PR up in a bit. |
Signed-off-by: Ping Xie <pingxie@google.com>
Signed-off-by: Ping Xie <pingxie@google.com>
… failover (#2301) After a failover occurs, when the PING-PONG of the replica, that is, the old primary, reaches other shard nodes faster, the corresponding code path will be reached. We will adjust the sender to be a replica and sender_claimed_primary to be the primary node, but in myself view, slots still belong to the sender, which is a replica. This actually restores part of the code in 28976a9. It was lost in the changes to #445 and #754. These 3 changes are about avoiding the replicaof cycle but ended up creating a "cycle" of assumptions themselves. when #445 removed this logic, it was assumed that a replica could also drive clusterUpdateSlotsConfigWith but this was actually a regression that resulted in a replicaof cycle (#753). #754 fixed the replicaof cycle by disallowing a replica to drive clusterUpdateSlotsConfigWith but missed restoring the original logic. Added a new `DEBUG DISABLE-CLUSTER-RECONNECTION <0|1>` this will prevent cluster nodes from reconnecting so that the issue can be reproduce in test. Signed-off-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Ping Xie <pingxie@outlook.com>
Fixes a regression introduced in PR #445, which allowed a message from a replica
to update the slot ownership of its primary. The regression results in a
replicaofcycle, causing server crashes due to the cycle detection assert. Thefix restores the previous behavior where only primary senders can trigger
clusterUpdateSlotsConfigWith.Additional changes:
sender that was a replica announces that it is now a primary.
avoid repeated memcmp() calls.
Fixes #753.