-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Retain a down primary's slot info after automatic fail over #10163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
in the "cluster nodes" output after an automatic failover. The idea is to bring consistency to the behavior of "cluster nodes" regardless of whether or not a replica exists for the failed primary.
…le with previous Redis releases
This reverts commit b4f5737.
|
@madolson when you get a chance :-) this PR improves the "cluster nodes" output consistency, irrespective of # of replicas in the cluster. |
|
@zuiderkwast would really appreciate it if you could help review this PR. Thanks! |
zuiderkwast
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that it can be useful to know which shard a failed master belonged to, but I have some concerns:
-
Can there can be multiple masters for the same slot range, where only one is not failed? What if clients fail to parse this output and selects the failed master instead of the healthy master? Assume the following two lines are included in the output:
72f66f4ecbcfc3055faae01a68bbeadc5079bb92 127.0.0.1:30001@40001 master,fail - 1642667704665 1642667703661 1 disconnected 0-5460 ab87c879682639786876de7868976f9100c0c09a 127.0.0.1:30002@40002 master - 1642667704665 1642667703661 1 connected 0-5460If a client only looks for the string "master" and a slot range, then this change can cause the client to select the failed master. It is a breaking change.
-
The intended solution to the problem that a master has no replicas is replica migration. Perhaps it is possible to improve this automatic process so replicas can be migrated to a master which has < N replicas (if target is N replicas for each master). Maybe it is not too difficult to add a config for that. WDYT?
Perhaps we could even improve that algorithm so a new replica can decide to migrate itself to another master even before it starts replicating the first master. Or we can allow configuring it as a replica without specifying which master (for example allow CLUSTER REPLICATE without a node-id) and then it can automatically decide which master it shall replicate. (Automatic is better than manual...)
-
If we want to promote the new CLUSTER SHARDS command as a replacement for CLUSTER NODES and CLUSTER SLOTS, it would be good if failed masters can be included in the CLUSTER SHARDS output, the correct shard. Otherwise, if CLUSTER NODES is the only command that provides this information, clients might still prefer it of the other commands.
-
CLUSTER SHARDS is new command so we don't need to introduce a breaking change if we add it to this command from the beginning, instead of changing CLUSTER NODES.
| goto fmterr; | ||
| } | ||
| while(start <= stop) clusterAddSlot(n, start++); | ||
| /* Add the slot ranges to the officical slots bitmap only if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| /* Add the slot ranges to the officical slots bitmap only if the | |
| /* Add the slot ranges to the official slots bitmap only if the |
|
Thanks @zuiderkwast. You had me on the compatibility concern. It is an indeed a breaking change. I will close this PR next.
We might want to generalize it a bit further and consider the case where the entire shard is lost (either because there is no replica or all nodes in the shard are dead). Ideally, the implementation would also need to take into account of AZs in addition to the replica count. I guess I was looking for something like "CLUSTER FIX". Under the hood, the node receiving this command will seek out uncovered slot ranges first and if none is found, try to become a replica of a primary that is missing the expected number of replicas. Going with this thought further, could the ultimate form of this idea be integrated into "CLUSTER MEET" or "CLUSTER MEET [FIX]", perhaps? In any case, this is not a trivial task when there are multiple new nodes roaming in the cluster looking for jobs.
This is a great idea. Will continue the discussion on the CLUSTER SHARDS thread. |
Fixes #10155
This PR improves the consistency of "cluster nodes" output such that, regardless of whether there exists a replica for the failed primary node, "cluster nodes" executed on any remaining node in the cluster now contains the slot info for the failed primary node.
[Before the change]
...
72f66f4ecbcfc3055faae01a68bbeadc5079bb92 127.0.0.1:30001@40001 master,fail - 1642667704665 1642667703661 1 disconnected
...
With the change
...
72f66f4ecbcfc3055faae01a68bbeadc5079bb92 127.0.0.1:30001@40001 master,fail - 1642667704665 1642667703661 1 disconnected 0-5460
...