Skip to content

Conversation

@PingXie
Copy link
Contributor

@PingXie PingXie commented Jan 23, 2022

Fixes #10155

This PR improves the consistency of "cluster nodes" output such that, regardless of whether there exists a replica for the failed primary node, "cluster nodes" executed on any remaining node in the cluster now contains the slot info for the failed primary node.

[Before the change]

...
72f66f4ecbcfc3055faae01a68bbeadc5079bb92 127.0.0.1:30001@40001 master,fail - 1642667704665 1642667703661 1 disconnected
...

With the change

...
72f66f4ecbcfc3055faae01a68bbeadc5079bb92 127.0.0.1:30001@40001 master,fail - 1642667704665 1642667703661 1 disconnected 0-5460
...

Ping Xie added 4 commits January 23, 2022 02:28
in the "cluster nodes" output after an automatic failover. The idea is
to bring consistency to the behavior of "cluster nodes" regardless of
whether or not a replica exists for the failed primary.
@madolson madolson self-requested a review January 30, 2022 19:54
@PingXie
Copy link
Contributor Author

PingXie commented Feb 17, 2022

@madolson when you get a chance :-) this PR improves the "cluster nodes" output consistency, irrespective of # of replicas in the cluster.

@PingXie
Copy link
Contributor Author

PingXie commented Feb 23, 2022

@zuiderkwast would really appreciate it if you could help review this PR. Thanks!

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that it can be useful to know which shard a failed master belonged to, but I have some concerns:

  1. Can there can be multiple masters for the same slot range, where only one is not failed? What if clients fail to parse this output and selects the failed master instead of the healthy master? Assume the following two lines are included in the output:

     72f66f4ecbcfc3055faae01a68bbeadc5079bb92 127.0.0.1:30001@40001 master,fail - 1642667704665 1642667703661 1 disconnected 0-5460
     ab87c879682639786876de7868976f9100c0c09a 127.0.0.1:30002@40002 master - 1642667704665 1642667703661 1 connected 0-5460
    

    If a client only looks for the string "master" and a slot range, then this change can cause the client to select the failed master. It is a breaking change.

  2. The intended solution to the problem that a master has no replicas is replica migration. Perhaps it is possible to improve this automatic process so replicas can be migrated to a master which has < N replicas (if target is N replicas for each master). Maybe it is not too difficult to add a config for that. WDYT?

    Perhaps we could even improve that algorithm so a new replica can decide to migrate itself to another master even before it starts replicating the first master. Or we can allow configuring it as a replica without specifying which master (for example allow CLUSTER REPLICATE without a node-id) and then it can automatically decide which master it shall replicate. (Automatic is better than manual...)

  3. If we want to promote the new CLUSTER SHARDS command as a replacement for CLUSTER NODES and CLUSTER SLOTS, it would be good if failed masters can be included in the CLUSTER SHARDS output, the correct shard. Otherwise, if CLUSTER NODES is the only command that provides this information, clients might still prefer it of the other commands.

  4. CLUSTER SHARDS is new command so we don't need to introduce a breaking change if we add it to this command from the beginning, instead of changing CLUSTER NODES.

goto fmterr;
}
while(start <= stop) clusterAddSlot(n, start++);
/* Add the slot ranges to the officical slots bitmap only if the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* Add the slot ranges to the officical slots bitmap only if the
/* Add the slot ranges to the official slots bitmap only if the

@PingXie
Copy link
Contributor Author

PingXie commented Feb 24, 2022

Thanks @zuiderkwast.

You had me on the compatibility concern. It is an indeed a breaking change. I will close this PR next.

allow CLUSTER REPLICATE without a node-id

We might want to generalize it a bit further and consider the case where the entire shard is lost (either because there is no replica or all nodes in the shard are dead). Ideally, the implementation would also need to take into account of AZs in addition to the replica count. I guess I was looking for something like "CLUSTER FIX". Under the hood, the node receiving this command will seek out uncovered slot ranges first and if none is found, try to become a replica of a primary that is missing the expected number of replicas. Going with this thought further, could the ultimate form of this idea be integrated into "CLUSTER MEET" or "CLUSTER MEET [FIX]", perhaps? In any case, this is not a trivial task when there are multiple new nodes roaming in the cluster looking for jobs.

it would be good if failed masters can be included in the CLUSTER SHARDS output

This is a great idea. Will continue the discussion on the CLUSTER SHARDS thread.

@PingXie PingXie closed this Feb 24, 2022
@PingXie PingXie mentioned this pull request Feb 24, 2022
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Retain a down primary's slot info after automatic fail over

2 participants