-
Notifications
You must be signed in to change notification settings - Fork 10.3k
raft: improve the availability related to member change #7625
Description
Hi,
Current member change implementation requires at least two nodes works for a cluster. If one node fails in a three nodes cluster, there is a short time gap that the availability risks on another node failure, after the previously failed node is removed, before the new node is added.
Another availability issue arises when balancing the nodes among racks/data centers. It's an usual way to add a new node and then remove one old node among different racks/data centers to do the balancing. After adding a node into a three nodes cluster in one of the three racks/data centers, there will two nodes in one same rack/data center. If this rack/data center fails, the cluster is unavailable. The elaboration for this issue is in tikv/tikv#1468
Both availability issues are related to the member change implementation. To fix them, I suggest to add a "ReplaceNode" primitive in member change. It requires to write and then commit one log entry to achieve the target "remove one existing node and add a new node".