-
Notifications
You must be signed in to change notification settings - Fork 2.3k
raft: support joint consensus for cluster membership change #1468
Description
Etcd uses a simple implementation for membership change(adding/removing one peer one time when applying the raft log).
This works well in most of the time, but sometime it may still have risk, especially when PD does balance.
E,g, three racks 1, 2 and 3, each rack has 2 machines (we use h11, h12 for machines in rack1, and so on). PD first schedules three peers p1, p2, p3 to h11, h21 and h31, then it finds that h11 has a high load, so it decides to add a new peer p4 to h12 and remove p1 in h11.
If rack 1 is down after adding p4, the region can't supply service. To avoid this, we must add p4 and remove p1 atomically, but now, we can't support it.
Supporting join consensus can fix this problem, but this is different from etcd, and we must do many tests to verify the correctness and cover the corner case.