-
Notifications
You must be signed in to change notification settings - Fork 4.1k
Range replica removal #768
Copy link
Copy link
Closed
Labels
C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Milestone
Description
We don't currently do anything special when a replica is removed from a range; this issue tracks changes we need to make.
- We don't actually inform the removed node that a change has occurred. (If it was online at the time of its removal then it may have gotten the event in its log, but if the entry didn't make it through on the first attempt it will not be retried once it has been committed on a quorum). The removed replica will come back online in follower mode still believing it is a part of the group. It may try to start elections because it is no longer receiving heartbeats from any leader (although coalesced heartbeats may (incorrectly) suppress elections if the node still has groups in common with its former leader; see storage: Coalesced Heartbeats Corner Cases #315). For a short time after its removal, its elections might actually succeed (see raft thesis.pdf page 41). We need some way to inform the node that it has been removed from the group (while still allowing for the node to be re-added in the future).
- If the node being removed was the leader at the time of its removal, all followers must "forget" their knowledge of this leader so they will stop fanning out coalesced heartbeats and trigger elections. Ideally we would implement a more orderly leadership handoff for this case instead of relying on followers timing out and triggering elections as if the leader had died.
- Stale messages may arrive after the range has been removed; these will try to recreate the group. If we cannot tell the difference between messages that predate the removal and future messages that re-add this replica, we have to let them through and ensure that all necessary cleanup happens in an asynchronous GC process instead of a trigger on the original replica removal.
- After removing the raft group, we also need to remove the data that belonged to the range (while being mindful of NodeID reuse: Raft NodeID reuse #756). If the node is later re-added, we need to be careful that the population of the new incarnation of the range does not overlap with the deletion of the old incarnation.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.