-
Notifications
You must be signed in to change notification settings - Fork 4.1k
raft: use learner replicas instead of preemptive snapshots #34058
Description
We could stop using preemptive snapshots if we used Raft's support for learner replicas. These catch up on the log and can receive snapshots, but they don't vote, campaign, or count in the commit quorum.
Doing so would unify Raft and preemptive snapshots, would allow us to treat learners as regular replicas in much of the code, and would prevent the replica GC queue from erroneously removing preemptive snapshots (which is a medium-size nuisance today). Instead, we'd need a mechanism that regularly inspects the range descriptor for learners and decides whether a learner should be removed (for example, after a replication change fails without cleaning up the learner).
Naively we could try to rely on the Raft snapshot queue to catch up the learners, but that queue works very poorly. Until it has been redesigned, we would probably be better off sending the snapshot manually like today.
Learner replicas could also be used in conjunction with follower reads to enable historical queries at a remote location. For example, a satellite DC could hold a learner replica of each range whose full members live in another DC, and would allow the satellite DC to carry out historical reads on a copy of the data. Whether this is useful in the presence of CDC is speculation, but I think @bdarnell had opinions in the past.
cc @nvanbenschoten (who suggested learner replicas)