kv: enable delegate snapshots#83991
Conversation
|
This is the restore of the initial PR from amygao9@7c525d9 - this is still a WIP |
94241ca to
1a6329c
Compare
1a6329c to
a36d1ab
Compare
2e76245 to
89ad698
Compare
432c493 to
f8d5d89
Compare
f8d5d89 to
0efbfdf
Compare
7e6fa1d to
5878898
Compare
ccd450b to
e02c7ba
Compare
e02c7ba to
2a0af82
Compare
nvb
left a comment
There was a problem hiding this comment.
Reviewed 5 of 19 files at r4, 17 of 17 files at r5, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @andrewbaptist)
pkg/cmd/roachtest/tests/decommissionbench.go line 90 at r5 (raw file):
timeout time.Duration // An override for the decommission node to make it choose a predictable node.
nit: For readers that are new to this code, an "instead of a random node" at the end would be helpful.
pkg/kv/kvserver/replica_command.go line 2536 at r5 (raw file):
coordinator, err := r.GetReplicaDescriptor() if err != nil { // If there is no local replica descriptor, return an empty list.
Should we return this error instead of requiring callers to handle an empty list?
pkg/kv/kvserver/replica_command.go line 2579 at r5 (raw file):
// Get the localities of the candidate replicas, including the original sender. localities := storePool.GetLocalitiesPerReplica(candidates) recipientLocality := storePool.GetLocalitiesPerReplica([]roachpb.ReplicaDescriptor{recipient})[recipient]
nit: consider making the function variadic.
pkg/kv/kvserver/replica_command.go line 2583 at r5 (raw file):
// Construct a map from replica to its diversity score compared to the // recipient. Also track the best score we see. replicaDistance := make(map[roachpb.ReplicaDescriptor]float64)
nit: presize the capacity to len(localities)
pkg/kv/kvserver/replica_command.go line 2611 at r5 (raw file):
Shuffle the replicas to prevent always choosing them in the same order.
Won't the rng with the same seed lead to the replicas always being shuffled in the same order? Is the randomness coming from the map iteration above?
pkg/kv/kvserver/replica_command.go line 2817 at r5 (raw file):
// Check whether we should queue on the delegate. delegateRequest.QueueOnDelegateLen = MaxQueueOnDelegateLimit.Get(&r.ClusterSettings().SV)
nit: why is this pulled outside of the struct initialization?
pkg/kv/kvserver/replica_command.go line 2845 at r5 (raw file):
if retErr == nil { return }
If we're looping and retrying on a different sender blindly without checking for a specific type of error, then it's probably best to at least log the errors that we're seeing here when looping. I'm sure we'll unintentionally delegate to multiple senders on an error that we weren't expecting at least once while stabilizing this work. Better to at least have some evidence of what the error is when that happens.
pkg/kv/kvserver/replica_command.go line 2857 at r5 (raw file):
// is validated twice, once before "queueing" and once after. This reduces the // chance of false positives (snapshots which are sent but can't be used), // however it is impossible to completely eliminate them.
"impossible to completely eliminate them" is too strong of wording, and it also doesn't help the reader understand the interactions here. Consider expanding on this point as a way to illustrate why we don't make stronger guarantees. What would we have to synchronize with to guarantee that we will never hit a false positive? Freeze all range splits, merge, rebalances, leadership changes, and log truncations?
pkg/kv/kvserver/replica_command.go line 2861 at r5 (raw file):
ctx context.Context, req *kvserverpb.DelegateSnapshotRequest, ) error { // If the generation has changed, this snapshot will be useless, so don't
Why does a descriptor generation change necessarily cause the snapshot to be useless?
pkg/kv/kvserver/replica_command.go line 2863 at r5 (raw file):
// If the generation has changed, this snapshot will be useless, so don't // attempt to send it. if r.Desc().Generation != req.DescriptorGeneration {
nit: pull desc := r.Desc() out so that there's only a single instance.
pkg/kv/kvserver/replica_command.go line 2875 at r5 (raw file):
// Check that the snapshot we generated has a descriptor that includes the // recipient. If it doesn't, the recipient will reject it, so it's better to
I'm surprised that we need this check if we're already checking that the descriptor generation is the same on the delegate and the leaseholder. Is that the case in the examples you discuss in this comment?
pkg/kv/kvserver/replica_command.go line 2895 at r5 (raw file):
// Check the raft applied state index and term to determine if this replica // is not too far behind the leaseholder that it needs a snapshot.
Play this comment out one more step. If the delegate is too far behind that is also needs a snapshot, then any snapshot it sends will be useless.
pkg/kv/kvserver/replica_command.go line 2909 at r5 (raw file):
// Delegate has a different term than the coordinator. This typically means // the lease has been transferred, and we should not process this request.
Can you explain why we shouldn't process the request in this case?
pkg/kv/kvserver/replica_command.go line 2925 at r5 (raw file):
} // Sender replica's will be rejected if the sender replica's raft applied index is lower than
"Sender replica's snapshot"?
pkg/kv/kvserver/replica_command.go line 2927 at r5 (raw file):
// Sender replica's will be rejected if the sender replica's raft applied index is lower than // or equal to the truncated state on the leaseholder, as this replica's snapshot will be wasted. // Note that its possible that we can enforce strictly lesser than if etcd does not require
s/its/it is/
pkg/kv/kvserver/replica_command.go line 2950 at r5 (raw file):
settings.SystemOnly, "kv.snapshot_delegation.num_follower", "the number of delegates to try when sending snapshots",
"when sending snapshots, before falling back to sending from the leaseholder"
pkg/kv/kvserver/replica_command.go line 3011 at r5 (raw file):
// obtain the send semaphore. We check after to make sure the snapshot request // still makes sense (e.g the range hasn't split under us). The check is // lightweight, so better to be safe than sorry.
This makes it sound like the check before the reservation is best-effort and the check after is properly synchronized and makes a stronger guarantee. But that's not correct, is it? Consider being explicit about this in these two comments, like you are in the comment on validateSnapshotDelegationRequest ("however it is impossible to completely eliminate them").
pkg/kv/kvserver/replica_command.go line 3034 at r5 (raw file):
// TODO: Review this to make sure that there isn't any inconsistency since we // may be on a delegate. The GetSnapshot call will add a LogTruncation // constraint, but it isn't clear that this will help.
This is a good call. We want the log truncation constraint to be created on the leaseholder, not on the delegate.
pkg/kv/kvserver/store_snapshot.go line 696 at r5 (raw file):
} // If the queue is too long on a delegate, don't attempt to wait before sending. if req.QueueOnDelegateLen >= 0 {
Do we want to push this queue depth limit into the MultiQueue itself to avoid raciness? Currently, this limit can be exceeded by multiple concurrent calls to reserveSendSnapshot.
pkg/kv/kvserver/testing_knobs.go line 427 at r5 (raw file):
// send snapshot semaphore. AfterSendSnapshotThrottle func() // SelectDelegateSnapshotSender returns a replica to send delegated snapshots.
"returns a replica"
It returns a slice, so this contract isn't quite right. Want to explain what it means for multiple replicas to be returned?
pkg/kv/kvserver/allocator/storepool/store_pool.go line 1208 at r5 (raw file):
func (sp *StorePool) GetLocalitiesPerReplica( replicas []roachpb.ReplicaDescriptor, ) map[roachpb.ReplicaDescriptor]roachpb.Locality {
Would it be more natural for this to map from ReplicaID to Locality? The ReplicaDescriptor struct contains a number of fields, some of which (e.g. Type) are mutable over the lifetime of a replica.
pkg/kv/kvserver/kvserverpb/raft.proto line 274 at r5 (raw file):
// The priority of the snapshot. // TODO(abaptist): Remove this field for v23.1.
Can we remove them now?
pkg/kv/kvserver/kvserverpb/raft.proto line 286 at r5 (raw file):
// The truncated state of the Raft log on the coordinator replica. uint64 truncated_index = 8;
If we're changing this, we should evaluate whether we want to send the "truncated index" or the "first index" (truncated index + 1). I believe it is more common around the Raft layer to refer to the log's first index, but I could be wrong about that. Doing so would allow you to use the existing Replica.GetFirstIndex method, instead of introducing the new Replica.GetRaftTruncatedState method.
2a0af82 to
99429d7
Compare
99429d7 to
88cfa8d
Compare
andrewbaptist
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, @herkolategan, @nvanbenschoten, and @smg260)
pkg/cmd/roachtest/tests/decommissionbench.go line 90 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: For readers that are new to this code, an "instead of a random node" at the end would be helpful.
Done
pkg/kv/kvserver/replica_command.go line 2536 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should we return this error instead of requiring callers to handle an empty list?
Done
pkg/kv/kvserver/replica_command.go line 2579 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: consider making the function variadic.
Done
pkg/kv/kvserver/replica_command.go line 2583 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: presize the capacity to
len(localities)
Done
pkg/kv/kvserver/replica_command.go line 2611 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Shuffle the replicas to prevent always choosing them in the same order.
Won't the rng with the same seed lead to the replicas always being shuffled in the same order? Is the randomness coming from the map iteration above?
For any given replica, they will shuffle in the same way, but if there are multiple different replicas that are being transmitted they will shuffle differently. This is typically only called once per replica, because if it fails then it falls back to the leaseholder.
pkg/kv/kvserver/replica_command.go line 2817 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: why is this pulled outside of the struct initialization?
Done
pkg/kv/kvserver/replica_command.go line 2845 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
If we're looping and retrying on a different sender blindly without checking for a specific type of error, then it's probably best to at least log the errors that we're seeing here when looping. I'm sure we'll unintentionally delegate to multiple senders on an error that we weren't expecting at least once while stabilizing this work. Better to at least have some evidence of what the error is when that happens.
Done, good point!
pkg/kv/kvserver/replica_command.go line 2857 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"impossible to completely eliminate them" is too strong of wording, and it also doesn't help the reader understand the interactions here. Consider expanding on this point as a way to illustrate why we don't make stronger guarantees. What would we have to synchronize with to guarantee that we will never hit a false positive? Freeze all range splits, merge, rebalances, leadership changes, and log truncations?
Done
pkg/kv/kvserver/replica_command.go line 2863 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: pull
desc := r.Desc()out so that there's only a single instance.
Done
pkg/kv/kvserver/replica_command.go line 2875 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'm surprised that we need this check if we're already checking that the descriptor generation is the same on the delegate and the leaseholder. Is that the case in the examples you discuss in this comment?
You are right about this being redundant. I don't see how this could happen either if the descriptor generation has changed. I could change this to an assertion, or remove the check.
pkg/kv/kvserver/replica_command.go line 2895 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Play this comment out one more step. If the delegate is too far behind that is also needs a snapshot, then any snapshot it sends will be useless.
Done, Added comment
pkg/kv/kvserver/replica_command.go line 2925 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"Sender replica's snapshot"?
Done
pkg/kv/kvserver/replica_command.go line 2927 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/its/it is/
Done
pkg/kv/kvserver/replica_command.go line 2950 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"when sending snapshots, before falling back to sending from the leaseholder"
Done
pkg/kv/kvserver/replica_command.go line 3011 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This makes it sound like the check before the reservation is best-effort and the check after is properly synchronized and makes a stronger guarantee. But that's not correct, is it? Consider being explicit about this in these two comments, like you are in the comment on
validateSnapshotDelegationRequest("however it is impossible to completely eliminate them").
Done, Good point and I cleaned up the comment on this.
pkg/kv/kvserver/replica_command.go line 3034 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This is a good call. We want the log truncation constraint to be created on the leaseholder, not on the delegate.
I'm not positive about the best way to handle this. If this is called on the leaseholder, then it will correctly have the constraint in place, but if it is called on a delegate then there isn't really a way to guarantee this. I could change to make the constraint occur much earlier (on the leaseholder before it sends the delegation request) but that risks holding the constraint for a longer window. I don't think it would be hard to make that change, so maybe that is the way to go here.
pkg/kv/kvserver/store_snapshot.go line 696 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do we want to push this queue depth limit into the MultiQueue itself to avoid raciness? Currently, this limit can be exceeded by multiple concurrent calls to
reserveSendSnapshot.
Done, probably wouldn't have been an issue, but since this is the only use of the MultiQueue, simpler to do this.
pkg/kv/kvserver/testing_knobs.go line 427 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"returns a replica"
It returns a slice, so this contract isn't quite right. Want to explain what it means for multiple replicas to be returned?
Done
pkg/kv/kvserver/allocator/storepool/store_pool.go line 1208 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Would it be more natural for this to map from ReplicaID to Locality? The ReplicaDescriptor struct contains a number of fields, some of which (e.g. Type) are mutable over the lifetime of a replica.
Done, it requires an extra loop to convert back to descriptors, but probably the right thing to do as it makes the equality safer.
pkg/kv/kvserver/kvserverpb/raft.proto line 274 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Can we remove them now?
Unfortunately, this is a bigger change than I had planned for this patch. I was going to do this as a follow-on PR. I chatted with @AlexTalks a little about this and there are UI changes related to it. I think this can still get done in 23.1, but I didn't want to tie it to this. If we later remove these fields, it will not matter as the only thing they are still used for is reporting stats right now.
pkg/kv/kvserver/kvserverpb/raft.proto line 286 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
If we're changing this, we should evaluate whether we want to send the "truncated index" or the "first index" (truncated index + 1). I believe it is more common around the Raft layer to refer to the log's first index, but I could be wrong about that. Doing so would allow you to use the existing
Replica.GetFirstIndexmethod, instead of introducing the newReplica.GetRaftTruncatedStatemethod.
Done, This makes more sense now.
nvb
left a comment
There was a problem hiding this comment.
Reviewed 20 of 21 files at r6, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, @herkolategan, and @smg260)
pkg/kv/kvserver/replica_command.go line 2861 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Why does a descriptor generation change necessarily cause the snapshot to be useless?
I'm still curious about this. Are there descriptor generation changes that don't necessarily mean this, but we're ok with false positives?
pkg/kv/kvserver/replica_command.go line 3034 at r5 (raw file):
but that risks holding the constraint for a longer window
The leaseholder also doesn't know which index to bound log truncation up to. It may have applied further than the delegated sender, and it doesn't want to truncate the log any higher than the index that the delegated snapshot is being sent.
I think this is why the RPC protocol was streaming before. We had envisioned a handshake where the leaseholder would ask the delegate for its applied index, install a constraint at that index (rechecking that it hadn't truncated above this already), then allow the delegated snapshot to proceed.
pkg/kv/kvserver/replica_command.go line 2591 at r6 (raw file):
// regardless of score. var tiedReplicas []roachpb.ReplicaID for desc, score := range replicaDistance {
s/desc/replID/
pkg/kv/kvserver/replica_command.go line 2611 at r6 (raw file):
// Only keep the top numFollowers replicas. if len(tiedReplicas) > numFollowers { tiedReplicas = tiedReplicas[0:numFollowers]
small nit: the 0 is unnecessary.
pkg/kv/kvserver/replica_command.go line 2619 at r6 (raw file):
for n, replicaId := range tiedReplicas { found := false for _, desc := range rangeDesc.Replicas().Descriptors() {
Can we use ReplicaSet.GetReplicaDescriptorByID here?
andrewbaptist
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @herkolategan, @nvanbenschoten, and @smg260)
pkg/kv/kvserver/replica_command.go line 2861 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'm still curious about this. Are there descriptor generation changes that don't necessarily mean this, but we're ok with false positives?
Just to record what we discussed and added a comment to it. This check probably is a little overly strict, which means some delegated snapshots may be rejected unnecessarily. This is not really a problem and should be pretty rare. We could likely remove this check in the future.
pkg/kv/kvserver/replica_command.go line 3034 at r5 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
but that risks holding the constraint for a longer window
The leaseholder also doesn't know which index to bound log truncation up to. It may have applied further than the delegated sender, and it doesn't want to truncate the log any higher than the index that the delegated snapshot is being sent.
I think this is why the RPC protocol was streaming before. We had envisioned a handshake where the leaseholder would ask the delegate for its applied index, install a constraint at that index (rechecking that it hadn't truncated above this already), then allow the delegated snapshot to proceed.
After discussing options on this I added the log truncation constraint up front on the coordinator and there is a risk of holding this longer than strictly required. There is a path to change this if necessary in the future, but it is unlikely to be a concern for most systems. https://cockroachlabs.atlassian.net/wiki/spaces/~6268113f52310b0068ffd245/pages/2869854482/Delegate+snapshot+overview
There are a few ways to solve this, but I don't think the streaming RPC is correct since the data required is disjoint for the different parts of the flow. There is a risk with the latest change that a delegated requests will fail since it won't have the latest applied index from the leaseholder. If that occurs often it would be worth changing to retrying on the delegate since it will "eventually" catch up in most cases.
pkg/kv/kvserver/replica_command.go line 2591 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/desc/replID/
Done
pkg/kv/kvserver/replica_command.go line 2611 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
small nit: the
0is unnecessary.
Done
pkg/kv/kvserver/replica_command.go line 2619 at r6 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Can we use
ReplicaSet.GetReplicaDescriptorByIDhere?
Done
nvb
left a comment
There was a problem hiding this comment.
Reviewed 18 of 20 files at r7, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, @herkolategan, and @smg260)
pkg/kv/kvserver/replica_command.go line 2821 at r7 (raw file):
Term: status.Term, DelegatedSender: sender, FirstIndex: appliedIndex,
Could you add a comment here about why you are setting FirstIndex to the value of appliedIndex on the leaseholder? It's subtle and could benefit from a discussion about the consequences and plans for future improvement.
pkg/kv/kvserver/replica_command.go line 2880 at r7 (raw file):
// If the generation has changed, this snapshot may be useless, so don't // attempt to send it. //NB: This is an overly strict check. If other delegates are added to this
nit: missing a space after // on each line.
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, @herkolategan, @nvanbenschoten, and @smg260)
-- commits line 29 at r8:
nit: consider adding more words to the release note for this awesome work so that the average user can understand it
andrewbaptist
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @herkolategan, @nvanbenschoten, and @smg260)
pkg/kv/kvserver/replica_command.go line 2821 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Could you add a comment here about why you are setting FirstIndex to the value of
appliedIndexon the leaseholder? It's subtle and could benefit from a discussion about the consequences and plans for future improvement.
Done
pkg/kv/kvserver/replica_command.go line 2880 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: missing a space after
//on each line.
Done
nvb
left a comment
There was a problem hiding this comment.
Reviewed all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, @herkolategan, and @smg260)
pkg/kv/kvserver/replica_command.go line 2821 at r7 (raw file):
Previously, andrewbaptist (Andrew Baptist) wrote…
Done
Did you miss a git push?
andrewbaptist
left a comment
There was a problem hiding this comment.
Reviewed 2 of 4 files at r8.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andreimatei, @herkolategan, @nvanbenschoten, and @smg260)
Previously, andreimatei (Andrei Matei) wrote…
nit: consider adding more words to the release note for this awesome work so that the average user can understand it
Thanks, I added some more to the release notes. I will likely publish a short blog no this as well when 23.1 comes out!
pkg/kv/kvserver/replica_command.go line 2821 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Did you miss a
git push?
I'm not sure what happened - I rewrote and pushed again.
nvb
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r10, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @AlexTalks, @andreimatei, @andrewbaptist, @herkolategan, and @smg260)
Fixes: cockroachdb#42491 This commit allows a snapshot to be sent by a follower instead of the leader of a range. The follower(s) are chosen based on locality to the final recipient of the snapshot. If the follower is not able to quickly send the snapshot, the attempt is aborted and the leader sends the snapshot instead. By choosing a delegate rather than sending the snapshot directly, WAN traffic can be minimized. Additionally the snapshot will likely be delivered faster. There are two settings that control this feature. The first, `kv.snapshot_delegation.num_follower`, controls how many followers the snapshot is attempted to be delegated through. If set to 0, then snapshot delegation is disabled. The second, `kv.snapshot_delegation_queue.enabled`, controls whether delegated snapshots will queue on the delegate or return failure immediately. This is useful to prevent a delegation request from spending a long time waiting before it is sent. Before the snapshot is sent from the follower checks are done to verify that the delegate is able to send a snapshot that will be valid for the recipient. If not the request is rerouted to the leader. Release note (performance improvement): Adds delegated snapshots which can reduce WAN traffic for snapshot movement. If there is another replica for this range with a closer locality than the delegate, the leaseholder will attempt to have that delegate send the snapshot. This is particularly useful in the case of a decommission of a node where most snapshots are transferred to another replica in the same locality.
|
bors r=nvanbenschoten |
|
Build succeeded: |
|
congrats!! |
kvserver: delegate snapshots to followers
Fixes: #42491
This commit allows a snapshot to be sent by a follower instead of the
leader of a range. The follower(s) are chosen based on locality to the
final recipient of the snapshot. If the follower is not able to
quickly send the snapshot, the attempt is aborted and the leader sends
the snapshot instead.
By choosing a delegate rather than sending the snapshot directly, WAN
traffic can be minimized. Additionally the snapshot will likely be
delivered faster.
There are two settings that control this feature. The first,
kv.snapshot_delegation.num_follower, controls how many followersthe snapshot is attempted to be delegated through. If set to 0, then
snapshot delegation is disabled. The second,
kv.snapshot_delegation_queue.enabled, controls whether delegatedsnapshots will queue on the delegate or return failure immediately. This
is useful to prevent a delegation request from spending a long time
waiting before it is sent.
Before the snapshot is sent from the follower checks are done to
verify that the delegate is able to send a snapshot that will be valid
for the recipient. If not the request is rerouted to the leader.
Release note (performance improvement): Adds delegated snapshots which can reduce WAN traffic for snapshot movement.