kv: enable delegate snapshots by andrewbaptist · Pull Request #83991 · cockroachdb/cockroach

andrewbaptist · 2022-07-07T13:59:19Z

kvserver: delegate snapshots to followers

This commit allows a snapshot to be sent by a follower instead of the
leader of a range. The follower(s) are chosen based on locality to the
final recipient of the snapshot. If the follower is not able to
quickly send the snapshot, the attempt is aborted and the leader sends
the snapshot instead.

By choosing a delegate rather than sending the snapshot directly, WAN
traffic can be minimized. Additionally the snapshot will likely be
delivered faster.

There are two settings that control this feature. The first,
kv.snapshot_delegation.num_follower, controls how many followers
the snapshot is attempted to be delegated through. If set to 0, then
snapshot delegation is disabled. The second,
kv.snapshot_delegation_queue.enabled, controls whether delegated
snapshots will queue on the delegate or return failure immediately. This
is useful to prevent a delegation request from spending a long time
waiting before it is sent.

Before the snapshot is sent from the follower checks are done to
verify that the delegate is able to send a snapshot that will be valid
for the recipient. If not the request is rerouted to the leader.

Release note (performance improvement): Adds delegated snapshots which can reduce WAN traffic for snapshot movement.

cockroach-teamcity · 2022-07-07T13:59:30Z

This change is

andrewbaptist · 2022-07-07T14:00:48Z

This is the restore of the initial PR from amygao9@7c525d9 - this is still a WIP

nvb

Reviewed 5 of 19 files at r4, 17 of 17 files at r5, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @andrewbaptist)

pkg/cmd/roachtest/tests/decommissionbench.go line 90 at r5 (raw file):

	timeout time.Duration

	// An override for the decommission node to make it choose a predictable node.

nit: For readers that are new to this code, an "instead of a random node" at the end would be helpful.

pkg/kv/kvserver/replica_command.go line 2536 at r5 (raw file):

	coordinator, err := r.GetReplicaDescriptor()
	if err != nil {
		// If there is no local replica descriptor, return an empty list.

Should we return this error instead of requiring callers to handle an empty list?

pkg/kv/kvserver/replica_command.go line 2579 at r5 (raw file):

	// Get the localities of the candidate replicas, including the original sender.
	localities := storePool.GetLocalitiesPerReplica(candidates)
	recipientLocality := storePool.GetLocalitiesPerReplica([]roachpb.ReplicaDescriptor{recipient})[recipient]

nit: consider making the function variadic.

pkg/kv/kvserver/replica_command.go line 2583 at r5 (raw file):

	// Construct a map from replica to its diversity score compared to the
	// recipient. Also track the best score we see.
	replicaDistance := make(map[roachpb.ReplicaDescriptor]float64)

nit: presize the capacity to len(localities)

pkg/kv/kvserver/replica_command.go line 2611 at r5 (raw file):

Shuffle the replicas to prevent always choosing them in the same order.

Won't the rng with the same seed lead to the replicas always being shuffled in the same order? Is the randomness coming from the map iteration above?

pkg/kv/kvserver/replica_command.go line 2817 at r5 (raw file):

	// Check whether we should queue on the delegate.
	delegateRequest.QueueOnDelegateLen = MaxQueueOnDelegateLimit.Get(&r.ClusterSettings().SV)

nit: why is this pulled outside of the struct initialization?

pkg/kv/kvserver/replica_command.go line 2845 at r5 (raw file):

		if retErr == nil {
			return
		}

If we're looping and retrying on a different sender blindly without checking for a specific type of error, then it's probably best to at least log the errors that we're seeing here when looping. I'm sure we'll unintentionally delegate to multiple senders on an error that we weren't expecting at least once while stabilizing this work. Better to at least have some evidence of what the error is when that happens.

pkg/kv/kvserver/replica_command.go line 2857 at r5 (raw file):

// is validated twice, once before "queueing" and once after. This reduces the
// chance of false positives (snapshots which are sent but can't be used),
// however it is impossible to completely eliminate them.

"impossible to completely eliminate them" is too strong of wording, and it also doesn't help the reader understand the interactions here. Consider expanding on this point as a way to illustrate why we don't make stronger guarantees. What would we have to synchronize with to guarantee that we will never hit a false positive? Freeze all range splits, merge, rebalances, leadership changes, and log truncations?

pkg/kv/kvserver/replica_command.go line 2861 at r5 (raw file):

	ctx context.Context, req *kvserverpb.DelegateSnapshotRequest,
) error {
	// If the generation has changed, this snapshot will be useless, so don't

Why does a descriptor generation change necessarily cause the snapshot to be useless?

pkg/kv/kvserver/replica_command.go line 2863 at r5 (raw file):

	// If the generation has changed, this snapshot will be useless, so don't
	// attempt to send it.
	if r.Desc().Generation != req.DescriptorGeneration {

nit: pull desc := r.Desc() out so that there's only a single instance.

pkg/kv/kvserver/replica_command.go line 2875 at r5 (raw file):

	// Check that the snapshot we generated has a descriptor that includes the
	// recipient. If it doesn't, the recipient will reject it, so it's better to

I'm surprised that we need this check if we're already checking that the descriptor generation is the same on the delegate and the leaseholder. Is that the case in the examples you discuss in this comment?

pkg/kv/kvserver/replica_command.go line 2895 at r5 (raw file):

	// Check the raft applied state index and term to determine if this replica
	// is not too far behind the leaseholder that it needs a snapshot.

Play this comment out one more step. If the delegate is too far behind that is also needs a snapshot, then any snapshot it sends will be useless.

pkg/kv/kvserver/replica_command.go line 2909 at r5 (raw file):

	// Delegate has a different term than the coordinator. This typically means
	// the lease has been transferred, and we should not process this request.

Can you explain why we shouldn't process the request in this case?

pkg/kv/kvserver/replica_command.go line 2925 at r5 (raw file):

	}

	// Sender replica's will be rejected if the sender replica's raft applied index is lower than

"Sender replica's snapshot"?

pkg/kv/kvserver/replica_command.go line 2927 at r5 (raw file):

	// Sender replica's will be rejected if the sender replica's raft applied index is lower than
	// or equal to the truncated state on the leaseholder, as this replica's snapshot will be wasted.
	// Note that its possible that we can enforce strictly lesser than if etcd does not require

s/its/it is/

pkg/kv/kvserver/replica_command.go line 2950 at r5 (raw file):

		settings.SystemOnly,
		"kv.snapshot_delegation.num_follower",
		"the number of delegates to try when sending snapshots",

"when sending snapshots, before falling back to sending from the leaseholder"

pkg/kv/kvserver/replica_command.go line 3011 at r5 (raw file):

	// obtain the send semaphore. We check after to make sure the snapshot request
	// still makes sense (e.g the range hasn't split under us). The check is
	// lightweight, so better to be safe than sorry.

This makes it sound like the check before the reservation is best-effort and the check after is properly synchronized and makes a stronger guarantee. But that's not correct, is it? Consider being explicit about this in these two comments, like you are in the comment on validateSnapshotDelegationRequest ("however it is impossible to completely eliminate them").

pkg/kv/kvserver/replica_command.go line 3034 at r5 (raw file):

	// TODO: Review this to make sure that there isn't any inconsistency since we
	// may be on a delegate. The GetSnapshot call will add a LogTruncation
	// constraint, but it isn't clear that this will help.

This is a good call. We want the log truncation constraint to be created on the leaseholder, not on the delegate.

pkg/kv/kvserver/store_snapshot.go line 696 at r5 (raw file):

	}
	// If the queue is too long on a delegate, don't attempt to wait before sending.
	if req.QueueOnDelegateLen >= 0 {

Do we want to push this queue depth limit into the MultiQueue itself to avoid raciness? Currently, this limit can be exceeded by multiple concurrent calls to reserveSendSnapshot.

pkg/kv/kvserver/testing_knobs.go line 427 at r5 (raw file):

	// send snapshot semaphore.
	AfterSendSnapshotThrottle func()
	// SelectDelegateSnapshotSender returns a replica to send delegated snapshots.

"returns a replica"

It returns a slice, so this contract isn't quite right. Want to explain what it means for multiple replicas to be returned?

pkg/kv/kvserver/allocator/storepool/store_pool.go line 1208 at r5 (raw file):

func (sp *StorePool) GetLocalitiesPerReplica(
	replicas []roachpb.ReplicaDescriptor,
) map[roachpb.ReplicaDescriptor]roachpb.Locality {

Would it be more natural for this to map from ReplicaID to Locality? The ReplicaDescriptor struct contains a number of fields, some of which (e.g. Type) are mutable over the lifetime of a replica.

pkg/kv/kvserver/kvserverpb/raft.proto line 274 at r5 (raw file):

  // The priority of the snapshot.
  // TODO(abaptist): Remove this field for v23.1.

Can we remove them now?

pkg/kv/kvserver/kvserverpb/raft.proto line 286 at r5 (raw file):

  // The truncated state of the Raft log on the coordinator replica.
  uint64 truncated_index = 8;

If we're changing this, we should evaluate whether we want to send the "truncated index" or the "first index" (truncated index + 1). I believe it is more common around the Raft layer to refer to the log's first index, but I could be wrong about that. Doing so would allow you to use the existing Replica.GetFirstIndex method, instead of introducing the new Replica.GetRaftTruncatedState method.

andrewbaptist

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, @herkolategan, @nvanbenschoten, and @smg260)

pkg/cmd/roachtest/tests/decommissionbench.go line 90 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: For readers that are new to this code, an "instead of a random node" at the end would be helpful.

Done

pkg/kv/kvserver/replica_command.go line 2536 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should we return this error instead of requiring callers to handle an empty list?

Done

pkg/kv/kvserver/replica_command.go line 2579 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: consider making the function variadic.

Done

pkg/kv/kvserver/replica_command.go line 2583 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: presize the capacity to len(localities)

Done

pkg/kv/kvserver/replica_command.go line 2611 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Shuffle the replicas to prevent always choosing them in the same order.

Won't the rng with the same seed lead to the replicas always being shuffled in the same order? Is the randomness coming from the map iteration above?

For any given replica, they will shuffle in the same way, but if there are multiple different replicas that are being transmitted they will shuffle differently. This is typically only called once per replica, because if it fails then it falls back to the leaseholder.

pkg/kv/kvserver/replica_command.go line 2817 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: why is this pulled outside of the struct initialization?

Done

pkg/kv/kvserver/replica_command.go line 2845 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

If we're looping and retrying on a different sender blindly without checking for a specific type of error, then it's probably best to at least log the errors that we're seeing here when looping. I'm sure we'll unintentionally delegate to multiple senders on an error that we weren't expecting at least once while stabilizing this work. Better to at least have some evidence of what the error is when that happens.

Done, good point!

pkg/kv/kvserver/replica_command.go line 2857 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"impossible to completely eliminate them" is too strong of wording, and it also doesn't help the reader understand the interactions here. Consider expanding on this point as a way to illustrate why we don't make stronger guarantees. What would we have to synchronize with to guarantee that we will never hit a false positive? Freeze all range splits, merge, rebalances, leadership changes, and log truncations?

Done

pkg/kv/kvserver/replica_command.go line 2863 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: pull desc := r.Desc() out so that there's only a single instance.

Done

pkg/kv/kvserver/replica_command.go line 2875 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm surprised that we need this check if we're already checking that the descriptor generation is the same on the delegate and the leaseholder. Is that the case in the examples you discuss in this comment?

You are right about this being redundant. I don't see how this could happen either if the descriptor generation has changed. I could change this to an assertion, or remove the check.

pkg/kv/kvserver/replica_command.go line 2895 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Play this comment out one more step. If the delegate is too far behind that is also needs a snapshot, then any snapshot it sends will be useless.

Done, Added comment

pkg/kv/kvserver/replica_command.go line 2925 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"Sender replica's snapshot"?

Done

pkg/kv/kvserver/replica_command.go line 2927 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/its/it is/

Done

pkg/kv/kvserver/replica_command.go line 2950 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"when sending snapshots, before falling back to sending from the leaseholder"

Done

pkg/kv/kvserver/replica_command.go line 3011 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This makes it sound like the check before the reservation is best-effort and the check after is properly synchronized and makes a stronger guarantee. But that's not correct, is it? Consider being explicit about this in these two comments, like you are in the comment on validateSnapshotDelegationRequest ("however it is impossible to completely eliminate them").

Done, Good point and I cleaned up the comment on this.

pkg/kv/kvserver/replica_command.go line 3034 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is a good call. We want the log truncation constraint to be created on the leaseholder, not on the delegate.

I'm not positive about the best way to handle this. If this is called on the leaseholder, then it will correctly have the constraint in place, but if it is called on a delegate then there isn't really a way to guarantee this. I could change to make the constraint occur much earlier (on the leaseholder before it sends the delegation request) but that risks holding the constraint for a longer window. I don't think it would be hard to make that change, so maybe that is the way to go here.

pkg/kv/kvserver/store_snapshot.go line 696 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we want to push this queue depth limit into the MultiQueue itself to avoid raciness? Currently, this limit can be exceeded by multiple concurrent calls to reserveSendSnapshot.

Done, probably wouldn't have been an issue, but since this is the only use of the MultiQueue, simpler to do this.

pkg/kv/kvserver/testing_knobs.go line 427 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"returns a replica"

It returns a slice, so this contract isn't quite right. Want to explain what it means for multiple replicas to be returned?

Done

pkg/kv/kvserver/allocator/storepool/store_pool.go line 1208 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Would it be more natural for this to map from ReplicaID to Locality? The ReplicaDescriptor struct contains a number of fields, some of which (e.g. Type) are mutable over the lifetime of a replica.

Done, it requires an extra loop to convert back to descriptors, but probably the right thing to do as it makes the equality safer.

pkg/kv/kvserver/kvserverpb/raft.proto line 274 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can we remove them now?

Unfortunately, this is a bigger change than I had planned for this patch. I was going to do this as a follow-on PR. I chatted with @AlexTalks a little about this and there are UI changes related to it. I think this can still get done in 23.1, but I didn't want to tie it to this. If we later remove these fields, it will not matter as the only thing they are still used for is reporting stats right now.

pkg/kv/kvserver/kvserverpb/raft.proto line 286 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

If we're changing this, we should evaluate whether we want to send the "truncated index" or the "first index" (truncated index + 1). I believe it is more common around the Raft layer to refer to the log's first index, but I could be wrong about that. Doing so would allow you to use the existing Replica.GetFirstIndex method, instead of introducing the new Replica.GetRaftTruncatedState method.

Done, This makes more sense now.

nvb

Reviewed 20 of 21 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, @herkolategan, and @smg260)

pkg/kv/kvserver/replica_command.go line 2861 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Why does a descriptor generation change necessarily cause the snapshot to be useless?

I'm still curious about this. Are there descriptor generation changes that don't necessarily mean this, but we're ok with false positives?

pkg/kv/kvserver/replica_command.go line 3034 at r5 (raw file):

but that risks holding the constraint for a longer window

The leaseholder also doesn't know which index to bound log truncation up to. It may have applied further than the delegated sender, and it doesn't want to truncate the log any higher than the index that the delegated snapshot is being sent.

I think this is why the RPC protocol was streaming before. We had envisioned a handshake where the leaseholder would ask the delegate for its applied index, install a constraint at that index (rechecking that it hadn't truncated above this already), then allow the delegated snapshot to proceed.

pkg/kv/kvserver/replica_command.go line 2591 at r6 (raw file):

	// regardless of score.
	var tiedReplicas []roachpb.ReplicaID
	for desc, score := range replicaDistance {

s/desc/replID/

pkg/kv/kvserver/replica_command.go line 2611 at r6 (raw file):

	// Only keep the top numFollowers replicas.
	if len(tiedReplicas) > numFollowers {
		tiedReplicas = tiedReplicas[0:numFollowers]

small nit: the 0 is unnecessary.

pkg/kv/kvserver/replica_command.go line 2619 at r6 (raw file):

	for n, replicaId := range tiedReplicas {
		found := false
		for _, desc := range rangeDesc.Replicas().Descriptors() {

Can we use ReplicaSet.GetReplicaDescriptorByID here?

andrewbaptist

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @herkolategan, @nvanbenschoten, and @smg260)

pkg/kv/kvserver/replica_command.go line 2861 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'm still curious about this. Are there descriptor generation changes that don't necessarily mean this, but we're ok with false positives?

Just to record what we discussed and added a comment to it. This check probably is a little overly strict, which means some delegated snapshots may be rejected unnecessarily. This is not really a problem and should be pretty rare. We could likely remove this check in the future.

pkg/kv/kvserver/replica_command.go line 3034 at r5 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

but that risks holding the constraint for a longer window

The leaseholder also doesn't know which index to bound log truncation up to. It may have applied further than the delegated sender, and it doesn't want to truncate the log any higher than the index that the delegated snapshot is being sent.

I think this is why the RPC protocol was streaming before. We had envisioned a handshake where the leaseholder would ask the delegate for its applied index, install a constraint at that index (rechecking that it hadn't truncated above this already), then allow the delegated snapshot to proceed.

After discussing options on this I added the log truncation constraint up front on the coordinator and there is a risk of holding this longer than strictly required. There is a path to change this if necessary in the future, but it is unlikely to be a concern for most systems. https://cockroachlabs.atlassian.net/wiki/spaces/~6268113f52310b0068ffd245/pages/2869854482/Delegate+snapshot+overview
There are a few ways to solve this, but I don't think the streaming RPC is correct since the data required is disjoint for the different parts of the flow. There is a risk with the latest change that a delegated requests will fail since it won't have the latest applied index from the leaseholder. If that occurs often it would be worth changing to retrying on the delegate since it will "eventually" catch up in most cases.

pkg/kv/kvserver/replica_command.go line 2591 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/desc/replID/

Done

pkg/kv/kvserver/replica_command.go line 2611 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

small nit: the 0 is unnecessary.

Done

pkg/kv/kvserver/replica_command.go line 2619 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Can we use ReplicaSet.GetReplicaDescriptorByID here?

Done

nvb

Reviewed 18 of 20 files at r7, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, @herkolategan, and @smg260)

pkg/kv/kvserver/replica_command.go line 2821 at r7 (raw file):

		Term:                 status.Term,
		DelegatedSender:      sender,
		FirstIndex:           appliedIndex,

Could you add a comment here about why you are setting FirstIndex to the value of appliedIndex on the leaseholder? It's subtle and could benefit from a discussion about the consequences and plans for future improvement.

pkg/kv/kvserver/replica_command.go line 2880 at r7 (raw file):

	// If the generation has changed, this snapshot may be useless, so don't
	// attempt to send it.
	//NB: This is an overly strict check. If other delegates are added to this

nit: missing a space after // on each line.

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, @herkolategan, @nvanbenschoten, and @smg260)

-- commits line 29 at r8:
nit: consider adding more words to the release note for this awesome work so that the average user can understand it

andrewbaptist

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @herkolategan, @nvanbenschoten, and @smg260)

pkg/kv/kvserver/replica_command.go line 2821 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Could you add a comment here about why you are setting FirstIndex to the value of appliedIndex on the leaseholder? It's subtle and could benefit from a discussion about the consequences and plans for future improvement.

Done

pkg/kv/kvserver/replica_command.go line 2880 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: missing a space after // on each line.

Done

nvb

Reviewed all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andrewbaptist, @herkolategan, and @smg260)

pkg/kv/kvserver/replica_command.go line 2821 at r7 (raw file):

Previously, andrewbaptist (Andrew Baptist) wrote…

Done

Did you miss a git push?

andrewbaptist

Reviewed 2 of 4 files at r8.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks, @andreimatei, @herkolategan, @nvanbenschoten, and @smg260)

-- commits line 29 at r8:

Previously, andreimatei (Andrei Matei) wrote…

nit: consider adding more words to the release note for this awesome work so that the average user can understand it

Thanks, I added some more to the release notes. I will likely publish a short blog no this as well when 23.1 comes out!

pkg/kv/kvserver/replica_command.go line 2821 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Did you miss a git push?

I'm not sure what happened - I rewrote and pushed again.

nvb

Reviewed 1 of 1 files at r10, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @AlexTalks, @andreimatei, @andrewbaptist, @herkolategan, and @smg260)

Fixes: cockroachdb#42491 This commit allows a snapshot to be sent by a follower instead of the leader of a range. The follower(s) are chosen based on locality to the final recipient of the snapshot. If the follower is not able to quickly send the snapshot, the attempt is aborted and the leader sends the snapshot instead. By choosing a delegate rather than sending the snapshot directly, WAN traffic can be minimized. Additionally the snapshot will likely be delivered faster. There are two settings that control this feature. The first, `kv.snapshot_delegation.num_follower`, controls how many followers the snapshot is attempted to be delegated through. If set to 0, then snapshot delegation is disabled. The second, `kv.snapshot_delegation_queue.enabled`, controls whether delegated snapshots will queue on the delegate or return failure immediately. This is useful to prevent a delegation request from spending a long time waiting before it is sent. Before the snapshot is sent from the follower checks are done to verify that the delegate is able to send a snapshot that will be valid for the recipient. If not the request is rerouted to the leader. Release note (performance improvement): Adds delegated snapshots which can reduce WAN traffic for snapshot movement. If there is another replica for this range with a closer locality than the delegate, the leaseholder will attempt to have that delegate send the snapshot. This is particularly useful in the case of a decommission of a node where most snapshots are transferred to another replica in the same locality.

andrewbaptist · 2023-02-03T12:51:45Z

bors r=nvanbenschoten

craig · 2023-02-03T13:52:19Z

Build succeeded:

Bazel Essential CI (Cockroach)

shralex · 2023-02-03T20:01:18Z

congrats!!

andrewbaptist force-pushed the follower_snapshots branch 9 times, most recently from 94241ca to 1a6329c Compare July 12, 2022 17:43

andrewbaptist changed the title ~~Initial commit~~ kv: complete delegate snapshots Jul 12, 2022

andrewbaptist force-pushed the follower_snapshots branch from 1a6329c to a36d1ab Compare July 12, 2022 19:52

andrewbaptist force-pushed the follower_snapshots branch 5 times, most recently from 2e76245 to 89ad698 Compare July 25, 2022 19:22

andrewbaptist changed the title ~~kv: complete delegate snapshots~~ kv: enable delegate snapshots Jul 26, 2022

andrewbaptist force-pushed the follower_snapshots branch 8 times, most recently from 432c493 to f8d5d89 Compare August 1, 2022 18:25

andrewbaptist force-pushed the follower_snapshots branch from f8d5d89 to 0efbfdf Compare August 22, 2022 17:59

andrewbaptist force-pushed the follower_snapshots branch 2 times, most recently from 7e6fa1d to 5878898 Compare September 28, 2022 13:05

andrewbaptist force-pushed the follower_snapshots branch 2 times, most recently from ccd450b to e02c7ba Compare November 29, 2022 20:14

andrewbaptist marked this pull request as ready for review November 29, 2022 20:45

andrewbaptist requested a review from a team as a code owner November 29, 2022 20:45

andrewbaptist requested review from AlexTalks and nvb November 29, 2022 20:46

andrewbaptist force-pushed the follower_snapshots branch from e02c7ba to 2a0af82 Compare November 30, 2022 20:41

nvb reviewed Dec 21, 2022

View reviewed changes

andrewbaptist force-pushed the follower_snapshots branch from 2a0af82 to 99429d7 Compare December 29, 2022 17:44

andrewbaptist requested a review from a team as a code owner December 29, 2022 17:44

andrewbaptist requested review from a team, herkolategan and smg260 and removed request for a team December 29, 2022 17:44

andrewbaptist force-pushed the follower_snapshots branch from 99429d7 to 88cfa8d Compare December 29, 2022 18:01

andrewbaptist commented Dec 29, 2022

View reviewed changes

nvb reviewed Jan 9, 2023

View reviewed changes

andrewbaptist commented Jan 20, 2023

View reviewed changes

nvb reviewed Jan 30, 2023

View reviewed changes

andreimatei reviewed Jan 31, 2023

View reviewed changes

andrewbaptist commented Feb 1, 2023

View reviewed changes

nvb reviewed Feb 1, 2023

View reviewed changes

andrewbaptist commented Feb 2, 2023

View reviewed changes

nvb approved these changes Feb 2, 2023

View reviewed changes

cockroach-teamcity mentioned this pull request Feb 4, 2023

PR #83991 - kvserver: delegate snapshots to followers cockroachdb/docs#16189

Closed

rmloveland mentioned this pull request May 2, 2023

Add docs on delegated Raft snapshots cockroachdb/docs#16876

Merged

Conversation

andrewbaptist commented Jul 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Jul 7, 2022

Uh oh!

andrewbaptist commented Jul 7, 2022

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

andrewbaptist left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

andrewbaptist left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

andrewbaptist left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

andrewbaptist left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

andrewbaptist commented Feb 3, 2023

Uh oh!

craig bot commented Feb 3, 2023

Uh oh!

shralex commented Feb 3, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

andrewbaptist commented Jul 7, 2022 •

edited

Loading