kvserver: honor priority on send and receive snapshots#86701
kvserver: honor priority on send and receive snapshots#86701craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
9fedf4c to
1d259f7
Compare
636782a to
e0e3799
Compare
e0e3799 to
e586410
Compare
|
This can be rebased onto master now, removing the first two commits (or we can switch to a new branch - either way is fine with me!) |
e586410 to
9738028
Compare
|
Rebased onto master. |
2646918 to
1ec7bbe
Compare
1ec7bbe to
2db3573
Compare
|
From a testing perspective, the decommissioning tests were all run. With a combination of this and Lidor's changes, the time to decommission has dropped from ~55 minutes to ~24 minutes. This will be even faster if drain is called before decommission. |
2db3573 to
666ca01
Compare
nvb
left a comment
There was a problem hiding this comment.
You were asking about adding a cluster version flag in https://cockroachlabs.slack.com/archives/C02H8JCC7DM/p1661545679399539. That seems apt to me, as there is a possibility that without such a flag, mixed version clusters will prioritize snapshots from v22.2 nodes. You could avoid that behavior by adding a sender-side cluster version check that strips the source and priority fields before sending a SnapshotRequest if the cluster is not yet all running v22.2 processes.
We also talked about adding a cluster setting to disable this. Did you still want to do that? If so, the logic could live in the same place as the cluster version check.
Reviewed 8 of 8 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15 and @AlexTalks)
pkg/kv/kvserver/replica_learner_test.go line 246 at r1 (raw file):
// This test assumes that the snapshot send limit is set to 1 so the second // snapshot send will block waiting for the first to complete. cleanup := envutil.TestSetEnv(t, "COCKROACH_CONCURRENT_SNAPSHOT_SEND_LIMIT", "1")
The use of env vars to control a TestCluster feels like a bit of an anti-pattern. Can we copy concurrentSnapshotApplyLimit and concurrentSnapshotSendLimit to KVConfig and TestServerArgs and then configure it directly in this test? See ScanMinIdleTime for an example of how that plumbing would work.
pkg/kv/kvserver/store.go line 1161 at r1 (raw file):
if sc.concurrentSnapshotSendLimit == 0 { sc.concurrentSnapshotSendLimit = envutil.EnvOrDefaultInt("COCKROACH_CONCURRENT_SNAPSHOT_SEND_LIMIT", 2)
Let's comment here (or server/config.go if we do my suggestion from above) about why we want an apply-size limit of 1 but a send-side limit of 2.
pkg/kv/kvserver/store_snapshot.go line 691 at r1 (raw file):
fn() } requestSource := int32(req.SenderQueueName)
nit: there seems to be a good amount of repetition between this function and the previous one. That's not something to solve here, but then let's make the exact same changes to each so it doesn't look like there's a difference. For instance, define these variables in both or in neither.
pkg/kv/kvserver/store_snapshot.go line 715 at r1 (raw file):
waitingSnapshotMetric, inProgressSnapshotMetric, totalInProgressSnapshotMetric *metric.Gauge, ) (cleanup func(), _err error) { // TODO: Is this necessary
Yes, it is necessary if you want the caller to be able to call the function on error. However, this contract that the cleanup function must be called even on error is not standard. Usually, functions will handle this themselves on error paths and not hand the responsibility to their caller. I think here that would just mean calling semaphore.Cancel(task) directly in the case <-queueCtx.Done(): case. Consider switching to that approach.
pkg/kv/kvserver/store_test.go line 2929 at r1 (raw file):
} func TestSendSnapshotConcurrency(t *testing.T) {
nit: Could you give this test a comment explaining what it is testing.
666ca01 to
3b68364
Compare
1367ae5 to
81a270e
Compare
andrewbaptist
left a comment
There was a problem hiding this comment.
Done, added enable/disable flag.
Reviewed 1 of 10 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @andrewbaptist, and @nvanbenschoten)
pkg/kv/kvserver/store_snapshot.go line 693 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Don't we want this in
Replica.sendSnapshot, so that the name and priority won't make it to the receiver (who might be a v22.2 node)?
Done, moved
pkg/kv/kvserver/store_snapshot.go line 709 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I think this comment is now stale, as are the comments on the preceding two methods.
Done
nvb
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r5, 2 of 2 files at r7, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)
AlexTalks
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)
pkg/kv/kvserver/replica_command.go line 2671 at r7 (raw file):
// queue name or priority if the receiver may not understand them or the // setting is disabled. if !r.store.ClusterSettings().Version.IsActive(ctx, clusterversion.PrioritizeSnapshots) ||
nit: Might be worth clarifying that the only thing we for sure want to remove in 23.1 is the cluster version.
pkg/kv/kvserver/store.go line 259 at r7 (raw file):
ProtectedTimestampReader: spanconfig.EmptyProtectedTSReader(clock), SnapshotSendLimit: 2, SnapshotApplyLimit: 1,
I think (as we discussed) this should not use hard-coded values. Apologies for making you keep moving these around, but maybe it would make sense to move the defaults into kvserver and that way you can reference them both here and in the server pkg. You'll just have to make the defaults public however.
Code quote:
SnapshotSendLimit: 2,
SnapshotApplyLimit: 1,pkg/kv/kvserver/store_snapshot.go line 67 at r7 (raw file):
var snapshotPrioritizationEnabled = settings.RegisterBoolSetting( settings.SystemOnly, "kv.snapshot.prioritization_enabled",
nit: kv.snapshot_prioritization.enabled
pkg/kv/kvserver/store_snapshot.go line 69 at r7 (raw file):
"kv.snapshot.prioritization_enabled", "if true, then prioritize enqueued snapshots on both the send or receive sides", true)
nit: does this pass lint? I'd assume we'd need:
true,
)
pkg/kv/kvserver/store_snapshot.go line 676 at r7 (raw file):
return s.throttleSnapshot(ctx, s.snapshotApplySem, int(header.SenderQueueName), header.SenderQueuePriority,
nit: I still don't love the casting to ints, it feels like we should at least have had a multiqueue.QueueID type or something. This isn't something I'd worry about changing now, but just wanted to note it.
pkg/kv/kvserver/store_snapshot.go line 723 at r7 (raw file):
// getting stuck behind large snapshots managed by the replicate queue. if rangeSize != 0 || s.cfg.TestingKnobs.ThrottleEmptySnapshots { task := semaphore.Add(requestSource, requestPriority)
should we still call this semaphore? It seems inaccurate - can we rename to something like snapshotQueue?
pkg/kv/kvserver/store_test.go line 2933 at r7 (raw file):
// prioritization of the snapshots as that is covered by the multi-queue // testing. func TestSendSnapshotConcurrency(t *testing.T) {
This test is certainly useful, though I think if at all possible it would be useful for us to have a test where we validate the entire snapshot flow (on sender/receiver) running concurrently w/ multiple senders.
pkg/kv/kvserver/kvserverpb/raft.proto line 207 at r7 (raw file):
// The sending queue's name, to be utilized to ensure fairness across // different snapshot sending sources. OTHER is a reserved name(0) and all
The default queue name, OTHER, is reserved for any uncategorized
and unprioritized snapshots, and requests with sender queue name OTHER
may not specify a non-zero sender_queue_priority. To prioritize snapshots
categorized as OTHER, first...
pkg/server/config.go line 66 at r7 (raw file):
defaultScanMaxIdleTime = 1 * time.Second defaultSnapshotSendLimit = 2 defaultSnapshotApplyLimit = 1
These are the ones I was thinking would be good to move. I think you may have added a comment elsewhere, it might make sense to add a comment here that says
// ... See server.KVConfig for more info.
(or wherever else the comment lives).
pkg/server/config.go line 791 at r7 (raw file):
cfg.ScanMaxIdleTime = envutil.EnvOrDefaultDuration("COCKROACH_SCAN_MAX_IDLE_TIME", cfg.ScanMaxIdleTime) cfg.SnapshotSendLimit = envutil.EnvOrDefaultInt64("COCKROACH_SNAPSHOT_SEND_LIMIT", cfg.SnapshotSendLimit) cfg.SnapshotApplyLimit = envutil.EnvOrDefaultInt64("COCKROACH_SNAPSHOT_APPLY_LIMIT", cfg.SnapshotApplyLimit)
Let's make sure these environment variables match the existing ones we had (I doubt anything relies on this, but I don't see much benefit in changing it).
AlexTalks
left a comment
There was a problem hiding this comment.
Per the testing discussion, in replica_learner_test.go something like the operations within TestRaftSnapshotQueueSeesLearner concurrently with some of the operations of TestAddReplicaWithReceiverThrottling or even TestLearnerAdminChangeReplicasRace could be useful - particularly if we can inject testing knobs for some snapshots to fail while others succeed. testRaftSnapshotsToNonVoters seems particularly useful.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)
e086547 to
9a70dff
Compare
andrewbaptist
left a comment
There was a problem hiding this comment.
Working, looking at options for how to do this best.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @AlexTalks, and @nvanbenschoten)
pkg/kv/kvserver/replica_command.go line 2671 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
nit: Might be worth clarifying that the only thing we for sure want to remove in 23.1 is the cluster version.
Done
pkg/kv/kvserver/store.go line 259 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
I think (as we discussed) this should not use hard-coded values. Apologies for making you keep moving these around, but maybe it would make sense to move the defaults into
kvserverand that way you can reference them both here and in theserverpkg. You'll just have to make the defaults public however.
Done
pkg/kv/kvserver/store_snapshot.go line 67 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
nit:
kv.snapshot_prioritization.enabled
Done
pkg/kv/kvserver/store_snapshot.go line 69 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
nit: does this pass lint? I'd assume we'd need:
true, )
Done (it did pass lint, but probably shouldn't have)
pkg/kv/kvserver/store_snapshot.go line 676 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
nit: I still don't love the casting to
ints, it feels like we should at least have had amultiqueue.QueueIDtype or something. This isn't something I'd worry about changing now, but just wanted to note it.
I don't love this either, but it does help in the case of version compatibility (it should definitely be strings over the wire, so it needs to cast to "something" - ints are just as good as strings from that perspective?
pkg/kv/kvserver/store_snapshot.go line 723 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
should we still call this
semaphore? It seems inaccurate - can we rename to something likesnapshotQueue?
Done, Renamed everywhere
pkg/kv/kvserver/kvserverpb/raft.proto line 207 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
The default queue name, OTHER, is reserved for any uncategorized and unprioritized snapshots, and requests with sender queue name OTHER may not specify a non-zero sender_queue_priority. To prioritize snapshots categorized as OTHER, first...
Done
pkg/server/config.go line 66 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
These are the ones I was thinking would be good to move. I think you may have added a comment elsewhere, it might make sense to add a comment here that says
// ... See server.KVConfig for more info.(or wherever else the comment lives).
Done
pkg/server/config.go line 791 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
Let's make sure these environment variables match the existing ones we had (I doubt anything relies on this, but I don't see much benefit in changing it).
Done, I hadn't thought about the compatibility aspect here.
9a70dff to
3f5a0ea
Compare
3f5a0ea to
d848c78
Compare
Previously on both the send and receive snapshot side, the various places that sent snapshots were uncoordinated, so the choice of which snapshot was sent was somewhat arbitrary. This PR uses the new multiqueue to prioritize them correctly. Release note (performance improvement): Snapshots use a fair round-robin approach for choosing which one to send next. This allows decommissioning to complete much faster. Release justification: A number of customer support cases are caused by incorrect snapshot prioritization.
d848c78 to
2dc2da8
Compare
AlexTalks
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @herkolategan, and @nvanbenschoten)
pkg/kv/kvserver/store_test.go line 2933 at r7 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
This test is certainly useful, though I think if at all possible it would be useful for us to have a test where we validate the entire snapshot flow (on sender/receiver) running concurrently w/ multiple senders.
Part of #87319
AlexTalks
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @herkolategan, and @nvanbenschoten)
|
bors r+ |
|
👎 Rejected by code reviews |
|
bors r+ |
|
Build succeeded: |

Previously on both the send and receive snapshot side, the
various places that sent snapshots were uncoordinated, so
the choice of which snapshot was sent was somewhat arbitrary.
This PR uses the new multiqueue to prioritize them correctly.
Release note (performance improvement): Snapshots use a fair
round-robin approach for choosing which one to send next. This
allows decommissioning to complete much faster.
Release Justification: A number of customer support cases
are caused by incorrect snapshot prioritization.