release-22.1: kvserver: don't transfer leases to draining nodes during scatters#80834
Conversation
|
Thanks for opening a backport. Please check the backport criteria before merging:
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
Add a brief release justification to the body of your PR to justify this backport. Some other things to consider:
|
nvb
left a comment
There was a problem hiding this comment.
I did not do a pass on some parts of this before they landed on master, so I left a few comments below. Feel free to address them in a separate PR on master and then add a commit to this backport PR.
Also, we saw in https://github.com/cockroachlabs/support/issues/1569 that this fixes a bug where the allocator could skip the excludeReplicasInNeedOfSnapshots protection when lease preferences are present. This backport will fix the bug for release-22.1. We'd like to also fix it for release-21.2. Do you have a sense of the difficulty of a backport of this change to release-21.2? We could do something more targeted if it's too much of a lift.
Reviewed 1 of 1 files at r1, 3 of 3 files at r2, 4 of 4 files at r3, 6 of 6 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)
pkg/kv/kvserver/allocator.go line 1501 at r2 (raw file):
candidates := make([]roachpb.ReplicaDescriptor, 0, len(existing)) for i := range existing { if existing[i].GetType() != roachpb.VOTER_FULL {
In #74546, we began allowing VOTER_INCOMING replicas to receive the lease. Should we be doing the same here?
Should we be using CheckCanReceiveLease in this function?
pkg/kv/kvserver/allocator.go line 1579 at r2 (raw file):
// If there are any replicas that do match lease preferences, then we check if // the existing leaseholder is one of them. preferred := a.preferredLeaseholders(conf, candidates)
Should these preferred replicas be passed through excludeReplicasInNeedOfSnapshots, or is the idea that a later call to ValidLeaseTargets will perform this check?
pkg/kv/kvserver/replicate_queue.go line 1052 at r1 (raw file):
conf, transferLeaseOptions{ goal: leaseCountConvergence,
Is this changing behavior? Isn't followTheWorkload the zero-value for this enum? Is the change in behavior intentional?
pkg/kv/kvserver/store_rebalancer.go line 489 at r4 (raw file):
replWithStats.repl.leaseholderStats, true, /* forceDecisionWithoutStats */ transferLeaseOptions{
nit: consider adding the ExcludeLeaseRepl field back in so that this call is explicit.
pkg/kv/kvserver/allocator_test.go line 1746 at r4 (raw file):
existing []roachpb.ReplicaDescriptor leaseholder roachpb.StoreID allowLeaseRepl bool
nit: should we use excludeLeaseRepl here as well so that readers of the test don't need to negate the input? Same question below.
aayushshah15
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/allocator.go line 1501 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
In #74546, we began allowing
VOTER_INCOMINGreplicas to receive the lease. Should we be doing the same here?Should we be using
CheckCanReceiveLeasein this function?
I changed it to using the IsVoterNewConfig() method that CheckCanReceiveLease calls into, for now, since plumbing the range descriptor through into TransferLeaseTarget() for a call into CheckCanReceiveLease has quite a large footprint in terms of the tests that need to be changed. LMK if you prefer something else.
pkg/kv/kvserver/allocator.go line 1579 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should these preferred replicas be passed through
excludeReplicasInNeedOfSnapshots, or is the idea that a later call toValidLeaseTargetswill perform this check?
It does seem like a good idea for leaseholderShouldMoveDueToPreferences() to not fire if all preferred replicas need snapshots. I've made that change, and looks like nothing breaks (because, as you said, ValidLeaseTargets() was filtering these replicas out anyway).
pkg/kv/kvserver/replicate_queue.go line 1052 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is this changing behavior? Isn't
followTheWorkloadthe zero-value for this enum? Is the change in behavior intentional?
The thing is that whenever excludeLeaseRepl is true, followTheWorkload always falls back to leaseCountConvergence. I just wanted to make that explicit here for this caller.
It's sort of gross how TransferLeaseTarget has become a bit of a monstrosity to use due to all these options subtly interacting with each other. I didn't really attempt to improve that as part of the original PR though.
pkg/kv/kvserver/store_rebalancer.go line 489 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: consider adding the
ExcludeLeaseReplfield back in so that this call is explicit.
Done.
pkg/kv/kvserver/allocator_test.go line 1746 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: should we use
excludeLeaseReplhere as well so that readers of the test don't need to negate the input? Same question below.
Done.
aayushshah15
left a comment
There was a problem hiding this comment.
I've opened #81403 to address the review comments you left, PTAL. I'll add that commit to this backport PR once you approve it.
I tried backporting to 21.2 but there have been a bunch of changes in the interim that make it complicated. Even after resolving the conflicts, I don't have a ton of confidence there and we should probably just do a targeted fix for 21.2.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
…erLeaseTarget()` This commit addresses miscelleneous review comments from cockroachdb#80834. Release note: None
…erLeaseTarget()` This commit addresses miscelleneous review comments from cockroachdb#80834. Release note: None
81279: storage: specify and test iterator visibility semantics r=jbowens a=erikgrinaker `@jbowens` I split this out from the range key PR in #77417, since we'll need to audit/update existing code for the new point key batch semantics, and it seems cleaner/easier to do this separately. As you know, the test currently fails because existing batch iterators see concurrent batch writes. This only covers point keys, but range keys should have identical semantics, and the tests will be extended for range keys in #77417. --- This patch specifies and tests iterator visibility semantics. See comment on `Engine.NewMVCCIterator` for details. Release note: None 81403: kvserver: address assorted comments from previous refactor of `TransferLeaseTarget()` r=aayushshah15 a=aayushshah15 This commit addresses miscelleneous review comments from #80834. Release note: None 81679: sql: enforce nullability when evaluating expressions in index backfills r=ajwerner a=ajwerner Prior to this change, we never enforced nullability of values when performing index backfills. Up until #76983, all column additions used the legacy schema changer and its column backfill protocol. The column backfiller indeed checks on nullability violations for columns [here](https://github.com/cockroachdb/cockroach/blob/d5313438aaa61a71a5a66c5718963bbb0fd7268b/pkg/sql/backfill/backfill.go#L355-L357). Release note (bug fix): Prior to this change, virtual computed columns which were marked as NOT NULL could be added to new secondary index. After this change, attempts to add such columns to a secondary index will result in an error. Note that such invalid columns can still be added to tables. Work to resolve that bug is tracked in #81675. 81777: util/metric: delete Rate r=andreimatei a=andreimatei This exponentially moving Rate was unused. It also was exporting to Prometheus as a gauge. It seems unlikely that we'll use this in the future; I think we're happy enough with counters and computing rates in the monitoring system. The Rate seems to have come from a time when we liked moving measurements more; since then, we've adapted the Histogram, for example, to include non-windowed state. Release note: None 81798: sql: enable streamer for lookup joins r=yuzefovich a=yuzefovich We've just fixed an issue with incorrectly using leaf txns for mutations because of the streamer, so I think it is ok to fully enable the streamer. I expect that there might be some fallout, so I'm curious to see the run of nightlies. Release note: None 81804: build: use bazel for roachtest stress r=rickystewart a=nicktrav Currently, the Roachtest Stress TeamCity job uses `make` (via the `mkrelease.sh` script) for building the required binaries. Recently, the pipeline started hanging while building the binary. Rather than investigating and fixing the existing `make`-based pipeline, convert the job to using Bazel for building the binaries. Retain the existing entrypoint to the job, converting it to use [the existing pattern][1] of calling `run_bazel` with the required environment variables, invoking an `_impl.sh` script. The existing logic is moved into the new `_impl.sh` script, with some minor additions to source some additional scripts. Additionally, the TeamCity job itself was updated to set pass in the `TARGET_CLOUD` environment variable (set to `gce`), and remove the need to build within a container (instead deferring to Bazel to manage the build environment). The diff for the TeamCity pipeline is as follows (for posterity): ```diff 18c18 < --- > <param name="env.TARGET_CLOUD" value="gce" /> 24c24 < <param name="plugin.docker.imageId" value="%builder.dockerImage%" /> --- > 28,29c28,29 < export BUILD_TAG="$(git describe --abbrev=0 --tags --match=v[0-9]*)" < ./build/teamcity-roachtest-stress.sh]]></param> --- > build_tag="$(git describe --abbrev=0 --tags --match=v[0-9]*)" > CLOUD=${TARGET_CLOUD} BUILD_TAG="${build_tag}" ./build/teamcity-roachtest-stress.sh]]></param> ``` Release note: None. [1]: https://github.com/cockroachdb/cockroach/blob/12198a51408e7333cd4f96b221e6734239479765/build/teamcity/cockroach/nightlies/roachtest_nightly_gce.sh#L10-L11 81813: liveness: run sync disk write in a stopper task r=stevendanna,nicktrav a=erikgrinaker **liveness: move stopper to `NodeLivenessOptions`** Release note: None **liveness: run sync disk write in a stopper task** This patch runs the sync disk write during node heartbeats in a stopper task. The write is done in a goroutine, so that we can respect the caller's context cancellation (even though the write itself won't). However, this could race with engine shutdown when stopping the node, violating the Pebble contract and triggering the race detector. Running it as a stopper task will cause the node to wait for the disk write to complete before closing the engine. Of course, if the disk stalls then node shutdown will now never complete. This is very unfortunate, since stopping the node is often the only mitigation to recover stuck ranges with stalled disks. This is mitigated by Pebble panic'ing the node on stalled disks, and Kubernetes and other orchestration tools killing the process after some time. Touches #81786. Resolves #81511. Resolves #81827. Release note: None 81828: sql: deflake TestPrimaryKeyDropIndexNotCancelable r=chengxiong-ruan a=ajwerner The problem was that we weren't atomically setting the boolean, at least, I think. I repro'd another flake of it pretty quickly. Release note: None Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com> Co-authored-by: Aayush Shah <aayush.shah15@gmail.com> Co-authored-by: Andrew Werner <awerner32@gmail.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Nick Travers <travers@cockroachlabs.com> Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
|
@aayushshah15 it looks like #79295 landed. Do we want to patch that in here and then get this merged? |
Previously, it wasn't very clear how a call into `maybeTransferLeaseAway` would find a lease transfer target. This commit makes a small effort to make this obvious. Release note: None
This commit is a minor refactor of the `Allocator.TransferLeaseTarget` logic in order to make it more readable and, to abstract out a new exported `Allocator` method called `ValidLeaseTargets()`. The contract of `ValidLeaseTargets()` is as follows: ``` // ValidLeaseTargets returns a set of candidate stores that are suitable to be // transferred a lease for the given range. // // - It excludes stores that are dead, or marked draining or suspect. // - If the range has lease_preferences, and there are any non-draining, // non-suspect nodes that match those preferences, it excludes stores that don't // match those preferences. // - It excludes replicas that may need snapshots. If replica calling this // method is not the Raft leader (meaning that it doesn't know whether follower // replicas need a snapshot or not), produces no results. ``` Previously, there were multiple places where we were performing the logic that's encapsulated by `ValidLeaseTargets()`, which was a potential source of bugs. This is an attempt to unify this logic in one place that's relatively well-tested. This commit is only a refactor, and does not attempt to change any behavior. As such, no existing tests have been changed, with the exception of a subtest inside `TestAllocatorTransferLeaseTargetDraining`. See the comment over that subtest to understand why the behavior change made by this patch is desirable. The next commit in this PR uses this method to fix (at least part of) cockroachdb#74691. Release note: none
Previously, `AdminScatter` called with the `RandomizeLeases` option could potentially transfer leases to nodes marked draining. This commit leverages the refactor from the last commit to fix this bug by first filtering the set of candidates down to a set of valid candidates that meet lease preferences and are not marked suspect or draining. Release note (bug fix): Fixes a bug where draining / drained nodes could re-acquire leases during an import or an index backfill.
This commit inverts this boolean and renames it to `excludeLeaseRepl`, which is a lot more intuitive for the callers of `Allocator.TransferLeaseTarget`. Release note: none
Fixes cockroachdb#79566. Fallout from a6a8d5c.
6caee95 to
1a7b6d2
Compare
1a7b6d2 to
a199a3c
Compare
…erLeaseTarget()` This commit addresses miscelleneous review comments from cockroachdb#80834. Release note: None
a199a3c to
cebaac1
Compare
Backport 4/4 commits from #79295.
Backport 1/1 commit from #79581.
Backport 1/1 commit from #81403.
/cc @cockroachdb/release
kvserver: introduce Allocator.ValidLeaseTargets()
This commit is a minor refactor of the
Allocator.TransferLeaseTargetlogic inorder to make it more readable and, to abstract out a new exported
Allocatormethod called
ValidLeaseTargets().The contract of
ValidLeaseTargets()is as follows:Previously, there were multiple places where we were performing the logic
that's encapsulated by
ValidLeaseTargets(), which was a potential source ofbugs. This is an attempt to unify this logic in one place that's relatively
well-tested. This commit is only a refactor, and does not attempt to change any
behavior. As such, no existing tests have been changed, with the exception of a
subtest inside
TestAllocatorTransferLeaseTargetDraining. See the comment overthat subtest to understand why the behavior change made by this patch is
desirable.
The next commit in this PR uses this method to fix (at least part of) #74691.
Release note: none
kvserver: don't transfer leases to draining nodes during scatters
Previously,
AdminScattercalled with theRandomizeLeasesoption couldpotentially transfer leases to nodes marked draining. This commit leverages
the refactor from the last commit to fix this bug by first filtering the set of
candidates down to a set of valid candidates that meet lease preferences and
are not marked suspect or draining.
Relates to and fixes a part of #74691.
Release note (bug fix): Fixes a bug where draining / drained nodes could
re-acquire leases during an import or an index backfill.
Release justification: bug fix.