Skip to content

release-22.1: kvserver: don't transfer leases to draining nodes during scatters#80834

Merged
aayushshah15 merged 6 commits intocockroachdb:release-22.1from
aayushshah15:backport22.1-79295
Jun 10, 2022
Merged

release-22.1: kvserver: don't transfer leases to draining nodes during scatters#80834
aayushshah15 merged 6 commits intocockroachdb:release-22.1from
aayushshah15:backport22.1-79295

Conversation

@aayushshah15
Copy link
Copy Markdown
Contributor

@aayushshah15 aayushshah15 commented Apr 30, 2022

Backport 4/4 commits from #79295.

Backport 1/1 commit from #79581.

Backport 1/1 commit from #81403.

/cc @cockroachdb/release


kvserver: introduce Allocator.ValidLeaseTargets()

This commit is a minor refactor of the Allocator.TransferLeaseTarget logic in
order to make it more readable and, to abstract out a new exported Allocator
method called ValidLeaseTargets().

The contract of ValidLeaseTargets() is as follows:

// ValidLeaseTargets returns a set of candidate stores that are suitable to be
// transferred a lease for the given range.
//
// - It excludes stores that are dead, or marked draining or suspect.
// - If the range has lease_preferences, and there are any non-draining,
// non-suspect nodes that match those preferences, it excludes stores that don't
// match those preferences.
// - It excludes replicas that may need snapshots. If replica calling this
// method is not the Raft leader (meaning that it doesn't know whether follower
// replicas need a snapshot or not), produces no results.

Previously, there were multiple places where we were performing the logic
that's encapsulated by ValidLeaseTargets(), which was a potential source of
bugs. This is an attempt to unify this logic in one place that's relatively
well-tested. This commit is only a refactor, and does not attempt to change any
behavior. As such, no existing tests have been changed, with the exception of a
subtest inside TestAllocatorTransferLeaseTargetDraining. See the comment over
that subtest to understand why the behavior change made by this patch is
desirable.

The next commit in this PR uses this method to fix (at least part of) #74691.

Release note: none

kvserver: don't transfer leases to draining nodes during scatters

Previously, AdminScatter called with the RandomizeLeases option could
potentially transfer leases to nodes marked draining. This commit leverages
the refactor from the last commit to fix this bug by first filtering the set of
candidates down to a set of valid candidates that meet lease preferences and
are not marked suspect or draining.

Relates to and fixes a part of #74691.

Release note (bug fix): Fixes a bug where draining / drained nodes could
re-acquire leases during an import or an index backfill.

Release justification: bug fix.

@aayushshah15 aayushshah15 requested a review from a team as a code owner April 30, 2022 22:14
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Apr 30, 2022

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@aayushshah15 aayushshah15 requested a review from nvb April 30, 2022 22:18
Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

I did not do a pass on some parts of this before they landed on master, so I left a few comments below. Feel free to address them in a separate PR on master and then add a commit to this backport PR.

Also, we saw in https://github.com/cockroachlabs/support/issues/1569 that this fixes a bug where the allocator could skip the excludeReplicasInNeedOfSnapshots protection when lease preferences are present. This backport will fix the bug for release-22.1. We'd like to also fix it for release-21.2. Do you have a sense of the difficulty of a backport of this change to release-21.2? We could do something more targeted if it's too much of a lift.

Reviewed 1 of 1 files at r1, 3 of 3 files at r2, 4 of 4 files at r3, 6 of 6 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15)


pkg/kv/kvserver/allocator.go line 1501 at r2 (raw file):

	candidates := make([]roachpb.ReplicaDescriptor, 0, len(existing))
	for i := range existing {
		if existing[i].GetType() != roachpb.VOTER_FULL {

In #74546, we began allowing VOTER_INCOMING replicas to receive the lease. Should we be doing the same here?

Should we be using CheckCanReceiveLease in this function?


pkg/kv/kvserver/allocator.go line 1579 at r2 (raw file):

	// If there are any replicas that do match lease preferences, then we check if
	// the existing leaseholder is one of them.
	preferred := a.preferredLeaseholders(conf, candidates)

Should these preferred replicas be passed through excludeReplicasInNeedOfSnapshots, or is the idea that a later call to ValidLeaseTargets will perform this check?


pkg/kv/kvserver/replicate_queue.go line 1052 at r1 (raw file):

		conf,
		transferLeaseOptions{
			goal:   leaseCountConvergence,

Is this changing behavior? Isn't followTheWorkload the zero-value for this enum? Is the change in behavior intentional?


pkg/kv/kvserver/store_rebalancer.go line 489 at r4 (raw file):

			replWithStats.repl.leaseholderStats,
			true, /* forceDecisionWithoutStats */
			transferLeaseOptions{

nit: consider adding the ExcludeLeaseRepl field back in so that this call is explicit.


pkg/kv/kvserver/allocator_test.go line 1746 at r4 (raw file):

		existing       []roachpb.ReplicaDescriptor
		leaseholder    roachpb.StoreID
		allowLeaseRepl bool

nit: should we use excludeLeaseRepl here as well so that readers of the test don't need to negate the input? Same question below.

Copy link
Copy Markdown
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/allocator.go line 1501 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

In #74546, we began allowing VOTER_INCOMING replicas to receive the lease. Should we be doing the same here?

Should we be using CheckCanReceiveLease in this function?

I changed it to using the IsVoterNewConfig() method that CheckCanReceiveLease calls into, for now, since plumbing the range descriptor through into TransferLeaseTarget() for a call into CheckCanReceiveLease has quite a large footprint in terms of the tests that need to be changed. LMK if you prefer something else.


pkg/kv/kvserver/allocator.go line 1579 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Should these preferred replicas be passed through excludeReplicasInNeedOfSnapshots, or is the idea that a later call to ValidLeaseTargets will perform this check?

It does seem like a good idea for leaseholderShouldMoveDueToPreferences() to not fire if all preferred replicas need snapshots. I've made that change, and looks like nothing breaks (because, as you said, ValidLeaseTargets() was filtering these replicas out anyway).


pkg/kv/kvserver/replicate_queue.go line 1052 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this changing behavior? Isn't followTheWorkload the zero-value for this enum? Is the change in behavior intentional?

The thing is that whenever excludeLeaseRepl is true, followTheWorkload always falls back to leaseCountConvergence. I just wanted to make that explicit here for this caller.

It's sort of gross how TransferLeaseTarget has become a bit of a monstrosity to use due to all these options subtly interacting with each other. I didn't really attempt to improve that as part of the original PR though.


pkg/kv/kvserver/store_rebalancer.go line 489 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: consider adding the ExcludeLeaseRepl field back in so that this call is explicit.

Done.


pkg/kv/kvserver/allocator_test.go line 1746 at r4 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: should we use excludeLeaseRepl here as well so that readers of the test don't need to negate the input? Same question below.

Done.

Copy link
Copy Markdown
Contributor Author

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #81403 to address the review comments you left, PTAL. I'll add that commit to this backport PR once you approve it.

I tried backporting to 21.2 but there have been a bunch of changes in the interim that make it complicated. Even after resolving the conflicts, I don't have a ton of confidence there and we should probably just do a targeted fix for 21.2.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)

aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request May 18, 2022
…erLeaseTarget()`

This commit addresses miscelleneous review comments from cockroachdb#80834.

Release note: None
aayushshah15 added a commit to aayushshah15/cockroach that referenced this pull request May 25, 2022
…erLeaseTarget()`

This commit addresses miscelleneous review comments from cockroachdb#80834.

Release note: None
craig bot pushed a commit that referenced this pull request May 25, 2022
81279: storage: specify and test iterator visibility semantics r=jbowens a=erikgrinaker

`@jbowens` I split this out from the range key PR in #77417, since we'll need to audit/update existing code for the new point key batch semantics, and it seems cleaner/easier to do this separately. As you know, the test currently fails because existing batch iterators see concurrent batch writes.

This only covers point keys, but range keys should have identical semantics, and the tests will be extended for range keys in #77417.

---

This patch specifies and tests iterator visibility semantics. See
comment on `Engine.NewMVCCIterator` for details.

Release note: None

81403: kvserver: address assorted comments from previous refactor of `TransferLeaseTarget()` r=aayushshah15 a=aayushshah15

This commit addresses miscelleneous review comments from #80834.

Release note: None

81679: sql: enforce nullability when evaluating expressions in index backfills r=ajwerner a=ajwerner

Prior to this change, we never enforced nullability of values when performing
index backfills. Up until #76983, all column additions used the legacy schema
changer and its column backfill protocol. The column backfiller indeed checks
on nullability violations for columns [here](https://github.com/cockroachdb/cockroach/blob/d5313438aaa61a71a5a66c5718963bbb0fd7268b/pkg/sql/backfill/backfill.go#L355-L357).

Release note (bug fix): Prior to this change, virtual computed columns which
were marked as NOT NULL could be added to new secondary index. After this
change, attempts to add such columns to a secondary index will result in an
error. Note that such invalid columns can still be added to tables. Work to
resolve that bug is tracked in #81675.

81777: util/metric: delete Rate r=andreimatei a=andreimatei

This exponentially moving Rate was unused. It also was exporting to
Prometheus as a gauge. It seems unlikely that we'll use this in the
future; I think we're happy enough with counters and computing rates in
the monitoring system. The Rate seems to have come from a time when we
liked moving measurements more; since then, we've adapted the Histogram,
for example, to include non-windowed state.

Release note: None

81798: sql: enable streamer for lookup joins r=yuzefovich a=yuzefovich

We've just fixed an issue with incorrectly using leaf txns for mutations
because of the streamer, so I think it is ok to fully enable the
streamer. I expect that there might be some fallout, so I'm curious to
see the run of nightlies.

Release note: None

81804: build: use bazel for roachtest stress r=rickystewart a=nicktrav

Currently, the Roachtest Stress TeamCity job uses `make` (via the
`mkrelease.sh` script) for building the required binaries. Recently, the
pipeline started hanging while building the binary. Rather than
investigating and fixing the existing `make`-based pipeline, convert the
job to using Bazel for building the binaries.

Retain the existing entrypoint to the job, converting it to use [the
existing pattern][1] of calling `run_bazel` with the required
environment variables, invoking an `_impl.sh` script. The existing logic
is moved into the new `_impl.sh` script, with some minor additions to
source some additional scripts.

Additionally, the TeamCity job itself was updated to set pass in the
`TARGET_CLOUD` environment variable (set to `gce`), and remove the need
to build within a container (instead deferring to Bazel to manage the
build environment). The diff for the TeamCity pipeline is as follows
(for posterity):

```diff
18c18
<
---
>       <param name="env.TARGET_CLOUD" value="gce" />
24c24
<           <param name="plugin.docker.imageId" value="%builder.dockerImage%" />
---
>
28,29c28,29
< export BUILD_TAG="$(git describe --abbrev=0 --tags --match=v[0-9]*)"
< ./build/teamcity-roachtest-stress.sh]]></param>
---
> build_tag="$(git describe --abbrev=0 --tags --match=v[0-9]*)"
> CLOUD=${TARGET_CLOUD} BUILD_TAG="${build_tag}" ./build/teamcity-roachtest-stress.sh]]></param>
```

Release note: None.

[1]: https://github.com/cockroachdb/cockroach/blob/12198a51408e7333cd4f96b221e6734239479765/build/teamcity/cockroach/nightlies/roachtest_nightly_gce.sh#L10-L11

81813: liveness: run sync disk write in a stopper task r=stevendanna,nicktrav a=erikgrinaker

**liveness: move stopper to `NodeLivenessOptions`**

Release note: None

**liveness: run sync disk write in a stopper task**

This patch runs the sync disk write during node heartbeats in a stopper
task. The write is done in a goroutine, so that we can respect the
caller's context cancellation (even though the write itself won't).
However, this could race with engine shutdown when stopping the node,
violating the Pebble contract and triggering the race detector.  Running
it as a stopper task will cause the node to wait for the disk write to
complete before closing the engine.

Of course, if the disk stalls then node shutdown will now never
complete. This is very unfortunate, since stopping the node is often
the only mitigation to recover stuck ranges with stalled disks. This is
mitigated by Pebble panic'ing the node on stalled disks, and Kubernetes
and other orchestration tools killing the process after some time.

Touches #81786.
Resolves #81511.
Resolves #81827.

Release note: None

81828: sql: deflake TestPrimaryKeyDropIndexNotCancelable r=chengxiong-ruan a=ajwerner

The problem was that we weren't atomically setting the boolean, at least, I
think. I repro'd another flake of it pretty quickly.

Release note: None

Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>
Co-authored-by: Andrew Werner <awerner32@gmail.com>
Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Nick Travers <travers@cockroachlabs.com>
Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
@nvb
Copy link
Copy Markdown
Contributor

nvb commented Jun 9, 2022

@aayushshah15 it looks like #79295 landed. Do we want to patch that in here and then get this merged?

aayushshah15 and others added 5 commits June 10, 2022 15:05
Previously, it wasn't very clear how a call into `maybeTransferLeaseAway` would
find a lease transfer target. This commit makes a small effort to make this
obvious.

Release note: None
This commit is a minor refactor of the `Allocator.TransferLeaseTarget` logic in
order to make it more readable and, to abstract out a new exported `Allocator`
method called `ValidLeaseTargets()`.

The contract of `ValidLeaseTargets()` is as follows:
```
// ValidLeaseTargets returns a set of candidate stores that are suitable to be
// transferred a lease for the given range.
//
// - It excludes stores that are dead, or marked draining or suspect.
// - If the range has lease_preferences, and there are any non-draining,
// non-suspect nodes that match those preferences, it excludes stores that don't
// match those preferences.
// - It excludes replicas that may need snapshots. If replica calling this
// method is not the Raft leader (meaning that it doesn't know whether follower
// replicas need a snapshot or not), produces no results.

```

Previously, there were multiple places where we were performing the logic
that's encapsulated by `ValidLeaseTargets()`, which was a potential source of
bugs. This is an attempt to unify this logic in one place that's relatively
well-tested. This commit is only a refactor, and does not attempt to change any
behavior. As such, no existing tests have been changed, with the exception of a
subtest inside `TestAllocatorTransferLeaseTargetDraining`. See the comment over
that subtest to understand why the behavior change made by this patch is
desirable.

The next commit in this PR uses this method to fix (at least part of) cockroachdb#74691.

Release note: none
Previously, `AdminScatter` called with the `RandomizeLeases` option could
potentially transfer leases to nodes marked draining.  This commit leverages
the refactor from the last commit to fix this bug by first filtering the set of
candidates down to a set of valid candidates that meet lease preferences and
are not marked suspect or draining.

Release note (bug fix): Fixes a bug where draining / drained nodes could
re-acquire leases during an import or an index backfill.
This commit inverts this boolean and renames it to `excludeLeaseRepl`, which is
a lot more intuitive for the callers of `Allocator.TransferLeaseTarget`.

Release note: none
…erLeaseTarget()`

This commit addresses miscelleneous review comments from cockroachdb#80834.

Release note: None
@aayushshah15 aayushshah15 merged commit 1867ba4 into cockroachdb:release-22.1 Jun 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants