storage: remove preemptive snapshots by ajwerner · Pull Request #44615 · cockroachdb/cockroach

ajwerner · 2020-01-31T23:09:03Z

This commit adds a migration to remove any remaining on-disk pre-emptive
snapshots and then excises the concept from the codebase.

There's some cruft in replica_init that I'd love to clean up but don't
have a clear vision of how.

Release note: None

cockroach-teamcity · 2020-01-31T23:09:10Z

This change is

ajwerner · 2020-01-31T23:11:21Z

@tbg here's a prototype of removing preemptive snapshots. It still needs testing.

One thing I tried to disambiguate was the word "initialized" as it applies to the Replica. Confusingly, calls to init did not previously always result in an "initialized" Replica. I carved out instead a notion of "loaded" as distinct from initialized. I sort of hate that that concept needs to exist at all tbh.

I'd love to hear your thoughts on further improvements. Thanks!

tbg · 2020-01-31T23:14:09Z

pkg/storage/replica_command.go

+		// TODO(ajwerner): Should a recipient be able to decline a LEARNER snapshot?
+		// It seems like yes. The recipient may know that it does not want to get
+		// added to a range. Confer with danhhz/tbg/darin about the implications of
+		// making this true for SnapshotRequest_LEARNER.


This was just discussed on #44247 (comment)

TLDR: when you refuse the learner, you've already added it to the desc, so we need to be careful. We don't want to roll back replication changes en masse which is what would happen here.

tbg · 2020-01-31T23:18:36Z

pkg/storage/store.go

 				return false, errors.Errorf("found uninitialized RangeDescriptor: %+v", desc)
 			}
+			replicaDesc, found := desc.GetReplicaDescriptor(s.StoreID())
+			if !found {


We also get replicas like that when a replica catches up across a change replicas trigger that removes it. When you made these replicas GC themselves proactively when they hit the trigger, you did so atomically with the application of the CRT raft command, right? In other words, in this branch, are we really 100% looking at a preemptive snap? I suppose at the very least we might be looking at a replica that did catch up across the trigger before we had code that insta-gc'ed, but it seems fine to treat that as a preemptive snapshot and nuke it (which would happen to it eventually anyway, since it's not in the desc any more).

tbg · 2020-01-31T23:20:36Z

pkg/storage/store_create_replica.go

 		// If this is intended for replicaID 0 then it's either a preemptive
 		// snapshot or a split/merge lock in which case we'll let it go through.
 		if replicaID == 0 {
+			// TODO(ajwerner): fix this case.


I have this all paged out again, but is the split/merge lock the last remaining use of replicaID zero? I'd be so happy to not have that be a thing. (not in this PR)

Even better! I removed all uses of ReplicaID 0 in this PR already, just forgot to come back to this conditional after I did that. This can be a fatal now.

tbg · 2020-01-31T23:23:14Z

Only a very partial review, I will have to give this a look with fresh eyes next week. Excited that this is happening!

ajwerner · 2020-02-03T06:33:33Z

This is RFAL. I'll put up a second PR to update the grafana dashboards.

tbg

: very very nice. I'm so happy you chewed through and removed all of that detritus. I feel a lot better about all of the code you touched.

Reviewed 14 of 25 files at r1, 14 of 14 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)

pkg/storage/replica_init.go, line 278 at r2 (raw file):

	//      batch. We could avoid setting the local descriptor in this case.
	//   2) When the DisableEagerReplicaRemoval testing knob is enabled. We
	//      could

missing some words

I agree that it would be nice to be more coherent here. Not necessarily for this PR, looks like a decent amount of individual cleanup is needed.

pkg/storage/replica_raft.go, line 1538 at r2 (raw file):

	ctx context.Context, split *roachpb.SplitTrigger,
) (func(), error) {
	// We pass a 0 replicaID because we want to lock the RHS even if we know

Comment stale.

pkg/storage/split_queue_test.go, line 77 at r2 (raw file):

		// ensures that the store won't be mucking with our replica concurrently
		// during testing (e.g. via the system config gossip update).
		copy := *tc.repl.Desc()

Shadowing a built-in? I usually go with cpy in that case.

pkg/storage/store.go, line 1342 at r2 (raw file):

				// range which has processed a raft command to remove itself (which is
				// possible prior to 19.2 or if the DisableEagerReplicaRemoval is
				// enabled) and has not yet been removed by the replica	 gc queue.

weird spaces

pkg/storage/store_split.go, line 51 at r2 (raw file):

... if it hasn't been removed in the meantime (handled below).

pkg/storage/store_split.go, line 53 at r2 (raw file):

	// acquired) in Replica.acquireSplitLock. It must be present here.
	rightRepl, err := r.store.GetReplica(split.RightDesc.RangeID)
	if err != nil && !roachpb.IsRangeNotFoundError(err) {

I'd rewrite this for clarity as

rightRepl, err := r.store.GetReplica(split.RightDesc.RangeID)
if roachpb.IsRangeNotFoundError(err) {
  // The right hand side we were planning to populate has already been removed.
  // We handle this below.
  rightRepl = nil
}
if err != nil {
  log.Fatalf(ctx, errors.Wrap(err, "failed to get RHS replica"))
}

pkg/storage/store_split.go, line 62 at r2 (raw file):

		// We're in the rare case where we know that the RHS has been removed
		// and re-added with a higher replica ID (and then maybe removed again).
		// We know we've never processed a snapshot for the right range because up

Worth an assertion (if rightRng isn't nil).

pkg/storage/store_split.go, line 63 at r2 (raw file):

		// and re-added with a higher replica ID (and then maybe removed again).
		// We know we've never processed a snapshot for the right range because up
		// to this point it would overlap	with the left and ranges cannot move

spaces

pkg/storage/store_split.go, line 66 at r2 (raw file):
While you're here, rework this comment - it's unclear what it's about until one hits this paragraph. Something like

// We're in the rare case where we know that the RHS has been removed
// and re-added with a higher replica ID (and then maybe removed again).
//
// To apply the split, we need to "throw away" the data that would belong to
// the RHS, i.e. we clear the user data the RHS would have inherited from the
// LHS due to the split and additionally clear all of the range ID local state
// that the split trigger writes into the RHS.
//
// We know we've never processed a snapshot for the right range because the
// LHS prevents any incoming snapshots until the split has executed (i.e. now).
// It is important to preserve the HardState because we might however have
// already voted at a higher term. In general this shouldn't happen because
// we add learners and then promote them only after we snapshot but we're
// going to be extra careful in case future versions of cockroach somehow
// promote replicas without ensuring that a snapshot has been received.
//
// Rather than specifically deleting around the data we want to preserve
// we read the HardState to preserve it, clear everything and write back
// the HardState and tombstone. Note that we only do this if rightRepl
// exists; if it doesn't, there's no Raft state to massage (when rightRepl
// was removed, a tombstone was written instead).

pkg/storage/store_split.go, line 111 at r2 (raw file):

// split with the Store. Requires that Replica.raftMu is held.
//
// TODO(tschottdorf): Want to merge this with SplitRange, but some legacy

Remove this TODO.

pkg/storage/store_split.go, line 116 at r2 (raw file):

	ctx context.Context, deltaMS enginepb.MVCCStats, split *roachpb.SplitTrigger, r *Replica,
) {
	rightRngOrNil := prepareRightReplicaForSplit(ctx, split, r)

Mention when it's nil.

pkg/storage/store_split.go, line 163 at r2 (raw file):

	// If the RHS replica at the point of the split was known to be removed when
	// during the application of the split then we may not find it here. That's
	// fine, carry on.

// See also:
_, _= r.acquireSplitLock, r.splitPreApply

ajwerner

TFTR!

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @tbg)

pkg/storage/replica_command.go, line 1878 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This was just discussed on #44247 (comment)

TLDR: when you refuse the learner, you've already added it to the desc, so we need to be careful. We don't want to roll back replication changes en masse which is what would happen here.

Ack.

pkg/storage/replica_init.go, line 278 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

missing some words

I agree that it would be nice to be more coherent here. Not necessarily for this PR, looks like a decent amount of individual cleanup is needed.

Done.

pkg/storage/replica_raft.go, line 1538 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Comment stale.

Done.

pkg/storage/split_queue_test.go, line 77 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Shadowing a built-in? I usually go with cpy in that case.

Done. This was ancient code.

pkg/storage/store.go, line 1342 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

weird spaces

Done.

pkg/storage/store_split.go, line 51 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

... if it hasn't been removed in the meantime (handled below).

Done.

pkg/storage/store_split.go, line 53 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I'd rewrite this for clarity as

rightRepl, err := r.store.GetReplica(split.RightDesc.RangeID)
if roachpb.IsRangeNotFoundError(err) {
  // The right hand side we were planning to populate has already been removed.
  // We handle this below.
  rightRepl = nil
}
if err != nil {
  log.Fatalf(ctx, errors.Wrap(err, "failed to get RHS replica"))
}

Done.

pkg/storage/store_split.go, line 62 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Worth an assertion (if rightRng isn't nil).

Done.

pkg/storage/store_split.go, line 63 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

spaces

Done.

pkg/storage/store_split.go, line 66 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

While you're here, rework this comment - it's unclear what it's about until one hits this paragraph. Something like

// We're in the rare case where we know that the RHS has been removed
// and re-added with a higher replica ID (and then maybe removed again).
//
// To apply the split, we need to "throw away" the data that would belong to
// the RHS, i.e. we clear the user data the RHS would have inherited from the
// LHS due to the split and additionally clear all of the range ID local state
// that the split trigger writes into the RHS.
//
// We know we've never processed a snapshot for the right range because the
// LHS prevents any incoming snapshots until the split has executed (i.e. now).
// It is important to preserve the HardState because we might however have
// already voted at a higher term. In general this shouldn't happen because
// we add learners and then promote them only after we snapshot but we're
// going to be extra careful in case future versions of cockroach somehow
// promote replicas without ensuring that a snapshot has been received.
//
// Rather than specifically deleting around the data we want to preserve
// we read the HardState to preserve it, clear everything and write back
// the HardState and tombstone. Note that we only do this if rightRepl
// exists; if it doesn't, there's no Raft state to massage (when rightRepl
// was removed, a tombstone was written instead).

Done.

pkg/storage/store_split.go, line 111 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Remove this TODO.

Done.

pkg/storage/store_split.go, line 116 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Mention when it's nil.

Done.

pkg/storage/store_split.go, line 163 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

// See also:
_, _= r.acquireSplitLock, r.splitPreApply
</blockquote></details>

Done.


<!-- Sent from Reviewable.io -->

tbg

Reviewed 11 of 11 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)

pkg/storage/store_split.go, line 91 at r3 (raw file):

			// snapshot to initialize this newer RHS.
			if rightRepl.IsInitialized() {
				log.Fatalf(ctx, "cannot delete ")

not that we're going to ever see this fire 🤞 but something's missing here

This commit adds a migration to remove any remaining on-disk pre-emptive snapshots and then excises the concept from the codebase. With this commit we gain the invariant that a Replica always has a non-zero replicaID that never changes. There's some cruft in replica_init that I'd love to clean up but don't have a clear vision of how. Release note: None

ajwerner

TFTR!

bors r=tbg

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @tbg)

pkg/storage/store_split.go, line 91 at r3 (raw file):

Previously, tbg (Tobias Grieger) wrote…

not that we're going to ever see this fire 🤞 but something's missing here

Done.

44564: colexec: clean up memory infra on panic in NewColOperator r=yuzefovich a=yuzefovich It is possible that an unexpected error occurs during NewColOperator call after we have created some memory monitoring infrastructure. In order to correctly clean up the infra in such scenario we will now have a separate panic catcher that closes the infra within NewColOperator. We will also now clean up the infra when an error occurs. Release note: None 44615: storage: remove preemptive snapshots r=tbg a=ajwerner This commit adds a migration to remove any remaining on-disk pre-emptive snapshots and then excises the concept from the codebase. There's some cruft in replica_init that I'd love to clean up but don't have a clear vision of how. Release note: None 44660: colexec: reject CASE operator with Bytes output type r=yuzefovich a=yuzefovich Currently, there is a contradiction between the way CASE operator works (which populates its output in arbitrary order) and the flat bytes implementation of Bytes type (which prohibits sets in arbitrary order), so we reject such scenario to fall back to row-by-row engine. Fixes: #44624. Release note: None 44687: cli: add system.namespace_deprecated to debug zip r=knz a=otan As we are migrating system.namespace from system.namespace_deprecated in 20.2, it may be useful to have a zip of this old table, in case it comes in handy for debugging in future. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com> Co-authored-by: Oliver Tan <otan@cockroachlabs.com>

craig · 2020-02-04T19:02:02Z

Build succeeded

GitHub CI (Cockroach)

We recently saw an abandoned "leaked" uninitialized replica on a 19.2 node. The unfortunate truth is in that version we aren't sure of its replica ID and thus can't remove it. In 20.1 and later we can remove uninitialized replicas (see cockroachdb#44615). Much of the time, uninitialized replicas are removed either because the range is re-added to the store at a higher replica ID or because the node receives a ReplicaTooOldError which leads to the removal. The replicaGCQueue is the obvious tool to remove "leaked" uninitialized replicas. This commit has to do some work to allow that queue the opportunity to interact with uninitialized replicas. There are at least two cases where such replicas can certainly come to exist The first is when a split is rapidly followed by a removal of both the LHS and RHS but the LHS never processes the split because it receives a ReplicaTooOldError. The second is when a snapshot to add a learner fails but the learner never discovers that it has been removed. Release note (bug fix): Garbage collect abandoned, uninitialized replicas.

tbg reviewed Jan 31, 2020

View reviewed changes

ajwerner force-pushed the ajwerner/remove-preemptive-snapshots branch 3 times, most recently from 1d0bd8a to 444b9a7 Compare February 3, 2020 05:43

ajwerner changed the title ~~[DNM] storage: remove preemptive snapshots~~ storage: remove preemptive snapshots Feb 3, 2020

ajwerner force-pushed the ajwerner/remove-preemptive-snapshots branch 5 times, most recently from 55ef87f to df22915 Compare February 3, 2020 14:51

tbg self-requested a review February 4, 2020 08:47

tbg approved these changes Feb 4, 2020

View reviewed changes

ajwerner force-pushed the ajwerner/remove-preemptive-snapshots branch 3 times, most recently from eb0e8a3 to 0dd71ca Compare February 4, 2020 15:52

ajwerner commented Feb 4, 2020

View reviewed changes

ajwerner force-pushed the ajwerner/remove-preemptive-snapshots branch from 0dd71ca to 454296b Compare February 4, 2020 16:29

tbg approved these changes Feb 4, 2020

View reviewed changes

ajwerner force-pushed the ajwerner/remove-preemptive-snapshots branch from 454296b to b2850e1 Compare February 4, 2020 17:49

ajwerner commented Feb 4, 2020

View reviewed changes

craig bot merged commit b2850e1 into cockroachdb:master Feb 4, 2020

ajwerner mentioned this pull request Apr 23, 2020

kvserver: garbage collect uninitialized replica eventually #47982

Closed

Conversation

ajwerner commented Jan 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Jan 31, 2020

Uh oh!

ajwerner commented Jan 31, 2020

Uh oh!

tbg Jan 31, 2020

Choose a reason for hiding this comment

Uh oh!

tbg Jan 31, 2020

Choose a reason for hiding this comment

Uh oh!

tbg Jan 31, 2020

Choose a reason for hiding this comment

Uh oh!

ajwerner Jan 31, 2020

Choose a reason for hiding this comment

Uh oh!

tbg Jan 31, 2020

Choose a reason for hiding this comment

Uh oh!

tbg commented Jan 31, 2020

Uh oh!

ajwerner commented Feb 3, 2020

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

ajwerner left a comment

Choose a reason for hiding this comment

Uh oh!

craig bot commented Feb 4, 2020

Build succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ajwerner commented Jan 31, 2020 •

edited

Loading