Skip to content

storage: remove preemptive snapshots#44615

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
ajwerner:ajwerner/remove-preemptive-snapshots
Feb 4, 2020
Merged

storage: remove preemptive snapshots#44615
craig[bot] merged 1 commit intocockroachdb:masterfrom
ajwerner:ajwerner/remove-preemptive-snapshots

Conversation

@ajwerner
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner commented Jan 31, 2020

This commit adds a migration to remove any remaining on-disk pre-emptive
snapshots and then excises the concept from the codebase.

There's some cruft in replica_init that I'd love to clean up but don't
have a clear vision of how.

Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@ajwerner
Copy link
Copy Markdown
Contributor Author

@tbg here's a prototype of removing preemptive snapshots. It still needs testing.

One thing I tried to disambiguate was the word "initialized" as it applies to the Replica. Confusingly, calls to init did not previously always result in an "initialized" Replica. I carved out instead a notion of "loaded" as distinct from initialized. I sort of hate that that concept needs to exist at all tbh.

I'd love to hear your thoughts on further improvements. Thanks!

// TODO(ajwerner): Should a recipient be able to decline a LEARNER snapshot?
// It seems like yes. The recipient may know that it does not want to get
// added to a range. Confer with danhhz/tbg/darin about the implications of
// making this true for SnapshotRequest_LEARNER.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just discussed on #44247 (comment)

TLDR: when you refuse the learner, you've already added it to the desc, so we need to be careful. We don't want to roll back replication changes en masse which is what would happen here.

return false, errors.Errorf("found uninitialized RangeDescriptor: %+v", desc)
}
replicaDesc, found := desc.GetReplicaDescriptor(s.StoreID())
if !found {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also get replicas like that when a replica catches up across a change replicas trigger that removes it. When you made these replicas GC themselves proactively when they hit the trigger, you did so atomically with the application of the CRT raft command, right? In other words, in this branch, are we really 100% looking at a preemptive snap? I suppose at the very least we might be looking at a replica that did catch up across the trigger before we had code that insta-gc'ed, but it seems fine to treat that as a preemptive snapshot and nuke it (which would happen to it eventually anyway, since it's not in the desc any more).

// If this is intended for replicaID 0 then it's either a preemptive
// snapshot or a split/merge lock in which case we'll let it go through.
if replicaID == 0 {
// TODO(ajwerner): fix this case.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have this all paged out again, but is the split/merge lock the last remaining use of replicaID zero? I'd be so happy to not have that be a thing. (not in this PR)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better! I removed all uses of ReplicaID 0 in this PR already, just forgot to come back to this conditional after I did that. This can be a fatal now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@tbg
Copy link
Copy Markdown
Member

tbg commented Jan 31, 2020

Only a very partial review, I will have to give this a look with fresh eyes next week. Excited that this is happening!

@ajwerner ajwerner force-pushed the ajwerner/remove-preemptive-snapshots branch 3 times, most recently from 1d0bd8a to 444b9a7 Compare February 3, 2020 05:43
@ajwerner ajwerner changed the title [DNM] storage: remove preemptive snapshots storage: remove preemptive snapshots Feb 3, 2020
@ajwerner
Copy link
Copy Markdown
Contributor Author

ajwerner commented Feb 3, 2020

This is RFAL. I'll put up a second PR to update the grafana dashboards.

@ajwerner ajwerner force-pushed the ajwerner/remove-preemptive-snapshots branch 5 times, most recently from 55ef87f to df22915 Compare February 3, 2020 14:51
@tbg tbg self-requested a review February 4, 2020 08:47
Copy link
Copy Markdown
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: : very very nice. I'm so happy you chewed through and removed all of that detritus. I feel a lot better about all of the code you touched.

Reviewed 14 of 25 files at r1, 14 of 14 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)


pkg/storage/replica_init.go, line 278 at r2 (raw file):

	//      batch. We could avoid setting the local descriptor in this case.
	//   2) When the DisableEagerReplicaRemoval testing knob is enabled. We
	//      could

missing some words

I agree that it would be nice to be more coherent here. Not necessarily for this PR, looks like a decent amount of individual cleanup is needed.


pkg/storage/replica_raft.go, line 1538 at r2 (raw file):

	ctx context.Context, split *roachpb.SplitTrigger,
) (func(), error) {
	// We pass a 0 replicaID because we want to lock the RHS even if we know

Comment stale.


pkg/storage/split_queue_test.go, line 77 at r2 (raw file):

		// ensures that the store won't be mucking with our replica concurrently
		// during testing (e.g. via the system config gossip update).
		copy := *tc.repl.Desc()

Shadowing a built-in? I usually go with cpy in that case.


pkg/storage/store.go, line 1342 at r2 (raw file):

				// range which has processed a raft command to remove itself (which is
				// possible prior to 19.2 or if the DisableEagerReplicaRemoval is
				// enabled) and has not yet been removed by the replica	 gc queue.

weird spaces


pkg/storage/store_split.go, line 51 at r2 (raw file):

... if it hasn't been removed in the meantime (handled below).


pkg/storage/store_split.go, line 53 at r2 (raw file):

	// acquired) in Replica.acquireSplitLock. It must be present here.
	rightRepl, err := r.store.GetReplica(split.RightDesc.RangeID)
	if err != nil && !roachpb.IsRangeNotFoundError(err) {

I'd rewrite this for clarity as

rightRepl, err := r.store.GetReplica(split.RightDesc.RangeID)
if roachpb.IsRangeNotFoundError(err) {
  // The right hand side we were planning to populate has already been removed.
  // We handle this below.
  rightRepl = nil
}
if err != nil {
  log.Fatalf(ctx, errors.Wrap(err, "failed to get RHS replica"))
}

pkg/storage/store_split.go, line 62 at r2 (raw file):

		// We're in the rare case where we know that the RHS has been removed
		// and re-added with a higher replica ID (and then maybe removed again).
		// We know we've never processed a snapshot for the right range because up

Worth an assertion (if rightRng isn't nil).


pkg/storage/store_split.go, line 63 at r2 (raw file):

		// and re-added with a higher replica ID (and then maybe removed again).
		// We know we've never processed a snapshot for the right range because up
		// to this point it would overlap	with the left and ranges cannot move

spaces


pkg/storage/store_split.go, line 66 at r2 (raw file):
While you're here, rework this comment - it's unclear what it's about until one hits this paragraph. Something like

// We're in the rare case where we know that the RHS has been removed
// and re-added with a higher replica ID (and then maybe removed again).
//
// To apply the split, we need to "throw away" the data that would belong to
// the RHS, i.e. we clear the user data the RHS would have inherited from the
// LHS due to the split and additionally clear all of the range ID local state
// that the split trigger writes into the RHS.
//
// We know we've never processed a snapshot for the right range because the
// LHS prevents any incoming snapshots until the split has executed (i.e. now).
// It is important to preserve the HardState because we might however have
// already voted at a higher term. In general this shouldn't happen because
// we add learners and then promote them only after we snapshot but we're
// going to be extra careful in case future versions of cockroach somehow
// promote replicas without ensuring that a snapshot has been received.
//
// Rather than specifically deleting around the data we want to preserve
// we read the HardState to preserve it, clear everything and write back
// the HardState and tombstone. Note that we only do this if rightRepl
// exists; if it doesn't, there's no Raft state to massage (when rightRepl
// was removed, a tombstone was written instead).


pkg/storage/store_split.go, line 111 at r2 (raw file):

// split with the Store. Requires that Replica.raftMu is held.
//
// TODO(tschottdorf): Want to merge this with SplitRange, but some legacy

Remove this TODO.


pkg/storage/store_split.go, line 116 at r2 (raw file):

	ctx context.Context, deltaMS enginepb.MVCCStats, split *roachpb.SplitTrigger, r *Replica,
) {
	rightRngOrNil := prepareRightReplicaForSplit(ctx, split, r)

Mention when it's nil.


pkg/storage/store_split.go, line 163 at r2 (raw file):

	// If the RHS replica at the point of the split was known to be removed when
	// during the application of the split then we may not find it here. That's
	// fine, carry on.
// See also:
_, _= r.acquireSplitLock, r.splitPreApply

@ajwerner ajwerner force-pushed the ajwerner/remove-preemptive-snapshots branch 3 times, most recently from eb0e8a3 to 0dd71ca Compare February 4, 2020 15:52
Copy link
Copy Markdown
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)


pkg/storage/replica_command.go, line 1878 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This was just discussed on #44247 (comment)

TLDR: when you refuse the learner, you've already added it to the desc, so we need to be careful. We don't want to roll back replication changes en masse which is what would happen here.

Ack.


pkg/storage/replica_init.go, line 278 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

missing some words

I agree that it would be nice to be more coherent here. Not necessarily for this PR, looks like a decent amount of individual cleanup is needed.

Done.


pkg/storage/replica_raft.go, line 1538 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Comment stale.

Done.


pkg/storage/split_queue_test.go, line 77 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Shadowing a built-in? I usually go with cpy in that case.

Done. This was ancient code.


pkg/storage/store.go, line 1342 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

weird spaces

Done.


pkg/storage/store_split.go, line 51 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

... if it hasn't been removed in the meantime (handled below).

Done.


pkg/storage/store_split.go, line 53 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I'd rewrite this for clarity as

rightRepl, err := r.store.GetReplica(split.RightDesc.RangeID)
if roachpb.IsRangeNotFoundError(err) {
  // The right hand side we were planning to populate has already been removed.
  // We handle this below.
  rightRepl = nil
}
if err != nil {
  log.Fatalf(ctx, errors.Wrap(err, "failed to get RHS replica"))
}

Done.


pkg/storage/store_split.go, line 62 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Worth an assertion (if rightRng isn't nil).

Done.


pkg/storage/store_split.go, line 63 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

spaces

Done.


pkg/storage/store_split.go, line 66 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

While you're here, rework this comment - it's unclear what it's about until one hits this paragraph. Something like

// We're in the rare case where we know that the RHS has been removed
// and re-added with a higher replica ID (and then maybe removed again).
//
// To apply the split, we need to "throw away" the data that would belong to
// the RHS, i.e. we clear the user data the RHS would have inherited from the
// LHS due to the split and additionally clear all of the range ID local state
// that the split trigger writes into the RHS.
//
// We know we've never processed a snapshot for the right range because the
// LHS prevents any incoming snapshots until the split has executed (i.e. now).
// It is important to preserve the HardState because we might however have
// already voted at a higher term. In general this shouldn't happen because
// we add learners and then promote them only after we snapshot but we're
// going to be extra careful in case future versions of cockroach somehow
// promote replicas without ensuring that a snapshot has been received.
//
// Rather than specifically deleting around the data we want to preserve
// we read the HardState to preserve it, clear everything and write back
// the HardState and tombstone. Note that we only do this if rightRepl
// exists; if it doesn't, there's no Raft state to massage (when rightRepl
// was removed, a tombstone was written instead).

Done.


pkg/storage/store_split.go, line 111 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Remove this TODO.

Done.


pkg/storage/store_split.go, line 116 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Mention when it's nil.

Done.


pkg/storage/store_split.go, line 163 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…
// See also:
_, _= r.acquireSplitLock, r.splitPreApply
</blockquote></details>

Done.


<!-- Sent from Reviewable.io -->

@ajwerner ajwerner force-pushed the ajwerner/remove-preemptive-snapshots branch from 0dd71ca to 454296b Compare February 4, 2020 16:29
Copy link
Copy Markdown
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 11 of 11 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)


pkg/storage/store_split.go, line 91 at r3 (raw file):

			// snapshot to initialize this newer RHS.
			if rightRepl.IsInitialized() {
				log.Fatalf(ctx, "cannot delete ")

not that we're going to ever see this fire 🤞 but something's missing here

This commit adds a migration to remove any remaining on-disk pre-emptive
snapshots and then excises the concept from the codebase. With this commit
we gain the invariant that a Replica always has a non-zero replicaID that
never changes.

There's some cruft in replica_init that I'd love to clean up but don't
have a clear vision of how.

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/remove-preemptive-snapshots branch from 454296b to b2850e1 Compare February 4, 2020 17:49
Copy link
Copy Markdown
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r=tbg

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)


pkg/storage/store_split.go, line 91 at r3 (raw file):

Previously, tbg (Tobias Grieger) wrote…

not that we're going to ever see this fire 🤞 but something's missing here

Done.

craig bot pushed a commit that referenced this pull request Feb 4, 2020
44564: colexec: clean up memory infra on panic in NewColOperator r=yuzefovich a=yuzefovich

It is possible that an unexpected error occurs during NewColOperator
call after we have created some memory monitoring infrastructure. In
order to correctly clean up the infra in such scenario we will now have
a separate panic catcher that closes the infra within NewColOperator.
We will also now clean up the infra when an error occurs.

Release note: None

44615: storage: remove preemptive snapshots r=tbg a=ajwerner

This commit adds a migration to remove any remaining on-disk pre-emptive
snapshots and then excises the concept from the codebase.

There's some cruft in replica_init that I'd love to clean up but don't
have a clear vision of how.

Release note: None

44660: colexec: reject CASE operator with Bytes output type r=yuzefovich a=yuzefovich

Currently, there is a contradiction between the way CASE operator
works (which populates its output in arbitrary order) and the flat
bytes implementation of Bytes type (which prohibits sets in arbitrary
order), so we reject such scenario to fall back to row-by-row engine.

Fixes: #44624.

Release note: None

44687: cli: add system.namespace_deprecated to debug zip r=knz a=otan

As we are migrating system.namespace from system.namespace_deprecated in
20.2, it may be useful to have a zip of this old table, in case it comes
in handy for debugging in future.

Release note: None

Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Feb 4, 2020

Build succeeded

@craig craig bot merged commit b2850e1 into cockroachdb:master Feb 4, 2020
ajwerner pushed a commit to ajwerner/cockroach that referenced this pull request Apr 24, 2020
We recently saw an abandoned "leaked" uninitialized replica on a 19.2 node.
The unfortunate truth is in that version we aren't sure of its replica ID and
thus can't remove it. In 20.1 and later we can remove uninitialized replicas
(see cockroachdb#44615). Much of the time, uninitialized replicas are removed either
because the range is re-added to the store at a higher replica ID or because
the node receives a ReplicaTooOldError which leads to the removal.

The replicaGCQueue is the obvious tool to remove "leaked" uninitialized
replicas. This commit has to do some work to allow that queue the opportunity
to interact with uninitialized replicas.

There are at least two cases where such replicas can certainly come to exist
The first is when a split is rapidly followed by a removal of both the LHS and
RHS but the LHS never processes the split because it receives a
ReplicaTooOldError. The second is when a snapshot to add a learner fails but
the learner never discovers that it has been removed.

Release note (bug fix): Garbage collect abandoned, uninitialized replicas.
ajwerner pushed a commit to ajwerner/cockroach that referenced this pull request Apr 24, 2020
We recently saw an abandoned "leaked" uninitialized replica on a 19.2 node.
The unfortunate truth is in that version we aren't sure of its replica ID and
thus can't remove it. In 20.1 and later we can remove uninitialized replicas
(see cockroachdb#44615). Much of the time, uninitialized replicas are removed either
because the range is re-added to the store at a higher replica ID or because
the node receives a ReplicaTooOldError which leads to the removal.

The replicaGCQueue is the obvious tool to remove "leaked" uninitialized
replicas. This commit has to do some work to allow that queue the opportunity
to interact with uninitialized replicas.

There are at least two cases where such replicas can certainly come to exist
The first is when a split is rapidly followed by a removal of both the LHS and
RHS but the LHS never processes the split because it receives a
ReplicaTooOldError. The second is when a snapshot to add a learner fails but
the learner never discovers that it has been removed.

Release note (bug fix): Garbage collect abandoned, uninitialized replicas.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants