kvserver/loqrecovery: check full key coverage in quorum recovery by aliher1911 · Pull Request #73756 · cockroachdb/cockroach

aliher1911 · 2021-12-13T20:09:42Z

Previously when doing unsafe replica recovery, if some ranges are
missing or represented by stale replicas that were split or merged,
recovery will change cluster to inconsistent state with gaps or
overlaps in keyspace.
This change adds checks for range completeness as well as adds a
preference for replicas with higher range applied index.

Release note: None

Fixes #73662

cockroach-teamcity · 2021-12-13T20:09:49Z

This change is

tbg

Did not review first commit.

Still have to review the logic changes.

Reviewed 25 of 25 files at r1, 4 of 7 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)

pkg/kv/kvserver/loqrecovery/loqrecoverypb/recovery.go, line 72 at r2 (raw file):

	}
	return roachpb.ReplicaID(0), errors.Errorf(
		"invalid replica info %s, replicas doesn't contain its own descriptor", m)

Can this be hit in practice?

pkg/kv/kvserver/loqrecovery/testdata/incomplete_range_fails, line 1 at r2 (raw file):

# Tests verifying that gaps or overlaps in ranges block recovery.

nit on the test name, "incomplete range" isn't particularly evocative of what's being exercised here, perhaps keyspace_covering since that is the property we're checking?

pkg/kv/kvserver/loqrecovery/testdata/incomplete_range_fails, line 3 at r2 (raw file):

# Tests verifying that gaps or overlaps in ranges block recovery.

# Check range gap where range 2 is missing leaving a hole between ranges 1 and 2

1 and 3

Throughout: consistent comment style: punctuation and capitalization for free-standing comments, no caps nor puncts for inline comments.

pkg/kv/kvserver/loqrecovery/testdata/incomplete_range_fails, line 8 at r2 (raw file):

  RangeID: 1
  StartKey: /Min
  EndKey: /Table/3  # First range end short of the second one leaving a missing Table/3

ends

missing [/Table/3,/Table,4)

pkg/kv/kvserver/loqrecovery/testdata/incomplete_range_fails, line 34 at r2 (raw file):
"key range is incomplete" is borderline confusing how about (I assume this message will be visible in cli as well)

unable to form a non-overlapping covering of the addressable keyspace: range gap ...

This isn't printing the key span for r1 nor n3, it would be helpful to group all spans, something like this:

range gap: r1: [/Min,/Table/3) gap: /Table/{3-4} r3: [/Table/4,/Max)

or the below, which might be easier to parse if there are lots of problems:

range gap /Table/{3-4}:
  r1: [/Min,/Table/3)
  r3: [/Table/4,/Max)

Could you also add an example that has more than one overlap to make sure that it works? I assume you have nontrivial coverage of the "exactly touching" case as this is "all of the test cases so far", right?

pkg/kv/kvserver/loqrecovery/testdata/incomplete_range_fails, line 77 at r2 (raw file):

make-plan
----
key range is incomplete. discovered inconsistencies: range overlap /{Table/1-Max} between ranges r1 and r3, range overlap /{Table/3-Max} between ranges r1 and r4

This isn't printing the r1 key span, which would be helpful. Ideally the format is consistent with the range overlap case.

pkg/kv/kvserver/loqrecovery/testdata/max_applied_voter_wins, line 1 at r2 (raw file):

# Test verifying that voter with max range applied index is chosen.

Are these two independent descriptions of the test? Perhaps delete the first line.

pkg/kv/kvserver/loqrecovery/testdata/max_applied_voter_wins, line 13 at r2 (raw file):

  - { NodeID: 1, StoreID: 1, ReplicaID: 1}  # This replica has higher range applied index so has preference
  - { NodeID: 2, StoreID: 2, ReplicaID: 2}
  - { NodeID: 3, StoreID: 3, ReplicaID: 3}

down

comments on 3-5 would be helpful

tbg

Reviewed 3 of 7 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @erikgrinaker)

erikgrinaker

Flushing some comments after a first pass.

Reviewed 25 of 25 files at r1, 7 of 7 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)

pkg/cli/debug_recover_loss_of_quorum.go, line 283 at r2 (raw file):

	for _, r := range report.PlannedUpdates {
		_, _ = fmt.Fprintf(stderr, "Recovering range r%d:%s promoting replica r%d at store s%d to r%d. "+
			"Discarding available replicas: [%s], Discarded dead replicas: [%s]\n",

Nit: use "discarding" for both.

pkg/kv/kvserver/loqrecovery/plan.go, line 82 at r2 (raw file):

func (r *PlanningReport) addUpdate(rr ReplicaUpdateReport) {
	r.PlannedUpdates = append(r.PlannedUpdates, rr)

nit: why not append directly to the slice instead of using this method?

pkg/kv/kvserver/loqrecovery/plan.go, line 121 at r2 (raw file):

	replicasByRangeID := groupReplicasByRangeID(replicas)
	// proposedWinners contain decisions for all ranges in keyspace. it contains ranges

We already have a lot of different names for this (survivor, promoted, new, updated, etc), let's not add yet another. We should pick a consistent name and stick with it.

pkg/kv/kvserver/loqrecovery/plan.go, line 212 at r2 (raw file):

	groupedRanges := make(map[roachpb.RangeID][]loqrecoverypb.ReplicaInfo)
	for _, descriptor := range descriptors {
		groupedRanges[descriptor.Desc.RangeID] = append(groupedRanges[descriptor.Desc.RangeID], descriptor)

nit: 100 columns throughout.

pkg/kv/kvserver/loqrecovery/plan.go, line 219 at r2 (raw file):

// proposedDecision contains replica resolution details e.g. preferred replica as
// well as extra info to produce a report of the planned action.
type proposedDecision struct {

Instead of introducing a new, opaque type for this, how about using a ranked list of replicas instead, where the first entry would be the preferred one? Then we'd pass around something like rankedReplicas []loqrecoverypb.ReplicaInfo, which is a lot more transparent. That's also essentially what pickPotentialWinningReplica already does, it's a ranking function that only at the end splits off the winner.

pkg/kv/kvserver/loqrecovery/plan.go, line 240 at r2 (raw file):

// Note that descriptors argument would be sorted in process of picking a winner
func pickPotentialWinningReplica(replicas []loqrecoverypb.ReplicaInfo) proposedDecision {
	// Maybe that's too expensive to compute multiple times per replica?

Should be fine (this is linear and cheap), but you could always generate a large dataset and try it out.

pkg/kv/kvserver/loqrecovery/plan.go, line 250 at r2 (raw file):

			}
		}
		// This is suspicious, our descriptor is not in replicas. Panic maybe?

Panics should only be used when internal invariants are broken, while the input here is external. I think we can keep this function as a non-fallible ranking function, and leave it to the caller to verify that the input satisfies all conditions.

pkg/kv/kvserver/loqrecovery/plan.go, line 278 at r2 (raw file):

			(voterI == voterJ &&
				(replicas[i].RaftAppliedIndex > replicas[j].RaftAppliedIndex ||
					(replicas[i].RaftAppliedIndex == replicas[j].RaftAppliedIndex && replicas[i].StoreID > replicas[j].StoreID)))

This chain is a bit hard to read. Can it be restructured as e.g. a switch statement or sequence of conditionals?

pkg/kv/kvserver/loqrecovery/plan.go, line 300 at r2 (raw file):

// checkRangeCompleteness given slice of all survivor ranges, checks that full keyspace is covered.
// Note that slice would be sorted in process of the check.
func checkRangeCompleteness(replicas []proposedDecision) error {

Use "covering" in the name here, e.g. checkKeyspaceCovering, to tie it into the Covering property from the RFC.

pkg/kv/kvserver/loqrecovery/plan.go, line 335 at r2 (raw file):

		default:
			prevRange = desc.RangeID()
			lastKey = desc.EndKey()

Can we move the prevRange and lastKey assignments out of the switch, since I think we can set them for all branches and still remain correct?

pkg/kv/kvserver/loqrecovery/plan.go, line 352 at r2 (raw file):

			descriptions = append(descriptions, fmt.Sprintf("%v", id))
		}
		return errors.Newf("key range is incomplete. discovered inconsistencies: %s", strings.Join(descriptions, ", "))

I think we can pass in anomalies here and it'll do the comma-separated list of strings automatically?

pkg/kv/kvserver/loqrecovery/plan.go, line 364 at r2 (raw file):

	ctx context.Context, p proposedDecision, liveStoreIDs storeIDSet,
) (loqrecoverypb.ReplicaUpdate, bool) {
	if !p.winner.Desc.Replicas().CanMakeProgress(func(rep roachpb.ReplicaDescriptor) bool {

nit: consider inverting this and returning early, to avoid the entire function being inside a branch.

pkg/kv/kvserver/loqrecovery/plan.go, line 371 at r2 (raw file):

		// all available replicas. We want to ensure that there would be no conflict
		// without bumping it by arbitrary number.
		nextReplicaID := p.winner.Desc.NextReplicaID

As we discussed elsewhere, how about collecting the canonical range descriptors from the meta-ranges during collect-info, and then using them if they still have quorum? That would give us an authoritative view of all range descriptors in the system, which would take a lot of the guesswork out of this decision and many others (e.g. if there were any splits that we haven't caught).

I'm fine with putting that off for a subsequent pull request though.

pkg/kv/kvserver/loqrecovery/plan.go, line 380 at r2 (raw file):

		replicaID, err := p.winner.ReplicaID()
		if err != nil {
			log.Fatalf(ctx, "replica descriptor doesn't contain replica for its own store: %s", p.winner.Desc)

We should return this up the stack, since this is based on externally provided input.

aliher1911

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)

pkg/kv/kvserver/loqrecovery/plan.go, line 335 at r2 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Can we move the prevRange and lastKey assignments out of the switch, since I think we can set them for all branches and still remain correct?

We don't do it if we have multiple ranges overlapping (first case).
Lets say R1 splits to R1(LHS) and R2(RHS) then R2 splits to R2(LHS) and R3(RHS). We lose R1(LHS) and are left with original R1. It would overlap with R2 and R3 and we don't advance prevRange at R2 because R1 is still covering wider range.

pkg/kv/kvserver/loqrecovery/plan.go, line 352 at r2 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

I think we can pass in anomalies here and it'll do the comma-separated list of strings automatically?

I tried that and getting a [] enclosed list but values are space separated which makes it harder to read. Anyway, Tobi suggests breaking it down into rows and making gaps and overlaps more specific e.g. which ranges before/after or what is overlapping. Would need more creative formatting.

pkg/kv/kvserver/loqrecovery/plan.go, line 371 at r2 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

As we discussed elsewhere, how about collecting the canonical range descriptors from the meta-ranges during collect-info, and then using them if they still have quorum? That would give us an authoritative view of all range descriptors in the system, which would take a lot of the guesswork out of this decision and many others (e.g. if there were any splits that we haven't caught).

I'm fine with putting that off for a subsequent pull request though.

Adding meta collection would be interesting especially if quorum is lost on meta rage!

pkg/kv/kvserver/loqrecovery/plan.go, line 380 at r2 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

We should return this up the stack, since this is based on externally provided input.

I'm torn on this. If we add errors here, that would complicate things since we'll have to bubble it up. While we should never have a replica info coming from store that doesn't have its own replica missing from descriptor, or is this assumption incorrect?

pkg/kv/kvserver/loqrecovery/loqrecoverypb/recovery.go, line 72 at r2 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Can this be hit in practice?

I don't think this is possible unless someone would edit or corrupt the replica info file. Or we have a bug. We can Fatal here, but that would require context so it is done on the level above. I'd rather not have to think about it at all.

erikgrinaker

Reviewed 4 of 11 files at r3, 7 of 7 files at r4, 27 of 27 files at r5, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @tbg)

pkg/kv/kvserver/loqrecovery/plan.go, line 219 at r2 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Instead of introducing a new, opaque type for this, how about using a ranked list of replicas instead, where the first entry would be the preferred one? Then we'd pass around something like rankedReplicas []loqrecoverypb.ReplicaInfo, which is a lot more transparent. That's also essentially what pickPotentialWinningReplica already does, it's a ranking function that only at the end splits off the winner.

I was thinking we wouldn't need a separate type for this at all -- just passing around [][]ReplicaInfo would be more transparent. But not a strongly held view, this is fine if you think it's better.

pkg/kv/kvserver/loqrecovery/plan.go, line 335 at r2 (raw file):

Previously, aliher1911 (Oleg) wrote…

We don't do it if we have multiple ranges overlapping (first case).
Lets say R1 splits to R1(LHS) and R2(RHS) then R2 splits to R2(LHS) and R3(RHS). We lose R1(LHS) and are left with original R1. It would overlap with R2 and R3 and we don't advance prevRange at R2 because R1 is still covering wider range.

Got it. I meant that we'd move whatever condition was necessary out of the switch, which I see is what we did here.

pkg/kv/kvserver/loqrecovery/plan.go, line 352 at r2 (raw file):

Previously, aliher1911 (Oleg) wrote…

I tried that and getting a [] enclosed list but values are space separated which makes it harder to read. Anyway, Tobi suggests breaking it down into rows and making gaps and overlaps more specific e.g. which ranges before/after or what is overlapping. Would need more creative formatting.

Makes sense.

pkg/kv/kvserver/loqrecovery/plan.go, line 371 at r2 (raw file):

Previously, aliher1911 (Oleg) wrote…

Adding meta collection would be interesting especially if quorum is lost on meta rage!

Yeah, we'd have to fall back to per-range heuristics if we don't have quorum on the meta ranges. But that's going to be the exception rather than the norm, so we'd get better guarantees in the typical case.

pkg/kv/kvserver/loqrecovery/plan.go, line 380 at r2 (raw file):

Previously, aliher1911 (Oleg) wrote…

I'm torn on this. If we add errors here, that would complicate things since we'll have to bubble it up. While we should never have a replica info coming from store that doesn't have its own replica missing from descriptor, or is this assumption incorrect?

Well, as a rule of thumb, user input should never be able to cause a panic -- panics are only appropriate when internal system invariants are violated. Sure, I get the idea that the input file "should" have been generated by us in a previous command, but it's still provided as input from the user so I don't think it's appropriate to panic on it.

pkg/kv/kvserver/loqrecovery/utils.go, line 90 at r5 (raw file):

// GetReport returns a properly formatted report that could be presented
// to user. It is separate from the message to respect error conventions.

What convention is this? Should error messages be single-line or something?

aliher1911

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)

pkg/kv/kvserver/loqrecovery/plan.go, line 219 at r2 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

I was thinking we wouldn't need a separate type for this at all -- just passing around [][]ReplicaInfo would be more transparent. But not a strongly held view, this is fine if you think it's better.

In the follow up PR, I need something to keep track of record indexes to save storelocal values. It gets back to struct or there would be a need for some external per store counter. As more features are added it becomes beneficial to have more context beside ReadWriter.

pkg/kv/kvserver/loqrecovery/plan.go, line 335 at r2 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Got it. I meant that we'd move whatever condition was necessary out of the switch, which I see is what we did here.

Yeah I moved it out, the condition could only fire for the first case, but it looks better now so worth the effort!

pkg/kv/kvserver/loqrecovery/plan.go, line 380 at r2 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Well, as a rule of thumb, user input should never be able to cause a panic -- panics are only appropriate when internal system invariants are violated. Sure, I get the idea that the input file "should" have been generated by us in a previous command, but it's still provided as input from the user so I don't think it's appropriate to panic on it.

Hard to argue with that. Maybe validate unmarshalled info upon load and fail fast and then skip checks altogether then? Otherwise this logic creeps unnecessarily into the code.

pkg/kv/kvserver/loqrecovery/utils.go, line 90 at r5 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

What convention is this? Should error messages be single-line or something?

Messages start with lowercase. Unless I'm making it up of course. I'm running full lint to find it out. But in general having details separate from message seems... reasonable?
It would be nice to do something similar to the rest of loqrecovery vs cli separation where cli would present results. This case though would be different because datadriven tests would have to present error results and for that they'll need cli code. So I put it into error itself.
In worst case I can turn it into Error() method.

erikgrinaker

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @tbg)

pkg/kv/kvserver/loqrecovery/plan.go, line 380 at r2 (raw file):

Previously, aliher1911 (Oleg) wrote…

Hard to argue with that. Maybe validate unmarshalled info upon load and fail fast and then skip checks altogether then? Otherwise this logic creeps unnecessarily into the code.

Yeah, doing an initial verification pass is fine if the error checking ends up being excessive.

pkg/kv/kvserver/loqrecovery/utils.go, line 90 at r5 (raw file):

Previously, aliher1911 (Oleg) wrote…

Messages start with lowercase. Unless I'm making it up of course. I'm running full lint to find it out. But in general having details separate from message seems... reasonable?
It would be nice to do something similar to the rest of loqrecovery vs cli separation where cli would present results. This case though would be different because datadriven tests would have to present error results and for that they'll need cli code. So I put it into error itself.
In worst case I can turn it into Error() method.

Yeah, this approach is totally fine. I was just wondering if the error type assertion was strictly necessary or if we could do something simpler, but this works.

aliher1911

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @tbg)

pkg/kv/kvserver/loqrecovery/utils.go, line 90 at r5 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Yeah, this approach is totally fine. I was just wondering if the error type assertion was strictly necessary or if we could do something simpler, but this works.

So I grepped through the sources and see that we always use lowercase for errors.New(f) message except the cases where first word is some name and no periods at the end. So we are pretty consistent in that. Outliers are in tests only. But we don't have a lint rule for that. It must have been pointed to me on some of the reviews before.
@tbg maybe you have some insight what's the common approach?

erikgrinaker

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @tbg)

pkg/kv/kvserver/loqrecovery/utils.go, line 90 at r5 (raw file):

Previously, aliher1911 (Oleg) wrote…

So I grepped through the sources and see that we always use lowercase for errors.New(f) message except the cases where first word is some name and no periods at the end. So we are pretty consistent in that. Outliers are in tests only. But we don't have a lint rule for that. It must have been pointed to me on some of the reviews before.
@tbg maybe you have some insight what's the common approach?

Yeah, no, the lowercase convention is totally a thing -- that's a Go convention, not a CRDB thing. The reasoning is that errors will often be combined via other errors, e.g. due to errors.Wrap or fmt.Printf("something went wrong: %s", err), and uppercase just looks weird in long error chains like this.

https://github.com/golang/go/wiki/CodeReviewComments#error-strings

I suppose one could argue for an exception here, or even just go with lowercase, but not sure if it's worth it. What we have is fine too, nbd.

knz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @tbg)

pkg/kv/kvserver/loqrecovery/utils.go, line 90 at r5 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Yeah, no, the lowercase convention is totally a thing -- that's a Go convention, not a CRDB thing. The reasoning is that errors will often be combined via other errors, e.g. due to errors.Wrap or fmt.Printf("something went wrong: %s", err), and uppercase just looks weird in long error chains like this.

https://github.com/golang/go/wiki/CodeReviewComments#error-strings

I suppose one could argue for an exception here, or even just go with lowercase, but not sure if it's worth it. What we have is fine too, nbd.

you can also have a lowercase error message, then add the report with errors.WithDetail. This way the report is properly presented "below" the error in the CLI.

The details section is meant to contain fully formed sentences (with capitals and periods).

aliher1911

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)

pkg/kv/kvserver/loqrecovery/plan.go, line 380 at r2 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Yeah, doing an initial verification pass is fine if the error checking ends up being excessive.

Actually I have a different idea. Looking on how we use our "local" replica early in the process we would always do this check when doing keyspace coverage. So it would return the error and subsequent checks are unnecessary. But to be sure we don't do anything funny, I'll just add an invalid input test to cover this case.

aliher1911

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)

pkg/kv/kvserver/loqrecovery/utils.go, line 90 at r5 (raw file):

Previously, knz (kena) wrote…

you can also have a lowercase error message, then add the report with errors.WithDetail. This way the report is properly presented "below" the error in the CLI.

The details section is meant to contain fully formed sentences (with capitals and periods).

This is exactly what we need here! I already made a change. CLI will handle this error automagically and test will just use GetAllDetails to compare if available.

erikgrinaker

Reviewed 3 of 6 files at r6, 4 of 4 files at r7, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911 and @tbg)

pkg/kv/kvserver/loqrecovery/plan.go, line 398 at r7 (raw file):

	}

	replica, _ := p.survivor().Replica()

Even if we've checked this earlier, we shouldn't just ignore the error here. It's better to fatal with a comment that we "should" never hit the branch because we expect the caller to already have checked this.

erikgrinaker

Reviewed 3 of 3 files at r8.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)

pkg/kv/kvserver/loqrecovery/plan.go, line 249 at r8 (raw file):

// could be used to do key covering validation.
// Note that descriptors argument would be sorted in process of picking a survivor
func pickPotentialSurvivingReplica(replicas []loqrecoverypb.ReplicaInfo) rankedReplicas {

nit: this isn't really picking a single replica though, it's ranking all of them. So I'd call it something like rankSurvivingReplicas.

pkg/kv/kvserver/loqrecovery/plan.go, line 301 at r8 (raw file):

}

// checkKeyspaceCovering given slice of all survivor ranges, checks that full keyspace is covered.

nit: comments should be wrapped at 80 columns, see style guide.

pkg/kv/kvserver/loqrecovery/plan.go, line 303 at r8 (raw file):

// checkKeyspaceCovering given slice of all survivor ranges, checks that full keyspace is covered.
// Note that slice would be sorted in process of the check.
func checkKeyspaceCovering(replicas []rankedReplicas) error {

This turned out very clean and readable. 👍

pkg/kv/kvserver/loqrecovery/plan.go, line 311 at r8 (raw file):

		RangeID:  roachpb.RangeID(0),
		StartKey: roachpb.RKey{},
		EndKey:   roachpb.RKey{}}}}

nit: it isn't really necessary to specify these values, since they're all zero values.

pkg/kv/kvserver/loqrecovery/plan.go, line 316 at r8 (raw file):

	// fail, we record this as anomaly to indicate there's a gap between ranges or an overlap
	// between two or more ranges.
	for _, desc := range replicas {

nit: this isn't a descriptor, it's a ranked list of descriptors.

pkg/kv/kvserver/loqrecovery/plan.go, line 342 at r8 (raw file):

		case prevDesc.endKey().Less(desc.startKey()):
			anomalies = append(anomalies, anomaly{
				span:       roachpb.Span{Key: roachpb.Key(prevDesc.endKey()), EndKey: roachpb.Key(desc.startKey())},

nit: 100 columns. Please set up some sort of formatting or warning for this in your editor.

pkg/kv/kvserver/loqrecovery/plan.go, line 390 at r8 (raw file):

	// We want to have replicaID which is greater or equal nextReplicaID across
	// all available replicas. We want to ensure that there would be no conflict
	// without bumping it by arbitrary number.

Aren't we bumping it by an arbitrary number below, though?

pkg/kv/kvserver/loqrecovery/utils.go, line 58 at r8 (raw file):

}

type anomaly struct {

nit: when this is located separately from where it's used, it needs a more specific name -- what kind of anomaly?

pkg/kv/kvserver/loqrecovery/testdata/invalid_input, line 2 at r8 (raw file):

# Test verifies that if we have replicas with incorrect descriptors that makes them
# impossible to process, we detect that and don't produce bad results.

nit: the name of this test should be more specific -- there are all sorts of invalid inputs we could be providing.

pkg/kv/kvserver/loqrecovery/testdata/keyspace_coverage, line 1 at r8 (raw file):

# Tests verifying that gaps between range spans or overlaps of range spans block recovery.

Good test coverage here. Consider another couple of cases: empty input set, and two ranges with identical key spans.

pkg/kv/kvserver/loqrecovery/testdata/keyspace_coverage, line 90 at r8 (raw file):

make-plan
----
Key space covering is not complete. Discovered following inconsistencies:

nit: perhaps we should output error before any error output in these tests, for clarity.

pkg/kv/kvserver/loqrecovery/testdata/max_applied_voter_wins, line 30 at r8 (raw file):

  - { NodeID: 5, StoreID: 5, ReplicaID: 5}
  RangeAppliedIndex: 10    # applied index takes precedence over store ID so this replica loses
  RaftCommittedIndex: 13

We should set this to 14, to make sure the applied index takes precedence over the committed index too.

pkg/kv/kvserver/loqrecovery/testdata/max_store_voter_wins, line 160 at r8 (raw file):

make-plan
----
[]

This changes because the post-split ranges actually have quorum, right? So there is no recovery needed?

Are there other related cases we should test, e.g. a merge scenario?

aliher1911

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)

pkg/kv/kvserver/loqrecovery/plan.go, line 390 at r8 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

Aren't we bumping it by an arbitrary number below, though?

Yeah, that's unfortunate comment left after the concept was revised that we can have some stray replicas emerging from "dead" nodes that were brought up by mistake.

pkg/kv/kvserver/loqrecovery/testdata/max_store_voter_wins, line 160 at r8 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

This changes because the post-split ranges actually have quorum, right? So there is no recovery needed?

Are there other related cases we should test, e.g. a merge scenario?

Those cases shouldn't be any different because we don't analyze how keys changed only committed index which is greater in both cases. I think that was a mistake to put it inco max_store_voter_wins instead of max_applied_voter_wind test file. I'll change that.

erikgrinaker

Reviewed 8 of 8 files at r10, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aliher1911 and @tbg)

pkg/kv/kvserver/loqrecovery/recovery_env_test.go, line 140 at r10 (raw file):

		details := errors.GetAllDetails(err)
		if len(details) > 0 {
			return fmt.Sprintf("ERROR: %s", strings.Join(details, "\n"))

nit: I was thinking we could do this for all errors, not just this specific error. But your call.

aliher1911

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aliher1911, @erikgrinaker, and @tbg)

pkg/kv/kvserver/loqrecovery/recovery_env_test.go, line 140 at r10 (raw file):

Previously, erikgrinaker (Erik Grinaker) wrote…

nit: I was thinking we could do this for all errors, not just this specific error. But your call.

Good point! I'll change it for tests but now I'm thinking that maybe all errors that propagate up should have a separate message and detail. Definitely not a blocker but might look more consistent.

Previously when doing unsafe replica recovery, if some ranges are missing or represented by stale replicas that were split or merged, recovery will change cluster to inconsistent state with gaps or overlaps in keyspace. This change adds checks for range completeness as well as adds a preference for replicas with higher range applied index. Release note: None

aliher1911 · 2022-01-12T13:22:55Z

bors r=erikgrinaker

craig · 2022-01-12T14:02:55Z

Build succeeded:

GitHub CI (Cockroach)

aliher1911 self-assigned this Dec 13, 2021

aliher1911 requested review from a team as code owners December 13, 2021 20:09

aliher1911 requested review from erikgrinaker and tbg and removed request for a team December 13, 2021 20:10

aliher1911 mentioned this pull request Dec 14, 2021

kvserver/loqrecovery: record and post replica recovery events #73785

Merged

tbg reviewed Dec 20, 2021

View reviewed changes

tbg self-requested a review December 20, 2021 16:59

tbg reviewed Dec 20, 2021

View reviewed changes

aliher1911 force-pushed the loq_check_range_full branch from 1d0cbe5 to 9287b52 Compare December 21, 2021 14:01

erikgrinaker reviewed Dec 21, 2021

View reviewed changes

aliher1911 force-pushed the loq_check_range_full branch from 9287b52 to a450cec Compare December 22, 2021 14:04

aliher1911 commented Dec 22, 2021

View reviewed changes

erikgrinaker reviewed Dec 22, 2021

View reviewed changes

aliher1911 commented Dec 22, 2021

View reviewed changes

erikgrinaker reviewed Dec 22, 2021

View reviewed changes

aliher1911 commented Dec 22, 2021

View reviewed changes

erikgrinaker reviewed Dec 22, 2021

View reviewed changes

knz reviewed Dec 22, 2021

View reviewed changes

aliher1911 force-pushed the loq_check_range_full branch from a450cec to 69e4fd5 Compare December 22, 2021 18:02

aliher1911 commented Dec 22, 2021

View reviewed changes

aliher1911 force-pushed the loq_check_range_full branch 2 times, most recently from db12ad7 to 89af3ba Compare December 22, 2021 23:12

aliher1911 commented Dec 22, 2021

View reviewed changes

erikgrinaker reviewed Dec 23, 2021

View reviewed changes

aliher1911 force-pushed the loq_check_range_full branch 3 times, most recently from bbd2422 to 5d8aba1 Compare December 23, 2021 20:10

aliher1911 force-pushed the loq_check_range_full branch from 5d8aba1 to ca94cc5 Compare December 29, 2021 15:19

aliher1911 mentioned this pull request Dec 30, 2021

kvserver/loqrecovery: write replica recovery events to rangelog #74333

Merged

aliher1911 force-pushed the loq_check_range_full branch from ca94cc5 to 3c10d6c Compare January 5, 2022 10:45

erikgrinaker reviewed Jan 5, 2022

View reviewed changes

aliher1911 force-pushed the loq_check_range_full branch from 3c10d6c to 3e90828 Compare January 6, 2022 14:34

aliher1911 commented Jan 6, 2022

View reviewed changes

erikgrinaker approved these changes Jan 6, 2022

View reviewed changes

aliher1911 commented Jan 7, 2022

View reviewed changes

aliher1911 force-pushed the loq_check_range_full branch from 3e90828 to c1b8782 Compare January 7, 2022 08:15

craig bot merged commit 4149ca7 into cockroachdb:master Jan 12, 2022

Conversation

aliher1911 commented Dec 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Dec 13, 2021

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

down

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

erikgrinaker left a comment

Choose a reason for hiding this comment

Uh oh!

aliher1911 left a comment

Choose a reason for hiding this comment

Uh oh!

erikgrinaker left a comment

Choose a reason for hiding this comment

Uh oh!

aliher1911 left a comment

Choose a reason for hiding this comment

Uh oh!

erikgrinaker left a comment

Choose a reason for hiding this comment

Uh oh!

aliher1911 left a comment

Choose a reason for hiding this comment

Uh oh!

erikgrinaker left a comment

Choose a reason for hiding this comment

Uh oh!

knz left a comment

Choose a reason for hiding this comment

Uh oh!

aliher1911 left a comment

Choose a reason for hiding this comment

Uh oh!

aliher1911 left a comment

Choose a reason for hiding this comment

Uh oh!

erikgrinaker left a comment

Choose a reason for hiding this comment

Uh oh!

erikgrinaker left a comment

Choose a reason for hiding this comment

Uh oh!

aliher1911 left a comment

Choose a reason for hiding this comment

Uh oh!

erikgrinaker left a comment

Choose a reason for hiding this comment

Uh oh!

aliher1911 left a comment

Choose a reason for hiding this comment

Uh oh!

aliher1911 commented Jan 12, 2022

Uh oh!

craig bot commented Jan 12, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

aliher1911 commented Dec 13, 2021 •

edited

Loading