Skip to content

storage: fix incorrect LookupRange error causing TestSplitSnapshotRace_SplitWins flakiness#8988

Merged
RaduBerinde merged 2 commits intocockroachdb:masterfrom
RaduBerinde:fix-split-test-loop
Aug 31, 2016
Merged

storage: fix incorrect LookupRange error causing TestSplitSnapshotRace_SplitWins flakiness#8988
RaduBerinde merged 2 commits intocockroachdb:masterfrom
RaduBerinde:fix-split-test-loop

Conversation

@RaduBerinde
Copy link
Copy Markdown
Member

@RaduBerinde RaduBerinde commented Aug 31, 2016

The test occasionally times out under stressrace. The culprit is a condition
where the first replica doesn't find the range and incorrectly returns
RangeKeyMismatchError. The request is not retried on other replicas, but is then
retried endlessly by dist_sender.

The fix is to return a RangeNotFound in this case. Rewriting the code in the
function a bit to make it more readable.

Fixes #8868.


This change is Reviewable

repDesc, err = replica.GetReplicaDescriptor()
if err != nil {
if _, ok := err.(*roachpb.RangeNotFoundError); ok {
// Ignore this replica.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More words

@tamird
Copy link
Copy Markdown
Contributor

tamird commented Aug 31, 2016

:lgtm:


Reviewed 1 of 1 files at r1, 2 of 2 files at r2.
Review status: all files reviewed at latest revision, 5 unresolved discussions, all commit checks successful.


storage/client_test.go, line 398 [r2] (raw file):

  nodeIndex := int(rep.NodeID) - 1
  if log.V(2) {
      log.Infof(context.TODO(), "SendNext nodeIndex=%d", nodeIndex)

this is test code - feel free to remove the V(2)


storage/stores.go, line 234 [r2] (raw file):

      repDescFound = true
      rangeID = replica.RangeID

can you move this up to where repDesc is assigned? slightly more expected there.

rangeID = replica.RangeID

var err error
repDesc, err = replica.GetReplicaDescriptor()
...

storage/stores.go, line 237 [r2] (raw file):

  }
  if !repDescFound {
      return 0, roachpb.ReplicaDescriptor{}, roachpb.NewRangeNotFoundError(0)

This looks good to me. Can you write a focused test for it?


storage/stores_test.go, line 205 [r1] (raw file):

      if tc.expError != "" {
          if !testutils.IsError(err, tc.expError) {
              t.Errorf("got unexpected error %v", err)

you need to log the expected error here - it is no longer evident from the code. also, every one of these calls to Errorf should begin with the loop index variable "%d: "


Comments from Reviewable

@RaduBerinde
Copy link
Copy Markdown
Member Author

Review status: 0 of 3 files reviewed at latest revision, 5 unresolved discussions, some commit checks pending.


storage/stores.go, line 221 [r2] (raw file):

Previously, tschottdorf (Tobias Schottdorf) wrote…

More words

Done.

storage/stores.go, line 234 [r2] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

can you move this up to where repDesc is assigned? slightly more expected there.

rangeID = replica.RangeID

var err error
repDesc, err = replica.GetReplicaDescriptor()
...
Done.

storage/stores.go, line 237 [r2] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

This looks good to me. Can you write a focused test for it?

Done.

storage/stores_test.go, line 205 [r1] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

you need to log the expected error here - it is no longer evident from the code. also, every one of these calls to Errorf should begin with the loop index variable "%d: "

Done.

Comments from Reviewable

@tamird
Copy link
Copy Markdown
Contributor

tamird commented Aug 31, 2016

Reviewed 3 of 3 files at r3, 3 of 3 files at r4.
Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks pending.


storage/stores.go, line 223 [r4] (raw file):

      if err != nil {
          if _, ok := err.(*roachpb.RangeNotFoundError); ok {
              // We are not holding a lock across this block so the range could have

"range could have gone away" isn't very descriptive. specifically, this would only happen if this replica was removed from the range via down-replication.


Comments from Reviewable

Rewrite part of TestStoresLookupReplica to use a table.
@tbg
Copy link
Copy Markdown
Member

tbg commented Aug 31, 2016

Reviewed 3 of 3 files at r5, 3 of 3 files at r6.
Review status: all files reviewed at latest revision, 2 unresolved discussions.


Comments from Reviewable

…e_SplitWins flakiness

The test occasionally times out under `stressrace`. The culprit is a condition
where the first replica doesn't find the range and incorrectly returns
RangeKeyMismatchError. The request is not retried on other replicas, but is then
retried endlessly by dist_sender.

The fix is to return a `RangeNotFound` in this case. Rewriting the code in the
function a bit to make it more readable.

Fixes cockroachdb#8868.
@RaduBerinde
Copy link
Copy Markdown
Member Author

Review status: 2 of 3 files reviewed at latest revision, 2 unresolved discussions, some commit checks pending.


storage/stores.go, line 223 [r4] (raw file):

Previously, tamird (Tamir Duberstein) wrote…

"range could have gone away" isn't very descriptive. specifically, this would only happen if this replica was removed from the range via down-replication.

Done.

Comments from Reviewable

@tamird
Copy link
Copy Markdown
Contributor

tamird commented Aug 31, 2016

Reviewed 2 of 3 files at r6, 1 of 1 files at r7.
Review status: all files reviewed at latest revision, 1 unresolved discussion, some commit checks pending.


Comments from Reviewable

@RaduBerinde RaduBerinde merged commit 1a34ca3 into cockroachdb:master Aug 31, 2016
@RaduBerinde RaduBerinde deleted the fix-split-test-loop branch August 31, 2016 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Failed tests (21860): TestSplitSnapshotRace_SplitWins

3 participants