Skip to content

storage: create new TestSnapshotAfterTruncationWithUncommittedTail test#37058

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/testStuckProp
Apr 24, 2019
Merged

storage: create new TestSnapshotAfterTruncationWithUncommittedTail test#37058
craig[bot] merged 2 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/testStuckProp

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Apr 24, 2019

This PR adds a regression test for #37056. In doing so, it also confirms
the theory that #37055 is the proper fix for that bug.

Before #37055, the test would get stuck with the following logs repeatedly
firing:

I190424 00:15:52.338808 12 storage/client_test.go:1242  SucceedsSoon: expected [21 21 21], got [12 21 21]
I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317  [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2
I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065  [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31
I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068  [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]

After #37055, the test passes.

@nvb nvb requested review from a team, ajwerner and bdarnell April 24, 2019 01:00
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, 1 of 1 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)


pkg/storage/client_merge_test.go, line 2994 at r2 (raw file):

	if len(req.Heartbeats)+len(req.HeartbeatResps) > 0 {
		reqCpy := *req
		*req = reqCpy

Why make a shallow copy in both directions? This doesn't look like it does anything. Should the second line be req = &reqCpy?


pkg/storage/client_raft_test.go, line 867 at r2 (raw file):

	ctx := context.Background()
	mtc := &multiTestContext{
		// This test was written before the multiTestContext started creating many

Not quite true in this case, but OK.


pkg/storage/client_raft_test.go, line 941 at r2 (raw file):

	for i := 0; i < 32; i++ {
		cCtx, cancel := context.WithTimeout(ctx, 25*time.Millisecond)
		defer cancel()

These deferred cancels will pile up until the end of the outer function. Should this be wrapped in a func literal to run them sooner?


pkg/storage/client_raft_test.go, line 961 at r2 (raw file):

	testutils.SucceedsSoon(t, func() error {
		mtc.advanceClock(ctx)
		err := newLeaderRepl.AdminTransferLease(ctx, newLeaderRepl.StoreID())

I don't think we need a transfer here. What's happening is that we're waiting for the old leader's lease to time out and then the first operation on newLeaderRepl will acquire the lease. Here that operation is an immediate transfer to itself, which is redundant - it could just be a Get.


pkg/storage/client_raft_test.go, line 993 at r2 (raw file):

	}

	// The partitioned replica should catch up after a snapshot.

Can we assert that a snapshot was used? (maybe via a metric?)

Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: I have nothing to add. Thanks for typing this up!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/storage/client_raft_test.go, line 936 at r2 (raw file):

 cancelled,

gets me every time

@nvb nvb force-pushed the nvanbenschoten/testStuckProp branch from 00f5876 to 4c93815 Compare April 24, 2019 02:29
Copy link
Copy Markdown
Contributor Author

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @bdarnell)


pkg/storage/client_merge_test.go, line 2994 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Why make a shallow copy in both directions? This doesn't look like it does anything. Should the second line be req = &reqCpy?

Done.


pkg/storage/client_raft_test.go, line 936 at r2 (raw file):

Previously, ajwerner wrote…
 cancelled,

gets me every time

Done.


pkg/storage/client_raft_test.go, line 941 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

These deferred cancels will pile up until the end of the outer function. Should this be wrapped in a func literal to run them sooner?

Done.


pkg/storage/client_raft_test.go, line 961 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I don't think we need a transfer here. What's happening is that we're waiting for the old leader's lease to time out and then the first operation on newLeaderRepl will acquire the lease. Here that operation is an immediate transfer to itself, which is redundant - it could just be a Get.

I did it like this to be explicit, but I'm fine removing the AdminTransferLease. In that case, I can just put the increment in the SucceedsSoon loop. Done.


pkg/storage/client_raft_test.go, line 993 at r2 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Can we assert that a snapshot was used? (maybe via a metric?)

Done.

@nvb nvb force-pushed the nvanbenschoten/testStuckProp branch from 4c93815 to f14c931 Compare April 24, 2019 04:28
Copy link
Copy Markdown
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r3, 1 of 1 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @bdarnell)

This PR adds a regression test for cockroachdb#37056. In doing so, it also confirms
the theory that cockroachdb#37055 is the proper fix for that bug.

Before cockroachdb#37055, the test would get stuck with the following logs repeatedly
firing:
```
I190424 00:15:52.338808 12 storage/client_test.go:1242  SucceedsSoon: expected [21 21 21], got [12 21 21]
I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317  [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2
I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065  [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31
I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068  [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
```

After cockroachdb#37055, the test passes.

Release note: None
@nvb nvb force-pushed the nvanbenschoten/testStuckProp branch from f14c931 to 32e7d43 Compare April 24, 2019 04:55
@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Apr 24, 2019

bors r+

craig bot pushed a commit that referenced this pull request Apr 24, 2019
37058: storage: create new TestSnapshotAfterTruncationWithUncommittedTail test r=nvanbenschoten a=nvanbenschoten

This PR adds a regression test for #37056. In doing so, it also confirms
the theory that #37055 is the proper fix for that bug.

Before #37055, the test would get stuck with the following logs repeatedly
firing:
```
I190424 00:15:52.338808 12 storage/client_test.go:1242  SucceedsSoon: expected [21 21 21], got [12 21 21]
I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317  [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2
I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065  [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31
I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068  [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
```

After #37055, the test passes.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Apr 24, 2019

Build succeeded

@craig craig bot merged commit 32e7d43 into cockroachdb:master Apr 24, 2019
nvb added a commit to nvb/cockroach that referenced this pull request Apr 24, 2019
This commit speeds up the test added in cockroachdb#37058 by parallelizing its
failed writes.

Release note: None
craig bot pushed a commit that referenced this pull request Apr 24, 2019
37079: storage: speed up TestSnapshotAfterTruncationWithUncommittedTail r=nvanbenschoten a=nvanbenschoten

This commit speeds up the test added in #37058 by parallelizing its
failed writes.

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
nvb added a commit to nvb/cockroach that referenced this pull request Apr 24, 2019
This commit speeds up the test added in cockroachdb#37058 by parallelizing its
failed writes.

Release note: None
@nvb nvb deleted the nvanbenschoten/testStuckProp branch April 24, 2019 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants