storage: create new TestSnapshotAfterTruncationWithUncommittedTail test#37058
storage: create new TestSnapshotAfterTruncationWithUncommittedTail test#37058craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
bdarnell
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r1, 1 of 1 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)
pkg/storage/client_merge_test.go, line 2994 at r2 (raw file):
if len(req.Heartbeats)+len(req.HeartbeatResps) > 0 { reqCpy := *req *req = reqCpy
Why make a shallow copy in both directions? This doesn't look like it does anything. Should the second line be req = &reqCpy?
pkg/storage/client_raft_test.go, line 867 at r2 (raw file):
ctx := context.Background() mtc := &multiTestContext{ // This test was written before the multiTestContext started creating many
Not quite true in this case, but OK.
pkg/storage/client_raft_test.go, line 941 at r2 (raw file):
for i := 0; i < 32; i++ { cCtx, cancel := context.WithTimeout(ctx, 25*time.Millisecond) defer cancel()
These deferred cancels will pile up until the end of the outer function. Should this be wrapped in a func literal to run them sooner?
pkg/storage/client_raft_test.go, line 961 at r2 (raw file):
testutils.SucceedsSoon(t, func() error { mtc.advanceClock(ctx) err := newLeaderRepl.AdminTransferLease(ctx, newLeaderRepl.StoreID())
I don't think we need a transfer here. What's happening is that we're waiting for the old leader's lease to time out and then the first operation on newLeaderRepl will acquire the lease. Here that operation is an immediate transfer to itself, which is redundant - it could just be a Get.
pkg/storage/client_raft_test.go, line 993 at r2 (raw file):
} // The partitioned replica should catch up after a snapshot.
Can we assert that a snapshot was used? (maybe via a metric?)
ajwerner
left a comment
There was a problem hiding this comment.
I have nothing to add. Thanks for typing this up!
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/client_raft_test.go, line 936 at r2 (raw file):
cancelled,
gets me every time
00f5876 to
4c93815
Compare
nvb
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @bdarnell)
pkg/storage/client_merge_test.go, line 2994 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Why make a shallow copy in both directions? This doesn't look like it does anything. Should the second line be
req = &reqCpy?
Done.
pkg/storage/client_raft_test.go, line 936 at r2 (raw file):
Previously, ajwerner wrote…
cancelled,gets me every time
Done.
pkg/storage/client_raft_test.go, line 941 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
These deferred cancels will pile up until the end of the outer function. Should this be wrapped in a func literal to run them sooner?
Done.
pkg/storage/client_raft_test.go, line 961 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
I don't think we need a transfer here. What's happening is that we're waiting for the old leader's lease to time out and then the first operation on newLeaderRepl will acquire the lease. Here that operation is an immediate transfer to itself, which is redundant - it could just be a Get.
I did it like this to be explicit, but I'm fine removing the AdminTransferLease. In that case, I can just put the increment in the SucceedsSoon loop. Done.
pkg/storage/client_raft_test.go, line 993 at r2 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Can we assert that a snapshot was used? (maybe via a metric?)
Done.
Release note: None
4c93815 to
f14c931
Compare
tbg
left a comment
There was a problem hiding this comment.
Reviewed 3 of 3 files at r3, 1 of 1 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @bdarnell)
This PR adds a regression test for cockroachdb#37056. In doing so, it also confirms the theory that cockroachdb#37055 is the proper fix for that bug. Before cockroachdb#37055, the test would get stuck with the following logs repeatedly firing: ``` I190424 00:15:52.338808 12 storage/client_test.go:1242 SucceedsSoon: expected [21 21 21], got [12 21 21] I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317 [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2 I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065 [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31 I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068 [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0] ``` After cockroachdb#37055, the test passes. Release note: None
f14c931 to
32e7d43
Compare
|
bors r+ |
37058: storage: create new TestSnapshotAfterTruncationWithUncommittedTail test r=nvanbenschoten a=nvanbenschoten This PR adds a regression test for #37056. In doing so, it also confirms the theory that #37055 is the proper fix for that bug. Before #37055, the test would get stuck with the following logs repeatedly firing: ``` I190424 00:15:52.338808 12 storage/client_test.go:1242 SucceedsSoon: expected [21 21 21], got [12 21 21] I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317 [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2 I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065 [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31 I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068 [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0] ``` After #37055, the test passes. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Build succeeded |
This commit speeds up the test added in cockroachdb#37058 by parallelizing its failed writes. Release note: None
This commit speeds up the test added in cockroachdb#37058 by parallelizing its failed writes. Release note: None
This PR adds a regression test for #37056. In doing so, it also confirms
the theory that #37055 is the proper fix for that bug.
Before #37055, the test would get stuck with the following logs repeatedly
firing:
After #37055, the test passes.