storage: drop raftentry.Cache data in applySnapshot#37055
storage: drop raftentry.Cache data in applySnapshot#37055craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
|
This PR could benefit from a storage test |
a6fa12e to
e917cc3
Compare
|
|
083666b to
7c7d0a9
Compare
5546e70 to
9a6b4fa
Compare
nvb
left a comment
There was a problem hiding this comment.
mod two nits. Thanks for getting this up so quickly.
Reviewed 2 of 2 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @nvanbenschoten)
pkg/storage/replica_raftstorage.go, line 861 at r2 (raw file):
// Clear the cached raft log entries to ensure that old or uncommitted // entries don't impact the in-memory state. r.store.raftEntryCache.Drop(r.RangeID)
You might consider adding a TODO to the method to use it in more places.
pkg/storage/raftentry/cache_test.go, line 456 at r1 (raw file):
var wg sync.WaitGroup // Test using both Clear and Drop to remove the added entries. testutils.RunTrueAndFalse(t, "dropOrClear", func(t *testing.T, useClear bool) {
nit: this isn't a great use of RunTrueAndFalse because it will be printed as:
TestConcurrentUpdates/dropOrClear=false
TestConcurrentUpdates/dropOrClear=true
It's not really clear what either sub-test does. Maybe consider:
for _, t := range struct{
name string
clear func()
}{
{"drop", func() { c.Drop(1) }},
{"clear", func() { c.Clear(1, 22) }},
}{
...
Release note: None
Add logic to clear cached raft log entries when applying a snapshot. Release note: None
9a6b4fa to
f8de052
Compare
ajwerner
left a comment
There was a problem hiding this comment.
Thanks for the quick review and for writing the test!
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/replica_raftstorage.go, line 861 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
You might consider adding a TODO to the method to use it in more places.
Anywhere in particular come to mind? A quick glance at the uses of Clear didn't reveal much to me in terms of replacement outside of some tests. One place we could use it is in replica_destroy. Where would the TODO go? Here or on the method? Either one feels like pollution without something specific in mind. If the thought was just about replica gc, I can just type a PR instead of a TODO.
pkg/storage/raftentry/cache_test.go, line 456 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: this isn't a great use of
RunTrueAndFalsebecause it will be printed as:TestConcurrentUpdates/dropOrClear=false TestConcurrentUpdates/dropOrClear=trueIt's not really clear what either sub-test does. Maybe consider:
for _, t := range struct{ name string clear func() }{ {"drop", func() { c.Drop(1) }}, {"clear", func() { c.Clear(1, 22) }}, }{ ...
Done
|
@nvanbenschoten cockroach/pkg/storage/replica_destroy.go Lines 77 to 81 in 8c4ba4b |
nvb
left a comment
There was a problem hiding this comment.
Is this where you had in mind for the additional Drop call?
I was thinking one level above that, in Replica.destroyRaftMuLocked. After the batch is actually committed.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
nvb
left a comment
There was a problem hiding this comment.
No need to make that change now though. We'll want this to be small enough to backport.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
|
bors r=nvanbenschoten |
37055: storage: drop raftentry.Cache data in applySnapshot r=nvanbenschoten a=ajwerner This PR adds a new `.Drop` method to the `raftentry.Cache` which will clear all data associated with a range more efficiently than calling `.Clear` with a large index. The second commit then uses this call when applying a snapshot to ensure that stale cached raft entries are never used. Fixes #37056. Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Build succeeded |
This PR adds a regression test for cockroachdb#37056. In doing so, it also confirms the theory that cockroachdb#37055 is the proper fix for that bug. Before cockroachdb#37055, the test would get stuck with the following logs repeatedly firing: ``` I190424 00:15:52.338808 12 storage/client_test.go:1242 SucceedsSoon: expected [21 21 21], got [12 21 21] I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317 [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2 I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065 [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31 I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068 [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0] ``` After cockroachdb#37055, the test passes. Release note: None
This PR adds a regression test for cockroachdb#37056. In doing so, it also confirms the theory that cockroachdb#37055 is the proper fix for that bug. Before cockroachdb#37055, the test would get stuck with the following logs repeatedly firing: ``` I190424 00:15:52.338808 12 storage/client_test.go:1242 SucceedsSoon: expected [21 21 21], got [12 21 21] I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317 [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2 I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065 [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31 I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068 [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0] ``` After cockroachdb#37055, the test passes. Release note: None
This PR adds a regression test for cockroachdb#37056. In doing so, it also confirms the theory that cockroachdb#37055 is the proper fix for that bug. Before cockroachdb#37055, the test would get stuck with the following logs repeatedly firing: ``` I190424 00:15:52.338808 12 storage/client_test.go:1242 SucceedsSoon: expected [21 21 21], got [12 21 21] I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317 [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2 I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065 [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31 I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068 [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0] ``` After cockroachdb#37055, the test passes. Release note: None
37058: storage: create new TestSnapshotAfterTruncationWithUncommittedTail test r=nvanbenschoten a=nvanbenschoten This PR adds a regression test for #37056. In doing so, it also confirms the theory that #37055 is the proper fix for that bug. Before #37055, the test would get stuck with the following logs repeatedly firing: ``` I190424 00:15:52.338808 12 storage/client_test.go:1242 SucceedsSoon: expected [21 21 21], got [12 21 21] I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317 [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2 I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065 [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31 I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068 [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0] ``` After #37055, the test passes. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
This PR adds a regression test for cockroachdb#37056. In doing so, it also confirms the theory that cockroachdb#37055 is the proper fix for that bug. Before cockroachdb#37055, the test would get stuck with the following logs repeatedly firing: ``` I190424 00:15:52.338808 12 storage/client_test.go:1242 SucceedsSoon: expected [21 21 21], got [12 21 21] I190424 00:15:52.378060 78 vendor/go.etcd.io/etcd/raft/raft.go:1317 [s1,r1/1:/M{in-ax}] 1 [logterm: 6, index: 31] rejected msgApp [logterm: 8, index: 31] from 2 I190424 00:15:52.378248 184 vendor/go.etcd.io/etcd/raft/raft.go:1065 [s2,r1/2:/M{in-ax}] 2 received msgApp rejection(lastindex: 31) from 1 for index 31 I190424 00:15:52.378275 184 vendor/go.etcd.io/etcd/raft/raft.go:1068 [s2,r1/2:/M{in-ax}] 2 decreased progress of 1 to [next = 31, match = 31, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0] ``` After cockroachdb#37055, the test passes. Release note: None
This PR adds a new
.Dropmethod to theraftentry.Cachewhich will clear all data associated with a range more efficiently than calling.Clearwith a large index. The second commit then uses this call when applying a snapshot to ensure that stale cached raft entries are never used.Fixes #37056.