storage: assert around snapshot sending/receiving#42011
storage: assert around snapshot sending/receiving#42011tbg merged 1 commit intocockroachdb:release-2.1from
Conversation
f7d1a40 to
31929d3
Compare
|
@tbg: I'm likely missing other spots where asserts would be useful. If you know of any I missed, let me know. |
31929d3 to
60ec8ce
Compare
tbg
left a comment
There was a problem hiding this comment.
I can't think of other places to put these. There's some more information that we'll generally want to put into these errors plus we'll want to create RocksDB checkpoints whenever they occur, but other than that this looks good!
pkg/storage/replica_raftstorage.go
Outdated
|
|
||
| if expLen := (s.RaftAppliedIndex - s.TruncatedState.Index); expLen != uint64(len(logEntries)) { | ||
| log.Fatalf(ctx, | ||
| "received inconsistent number of log entries: got %d entries, expected %d entries", |
There was a problem hiding this comment.
Print the raft applied index, truncated state, HardState and actual range ([a..b]) of indexes read.
For all of the fatals, also create a rocks Checkpoint (CreateCheckpoint, see
cockroach/pkg/storage/replica_proposal.go
Line 222 in a769be1
There was a problem hiding this comment.
It seems that maybe you want to extract a helper that you then call in all the places.
There was a problem hiding this comment.
CreateCheckpoint is not available in v2.1, it was only added in #36867.
Print the raft applied index, truncated state, HardState and actual range ([a..b]) of indexes read.
Done.
60ec8ce to
ab8683d
Compare
tbg
left a comment
There was a problem hiding this comment.
so far, thanks! Curious if we can get CreateCheckpoint onto release-2.1. It's really our biggest punch we can land if this bug comes up again
Reviewed 2 of 7 files at r1, 5 of 5 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @irfansharif and @tbg)
pkg/storage/raft_log_queue.go, line 354 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
Done (but we don't modify decision.Input, nor does it seem that it should, which is why I put this up).
I know, just de-risking the change regardless. As a reviewer, I can't tell whetherInput is embedded in truncateDecision. If, for example, ChosenVia were really in Input, we'd have introduced some weirdness. I know nothing like that happened, but I like to keep the diffs braindead on release branches.
pkg/storage/raft_log_queue.go, line 397 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
I certainly wouldn't want it to return a truncate decision that has NewFirstIndex > LastIndex
This already does happen (and why my first revision failed teamcity).
input.FirstIndexis set to TruncatedState.Index + 1, so 11 for uinit'ed replicas, whereas LastIndex is 10.The code in line 384 above seems like it would set NewFirstIndex := 10 in this case. Is that not what happens?
It does, but given 10 < 11 (input.FirstIndex), it's brought back up to 11. And thus we have NewFirstIndex > LastIndex. So this is funky, only for uninit'ed replicas can we have input.FirstIndex > input.LastIndex (input.FirstIndex = input.LastIndex + 1).
Add that in a comment, please.
pkg/storage/replica_raftstorage.go, line 941 at r1 (raw file):
Previously, irfansharif (irfan sharif) wrote…
CreateCheckpoint is not available in v2.1, it was only added in #36867.
Print the raft applied index, truncated state, HardState and actual range ([a..b]) of indexes read.
Done.
Ah, that's a real bummer because some deployments tend to auto-restart crashing nodes which will wipe the evidence. Does c1d8a2e backport to release-2.1 somewhat cleanly?
pkg/storage/replica_raftstorage.go, line 944 at r2 (raw file):
"(RaftAppliedIndex=%d, TruncatedState.Index=%d, HardState=%s, ReceivedLogEntries=[%d,%d])", len(logEntries), expLen, s.RaftAppliedIndex, s.TruncatedState.Index, hs.String(), logEntries[0].Index, logEntries[len(logEntries)-1].Index)
logEntries will be empty if we see the same bug again, so make sure the assertion doesn't panic in that case.
pkg/storage/store_snapshot.go, line 283 at r2 (raw file):
// snapshot) and the truncated index should equal the number of log entries // shipped over. expLen := endIndex - firstIndex
This is the assertion that we expect to fire - the snapshot that was sent had zero entries. Make sure this gets an engine snapshot if you find that CreateCheckpoint does backport well enough. Without the snapshot I think we'll be unlikely to have much evidence left by the time we get to take a look. Like in the other place, also print the actual range of indexes we got here.
ab8683d to
aa2b5a9
Compare
irfansharif
left a comment
There was a problem hiding this comment.
TFTR. I'll try backporting CreateCheckpoint in a separate PR (also it looks like it's missing from 19.1). I'll ping back here if successful 🤞
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)
pkg/storage/raft_log_queue.go, line 354 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I know, just de-risking the change regardless. As a reviewer, I can't tell whether
Inputis embedded intruncateDecision. If, for example,ChosenViawere really inInput, we'd have introduced some weirdness. I know nothing like that happened, but I like to keep the diffs braindead on release branches.
Gotcha, that makes sense. I'll keep this heuristic in mind going forward.
pkg/storage/raft_log_queue.go, line 397 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Add that in a comment, please.
Done.
pkg/storage/replica_raftstorage.go, line 941 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Ah, that's a real bummer because some deployments tend to auto-restart crashing nodes which will wipe the evidence. Does c1d8a2e backport to release-2.1 somewhat cleanly?
Trying this in a separate PR.
pkg/storage/replica_raftstorage.go, line 944 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
logEntries will be empty if we see the same bug again, so make sure the assertion doesn't panic in that case.
Whoops, fixed.
pkg/storage/store_snapshot.go, line 283 at r2 (raw file):
Previously, tbg (Tobias Grieger) wrote…
This is the assertion that we expect to fire - the snapshot that was sent had zero entries. Make sure this gets an engine snapshot if you find that CreateCheckpoint does backport well enough. Without the snapshot I think we'll be unlikely to have much evidence left by the time we get to take a look. Like in the other place, also print the actual range of indexes we got here.
Trying this in a separate PR.
tbg
left a comment
There was a problem hiding this comment.
Reviewed 3 of 3 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale)
aa2b5a9 to
3feb097
Compare
|
Rebased atop #42042, PTA(brief)L. |
tbg
left a comment
There was a problem hiding this comment.
Minor comments, but I'm confused about the go version check.
Reviewed 5 of 5 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif)
pkg/storage/replica_command.go, line 1033 at r4 (raw file):
sent, ); err != nil { if _, ok := err.(*MalformedSnapshotError); ok {
This looks brittle, I'd go for errors.Cause to make sure that an intermittent errors.Wrap isn't letting the error bypass this check. Also just test it manually (by always returning this error) and making sure that any test that involves snapshots fails with the proper fatal.
3feb097 to
eebb3f3
Compare
In v2.1 log entries are shipped alongside snapshots. The log entries included in snapshots (which also include the truncated state and the applied index) cover all indexes in the range [truncated-state.index + 1, applied-state]. We simply assert that this is always the case. We also assert during log truncations that the number of deleted entries is no more than what we expect (last index - first index). Additionally rename PendingPreemptiveSnapshotIndex to PendingSnapshotIndex, as it applies to both raft snapshots and pre-emptive snapshots. Release note: None
eebb3f3 to
93db76c
Compare
irfansharif
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @tbg)
pkg/storage/replica_command.go, line 1033 at r4 (raw file):
Previously, tbg (Tobias Grieger) wrote…
This looks brittle, I'd go for
errors.Causeto make sure that an intermittenterrors.Wrapisn't letting the error bypass this check. Also just test it manually (by always returning this error) and making sure that any test that involves snapshots fails with the proper fatal.
Done. Also checked manually that checkpoints are created (tests create in mem rocksdb instances, which don't create checkpoints it seems).
tbg
left a comment
There was a problem hiding this comment.
Reviewed 3 of 3 files at r5.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale)
|
</god merge> |
In v2.1 log entries are shipped alongside snapshots. The log entries
included in snapshots (which also include the truncated state and the
applied index) cover all indexes in the range
[truncated-state.index + 1, applied-state]. We simply assert that this
is always the case. We also assert during log truncations that the
number of deleted entries is no more than what we expect (last index -
first index).
Additionally rename PendingPreemptiveSnapshotIndex to
PendingSnapshotIndex, as it applies to both raft snapshots and
pre-emptive snapshots.
Release note: None