Do not load v2 snapshot on bootstrap#21107
Conversation
d517cd1 to
881a355
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files
... and 20 files with indirect coverage changes @@ Coverage Diff @@
## main #21107 +/- ##
==========================================
+ Coverage 68.39% 68.46% +0.07%
==========================================
Files 429 429
Lines 35244 35242 -2
==========================================
+ Hits 24105 24130 +25
+ Misses 9745 9728 -17
+ Partials 1394 1384 -10 Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
881a355 to
d5f4a0d
Compare
|
@fuweid @serathius PTAL Note We still need to generate v2 snapshot in 3.7, so as to be compatible with 3.6. We don't need to generate v2 snapshot at all in 3.8. |
| }, | ||
| } | ||
| if walSnaps[idx].ConfState != nil { | ||
| snapshot.Metadata.ConfState = *walSnaps[idx].ConfState |
There was a problem hiding this comment.
Shouldn't we read ConfState from v3?
There was a problem hiding this comment.
This is the next step.
I am trying to keep this PR as small as possible so that it's easier to review. The change in this PR should be independent. The goal is that we don't load v2snapshot at all in this PR, so that it's safe to completely remove v2snapshot in 3.8.
There was a problem hiding this comment.
I don't follow how dropping snapshotter logic to carefully select a snapshot and replacing it with just picking the last one based on WAL helps with that.
There was a problem hiding this comment.
I think my comment above #21107 (comment) already clarify the goal of this PR.
also #21107 (comment)
There was a problem hiding this comment.
No, having small is not a goal, it's a good practice. I think the goal should be avoiding incorrect etcd behavior, even if it's temporary.
Main thing confusing for me is title of PR doesn't match the code. We still load v2 snapshot, we just don't use snapshotter. Don't know if there is much benefit of dropping snapshotter, it's just logic to select which snapshot to open, so we don't really loading v2 snapshot, we just downgraded our logic to select it.
There was a problem hiding this comment.
No, having small is not a goal, it's a good practice
As mentioned above multiple times, the end goal is to completely get rid of v2 snapshot, which means we don't read v2 snapshot, nor write it. But in order to be compatible with 3.6, we still need to write v2 snapshot. This PR just stops read v2 snapshot, so that it's safe to completely remove it in 3.8.
I think the goal should be avoiding incorrect etcd behavior,
Do you see any incorrect behaviour?
Main thing confusing for me is title of PR doesn't match the code. We still load v2 snapshot
No, etcd doesn't load v2 snapshot anymore in this PR.
Note that snapshot is a generic concept in raft. raft may bootstrap from a snapshot. etcd also processes snapshots in two ways:
- send snapshot to slow followers
- generate snapshot every 10K entries. This may not be necessary, but it's a good timing to purge old entries.
There was a problem hiding this comment.
There is not undermining the end goal. I'm asking about the goal of this PR.
I missed one thing, this PR replaces snapshotter loading snapshot with just picking last Snapshot entry from WAL. As we no longer need the snapshot file to exist, just picking the last entry is correct. I think it makes sense now, this is a correct next step.
To speed up the review, please consider adding context to the PR. What the change is doing and why this is correct. Just "Do not load v2 snapshot on bootstrap" is not enough, as there is no direct code that you change that touches snapshots.
There was a problem hiding this comment.
What the change is doing and why this is correct. Just "Do not load v2 snapshot on bootstrap" is not enough, as there is no direct code that you change that touches snapshots.
Just updated the description. thx
d5f4a0d to
d68f180
Compare
686f0f9 to
f6241ee
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahrtr, serathius The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
In this commit, we don't load v2 snapshot files (*.snap) anymore, instead we just pick the latest snapshot entry from WAL. We don't need the raftpb.Snapshot.Data anymore, so it's correct to just read the latest snapshot entry from WAL. The end goal is to completely get rid of v2 snapshot, which means we don't read v2 snapshot, nor write it. But in order to be compatible with 3.6, we still need to write v2 snapshot. This PR just stops reading v2 snapshot, so that it's safe to completely remove it in 3.8. Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
f6241ee to
e74bb6a
Compare
|
@ahrtr: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| "recovered v2 store from snapshot", | ||
| var snapshot *raftpb.Snapshot | ||
| if len(walSnaps) > 0 { | ||
| idx := len(walSnaps) - 1 |
There was a problem hiding this comment.
ETCD commits the snapshot to disk first and then appends the WAL. It should be fine and it won't break test cases.
Link to #20187
In this PR, we don't load v2 snapshot files (*.snap) anymore, instead we just pick the latest snapshot entry from WAL. We don't need the raftpb.Snapshot.Data anymore, so it's correct to just read the latest snapshot entry from WAL.
The end goal is to completely get rid of v2 snapshot, which means we don't read v2 snapshot, nor write it. But in order to be compatible with 3.6, we still need to write v2 snapshot. This PR just stops reading v2 snapshot, so that it's safe to completely remove it in 3.8.
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.