Skip to content

Do not load v2 snapshot on bootstrap#21107

Merged
ahrtr merged 1 commit intoetcd-io:mainfrom
ahrtr:20260112_bootstrap_v3_not_read_v2snap
Jan 13, 2026
Merged

Do not load v2 snapshot on bootstrap#21107
ahrtr merged 1 commit intoetcd-io:mainfrom
ahrtr:20260112_bootstrap_v3_not_read_v2snap

Conversation

@ahrtr
Copy link
Copy Markdown
Member

@ahrtr ahrtr commented Jan 12, 2026

Link to #20187

In this PR, we don't load v2 snapshot files (*.snap) anymore, instead we just pick the latest snapshot entry from WAL. We don't need the raftpb.Snapshot.Data anymore, so it's correct to just read the latest snapshot entry from WAL.

The end goal is to completely get rid of v2 snapshot, which means we don't read v2 snapshot, nor write it. But in order to be compatible with 3.6, we still need to write v2 snapshot. This PR just stops reading v2 snapshot, so that it's safe to completely remove it in 3.8.

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 12, 2026

Codecov Report

❌ Patch coverage is 90.90909% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.46%. Comparing base (f5e6640) to head (e74bb6a).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
etcdutl/etcdutl/common.go 60.00% 2 Missing ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
server/etcdserver/bootstrap.go 65.81% <100.00%> (+0.56%) ⬆️
etcdutl/etcdutl/common.go 35.29% <60.00%> (-10.86%) ⬇️

... and 20 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #21107      +/-   ##
==========================================
+ Coverage   68.39%   68.46%   +0.07%     
==========================================
  Files         429      429              
  Lines       35244    35242       -2     
==========================================
+ Hits        24105    24130      +25     
+ Misses       9745     9728      -17     
+ Partials     1394     1384      -10     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f5e6640...e74bb6a. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ahrtr ahrtr force-pushed the 20260112_bootstrap_v3_not_read_v2snap branch from 881a355 to d5f4a0d Compare January 12, 2026 15:50
@ahrtr ahrtr changed the title [WIP] Do not load v2 snapshot on bootstrap Do not load v2 snapshot on bootstrap Jan 12, 2026
@ahrtr
Copy link
Copy Markdown
Member Author

ahrtr commented Jan 12, 2026

@fuweid @serathius PTAL

Note We still need to generate v2 snapshot in 3.7, so as to be compatible with 3.6. We don't need to generate v2 snapshot at all in 3.8.

},
}
if walSnaps[idx].ConfState != nil {
snapshot.Metadata.ConfState = *walSnaps[idx].ConfState
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we read ConfState from v3?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the next step.

I am trying to keep this PR as small as possible so that it's easier to review. The change in this PR should be independent. The goal is that we don't load v2snapshot at all in this PR, so that it's safe to completely remove v2snapshot in 3.8.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow how dropping snapshotter logic to carefully select a snapshot and replacing it with just picking the last one based on WAL helps with that.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my comment above #21107 (comment) already clarify the goal of this PR.

also #21107 (comment)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, having small is not a goal, it's a good practice. I think the goal should be avoiding incorrect etcd behavior, even if it's temporary.

Main thing confusing for me is title of PR doesn't match the code. We still load v2 snapshot, we just don't use snapshotter. Don't know if there is much benefit of dropping snapshotter, it's just logic to select which snapshot to open, so we don't really loading v2 snapshot, we just downgraded our logic to select it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, having small is not a goal, it's a good practice

As mentioned above multiple times, the end goal is to completely get rid of v2 snapshot, which means we don't read v2 snapshot, nor write it. But in order to be compatible with 3.6, we still need to write v2 snapshot. This PR just stops read v2 snapshot, so that it's safe to completely remove it in 3.8.

I think the goal should be avoiding incorrect etcd behavior,

Do you see any incorrect behaviour?

Main thing confusing for me is title of PR doesn't match the code. We still load v2 snapshot

No, etcd doesn't load v2 snapshot anymore in this PR.

Note that snapshot is a generic concept in raft. raft may bootstrap from a snapshot. etcd also processes snapshots in two ways:

  • send snapshot to slow followers
  • generate snapshot every 10K entries. This may not be necessary, but it's a good timing to purge old entries.

Copy link
Copy Markdown
Member

@serathius serathius Jan 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is not undermining the end goal. I'm asking about the goal of this PR.

I missed one thing, this PR replaces snapshotter loading snapshot with just picking last Snapshot entry from WAL. As we no longer need the snapshot file to exist, just picking the last entry is correct. I think it makes sense now, this is a correct next step.

To speed up the review, please consider adding context to the PR. What the change is doing and why this is correct. Just "Do not load v2 snapshot on bootstrap" is not enough, as there is no direct code that you change that touches snapshots.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the change is doing and why this is correct. Just "Do not load v2 snapshot on bootstrap" is not enough, as there is no direct code that you change that touches snapshots.

Just updated the description. thx

@ahrtr ahrtr force-pushed the 20260112_bootstrap_v3_not_read_v2snap branch from d5f4a0d to d68f180 Compare January 12, 2026 17:18
@ahrtr ahrtr force-pushed the 20260112_bootstrap_v3_not_read_v2snap branch 2 times, most recently from 686f0f9 to f6241ee Compare January 12, 2026 17:32
Comment thread server/etcdserver/api/snap/snapshotter_test.go
@k8s-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahrtr, serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

In this commit, we don't load v2 snapshot files (*.snap) anymore,
instead we just pick the latest snapshot entry from WAL. We don't
need the raftpb.Snapshot.Data anymore, so it's correct to just
read the latest snapshot entry from WAL.

The end goal is to completely get rid of v2 snapshot, which means
we don't read v2 snapshot, nor write it. But in order to be compatible
with 3.6, we still need to write v2 snapshot. This PR just stops
reading v2 snapshot, so that it's safe to completely remove it in 3.8.

Signed-off-by: Benjamin Wang <benjamin.ahrtr@gmail.com>
@ahrtr ahrtr force-pushed the 20260112_bootstrap_v3_not_read_v2snap branch from f6241ee to e74bb6a Compare January 13, 2026 10:42
@k8s-ci-robot
Copy link
Copy Markdown

@ahrtr: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-etcd-coverage-report e74bb6a link true /test pull-etcd-coverage-report

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@ahrtr ahrtr merged commit c8a05d6 into etcd-io:main Jan 13, 2026
31 of 32 checks passed
@ahrtr ahrtr deleted the 20260112_bootstrap_v3_not_read_v2snap branch January 13, 2026 11:33
Copy link
Copy Markdown
Member

@fuweid fuweid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

"recovered v2 store from snapshot",
var snapshot *raftpb.Snapshot
if len(walSnaps) > 0 {
idx := len(walSnaps) - 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ETCD commits the snapshot to disk first and then appends the WAL. It should be fine and it won't break test cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants