etcdserver: separate maybeCompactRaftLog function to compact raft log independently by clement2026 · Pull Request #18635 · etcd-io/etcd

clement2026 · 2024-09-24T10:10:09Z

Goal: separate maybeCompactRaftLog function to compact raft log independently from snapshots.

TODO

Add tests for the changes
Reduce memory usage of etcd member catchup mechanism #17098 (comment)
testutils: add ExpectExactNTimes function to LogObserver #18637

…ly from snapshots Signed-off-by: Clement <gh.2lgqz@aleeas.com>

k8s-ci-robot · 2024-09-24T10:10:13Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: clement2026
Once this PR has been reviewed and has the lgtm label, please assign ahrtr for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2024-09-24T10:10:20Z

Hi @clement2026. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

clement2026 · 2024-09-24T10:18:21Z

@serathius Please take a look. This PR focuses on one task only.

…ly from snapshots Signed-off-by: Clement <gh.2lgqz@aleeas.com>

codecov-commenter · 2024-09-24T10:28:03Z

Codecov Report

❌ Patch coverage is 76.66667% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.88%. Comparing base (59cfd7a) to head (841da6e).
⚠️ Report is 2285 commits behind head on main.

⚠️ Current head 841da6e differs from pull request most recent head f34bee5

Please upload reports for the commit f34bee5 to get more accurate results.

Files with missing lines	Patch %	Lines
server/storage/wal/wal.go	71.42%	2 Missing and 2 partials ⚠️
server/etcdserver/server.go	85.71%	1 Missing and 1 partial ⚠️
server/storage/wal/file_pipeline.go	0.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

Files with missing lines	Coverage Δ
server/storage/wal/repair.go	`56.14% <100.00%> (ø)`
server/storage/wal/file_pipeline.go	`90.69% <0.00%> (ø)`
server/etcdserver/server.go	`81.03% <85.71%> (-0.31%)`	⬇️
server/storage/wal/wal.go	`57.73% <71.42%> (+0.27%)`	⬆️

... and 13 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #18635      +/-   ##
==========================================
+ Coverage   68.79%   68.88%   +0.08%     
==========================================
  Files         420      420              
  Lines       35522    35545      +23     
==========================================
+ Hits        24438    24485      +47     
+ Misses       9657     9639      -18     
+ Partials     1427     1421       -6

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 59cfd7a...f34bee5. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

clement2026 · 2024-09-24T15:52:43Z

I can't test ep.compacti directly because it's not accessible. However, I checked the logs and confirmed it’s working as expected.

Add log statements

	ep := etcdProgress{
		confState: sn.Metadata.ConfState,
		snapi:     sn.Metadata.Index,
		appliedt:  sn.Metadata.Term,
		appliedi:  sn.Metadata.Index,
		compacti:  fi - 1,
	}
	s.Logger().Info("initial ep.compacti", zap.Uint64("ep.compacti", ep.compacti),zap.Uint64("ep.snapi", ep.snapi))

	compacti := ep.snapi - s.Cfg.SnapshotCatchUpEntries
	if compacti <= ep.compacti {
		return
	}
	s.Logger().Info("compacti > ep.compacti", zap.Uint64("compacti", compacti), zap.Uint64("ep.compacti", ep.compacti))

Start a new etcd instance and make 110 put requests

➜  rm -rf default.etcd;  bin/etcd --experimental-snapshot-catchup-entries=5 --snapshot-count=10  2>&1 | grep 'ep.compacti'

➜ benchmark put --key-space-size=9999999 --val-size=100 --total=110

Logs:

{"level":"info","ts":"2024-09-24T23:38:10.660332+0800","caller":"etcdserver/server.go:823","msg":"initial ep.compacti","ep.compacti":0,"ep.snapi":0}
{"level":"info","ts":"2024-09-24T23:38:16.890460+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":6,"ep.compacti":0}
{"level":"info","ts":"2024-09-24T23:38:16.959513+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":17,"ep.compacti":6}
{"level":"info","ts":"2024-09-24T23:38:17.017577+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":28,"ep.compacti":17}
{"level":"info","ts":"2024-09-24T23:38:17.096548+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":39,"ep.compacti":28}
{"level":"info","ts":"2024-09-24T23:38:17.159693+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":50,"ep.compacti":39}
{"level":"info","ts":"2024-09-24T23:38:17.238754+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":61,"ep.compacti":50}
{"level":"info","ts":"2024-09-24T23:38:17.316651+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":72,"ep.compacti":61}
{"level":"info","ts":"2024-09-24T23:38:17.391588+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":83,"ep.compacti":72}
{"level":"info","ts":"2024-09-24T23:38:17.460776+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":94,"ep.compacti":83}
{"level":"info","ts":"2024-09-24T23:38:17.539552+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":105,"ep.compacti":94}
^C

Restart the etcd instance and make 50 put requests

➜  bin/etcd --experimental-snapshot-catchup-entries=5 --snapshot-count=10  2>&1 | grep 'ep.compacti'

➜ benchmark put --key-space-size=9999999 --val-size=100 --total=50

Logs:

{"level":"info","ts":"2024-09-24T23:38:24.076007+0800","caller":"etcdserver/server.go:823","msg":"initial ep.compacti","ep.compacti":110,"ep.snapi":110}
{"level":"info","ts":"2024-09-24T23:39:16.807112+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":116,"ep.compacti":110}
{"level":"info","ts":"2024-09-24T23:39:16.880017+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":127,"ep.compacti":116}
{"level":"info","ts":"2024-09-24T23:39:16.941133+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":138,"ep.compacti":127}
{"level":"info","ts":"2024-09-24T23:39:17.012076+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":149,"ep.compacti":138}
{"level":"info","ts":"2024-09-24T23:39:17.075207+0800","caller":"etcdserver/server.go:2195","msg":"compacti > ep.compacti","compacti":160,"ep.compacti":149}

serathius · 2024-09-25T07:35:46Z

 		snapi:     sn.Metadata.Index,
 		appliedt:  sn.Metadata.Term,
 		appliedi:  sn.Metadata.Index,
+		compacti:  fi - 1,


Would be good to add a comment why we substract one. Compact index should be the last index that raftStorage.Compact was called. Whether we should substract depend on whether the Compact operation is inclusive or exclusive. Need to confirm this.

serathius · 2024-09-25T07:37:25Z

-
-	err = s.r.raftStorage.Compact(compacti)
+	err := s.r.raftStorage.Compact(compacti)
+	ep.compacti = compacti


I think we should update compact id if there was no error.

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

serathius · 2024-09-25T08:49:23Z

+// TestMemoryStorageCompaction tests that after calling raftStorage.Compact(compacti)
+// without errors, the dummy entry becomes {Index: compacti} and
+// raftStorage.FirstIndex() returns (compacti+1, nil).
+func TestMemoryStorageCompaction(t *testing.T) {


Thanks for the test, it's really great to protect our assumption via test.

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

serathius · 2024-09-25T11:45:17Z

+		}
+	}
+	// The first snapshot and compaction shouldn't happen because applied index is less than 11
+	logOccurredAtMostNTimes(t, mem, 5*time.Second, "saved snapshot", 0)


Those timeouts are very high, can we not have as big timeout? Snapshot and compaction should happen synchronously to operations, meaning logs should be available immidietly.

Alright, I'll reduce the timeout to 1 or 2 seconds second and see how it works out.

serathius · 2024-09-25T11:45:43Z

+	}
+	// The first snapshot and compaction shouldn't happen because applied index is less than 11
+	logOccurredAtMostNTimes(t, mem, 5*time.Second, "saved snapshot", 0)
+	logOccurredAtMostNTimes(t, mem, time.Second, "compacted Raft logs", 0)


Is there a reason sometimes we wait for 1 second, sometimes for 5 seconds?

The first logOccurredAtMostNTimes waits for 5 seconds. Once it's done, we can assume the log is synced up, so the second logOccurredAtMostNTimes doesn't need to wait that long.

Logs should appear in in matter of tens of miliseconds not multiple seconds. The whole test should take seconds.

serathius · 2024-09-25T11:48:34Z

+// logOccurredAtMostNTimes ensures that the log has exactly `count` occurrences of `s` before timing out, no more, no less.
+func logOccurredAtMostNTimes(t *testing.T, m *integration.Member, timeout time.Duration, s string, count int) {


Suggested change

// logOccurredAtMostNTimes ensures that the log has exactly `count` occurrences of `s` before timing out, no more, no less.

func logOccurredAtMostNTimes(t *testing.T, m *integration.Member, timeout time.Duration, s string, count int) {

func logOccurredExactlyNTimes(t *testing.T, m *integration.Member, timeout time.Duration, s string, count int) {

ahrtr · 2024-09-25T13:40:48Z

+	// Retain all log entries up to the latest snapshot index to ensure any member can recover from that snapshot.
+	// Beyond the snapshot index, preserve the most recent s.Cfg.SnapshotCatchUpEntries entries in memory.
+	// This allows slow followers to catch up by synchronizing entries instead of requiring a full snapshot transfer.
+	if ep.snapi <= s.Cfg.SnapshotCatchUpEntries {


Suggested change

if ep.snapi <= s.Cfg.SnapshotCatchUpEntries {

if ep.appliedi <= s.Cfg.SnapshotCatchUpEntries {

snapi is more correct here, the reason is described in the comment.

ahrtr · 2024-09-25T13:40:58Z

+		return
+	}
+
+	compacti := ep.snapi - s.Cfg.SnapshotCatchUpEntries


Suggested change

compacti := ep.snapi - s.Cfg.SnapshotCatchUpEntries

compacti := ep.appliedi - s.Cfg.SnapshotCatchUpEntries

What's the purpose of the change?
If you use ep.snapi, then the behaviour is exactly the same as existing behaviour, because ep.snapi only gets updated each time after creating the (v2) snapshot.

@ahrtr #18622 made a lot of changes, but Serathius and I agreed to keep PRs small. So, in this PR, we just separated the compaction from the snapshot while keeping the existing behavior.

Please see #17098 (comment). Can we get that confirmed firstly?

This change doesn't change when compaction is run or how many times it's executed. I was aware of https://github.com/etcd-io/raft/blob/5d6eb55c4e6929e461997c9113aba99a5148e921/storage.go#L266-L269 code, that's why I was proposing compacting only ever X applies.

ahrtr · 2024-09-25T15:03:09Z

 	<-apply.notifyc

 	s.triggerSnapshot(ep)
+	s.maybeCompactRaftLog(ep)


If you really want to get this small merged firstly, then please ensure it's as independent as possible. Currently the s.snapshot performs both snapshot and compaction operations. It makes sense to extract the compaction operation as an independent function/method, but let's call the method inside s.triggerSnapshot,

func (s *EtcdServer) triggerSnapshot(ep *etcdProgress) { if !s.shouldSnapshot(ep) { return } lg := s.Logger() lg.Info( "triggering snapshot", zap.String("local-member-id", s.MemberID().String()), zap.Uint64("local-member-applied-index", ep.appliedi), zap.Uint64("local-member-snapshot-index", ep.snapi), zap.Uint64("local-member-snapshot-count", s.Cfg.SnapshotCount), zap.Bool("snapshot-forced", s.forceSnapshot), ) s.forceSnapshot = false s.snapshot(ep.appliedi, ep.confState) ep.snapi = ep.appliedi s.compact(xxxxx) // call the new method here, so we still do it each time after creating a snapshot. }

No rush on merging this PR. If we do merge it, we need to ensure etcd actually benefits from it. Let's resolve #17098 (comment) first.

The goal of the PR is to make compaction independent from snapshot. Not just refactoring it to function.

Not just refactor the function.

Just refactoring the function (extract the compact into a separate method) is an independent and safe change, accordingly can be merged soon.

The goal of the PR is to make compaction independent from snapshot

It modifies the logic/semantics, so it's no longer an independent change.

"Refactoring it to function" is a subset of "refactoring".

To be clearer about #18635 (comment), I am proposing an independent & safe minor refactoring below as the very first step

ahrtr@efae0d2

serathius · 2024-09-26T07:32:29Z

+	// The first snapshot and compaction should happen because applied index is 11
+	logOccurredAtMostNTimes(t, mem, 5*time.Second, "saved snapshot", 1)
+	logOccurredAtMostNTimes(t, mem, time.Second, "compacted Raft logs", 1)
+	expectMemberLog(t, mem, time.Second, "\"compact-index\": 6", 1)


Do we really need to check if compacted Raft log occured at most N times? This is hard to check, why checking if "compact-index": X is not enough ?

k8s-ci-robot · 2024-10-30T19:27:47Z

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

stale · 2025-04-26T05:15:17Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

github-actions · 2025-08-11T00:13:53Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

github-actions · 2025-10-12T00:12:27Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

github-actions · 2025-12-12T00:13:31Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

serathius · 2025-12-12T08:02:22Z

Closed with #18825

separate maybeCompactRaftLog function to compact raft log independent…

dff5305

…ly from snapshots Signed-off-by: Clement <gh.2lgqz@aleeas.com>

k8s-ci-robot added do-not-merge/work-in-progress needs-ok-to-test labels Sep 24, 2024

k8s-ci-robot added the size/S label Sep 24, 2024

clement2026 changed the title ~~separate maybeCompactRaftLog function to compact raft log independent…~~ separate maybeCompactRaftLog function to compact raft log independently Sep 24, 2024

clement2026 changed the title ~~separate maybeCompactRaftLog function to compact raft log independently~~ etcdserver: separate maybeCompactRaftLog function to compact raft log independently Sep 24, 2024

clement2026 marked this pull request as ready for review September 24, 2024 10:18

k8s-ci-robot removed the do-not-merge/work-in-progress label Sep 24, 2024

clement2026 added 2 commits September 24, 2024 18:25

separate maybeCompactRaftLog function to compact raft log independent…

9b04e4d

…ly from snapshots Signed-off-by: Clement <gh.2lgqz@aleeas.com>

separate maybeCompactRaftLog function to compact raft log independent…

1d08b34

…ly from snapshots Signed-off-by: Clement <gh.2lgqz@aleeas.com>

serathius reviewed Sep 24, 2024

View reviewed changes

Comment thread server/etcdserver/server.go

add compacti field to etcdProgress; add a test case

06a2cb7

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

k8s-ci-robot added area/testing size/L and removed size/S labels Sep 24, 2024

serathius reviewed Sep 25, 2024

View reviewed changes

Comment thread server/etcdserver/server.go Outdated

serathius reviewed Sep 25, 2024

View reviewed changes

clement2026 added 2 commits September 25, 2024 16:44

apply suggestions

2028c24

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

apply suggestions

96d8985

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

serathius reviewed Sep 25, 2024

View reviewed changes

Comment thread server/etcdserver/memory_storage_test.go Outdated

serathius reviewed Sep 25, 2024

View reviewed changes

Comment thread server/etcdserver/memory_storage_test.go Outdated

serathius reviewed Sep 25, 2024

View reviewed changes

Comment thread server/etcdserver/server.go Outdated

serathius reviewed Sep 25, 2024

View reviewed changes

Comment thread server/etcdserver/memory_storage_test.go Outdated

clement2026 added 2 commits September 25, 2024 17:36

apply suggestions

7f0ada2

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

apply suggestions: at most N occurrences

bc9a3fa

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

clement2026 force-pushed the compact-raft-log-independently-from-snapshots branch from 99db0bb to bc9a3fa Compare September 25, 2024 10:52

typo

f34bee5

Signed-off-by: Clement <gh.2lgqz@aleeas.com>

clement2026 force-pushed the compact-raft-log-independently-from-snapshots branch from 7a5bfd6 to f34bee5 Compare September 25, 2024 10:58

serathius reviewed Sep 25, 2024

View reviewed changes

Comment thread tests/integration/raft_log_test.go

serathius reviewed Sep 25, 2024

View reviewed changes

Comment thread tests/integration/raft_log_test.go

ahrtr reviewed Sep 25, 2024

View reviewed changes

clement2026 mentioned this pull request Sep 25, 2024

testutils: add ExpectExactNTimes function to LogObserver #18637

Closed

serathius reviewed Sep 26, 2024

View reviewed changes

k8s-ci-robot added the needs-rebase label Oct 30, 2024

serathius mentioned this pull request Nov 2, 2024

Run a separate in memory snapshot to reduce number of entries stored in raft memory storage #18825

Merged

stale bot added the stale label Apr 26, 2025

github-actions bot removed the stale label May 21, 2025

github-actions bot added stale and removed stale labels Aug 11, 2025

github-actions bot added stale and removed stale labels Oct 12, 2025

github-actions bot added the stale label Dec 12, 2025

serathius closed this Dec 12, 2025

		// logOccurredAtMostNTimes ensures that the log has exactly `count` occurrences of `s` before timing out, no more, no less.
		func logOccurredAtMostNTimes(t testing.T, m integration.Member, timeout time.Duration, s string, count int) {

	if ep.snapi <= s.Cfg.SnapshotCatchUpEntries {
	if ep.appliedi <= s.Cfg.SnapshotCatchUpEntries {

	compacti := ep.snapi - s.Cfg.SnapshotCatchUpEntries
	compacti := ep.appliedi - s.Cfg.SnapshotCatchUpEntries

Conversation

clement2026 commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Uh oh!

k8s-ci-robot commented Sep 24, 2024

Uh oh!

k8s-ci-robot commented Sep 24, 2024

Uh oh!

clement2026 commented Sep 24, 2024

Uh oh!

codecov-commenter commented Sep 24, 2024 • edited by codecov bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

clement2026 commented Sep 24, 2024

Add log statements

Start a new etcd instance and make 110 put requests

Restart the etcd instance and make 50 put requests

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serathius Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serathius Sep 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

serathius Sep 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clement2026 commented Sep 24, 2024 •

edited

Loading

codecov-commenter commented Sep 24, 2024 •

edited by codecov bot

Loading

serathius Sep 25, 2024 •

edited

Loading

serathius Sep 26, 2024 •

edited

Loading

serathius Sep 26, 2024 •

edited

Loading