raft: improve commentary on Ready by tbg · Pull Request #10861 · etcd-io/etcd

tbg · 2019-06-27T12:21:59Z

The correct way to handle a Ready was elusive and comes with a surprising
amount of pitfalls. There are also certain optimizations that need
justification and were previously not documented.

Fix both.

The correct way to handle a Ready was elusive and comes with a surprising amount of pitfalls. There are also certain optimizations that need justification and were previously not documented. Fix both.

softState() allocates for no good reason, but plenty of callers use it in the hot path. This is probably not major or we would've knocked it out already.

codecov-io · 2019-06-27T13:03:40Z

Codecov Report

Merging #10861 into master will decrease coverage by 0.09%.
The diff coverage is n/a.

@@            Coverage Diff            @@
##           master   #10861     +/-   ##
=========================================
- Coverage   62.82%   62.73%   -0.1%     
=========================================
  Files         398      398             
  Lines       37446    37446             
=========================================
- Hits        23526    23492     -34     
- Misses      12339    12375     +36     
+ Partials     1581     1579      -2

Impacted Files	Coverage Δ
raft/node.go	`92.74% <ø> (ø)`	⬆️
raft/log.go	`79.56% <ø> (ø)`	⬆️
raft/raft.go	`91.98% <ø> (+0.69%)`	⬆️
clientv3/balancer/grpc1.7-health.go	`17.44% <0%> (-43.03%)`	⬇️
auth/options.go	`57.5% <0%> (-35%)`	⬇️
proxy/grpcproxy/register.go	`69.44% <0%> (-11.12%)`	⬇️
clientv3/naming/grpc.go	`68.42% <0%> (-7.02%)`	⬇️
clientv3/namespace/watch.go	`87.87% <0%> (-6.07%)`	⬇️
auth/range_perm_cache.go	`56.96% <0%> (-3.8%)`	⬇️
etcdserver/api/rafthttp/msgappv2_codec.go	`69.56% <0%> (-1.74%)`	⬇️
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 948e276...64ad156. Read the comment docs.

xiang90 · 2019-06-27T17:01:54Z

raft/node.go

+// optimizations for simplicity. These applications will use the following
+// sequence of steps:
+//
+// 1 apply snapshot (if any) and sync


add dot after the number like line 60 to 62?

xiang90 · 2019-06-27T17:04:23Z

raft/node.go

 // be saved to stable storage, committed or sent to other peers.
 // All fields in Ready are read-only.
+//
+// Handling the Ready can be done with or without performance optimizations.


i feel we should move this explanation to https://github.com/etcd-io/etcd/blob/master/raft/README.md and add a link here.

nvb · 2019-06-27T18:08:53Z

raft/node.go

+// appended and synced before the HardState is updated. This avoids the
+// situation in which the HardState references a Commit index which is not
+// durably appended yet. This does not penalize performance because updates to
+// the HardState must only be synced when MustSync is set, and those occur only


This isn't actually true. Unfortunately, we set MustSync to true if there are entries to append but no change to HardState.Vote or HardState.Term. We might want to expose a new MustSyncHardState or something.

Oh, right. I forgot that what I wanted to do was to change this to MustSyncHardState and add an accessor func (rd *Ready) MustSync() { return len(rd.Entries) > 0 || rd.MustSyncHardState }.

nvb · 2019-06-27T18:10:42Z

raft/node.go

+// before persisting its Entries, it won't be the leader for its old term any
+// more and thus will not handle the MsgAppResp. Additionally, the fact that a
+// commit index that is higher than that ultimately persisted on the crashed
+// leader is now known to a follower (via the MsgApp) is benign. And, for one


"the fact ... is benign"

Is that what you meant to say?

Heh, not quite. Will fix.

Going to ping when it's ready, won't be today

tbg · 2019-07-18T21:56:58Z

Reminder to self: work this new wall of text into the growing body of this documentation project 😄

// A Snapshot is sent by the leader to catch up a follower that has fallen behind.
// The snapshot corresponds to the on-disk replicated state of the distributed
// state machine as of the log index described by the metadata's Index and Term.
// When raft asks a follower to apply a Snapshot (as part of Ready handling), it
// is informing the application that the contents of the Raft log are made obsolete
// by the Snapshot. 
//
// If all data is on the same storage engine and can be manipulated atomically,
// the procedure is easy: apply the snapshot and the remainder of the Ready (it
// will typically at least include an updated HardState to reflect a new Commit
// index) in one atomic operation.
//
// If atomicity between the two is not available, a more complicated approach
// has to be taken. Assume that the application state (i.e the Data field) can
// be written atomically along with the index it belongs to (Metadata.Index).
// Assume further that the Raft log and HardState are on different storage
// engines, respectively, and no atomicity is available between the three.
//
// The Raft log poses the smallest problem. We must not delete from it before
// the snapshot data is durably applied (otherwise we break all of the promises
// we made by accepting entries into it). Assuming that, it won't matter whether
// the log gets removed before or after the HardState is updated. Care must be
// taken when initializing the system after a crash to understand that the Raft
// log needs to be compacted synchronously if the applied index is ahead of the
// last index. To avoid watering down detection of problems in general, this
// should be considered only if the data partition holds the immediate result
// of a snapshot application following the procedure detailed later in this
// comment.
//
// The bigger
// constraint exists between the HardState and snapshot data because they
// interact via HardState.Commit: if that is updated but then the system
// crash-restarts before the snapshot data is persisted, we'll see a commit
// index ahead of the last index, which is an undefined state and will panic. On
// the other hand, if the snapshot data is persisted but not the commit index,
// we will see the applied index ahead of the commit index, which is similarly
// problematic. To avoid that, the app can adapt Storage.InitialState so that it
// takes the commit index not only from the persisted HardState, but also takes
// into account any larger applied index in its data engine. Naively this waters
// down Raft's internal detection against data loss (if the storage engine
// unexpectedly lost HardState updates, this mechanism could mask it) but the
// app can additionally make this behavior conditional: when snapshot data is
// written to the data engine, it (atomically) includes a marker. At startup,
// only when the marker is present is the applied index taken into account and
// the HardState transparently updated, and then the flag removed again.
//
// It remains to detail how the Raft log is removed. As discussed, this is the
// last step and requires essentially carrying out a log compaction
// (occasionally called truncation) that removes the entirety of the log
// entries.
message Snapshot {
	optional bytes            data     = 1;
	optional SnapshotMetadata metadata = 2 [(gogoproto.nullable) = false];
}

stale · 2020-04-06T19:55:37Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ptabor · 2022-09-02T10:16:35Z

(reopening)

tbg added 2 commits June 27, 2019 14:19

raft: improve commentary on Ready

f3d7228

The correct way to handle a Ready was elusive and comes with a surprising amount of pitfalls. There are also certain optimizations that need justification and were previously not documented. Fix both.

raft: add a small perf TODO

64ad156

softState() allocates for no good reason, but plenty of callers use it in the hot path. This is probably not major or we would've knocked it out already.

tbg force-pushed the ready-docs branch from ae59219 to 64ad156 Compare June 27, 2019 12:24

xiang90 reviewed Jun 27, 2019

View reviewed changes

nvb reviewed Jun 27, 2019

View reviewed changes

tbg mentioned this pull request Aug 9, 2019

raft/rafttest: introduce datadriven testing #11005

Merged

stale bot added the stale label Apr 6, 2020

stale bot closed this Apr 27, 2020

This was referenced Sep 1, 2022

Durability API guarantee broken in single node cluster #14370

Closed

raft: re-work leader self-ack mechanism cockroachdb/cockroach#87264

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

raft: improve commentary on Ready#10861

raft: improve commentary on Ready#10861
tbg wants to merge 2 commits intoetcd-io:masterfrom
tbg:ready-docs

tbg commented Jun 27, 2019

Uh oh!

codecov-io commented Jun 27, 2019

Uh oh!

xiang90 Jun 27, 2019

Uh oh!

xiang90 Jun 27, 2019

Uh oh!

nvb Jun 27, 2019

Uh oh!

tbg Jun 27, 2019

Uh oh!

nvb Jun 27, 2019

Uh oh!

tbg Jun 27, 2019

Uh oh!

tbg commented Jul 18, 2019

Uh oh!

stale bot commented Apr 6, 2020

Uh oh!

ptabor commented Sep 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants

Conversation

tbg commented Jun 27, 2019

Uh oh!

codecov-io commented Jun 27, 2019

Codecov Report

Uh oh!

xiang90 Jun 27, 2019

Choose a reason for hiding this comment

Uh oh!

xiang90 Jun 27, 2019

Choose a reason for hiding this comment

Uh oh!

nvb Jun 27, 2019

Choose a reason for hiding this comment

Uh oh!

tbg Jun 27, 2019

Choose a reason for hiding this comment

Uh oh!

nvb Jun 27, 2019

Choose a reason for hiding this comment

Uh oh!

tbg Jun 27, 2019

Choose a reason for hiding this comment

Uh oh!

tbg commented Jul 18, 2019

Uh oh!

stale bot commented Apr 6, 2020

Uh oh!

ptabor commented Sep 2, 2022

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

5 participants