Skip to content

storage: avoid reading uncommitted tail of Raft log when becoming leader #18601

@petermattis

Description

@petermattis

The work will all be upstream in etcd/raft. Filing an issue here for tracking purposes.

Forked from a comment on #18199:

#13231 is mainly talking about reducing some inefficiencies in the scan for uncommitted config changes, but the scan would still be there. It's tricky to eliminate the scan completely given the current raft semantics (if we cached some information about the presence of config changes we'd have to update that when the uncommitted tail gets truncated).

Fortunately, with a little refactoring in etcd/raft I think we can avoid the need for a precise count. What we require here is to ensure that we never have more than one config change in flight at a time. If we simply assume pessimistically that the tail of the log has a config change (so that the new leader cannot propose a config change until it has applied all entries up to the point of its election) and we can skip the scan.

@petermattis says:

When would you clear raft.pendingConf? Currently that field gets cleared when the conf change is applied. Keeping track of the the current last index and watching for when that index is committed seems doable, but a bit tricky. Did you have a simpler idea in mind?

@bdarnell responds:

My idea is to track the last index (instead of a bool pendingConf, it would be configChangeBlockedUntilIndex). I don't think it will be tricky because it doesn't need to persist across leadership changes.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions