-
Notifications
You must be signed in to change notification settings - Fork 4.1k
raft: re-work leader self-ack mechanism #87264
Description
Describe the problem
The raft.Ready contract has some deficiencies that have caused trouble upstream, and which are in our interest to help resolve.
I believe we're not affected, since we are aware of this behavior and handle it:
cockroach/pkg/kv/kvserver/apply/task.go
Lines 182 to 193 in d064059
| // 4. etcd/raft may provided a series of CommittedEntries in a Ready struct that | |
| // haven't actually been appended to our own log. This is most common in single | |
| // node replication groups, but it is possible when a follower in a multi-node | |
| // replication group is catching up after falling behind. In the first case, | |
| // the entries are not yet committed so acknowledging them would be a lie. In | |
| // the second case, the entries are committed so we could acknowledge them at | |
| // this point, but doing so seems risky. To avoid complications in either case, | |
| // the method takes a maxIndex parameter that limits the indexes that it will | |
| // acknowledge. Typically, callers will supply the highest index that they have | |
| // durably written to their raft log for this upper bound. | |
| // | |
| func (t *Task) AckCommittedEntriesBeforeApplication(ctx context.Context, maxIndex uint64) error { |
To Reproduce
See the thread above.
Expected behavior
Make the single-node case look like any other case. Don't emit Ready with .CommittedEntries that are contingent on handling .Entries. This reduces single-node throughput, but the performance could be clawed back on our end by special casing single nodes (which we probably don't need to, since if performance is a goal we can more completely elide raft).
In practice, this means shepherding etcd-io/etcd#14411.
Also revive etcd-io/etcd#10861, which among other things points out the above problem.
When thinking about issues such as these:
it is important to have all of the invariants properly documented and discoverable.
Jira issue: CRDB-19244