Skip to content

raft: re-work leader self-ack mechanism #87264

@tbg

Description

@tbg

Describe the problem

The raft.Ready contract has some deficiencies that have caused trouble upstream, and which are in our interest to help resolve.

etcd-io/etcd#14370 (comment)

I believe we're not affected, since we are aware of this behavior and handle it:

// 4. etcd/raft may provided a series of CommittedEntries in a Ready struct that
// haven't actually been appended to our own log. This is most common in single
// node replication groups, but it is possible when a follower in a multi-node
// replication group is catching up after falling behind. In the first case,
// the entries are not yet committed so acknowledging them would be a lie. In
// the second case, the entries are committed so we could acknowledge them at
// this point, but doing so seems risky. To avoid complications in either case,
// the method takes a maxIndex parameter that limits the indexes that it will
// acknowledge. Typically, callers will supply the highest index that they have
// durably written to their raft log for this upper bound.
//
func (t *Task) AckCommittedEntriesBeforeApplication(ctx context.Context, maxIndex uint64) error {

To Reproduce

See the thread above.

Expected behavior

Make the single-node case look like any other case. Don't emit Ready with .CommittedEntries that are contingent on handling .Entries. This reduces single-node throughput, but the performance could be clawed back on our end by special casing single nodes (which we probably don't need to, since if performance is a goal we can more completely elide raft).

In practice, this means shepherding etcd-io/etcd#14411.

Also revive etcd-io/etcd#10861, which among other things points out the above problem.

When thinking about issues such as these:

#38322
#17500 (comment)

it is important to have all of the invariants properly documented and discoverable.

Jira issue: CRDB-19244

Metadata

Metadata

Assignees

Labels

C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions