Skip to content

storage: Handle raft.ErrProposalDropped #21849

@bdarnell

Description

@bdarnell

etcd-io/etcd#9067 introduced a new error ErrProposalDropped. Properly handling this error will allow us to reduce the occurrence of ambiguous failures (Replica.executeWriteBatch doesn't need to return an AmbiguousResultError if it has not successfully proposed), and may allow us to be more intelligent in our raft-level retries.

This error alone does not allow us to eliminate time-based reproposals because raft's MsgProp forwarding is fire-and-forget. However, if we disabled raft-level forwarding and did our own forwarding, I think we could make the retry logic more deterministic (or at least hide the timing elements in the RPC layer).

Note that in typical usage of raft, you'd respond to this error by passing it up the stack to a layer that can try again on a different replica. We can't do that because of our leases - until the lease expires, no other node could successfully handle the request, so we have to just wait and retry on the lease holder. (we might be able to use this to make lease requests themselves fail-fast, though).

Jira issue: CRDB-5872

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-kv-replicationRelating to Raft, consensus, and coordination.C-cleanupTech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior.T-kvKV Team

    Type

    No type

    Projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions