Skip to content

kvserver: add ReplicaID to the persistent raft state #75740

@sumeerbhola

Description

@sumeerbhola

We currently do not store the ReplicaID when instantiating an uninitialized replica. We do know the ReplicaID when creating the corresponding uninitialized Replica object, since we pass it to newUnloadedReplica. Without the ReplicaID, if the node were to crash before the replica were initialized, the restarting node will have HardState without any knowledge of what ReplicaID it corresponds to. This creates difficulties with the ReplicaStorage design and can create a leak (see details below).

(copy-paste of conversation)
[sumeer] while implementing ReplicasStorage.Init I realized something I had missed earlier: for an UninitializedStateMachine replica, there is nothing in the engine that contains its ReplicaID.

  • We happen to luck out that we don’t need the ReplicaID for such a replica on the SplitReplica path (when this RangeID is the RHS of a split), since we never compare the ReplicaID of the RHS in the split with that of the uninitialized replica (we only compare the former with the RangeTombstone).
  • But this situation is peculiar in that after Init , ReplicasStorage has incomplete knowledge and would need a higher layer to provide this ReplicaID. How would a higher layer know the ReplicaID if there is no persistent record of it? Presumably, it would need to talk to other nodes before it figures this out, which means there could be an arbitrary time interval for which ReplicasStorage has 2 kinds of uninitialized replicas — those with a ReplicaID and those without.
  • Is there also a leak here? If the raft group has already removed this replica from this store, there isn’t going to be any raft activity. Someone needs to figure that out before we can clean up this state (i.e., we can’t unilaterally remove the HardState since the Term and Vote values may be relevant).

[tbg] Yeah, there’s always been something odd about the replicaID not getting stored. There’s definitely a leak, if you have an uninitialized replica and the node restarts, nothing will ever instantiate this replica in-memory again (unless it receives another message, which is doubtful at that point) but also if we did, we wouldn’t know the replicaID. So even if we tried, we couldn’t destroy this state, as we can’t ever be sure that it’s not still needed. This is definitely leaking today.
Seems kind of clear that we want the replicaID to be stored. Seems like it would have to be on the raft engine, parallel to the HardState (i.e. store ReplicaID whenever it changes, which should imply doing it on the first write to the HardState and never again since replicas can’t exist across replicaIDs).

cc: @tbg, @erikgrinaker

Epic: CRDB-220
Jira issue: CRDB-12811

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-kv-replicationRelating to Raft, consensus, and coordination.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions