Skip to content

kvserver: improve locking around replica destruction #64459

@erikgrinaker

Description

@erikgrinaker

The Replica.mu.destroyStatus field is used to signal atomic replica destruction. To update this, we often need to take out a bunch of locks -- typically raftMu, readOnlyCmdMu, and mu -- in a specific order:

// Locking notes: To avoid deadlocks, the following lock order must be
// obeyed: baseQueue.mu < Replica.raftMu < Replica.readOnlyCmdMu < Store.mu
// < Replica.mu < Replica.unreachablesMu < Store.coalescedMu < Store.scheduler.mu.
// (It is not required to acquire every lock in sequence, but when multiple
// locks are held at the same time, it is incorrect to acquire a lock with
// "lesser" value in this sequence after one with "greater" value).

This is tedious and error-prone. We should revisit this to see if it can be improved, at least in terms of ergonomics and safety. We should also use convenience methods on the replica to interact with destroyStatus, named by the locks that need to be held (e.g. Replica.destroyRaftMuReadOnlyCmdMuMuLocked 😐), and assert that the proper locks are in fact held with Mutex.AssertHeld() or Mutex.AssertRHeld().

See also #64324 for some discussion.

Jira issue: CRDB-7066

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-kv-replicationRelating to Raft, consensus, and coordination.C-cleanupTech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions