-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kvserver: improve locking around replica destruction #64459
Description
The Replica.mu.destroyStatus field is used to signal atomic replica destruction. To update this, we often need to take out a bunch of locks -- typically raftMu, readOnlyCmdMu, and mu -- in a specific order:
cockroach/pkg/kv/kvserver/store.go
Lines 471 to 476 in f8e1c90
| // Locking notes: To avoid deadlocks, the following lock order must be | |
| // obeyed: baseQueue.mu < Replica.raftMu < Replica.readOnlyCmdMu < Store.mu | |
| // < Replica.mu < Replica.unreachablesMu < Store.coalescedMu < Store.scheduler.mu. | |
| // (It is not required to acquire every lock in sequence, but when multiple | |
| // locks are held at the same time, it is incorrect to acquire a lock with | |
| // "lesser" value in this sequence after one with "greater" value). |
This is tedious and error-prone. We should revisit this to see if it can be improved, at least in terms of ergonomics and safety. We should also use convenience methods on the replica to interact with destroyStatus, named by the locks that need to be held (e.g. Replica.destroyRaftMuReadOnlyCmdMuMuLocked 😐), and assert that the proper locks are in fact held with Mutex.AssertHeld() or Mutex.AssertRHeld().
See also #64324 for some discussion.
Jira issue: CRDB-7066