Skip to content

kvserver: leaked replica mu #106568

@tbg

Description

@tbg

Describe the problem

In two instances, we saw a leaked replica mutex:

In both cases, the goroutine acquiring the mutex seems to have exited without releasing it. The second instance underwent an extensive attempt to reproduce the problem (~8k runs over a week) but the issue did not reoccur.

#106254 has a "deadlock detector" that also prints the stack trace of the mutex acquisition. However, it is likely too expensive to be always-on, especially for a hot mutex like Replica.mu.

Another angle could be #105366, i.e. proving through static analysis that all acquisitions are defer-unlocked (and thus in a deadlock scenario, the lock holder would still be around).

In my view we ought to only use a provably safe unlock pattern, so option 2 seems appealing, especially given the rarity of the bug.

Jira issue: CRDB-29618

Metadata

Metadata

Assignees

Labels

C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.T-kvKV Teambranch-masterFailures and bugs on the master branch.release-blockerIndicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions