kv/concurrency: clear lock holder in tryFreeLockOnReplicatedAcquire by nvb · Pull Request #50173 · cockroachdb/cockroach

nvb · 2020-06-13T02:01:41Z

This commit fixes a fairly bad bug introduced by #49980. The bug was
that we were not clearing lockState.holder while under lock in
tryFreeLockOnReplicatedAcquire, so the lock was never marked as
"empty" (see isEmptyLock). This meant that even though we held the
tree.mu in AcquireLock continuously between the calls to
tryFreeLockOnReplicatedAcquire and tree.Delete, it was possible for
requests with existing snapshots of the tree to begin waiting on the
partially cleared but not empty lockState. We would then remove
the lockState from the lock table's main tree, abandoning any waiter
that managed to slip in and begin waiting on the removed lockState.
These waiters would eventually push the original holder of the lock, but
even after their push succeeded and they resolved the corresponding
intent, they would not receive any update on their lockTableGuard's
channel.

This should have been caught by some of the YCSB roachtests, but they
haven't actually run for the past week due to various CI flakes. It would
have also been caught by the new assertion the commit adds to
lockState.lockIsFree.

cockroach-teamcity · 2020-06-13T02:01:48Z

This change is

sumeerbhola

it was possible for
requests with existing snapshots of the tree to begin waiting on the
partially cleared but not empty lockState. We would then remove
the lockState from the lock table's main tree, abandoning any waiter
that managed to slip in and begin waiting on the removed lockState.

It sounds like it should be easy to add a data-driven test case for lockTable that will fail prior to this fix. If yes, can you add one.

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)

This commit fixes a fairly bad bug introduced by cockroachdb#49980. The bug was that we were not clearing `lockState.holder` while under lock in `tryFreeLockOnReplicatedAcquire`, so the lock was never marked as "empty" (see `isEmptyLock`). This meant that even though we held the `tree.mu` in `AcquireLock` continuously between the calls to `tryFreeLockOnReplicatedAcquire` and `tree.Delete`, it was possible for requests with existing snapshots of the tree to begin waiting on the partially cleared **but not empty** `lockState`. We would then remove the `lockState` from the lock table's main tree, abandoning any waiter that managed to slip in and begin waiting on the removed `lockState`. These waiters would eventually push the original holder of the lock, but even after their push succeeded and they resolved the corresponding intent, they would not receive any update on their lockTableGuard's channel. This should have been caught by some of the YCSB roachtests, but they haven't actually run for the past week due to various CI flakes. It would have also been caught by the new assertion the commit adds to `lockState.lockIsFree`.

nvb · 2020-06-15T16:34:29Z

It sounds like it should be easy to add a data-driven test case for lockTable that will fail prior to this fix. If yes, can you add one.

Good idea, done. I realized late into doing this that the tree snapshots made it much easier to hit than I had thought. I was thinking it required a race between the call to tryFreeLockOnReplicatedAcquire and the call to tree.Delete, but it didn't.

TFTR!

bors r+

craig · 2020-06-15T17:26:32Z

Build succeeded

GitHub CI (Cockroach)

nvb requested a review from sumeerbhola June 13, 2020 02:01

sumeerbhola approved these changes Jun 15, 2020

View reviewed changes

nvb force-pushed the nvanbenschoten/clearLockFix branch from 711e0a1 to c90702a Compare June 15, 2020 16:32

nvb mentioned this pull request Jun 15, 2020

Multiregion movr workload performance discrepancies after primary key change #50019

Closed

craig bot merged commit c1d0b4a into cockroachdb:master Jun 15, 2020

nvb mentioned this pull request Jun 15, 2020

release-20.1: kv/concurrency: clear lock holder in tryFreeLockOnReplicatedAcquire #50216

Merged

nvb deleted the nvanbenschoten/clearLockFix branch June 17, 2020 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv/concurrency: clear lock holder in tryFreeLockOnReplicatedAcquire#50173

kv/concurrency: clear lock holder in tryFreeLockOnReplicatedAcquire#50173
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/clearLockFix

nvb commented Jun 13, 2020

Uh oh!

cockroach-teamcity commented Jun 13, 2020

Uh oh!

sumeerbhola left a comment

Uh oh!

nvb commented Jun 15, 2020

Uh oh!

craig bot commented Jun 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nvb commented Jun 13, 2020

Uh oh!

cockroach-teamcity commented Jun 13, 2020

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

nvb commented Jun 15, 2020

Uh oh!

craig bot commented Jun 15, 2020

Build succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants