storage/concurrency: release reqs from same txn as discovered lock#45601
Conversation
4f28cb6 to
2c7a4be
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 3 of 3 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/concurrency/lock_table.go, line 1211 at r1 (raw file):
// If there are waiting requests from the same txn, they no longer need to wait. if l.releaseWritersFromTxn(txn) { informWaiters = true
I think we can tighten up the semantics of informWaiters and remove the return value from releaseWritersFromTxn.
The only correctness need for informWaiters=false is when the lock is already held. Currently the code also sets it to false when l.reservation == nil but that is an premature performance optimization that I think we should remove.
nvb
left a comment
There was a problem hiding this comment.
TFTR!
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)
pkg/storage/concurrency/lock_table.go, line 1211 at r1 (raw file):
Previously, sumeerbhola wrote…
I think we can tighten up the semantics of
informWaitersand remove the return value fromreleaseWritersFromTxn.
The only correctness need forinformWaiters=falseis when the lock is already held. Currently the code also sets it to false whenl.reservation == nilbut that is an premature performance optimization that I think we should remove.
I'm not sure I understand why it's ever incorrect to call informActiveWaiters. You said that there's a correctness need for informWaiters=false when the lock is already held. Could you expand on that bit more?
Also, in place of the return value on releaseWritersFromTxn, are you suggesting that it call tryMakeNewDistinguished directly when it detects that doing so it necessary.
2c7a4be to
ad50d9c
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten and @sumeerbhola)
pkg/storage/concurrency/lock_table.go, line 1211 at r1 (raw file):
I'm not sure I understand why it's ever incorrect to call informActiveWaiters. You said that there's a correctness need for informWaiters=false when the lock is already held. Could you expand on that bit more?
I agree that from a correctness perspective it is ok to call informActiveWaiters even if nothing has changed. The use of informWaiters is trying to be a performance optimization. When l.reservation==nil and the lock was not held this performance optimization is limited to not calling informActiveWaiters since that function won't have any work to do -- there can't be any waiters. So we can remove it. Regarding setting informWaiters to false when the lock was already held -- now that there are more fields in waitingState, I don't think that optimization is correct since the waiters want to know whether they are waiting on a replicated or unreplicated lock. So that would mean removing informWaiters altogether.
Also, in place of the return value on releaseWritersFromTxn, are you suggesting that it call tryMakeNewDistinguished directly when it detects that doing so it necessary.
No, we'd still call informActiveWaiters here. I should have said that the return value from releaseWritersFromTxn should not be needed to decide whether to call informActiveWaiters.
ad50d9c to
e8faf7c
Compare
nvb
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @sumeerbhola)
pkg/storage/concurrency/lock_table.go, line 1211 at r1 (raw file):
Previously, sumeerbhola wrote…
I'm not sure I understand why it's ever incorrect to call informActiveWaiters. You said that there's a correctness need for informWaiters=false when the lock is already held. Could you expand on that bit more?
I agree that from a correctness perspective it is ok to call
informActiveWaiterseven if nothing has changed. The use ofinformWaitersis trying to be a performance optimization. Whenl.reservation==niland the lock was not held this performance optimization is limited to not callinginformActiveWaiterssince that function won't have any work to do -- there can't be any waiters. So we can remove it. Regarding settinginformWaitersto false when the lock was already held -- now that there are more fields inwaitingState, I don't think that optimization is correct since the waiters want to know whether they are waiting on a replicated or unreplicated lock. So that would mean removinginformWaitersaltogether.Also, in place of the return value on releaseWritersFromTxn, are you suggesting that it call tryMakeNewDistinguished directly when it detects that doing so it necessary.
No, we'd still call
informActiveWaitershere. I should have said that the return value fromreleaseWritersFromTxnshould not be needed to decide whether to callinformActiveWaiters.
Done, I think. Mind double checking that that's the change you were thinking of?
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r2.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)
This commit updates the AddDiscoveredLock state transition in lockTableImpl to release all waiting writers that are part of the same transaction as a discovered lock. This caused issues in cockroachdb#45482, where we saw txn deadlocks in the `pkg/sql/tests.Bank` benchmark. This issue was triggered because we were forgetting to inform the lockTable of a lock acquisition in a subtle case. That is now fixed and it's not clear that we can actually hit the bug here anymore given the current policy on how wait-queues form in the lock-table. Regardless, this is worth handling correctly in case things ever change.
e8faf7c to
7381a72
Compare
|
bors r+ |
Build failed (retrying...) |
|
bors r+ |
Build failed (retrying...) |
Build succeeded |
This testdata file was added in cockroachdb#45601. That skewed with the package rename. Release justification: testing only
46631: kv/kvserver: mv testdata/lock_table/add_discovered out of pkg/storage r=nvanbenschoten a=nvanbenschoten This testdata file was added in #45601. That skewed with the mass package rename. Release justification: testing only Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
This commit updates the AddDiscoveredLock state transition in lockTableImpl to release all waiting writers that are part of the same transaction as a discovered lock.
This caused issues in #45482, where we saw txn deadlocks in the
pkg/sql/tests.Bankbenchmark. This issue was triggered because we were forgetting to inform the lockTable of a lock acquisition in a subtle case. That is now fixed and it's not clear that we can actually hit the bug here anymore given the current policy on how wait-queues form in the lock-table. Regardless, this is worth handling correctly in case things ever change.