concurrency: small refactors in preparation for reservation removal#103373
concurrency: small refactors in preparation for reservation removal#103373craig[bot] merged 3 commits intocockroachdb:masterfrom
Conversation
|
@nvanbenschoten I didn't bother cleaning |
nvb
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 1 of 1 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)
-- commits line 21 at r2:
s/non-transactional/transactional/
-- commits line 37 at r3:
nit: "preparation"
pkg/kv/kvserver/concurrency/lock_table.go line 2327 at r1 (raw file):
// from the lock's queuedReaders list. Returns whether the reader was the // distinguished waiter or not. // TODO(arul): Check that this doesn't cause e to escape to the heap.
e is already a pointer, so I don't think anything that isn't already on the heap is escaping to the heap.
pkg/kv/kvserver/concurrency/lock_table.go line 1288 at r2 (raw file):
// OR it belongs to the same transaction that waiters in the lock wait queue // are waiting on, because a transaction doesn't push itself (it just sits // tight).
I know you don't want to use the term "reservation" anymore, but it would be helpful to mention that this case implies that the waitingTxn is in the queue and not holding the lock, otherwise this distinguished waiter would have been released by releaseWritersFromTxn.
pkg/kv/kvserver/concurrency/lock_table.go line 1342 at r2 (raw file):
} // waitingTxn returns the transaction that requests in the lock wait queue are
nit: "waiting" isn't very descriptive in a structure where everyone is waiting for someone else. In some sense, this is the only txn that is not waiting. Is there a better for this?
pkg/kv/kvserver/concurrency/lock_table.go line 2422 at r3 (raw file):
l.tryMakeNewDistinguished() } return l.queuedWriters.Len() == 0 && l.waitingReaders.Len() == 0 && l.reservation == nil
Does this cause us to GC held locks that don't have waiters?
pkg/kv/kvserver/concurrency/lock_table.go line 2500 at r3 (raw file):
// releases the first transactional writer it finds. The function is a no-op if // no transactional writer is present or if the first transactional writer is // not active. Any non-transactional writers before the first transactional
"no-op" is misleading here. Even if this is the case, the function can still release non-transaction waiters.
pkg/kv/kvserver/concurrency/lock_table.go line 2510 at r3 (raw file):
panic("maybeReleaseFirstTransactionalWriter called when lock is held") } if l.waitingReaders.Len() != 0 {
Where did we enforce this in the requestDone caller? Don't we clear the readers after calling this function?
7bfb972 to
b75c8f2
Compare
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
This commit is a pure refactor; it pulls out code to remove readers and writers from a lock's wait queues into separate functions. This lets us replace and simplify all calls to `doneWaitingAtLock` where `hasReservation` was false. It also means `doneWaitingAtLock` (now renamed to `doneActivelyWaitingAtLock` no longer needs to concern itself with the concept of reservations -- this will prove useful once we get rid of the concept entirely in a future commit. Release note: None
This patch pulls out logic to compute which transaction requests in a lock's wait queue are waiting on. This is forward looking to a time where we'll support multiple lock holders on a different key. More near term, once we get rid of reservations and replace them with the first transactional inactive writer, this will serve as a single point to make this change. Informs: cockroachdb#103361 Release note: None
b75c8f2 to
cf935c2
Compare
arulajmani
left a comment
There was a problem hiding this comment.
Like we spoke about offline, I pulled out the 4th commit into its own thing over at #103478.
RFAL.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/concurrency/lock_table.go line 2327 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
eis already a pointer, so I don't think anything that isn't already on the heap is escaping to the heap.
Ack, thanks for confirming.
pkg/kv/kvserver/concurrency/lock_table.go line 1288 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I know you don't want to use the term "reservation" anymore, but it would be helpful to mention that this case implies that the
waitingTxnis in the queue and not holding the lock, otherwise this distinguished waiter would have been released byreleaseWritersFromTxn.
Added a comment. Do you think this deserves a panic instead?
pkg/kv/kvserver/concurrency/lock_table.go line 1342 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: "waiting" isn't very descriptive in a structure where everyone is waiting for someone else. In some sense, this is the only txn that is not waiting. Is there a better for this?
I think this needs a new concept. For now, I've called this distinguishedTxn -- I'm not sure whether the parallel with distinguishedWaiter is a good thing or not. Definitely open to suggestions here.
pkg/kv/kvserver/concurrency/lock_table.go line 2422 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Does this cause us to GC held locks that don't have waiters?
Nice catch! I've fixed this condition to use l.isEmptyLock instead.
I wasn't sure why our existing tests weren't failing, so I tried crafting a test that would trigger this scenario. Turns out, we're being saved by this logic:
cockroach/pkg/kv/kvserver/concurrency/lock_table.go
Lines 2859 to 2865 in 2e6d3c5
pkg/kv/kvserver/concurrency/lock_table.go line 2500 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"no-op" is misleading here. Even if this is the case, the function can still release non-transaction waiters.
You're right. Improved this comment a bit, let me know what you think.
pkg/kv/kvserver/concurrency/lock_table.go line 2510 at r3 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Where did we enforce this in the
requestDonecaller? Don't we clear the readers after calling this function?
Yeah, but we only call into this function from requestDone if the lock isn't held, which means there can't be any waiting readers (as readers only queue behind held locks).
nvb
left a comment
There was a problem hiding this comment.
mod the discussion about naming. It feels important to get that right, given that it's a central concept in deadlock detection.
Thanks for splitting this into a separate PR, by the way. It's encouraging to see a green CI after these improvements.
Reviewed 1 of 1 files at r7, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)
pkg/kv/kvserver/concurrency/lock_table.go line 1288 at r2 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Added a comment. Do you think this deserves a panic instead?
Yeah, I think an assertion is appropriate. There are a lot of subtle and intertwined invariants in this data structure. I think we'd benefit from leaning into the use of more assertions. Even before #94986 lands, I would be enthusiastic about you defining an assertion utility at the bottom of this file and then using assertions throughout as you refactor. That's especially true now that you're splitting these functions apart, leading to more functions with subtle preconditions.
Something simple like the following would make this lightweight:
func assert(cond bool, msg string) {
if !cond {
panic(msg)
}
}pkg/kv/kvserver/concurrency/lock_table.go line 1342 at r2 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
I think this needs a new concept. For now, I've called this
distinguishedTxn-- I'm not sure whether the parallel withdistinguishedWaiteris a good thing or not. Definitely open to suggestions here.
To be honest, I don't think the "distinguished" parallel is helpful. If anything, it's confusing. I asked ChatGPT to weigh in. Some of the names I got from it were "blockerTxn", "blockingTxn", and "contenderTxn". It also said "firstInLine", which I thought was kind of nice, if we can separate it enough from the "wait-queue" to also be applicable to a lock holder.
cf935c2 to
062d331
Compare
arulajmani
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/kv/kvserver/concurrency/lock_table.go line 1288 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Yeah, I think an assertion is appropriate. There are a lot of subtle and intertwined invariants in this data structure. I think we'd benefit from leaning into the use of more assertions. Even before #94986 lands, I would be enthusiastic about you defining an assertion utility at the bottom of this file and then using assertions throughout as you refactor. That's especially true now that you're splitting these functions apart, leading to more functions with subtle preconditions.
Something simple like the following would make this lightweight:
func assert(cond bool, msg string) { if !cond { panic(msg) } }
Done. In a future commit, I'll move all the panics sprinkled around this package to make use of this function instead.
pkg/kv/kvserver/concurrency/lock_table.go line 1342 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
To be honest, I don't think the "distinguished" parallel is helpful. If anything, it's confusing. I asked ChatGPT to weigh in. Some of the names I got from it were "blockerTxn", "blockingTxn", and "contenderTxn". It also said "firstInLine", which I thought was kind of nice, if we can separate it enough from the "wait-queue" to also be applicable to a lock holder.
Changed to claimantTxn, after my own conversation with Chat GPT and conferring with you on Slack. Like I mentioned there, I'm not in love with this name but it seems like the best one we have for now. How do you feel about merging this and revisiting this naming if we can come up with something better in the next few weeks?
062d331 to
c256a8b
Compare
nvb
left a comment
There was a problem hiding this comment.
Reviewed 1 of 2 files at r8, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani)
pkg/kv/kvserver/concurrency/lock_table.go line 1342 at r2 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Changed to
claimantTxn, after my own conversation with Chat GPT and conferring with you on Slack. Like I mentioned there, I'm not in love with this name but it seems like the best one we have for now. How do you feel about merging this and revisiting this naming if we can come up with something better in the next few weeks?
I feel good about that. I also find "claimed" to be a good description of what we are trying to describe here, especially after reading the comment, so we might not need to change it.
pkg/kv/kvserver/concurrency/lock_table.go line 1300 at r9 (raw file):
// the lock is not held. assert( l.distinguishedWaiter == nil || !l.holder.locked, errors.AssertionFailedf(
We want to make these assertions as close to free as possible, so we'll need to avoid constructing errors ahead of time and then passing them in. Let's instead push this into the assertion function.
Also, is there a benefit to panicking with an errors.AssertionFailed, as opposed to a fmt.Sprintf()?
This patch reworks the lockIsFree contract. Previously, it could be called when a lock transitioned from locked/reserved to no longer locked/reserved. We no longer do the latter -- lockIsFree is only called when a lock is released. We instead move the reservation specific logic into a new method called `maybeReleaseFirstTransactionalWriter`. This is in preparation for a future commit where we'll remove reservations entirely. Release note: None
c256a8b to
8e72fe0
Compare
arulajmani
left a comment
There was a problem hiding this comment.
TFTR!
bors r=nvanbenschoten
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
pkg/kv/kvserver/concurrency/lock_table.go line 1342 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I feel good about that. I also find "claimed" to be a good description of what we are trying to describe here, especially after reading the comment, so we might not need to change it.
Yeah, after reworking some of the comments that previously spoke about reservations, I'm warming up to this as well. The comment around IsKeyLockedByConflictingTxn did it for me.
pkg/kv/kvserver/concurrency/lock_table.go line 1300 at r9 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We want to make these assertions as close to free as possible, so we'll need to avoid constructing errors ahead of time and then passing them in. Let's instead push this into the assertion function.
Also, is there a benefit to panicking with an
errors.AssertionFailed, as opposed to afmt.Sprintf()?
I don't think there is -- switched this to just using a string.
|
Build succeeded: |
103478: concurrency: get rid of reservations in the lock table r=nvanbenschoten a=arulajmani First 3 commits from #103373 This patch removes the notion of reservations from the lock table. Reservations served as a claim that prevented multiple requests from racing when a lock was released. Typically, when a lock was released, only the first transactional writer was released from the list of queued writers. It would do so by claiming a "reservation" on the lock. All requests that are sequenced through the lock table are associated with a sequence number based on arrival order. These sequence numbers are used to uphold ~fairness as requests are sequenced. They also serve a correctness purpose -- because all locks are not known upfront (as uncontended replicated locks may be discovered during evaluation), sequence numbers are used to break potential deadlocks that arise from out of order locking. This motivated the concept of reservation breaking, which could happen if a lower sequence number request encountered a reservation by a request with a higher sequence number. This would lead to somewhat complex state management, where requests could move from being reservations to inactive waiters multiple times during their lifetime. A lot of this can be simplified if we make no distinction between a reservation and an inactive waiter. This patch gets rid of reservations entirely. Instead, it offers a new invariant: The head of the list of waiting writers should always be an inactive, transactional writer if the lock isn't held. In practice, this works out functionally the same as how reservations operated, albeit with less state transitions. Being an inactive waiter at the head of the lock's wait-queue serves as the request's claim on the key. As such, verbiage that referenced "reservations" previously is now updated to talk about claims and claimant transactions. There's a bit of comment churn as a result. There's also some datadriven test churn as part of this patch -- but it should be helpful in convincing ourselves that this just changes concepts, and not functionality. In particular, what was previously a reservation holder, is now the first inactive queued qwriter at the lock. Closes #103361 Release note: None Co-authored-by: Arul Ajmani <arulajmani@gmail.com>
See individual commits for details.
Informs: #103361