concurrency: use lock.Modes to determine conflicts in tryActiveWait #104261
concurrency: use lock.Modes to determine conflicts in tryActiveWait #104261arulajmani wants to merge 3 commits intocockroachdb:masterfrom
Conversation
We'll make use of these when constructing lock modes. Epic: none Release note: None
nvb
left a comment
There was a problem hiding this comment.
Reviewed 3 of 3 files at r1, 1 of 2 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)
pkg/kv/kvserver/concurrency/lock_table.go line 1640 at r2 (raw file):
var ok bool replicatedFinalizedTxn, ok = g.lt.finalizedTxnCache.get(l.holder.holder[lock.Replicated].txn.ID) assert(ok, "expected txn to be finalized")
Is the cache guaranteed to still contain the transaction at this point? It's a small LRU cache that can be manipulated concurrently.
pkg/kv/kvserver/concurrency/lock_table.go line 1660 at r2 (raw file):
// a claim like normal. That is, it should add itself to the lock's wait // queue as an inactive writer. if !l.holder.locked && l.queuedWriters.Len() == 0 {
Is this the same condition as if transitionedToFree { return false, true }?
pkg/kv/kvserver/concurrency/lock_table.go line 1683 at r2 (raw file):
// prepareActiveWait and returning true here; there won't really be any // wait here. l.prepareActiveWait(g, notify, clock)
Doesn't this also call into adjustWaitingWritersWaitQueue? I'm surprised that we need this second handling of isMaxQueueLengthExceededError here.
pkg/kv/kvserver/concurrency/lock_table.go line 1712 at r2 (raw file):
// it is). guardLockMode := lock.MakeModeIntent(g.ts) for e := l.queuedWriters.Front(); e != nil; e = e.Next() {
I wonder whether there's room to unify this logic with the incremental handling of lock releasing. There's some duplicated logic between this function and maybeReleaseFirstTransactionalWriter.
I think one of the biggest sources of complexity in this code is the multiple, non-idempotent code paths that attempt to maintain the same invariants while handling different state transitions. That's only warranted if the redundancy is necessary for efficiency. But is it in this case? Could we treat queuing as the combination of:
- adding to the end of the wait queue, assuming we're going to have to wait
- releasing the first transactional writer, which may be us
pkg/kv/kvserver/concurrency/lock_table.go line 1714 at r2 (raw file):
for e := l.queuedWriters.Front(); e != nil; e = e.Next() { qqg := e.Value.(*queuedGuard) if qqg.guard == g {
Let's call out what's happened here:
// We found our request while scanning from the front before finding any conflicting waiters. There is no one for us to wait on.
pkg/kv/kvserver/concurrency/lock_table.go line 1721 at r2 (raw file):
} // Transactional writer. return false, transitionedToFree
Is it safe to return transitionedToFree = true here if we just added our own waiter?
pkg/kv/kvserver/concurrency/lock_table.go line 1725 at r2 (raw file):
waiterLockMode := lock.MakeModeIntent(qg.guard.ts) if lock.Conflicts(waiterLockMode, guardLockMode, &g.lt.settings.SV) { l.prepareActiveWait(g, notify, clock)
Same question. Are we inserting into the queue twice by first calling adjustWaitingWritersWaitQueue directly and then calling prepareActiveWait?
pkg/kv/kvserver/concurrency/lock_table.go line 1850 at r2 (raw file):
} // updateWaitingState updates the waiting state for the supplied request to wait
The comment on this function is incorrect.
pkg/kv/kvserver/concurrency/lock_table.go line 3323 at r2 (raw file):
} type maxQueueLengthExceededError struct{}
Let's give this a comment. I'm also not sure how useful it is to have an error return path that can only return one kind of error. Up to you.
arulajmani
left a comment
There was a problem hiding this comment.
Flushing replies to a few comments.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/concurrency/lock_table.go line 1583 at r2 (raw file):
// resolution (or done it), so this request does not need to do the resolution. // // TODO(XXX): The paragraph above is now stale. We're letting writers accumulate
Wanted to call out this part for discussion.
pkg/kv/kvserver/concurrency/lock_table.go line 1640 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is the cache guaranteed to still contain the transaction at this point? It's a small LRU cache that can be manipulated concurrently.
Good point. The only reason I needed this was because I didn't want to I need a roachpb.Transaction here. I could just modify conflictsWithLockHolder to return the replicatedFinalizedTxn instead of belongsToFinalizedTxn.
pkg/kv/kvserver/concurrency/lock_table.go line 1660 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is this the same condition as
if transitionedToFree { return false, true }?
Yeah, it is. I can make the switch if we feel that's easier to reason about.
pkg/kv/kvserver/concurrency/lock_table.go line 1683 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Doesn't this also call into
adjustWaitingWritersWaitQueue? I'm surprised that we need this second handling ofisMaxQueueLengthExceededErrorhere.
Yeah, but handling the the isMaxQueueLengthExceededError involves the same changes prepareForActiveWait makes when it hits this error. I wasn't sure what's clearer -- duplicating those changes here, or reusing prepareActiveWait. We could instead handle with this as well:
--- a/pkg/kv/kvserver/concurrency/lock_table.go
+++ b/pkg/kv/kvserver/concurrency/lock_table.go
@@ -1680,7 +1680,15 @@ func (l *lockState) tryActiveWait(
// based on their waiting state. The mechanism by which this happens is
// prepareActiveWait and returning true here; there won't really be any
// wait here.
- l.prepareActiveWait(g, notify, clock)
+ g.key = l.key
+ g.mu.startWait = true
+ g.mu.curLockWaitStart = clock.PhysicalTime()
+ ws := l.constructWaitingState(g)
+ ws.kind = waitQueueMaxLengthExceeded
+ g.updateWaitingStateLocked(ws)
+ if notify {
+ g.notify()
+ }
return true, transitionedToFree
}
panic(err) // should not happen
Is this clearer?
pkg/kv/kvserver/concurrency/lock_table.go line 1712 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I wonder whether there's room to unify this logic with the incremental handling of lock releasing. There's some duplicated logic between this function and
maybeReleaseFirstTransactionalWriter.I think one of the biggest sources of complexity in this code is the multiple, non-idempotent code paths that attempt to maintain the same invariants while handling different state transitions. That's only warranted if the redundancy is necessary for efficiency. But is it in this case? Could we treat queuing as the combination of:
- adding to the end of the wait queue, assuming we're going to have to wait
- releasing the first transactional writer, which may be us
Yeah. Now that you mention it, I think we need a call to informActiveWaiters here in the future -- we may have inserted this guy at the front of the queue, and the claimant txn may have changed as a result.
pkg/kv/kvserver/concurrency/lock_table.go line 1721 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is it safe to return
transitionedToFree = truehere if we just added our own waiter?
It should be safe, because we check if a lock is non-empty before GC-ing it. It might result in an unnecessary call to tryGCLocks, but it shouldn't be a correctness issue. I could also see why we should just return false instead. I could go either way here -- I actually did switch this locally right before pushing. Let me know if you have a preference either way.
pkg/kv/kvserver/concurrency/lock_table.go line 1725 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Same question. Are we inserting into the queue twice by first calling
adjustWaitingWritersWaitQueuedirectly and then callingprepareActiveWait?
adjustWaitingWritersWaitQueue has logic to handle the case where a request already exists. In this case, it'll effectively mark the request as an active waiter (we tentatively inserted it as an inactive waiter above).
Previously, `informActiveWaiters` would unconditionally signal the new state channel, even if the waiting state for the active waiter hadn't changed. This is fine, as the contract for signaling the new state channel makes no guarantees that there has been a state change. With this patch, `informActiveWaiters` detects such no-op cases and elides signaling the channel. This will prove handy when we expand the use of `informActiveWaiters`. Release note: None
This patch refactors `tryActiveWait` to make use of lock.Modes to determine conflicts. It's done so with the imminent introduction of more locking strengths in mind. To that effect, a lot of things are already generalized and the structure is such that it'll be easy to extend to multiple locks on a key. At a high level, the journey of a request through tryActiveWait is as follows: - First, it checks if its transaction already holds a lock at a equal or higher lock strength (read: it's good enough for its purpose). If this is the case, it can proceed without bothering with any bookkeeping. - It then checks if there's a lock holder with which it conflicts. It decides to actively wait if that is the case. - Finalized transactions/lock cleanup is handled next, if required. - At this point, a non-locking request can proceed. - A locking request first inserts itself in the correct position (as dictated by its sequence number) in the lock's wait-queue. It then iterates through the head of the wait-queue until it finds itself. While doing so, it checks for conflicts with requests waiting in front of it. If there is a conflict, it decides to actively wait. Otherwise, it can proceed with its scan. - If the request is of the transactional kind, it establishes a (possibly joint) claim on the lock before proceeding with its scan. A non-transactional request does not. Closes cockroachdb#102210 Release note: None
6b6e205 to
0c637b8
Compare
arulajmani
left a comment
There was a problem hiding this comment.
Thanks, RFAL.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/kvserver/concurrency/lock_table.go line 1640 at r2 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Good point. The only reason I needed this was because I didn't want to I need a
roachpb.Transactionhere. I could just modifyconflictsWithLockHolderto return thereplicatedFinalizedTxninstead ofbelongsToFinalizedTxn.
I meant finalizedTxn here. Updated.
pkg/kv/kvserver/concurrency/lock_table.go line 1660 at r2 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Yeah, it is. I can make the switch if we feel that's easier to reason about.
(haven't done this)
pkg/kv/kvserver/concurrency/lock_table.go line 1683 at r2 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Yeah, but handling the the
isMaxQueueLengthExceededErrorinvolves the same changesprepareForActiveWaitmakes when it hits this error. I wasn't sure what's clearer -- duplicating those changes here, or reusingprepareActiveWait. We could instead handle with this as well:--- a/pkg/kv/kvserver/concurrency/lock_table.go +++ b/pkg/kv/kvserver/concurrency/lock_table.go @@ -1680,7 +1680,15 @@ func (l *lockState) tryActiveWait( // based on their waiting state. The mechanism by which this happens is // prepareActiveWait and returning true here; there won't really be any // wait here. - l.prepareActiveWait(g, notify, clock) + g.key = l.key + g.mu.startWait = true + g.mu.curLockWaitStart = clock.PhysicalTime() + ws := l.constructWaitingState(g) + ws.kind = waitQueueMaxLengthExceeded + g.updateWaitingStateLocked(ws) + if notify { + g.notify() + } return true, transitionedToFree } panic(err) // should not happenIs this clearer?
Made this change for now. Let me know how you feel about this.
pkg/kv/kvserver/concurrency/lock_table.go line 1712 at r2 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Yeah. Now that you mention it, I think we need a call to
informActiveWaitershere in the future -- we may have inserted this guy at the front of the queue, and the claimant txn may have changed as a result.
I sat with this for a bit, and I'm leaning towards not making the change. A source of complexity in tryActiveWait is the somewhat surprising side effects of this function. For example, the calls to clearLockHolder and lockIsFree which can uncork unrelated requests, like I was talking to you about on Slack. Longer term, I think it'll be easier to reason about tryActiveWait if it only performs state changes that directly fall out of whether the supplied request should wait or not.
In the future, maybeReleaseFirstTransactionalWriter will probably be replaced by maybeReleaseAllCompatibleRequestsFromHeadOfTheQueue[1]. Such a function is more general than what we need here. It begs the same question as above, why is tryActiveWait uncorking other requests?
What's even more subtle is that it'll actually never uncork any other request at all! It'll either mark itself as inactive or be a no-op [2]. So making this change feels like we're needlessly expanding the scope of states we can be in at this point.
I did update this to call informActiveWaiters though. This motivated adding a new commit (2 of 3).
[1] maybeReleaseAllCompatibleRequestsFromHeadOfTheQueue will traverse the head of the queued locking requests list, removing any non-transactional write requests it encounters, and marking all compatible requests as inactive waiters and nudging them to proceed with their scan. It'll stop when it discovers an incompatible request.
[2] That's because if there were a request that was compatible with the head of the lock's wait queue (and potentially the lock holder), it shouldn't have been an active waiter to begin with.
pkg/kv/kvserver/concurrency/lock_table.go line 1850 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
The comment on this function is incorrect.
Thanks, fixed.
pkg/kv/kvserver/concurrency/lock_table.go line 3323 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Let's give this a comment. I'm also not sure how useful it is to have an error return path that can only return one kind of error. Up to you.
Added some comments. Mild preference to keep these using errors to indicate cases where we didn't modify the wait queues instead of returning a boolean, so keeping this as is.
nvb
left a comment
There was a problem hiding this comment.
Reviewed all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)
-- commits line 11 at r3:
What do you think about keeping this "elide signaling" change separate? It feels self-contained and probably deserves more scrutiny than it will get if grouped into this PR. A few questions to ask ourselves about it:
- what kind of bugs can occur from under-eliding? What kind of bugs can occur from over-eliding?
- is the logic in
canElideWaitingStateUpdateprone to rot as new fields are added towaitingState? - does the logic in
canElideWaitingStateUpdateneed to consider every field? Even observability-related fields likequeuedWriters? - can we push this into
updateWaitingStateLockedso that we don't add complexity to existing methods?
pkg/kv/kvserver/concurrency/lock_table.go line 1660 at r2 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
(haven't done this)
Yeah, I think it is easier to reason about. It makes it clear to readers why this lock is now unheld and with an empty wait queue. In fact, maybe move it up to where transitionedToFree is set so that it doesn't mix with other paths through this function.
+1 to removing the special case (in a future PR), by the way.
pkg/kv/kvserver/concurrency/lock_table.go line 1721 at r2 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
It should be safe, because we check if a lock is non-empty before GC-ing it. It might result in an unnecessary call to
tryGCLocks, but it shouldn't be a correctness issue. I could also see why we should just return false instead. I could go either way here -- I actually did switch this locally right before pushing. Let me know if you have a preference either way.
Related to the thread above, it's mildly confusing that we set transitionedToFree way up above and then keep using the variable throughout. It raises questions about whether any of the state transitions that we've performed have invalidated this stale piece of information. I'd suggest only using transitionedToFree on the one path that sets it and returning false (the literal) everywhere else.
|
Closing this one in favour of #104620. None of the outstanding comments are relevant to the new structure, so a fresh slate will probably be easier. Hopefully 3rd time is the charm! Btw, thanks for the collaboration on this one @nvanbenschoten. I won't jinx it by calling it fun before we actually land this, but I'm mostly there. |
104620: concurrency: use lock modes to find conflicts during lock table scans r=nvanbenschoten a=arulajmani Previous attempt over at: #104261 First commit from: #104537 Second commit from: #104600 This patch majorly refactors the lock table scan codepath, all in the name of shared locks. At its core is a desire to use lock modes to perform conflict resolution between an incoming request and locks held on one particular key. In doing so, we rip out tryActiveWait. At a high level (for a particular key), a request's journey looks like the following: - It first checks if the transaction already holds a lock at a equal or higher lock strength (read: It's good enough for its use). If this is the case, it can proceed without any bookkeeping. - It then checks if any finalized transactions hold locks on the key. Such locks do not conflict, but need to be resolved before the transaction can evaluate. They're thus accumulated for later. - The request then enqueues itself in the appropriate wait queue. - It then determines if it needs to actively wait at this lock because of a conflict. If that's the case, the lock table scan short circuits. - Otherwise, the request lays a claim (if it can) before proceeding with its scan. Closes #102210 Release note: None 105482: sqltelemetry: add missing schema telemetry r=postamar a=rafiss CREATE [ SCHEMA | INDEX | FUNCTION | TYPE ] and ALTER FUNCTION did not have any telemetry, but they should. Epic: None Release note: None 105579: sql: disallow cross-database type references in CTAS r=chengxiong-ruan a=chengxiong-ruan Fixes: #105393 Release note (bug fix): reviously, cross-database type references could sneaked in through `CREATE TABLE...AS` statements if the source table is from another database and any of its columns is of a user defined type. This introduced bug where the source table can be dropped and type could not be found for the CTAS table. This commit disallow such CTAS as a fix. 105581: optbuilder: reset annotations when building CREATE FUNCTION r=rafiss a=rafiss In 22dabb0 we started overriding the annotations for each statement in the UDF body. We should reset them to the original values, so we don't accidentally leave the old reference. Epic: None Release note: None 105596: storage: make `storage.max_sync_duration` public and `TenantReadOnly` r=erikgrinaker a=erikgrinaker Users have asked why this setting is not public, this patch makes it so. Furthermore, these settings were `TenantWritable`. We do not want these to be writable by tenants, where they can potentially cause problems on SQL nodes, considering e.g. SQL disk spilling uses Pebble. Epic: none Release note: None Co-authored-by: Arul Ajmani <arulajmani@gmail.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com> Co-authored-by: Chengxiong Ruan <chengxiongruan@gmail.com> Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
This patch refactors
tryActiveWaitto make use of lock.Modes todetermine conflicts. It's done so with the imminent introduction of
more locking strengths in mind. To that effect, a lot of things are
already generalized and the structure is such that it'll be easy to
extend to multiple locks on a key.
At a high level, the journey of a request through tryActiveWait is
as follows:
or higher lock strength (read: it's good enough for its purpose). If
this is the case, it can proceed without bothering with any bookkeeping.
decides to actively wait if that is the case.
dictated by its sequence number) in the lock's wait-queue. It then
iterates through the head of the wait-queue until it finds itself.
While doing so, it checks for conflicts with requests waiting in front
of it. If there is a conflict, it decides to actively wait. Otherwise,
it can proceed with its scan.
(possibly joint) claim on the lock before proceeding with its scan.
A non-transactional request does not.
Closes #102210
Release note: None