Skip to content

concurrency: small refactors in preparation for reservation removal#103373

Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom
arulajmani:remove-reservations-v2
May 18, 2023
Merged

concurrency: small refactors in preparation for reservation removal#103373
craig[bot] merged 3 commits intocockroachdb:masterfrom
arulajmani:remove-reservations-v2

Conversation

@arulajmani
Copy link
Copy Markdown
Collaborator

@arulajmani arulajmani commented May 16, 2023

See individual commits for details.

Informs: #103361

@arulajmani arulajmani requested a review from nvb May 16, 2023 04:51
@arulajmani arulajmani requested a review from a team as a code owner May 16, 2023 04:51
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@arulajmani
Copy link
Copy Markdown
Collaborator Author

@nvanbenschoten I didn't bother cleaning tryActiveWait up in this patch -- I just made some minimal changes to make sure things functionally remain the same. I'm assuming that's okay, given we're basically rewriting the thing over in #102973.

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 1 of 1 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)


-- commits line 21 at r2:
s/non-transactional/transactional/


-- commits line 37 at r3:
nit: "preparation"


pkg/kv/kvserver/concurrency/lock_table.go line 2327 at r1 (raw file):

// from the lock's queuedReaders list. Returns whether the reader was the
// distinguished waiter or not.
// TODO(arul): Check that this doesn't cause e to escape to the heap.

e is already a pointer, so I don't think anything that isn't already on the heap is escaping to the heap.


pkg/kv/kvserver/concurrency/lock_table.go line 1288 at r2 (raw file):

		// OR it belongs to the same transaction that waiters in the lock wait queue
		// are waiting on, because a transaction doesn't push itself (it just sits
		// tight).

I know you don't want to use the term "reservation" anymore, but it would be helpful to mention that this case implies that the waitingTxn is in the queue and not holding the lock, otherwise this distinguished waiter would have been released by releaseWritersFromTxn.


pkg/kv/kvserver/concurrency/lock_table.go line 1342 at r2 (raw file):

}

// waitingTxn returns the transaction that requests in the lock wait queue are

nit: "waiting" isn't very descriptive in a structure where everyone is waiting for someone else. In some sense, this is the only txn that is not waiting. Is there a better for this?


pkg/kv/kvserver/concurrency/lock_table.go line 2422 at r3 (raw file):

		l.tryMakeNewDistinguished()
	}
	return l.queuedWriters.Len() == 0 && l.waitingReaders.Len() == 0 && l.reservation == nil

Does this cause us to GC held locks that don't have waiters?


pkg/kv/kvserver/concurrency/lock_table.go line 2500 at r3 (raw file):

// releases the first transactional writer it finds. The function is a no-op if
// no transactional writer is present or if the first transactional writer is
// not active. Any non-transactional writers before the first transactional

"no-op" is misleading here. Even if this is the case, the function can still release non-transaction waiters.


pkg/kv/kvserver/concurrency/lock_table.go line 2510 at r3 (raw file):

		panic("maybeReleaseFirstTransactionalWriter called when lock is held")
	}
	if l.waitingReaders.Len() != 0 {

Where did we enforce this in the requestDone caller? Don't we clear the readers after calling this function?

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented May 16, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@arulajmani arulajmani changed the title concurrency: get rid of reservations in the lock table concurrency: small refactors in preparation for reservation removal May 16, 2023
This commit is a pure refactor; it pulls out code to remove readers
and writers from a lock's wait queues into separate functions. This
lets us replace and simplify all calls to `doneWaitingAtLock` where
`hasReservation` was false. It also means `doneWaitingAtLock` (now
renamed to `doneActivelyWaitingAtLock` no longer needs to concern
itself with the concept of reservations -- this will prove useful once
we get rid of the concept entirely in a future commit.

Release note: None
This patch pulls out logic to compute which transaction requests in a
lock's wait queue are waiting on. This is forward looking to a time
where we'll support multiple lock holders on a different key. More near
term, once we get rid of reservations and replace them with the first
transactional inactive writer, this will serve as a single point to
make this change.

Informs: cockroachdb#103361

Release note: None
@arulajmani arulajmani force-pushed the remove-reservations-v2 branch from b75c8f2 to cf935c2 Compare May 17, 2023 00:45
Copy link
Copy Markdown
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like we spoke about offline, I pulled out the 4th commit into its own thing over at #103478.

RFAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/kvserver/concurrency/lock_table.go line 2327 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

e is already a pointer, so I don't think anything that isn't already on the heap is escaping to the heap.

Ack, thanks for confirming.


pkg/kv/kvserver/concurrency/lock_table.go line 1288 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I know you don't want to use the term "reservation" anymore, but it would be helpful to mention that this case implies that the waitingTxn is in the queue and not holding the lock, otherwise this distinguished waiter would have been released by releaseWritersFromTxn.

Added a comment. Do you think this deserves a panic instead?


pkg/kv/kvserver/concurrency/lock_table.go line 1342 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: "waiting" isn't very descriptive in a structure where everyone is waiting for someone else. In some sense, this is the only txn that is not waiting. Is there a better for this?

I think this needs a new concept. For now, I've called this distinguishedTxn -- I'm not sure whether the parallel with distinguishedWaiter is a good thing or not. Definitely open to suggestions here.


pkg/kv/kvserver/concurrency/lock_table.go line 2422 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does this cause us to GC held locks that don't have waiters?

Nice catch! I've fixed this condition to use l.isEmptyLock instead.

I wasn't sure why our existing tests weren't failing, so I tried crafting a test that would trigger this scenario. Turns out, we're being saved by this logic:

l.mu.Lock()
empty := l.isEmptyLock()
l.mu.Unlock()
if empty {
tree.Delete(l)
atomic.AddInt64(&tree.numLocks, -1)
}


pkg/kv/kvserver/concurrency/lock_table.go line 2500 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"no-op" is misleading here. Even if this is the case, the function can still release non-transaction waiters.

You're right. Improved this comment a bit, let me know what you think.


pkg/kv/kvserver/concurrency/lock_table.go line 2510 at r3 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Where did we enforce this in the requestDone caller? Don't we clear the readers after calling this function?

Yeah, but we only call into this function from requestDone if the lock isn't held, which means there can't be any waiting readers (as readers only queue behind held locks).

@arulajmani arulajmani requested a review from nvb May 17, 2023 00:48
Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: mod the discussion about naming. It feels important to get that right, given that it's a central concept in deadlock detection.

Thanks for splitting this into a separate PR, by the way. It's encouraging to see a green CI after these improvements.

Reviewed 1 of 1 files at r7, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)


pkg/kv/kvserver/concurrency/lock_table.go line 1288 at r2 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Added a comment. Do you think this deserves a panic instead?

Yeah, I think an assertion is appropriate. There are a lot of subtle and intertwined invariants in this data structure. I think we'd benefit from leaning into the use of more assertions. Even before #94986 lands, I would be enthusiastic about you defining an assertion utility at the bottom of this file and then using assertions throughout as you refactor. That's especially true now that you're splitting these functions apart, leading to more functions with subtle preconditions.

Something simple like the following would make this lightweight:

func assert(cond bool, msg string) {
    if !cond {
        panic(msg)
    }
}

pkg/kv/kvserver/concurrency/lock_table.go line 1342 at r2 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

I think this needs a new concept. For now, I've called this distinguishedTxn -- I'm not sure whether the parallel with distinguishedWaiter is a good thing or not. Definitely open to suggestions here.

To be honest, I don't think the "distinguished" parallel is helpful. If anything, it's confusing. I asked ChatGPT to weigh in. Some of the names I got from it were "blockerTxn", "blockingTxn", and "contenderTxn". It also said "firstInLine", which I thought was kind of nice, if we can separate it enough from the "wait-queue" to also be applicable to a lock holder.

@arulajmani arulajmani force-pushed the remove-reservations-v2 branch from cf935c2 to 062d331 Compare May 17, 2023 21:29
Copy link
Copy Markdown
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)


pkg/kv/kvserver/concurrency/lock_table.go line 1288 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Yeah, I think an assertion is appropriate. There are a lot of subtle and intertwined invariants in this data structure. I think we'd benefit from leaning into the use of more assertions. Even before #94986 lands, I would be enthusiastic about you defining an assertion utility at the bottom of this file and then using assertions throughout as you refactor. That's especially true now that you're splitting these functions apart, leading to more functions with subtle preconditions.

Something simple like the following would make this lightweight:

func assert(cond bool, msg string) {
    if !cond {
        panic(msg)
    }
}

Done. In a future commit, I'll move all the panics sprinkled around this package to make use of this function instead.


pkg/kv/kvserver/concurrency/lock_table.go line 1342 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

To be honest, I don't think the "distinguished" parallel is helpful. If anything, it's confusing. I asked ChatGPT to weigh in. Some of the names I got from it were "blockerTxn", "blockingTxn", and "contenderTxn". It also said "firstInLine", which I thought was kind of nice, if we can separate it enough from the "wait-queue" to also be applicable to a lock holder.

Changed to claimantTxn, after my own conversation with Chat GPT and conferring with you on Slack. Like I mentioned there, I'm not in love with this name but it seems like the best one we have for now. How do you feel about merging this and revisiting this naming if we can come up with something better in the next few weeks?

@arulajmani arulajmani force-pushed the remove-reservations-v2 branch from 062d331 to c256a8b Compare May 17, 2023 21:32
Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 2 files at r8, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @arulajmani)


pkg/kv/kvserver/concurrency/lock_table.go line 1342 at r2 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Changed to claimantTxn, after my own conversation with Chat GPT and conferring with you on Slack. Like I mentioned there, I'm not in love with this name but it seems like the best one we have for now. How do you feel about merging this and revisiting this naming if we can come up with something better in the next few weeks?

I feel good about that. I also find "claimed" to be a good description of what we are trying to describe here, especially after reading the comment, so we might not need to change it.


pkg/kv/kvserver/concurrency/lock_table.go line 1300 at r9 (raw file):

		// the lock is not held.
		assert(
			l.distinguishedWaiter == nil || !l.holder.locked, errors.AssertionFailedf(

We want to make these assertions as close to free as possible, so we'll need to avoid constructing errors ahead of time and then passing them in. Let's instead push this into the assertion function.

Also, is there a benefit to panicking with an errors.AssertionFailed, as opposed to a fmt.Sprintf()?

This patch reworks the lockIsFree contract. Previously, it could be
called when a lock transitioned from locked/reserved to no longer
locked/reserved. We no longer do the latter -- lockIsFree is only
called when a lock is released.

We instead move the reservation specific logic into a new method
called `maybeReleaseFirstTransactionalWriter`. This is in preparation
for a future commit where we'll remove reservations entirely.

Release note: None
@arulajmani arulajmani force-pushed the remove-reservations-v2 branch from c256a8b to 8e72fe0 Compare May 18, 2023 19:47
Copy link
Copy Markdown
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

bors r=nvanbenschoten

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)


pkg/kv/kvserver/concurrency/lock_table.go line 1342 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I feel good about that. I also find "claimed" to be a good description of what we are trying to describe here, especially after reading the comment, so we might not need to change it.

Yeah, after reworking some of the comments that previously spoke about reservations, I'm warming up to this as well. The comment around IsKeyLockedByConflictingTxn did it for me.


pkg/kv/kvserver/concurrency/lock_table.go line 1300 at r9 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We want to make these assertions as close to free as possible, so we'll need to avoid constructing errors ahead of time and then passing them in. Let's instead push this into the assertion function.

Also, is there a benefit to panicking with an errors.AssertionFailed, as opposed to a fmt.Sprintf()?

I don't think there is -- switched this to just using a string.

@craig
Copy link
Copy Markdown
Contributor

craig bot commented May 18, 2023

Build succeeded:

@craig craig bot merged commit fe71b7c into cockroachdb:master May 18, 2023
craig bot pushed a commit that referenced this pull request May 31, 2023
103478: concurrency: get rid of reservations in the lock table r=nvanbenschoten a=arulajmani

First 3 commits from #103373

This patch removes the notion of reservations from the lock table.
Reservations served as a claim that prevented multiple requests from
racing when a lock was released. Typically, when a lock was released,
only the first transactional writer was released from the list of
queued writers. It would do so by claiming a "reservation" on the
lock.

All requests that are sequenced through the lock table are associated
with a sequence number based on arrival order. These sequence numbers
are used to uphold ~fairness as requests are sequenced. They also serve
a correctness purpose -- because all locks are not known upfront (as
uncontended replicated locks may be discovered during evaluation),
sequence numbers are used to break potential deadlocks that arise from
out of order locking. This motivated the concept of reservation
breaking, which could happen if a lower sequence number request
encountered a reservation by a request with a higher sequence number.
This would lead to somewhat complex state management, where requests
could  move from being reservations to inactive waiters multiple times
during their lifetime. A lot of this can be simplified if we make no
distinction between a reservation and an inactive waiter.

This patch gets rid of reservations entirely. Instead, it offers a new
invariant:

The head of the list of waiting writers should always be an inactive,
transactional writer if the lock isn't held.

In practice, this works out functionally the same as how reservations
operated, albeit with less state transitions. Being an inactive waiter
at the head of the lock's wait-queue serves as the request's claim on
the key. As such, verbiage that referenced "reservations" previously is
now updated to talk about claims and claimant transactions. There's a
bit of comment churn as a result. There's also some datadriven test
churn as part of this patch -- but it should be helpful in convincing
ourselves that this just changes concepts, and not functionality. In
particular, what was previously a reservation holder, is now the first
inactive queued qwriter at the lock.

Closes #103361

Release note: None

Co-authored-by: Arul Ajmani <arulajmani@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants