kv: add LockTimeout option to BatchRequest by nvb · Pull Request #68026 · cockroachdb/cockroach

nvb · 2021-07-24T02:07:07Z

Informs #67513.

This commit introduces a new lock_timeout field on the BatchRequest Header struct.

lock_timeout specifies the maximum amount of time that the batch request will wait while attempting to acquire a lock on a key or while blocking on an existing lock in order to perform a non-locking read on a key. The time limit applies separately to each lock acquisition attempt. If the timeout elapses when waiting for a lock, a WriteIntentError will be returned.

Unlike in some other systems like PostgreSQL, where non-locking reads do not wait on conflicting locks, in CockroachDB, non-locking reads do wait for conflicting locks to be released. Because of this, the lock_timeout configuration applies to non-locking reads in read-write and read-only transactions as well.

Only the (default) Block wait policy will allow a request to wait on conflicting locks, so the timeout only makes sense in conjunction with the Block wait policy. The Error wait policy will throw an error immediately if a conflicting lock held by an active transaction is encountered, so this timeout can never be hit with an Error wait policy.

A value of zero disables the timeout.

cockroach-teamcity · 2021-07-24T02:07:18Z

This change is

Fixes cockroachdb#67513. In cockroachdb#68026, we added support for a new `lock_timeout` field on the `BatchRequest.Header` struct. A batch's lock_timeout specifies the maximum amount of time that it will wait while attempting to acquire a lock on a key or while blocking on an existing lock in order to perform a non-locking read on a key. The time limit applies separately to each lock acquisition attempt. If the timeout elapses when waiting for a lock, a WriteIntentError is returned. This extension to the KV API allows us to properly support the `lock_timeout` session variable in SQL, which matches these semantics closely. When configured, this session variable aborts any statement that waits longer than the specified amount of time while attempting to acquire a single lock. The only deviation from Postgres' behavior is that the lock_timeout configuration applies to non-locking reads in read-write and read-only transactions. This is because unlike in Postgres, where where non-locking reads do not wait on conflicting locks, in CockroachDB, non-locking reads do wait for conflicting locks to be released. Release note (sql change): The lock_timeout session variable is now supported. The configuration can be used to abort a query with an error if it waits longer than the configured duration while blocking on a single row-level lock acquisition.

sumeerbhola

I haven't read the tests yet. Looks good, with just some small questions.

Reviewed 2 of 2 files at r1, 10 of 18 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @nvanbenschoten, and @sumeerbhola)

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 299 at r2 (raw file):

						ctx, req.LockTimeout,
						func(ctx context.Context) *Error { return w.pushLockTxn(ctx, req, state) },
						func(ctx context.Context) *Error { return w.pushLockTxnAfterTimeout(ctx, req, state) },

If it timed out while doing the push, I don't understand the reason for doing another push with error wait. The probability of success would be low. Is there extra information we need, like the latest state of the pushee?

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 456 at r2 (raw file):

		return doWithTimeoutAndFallback(
			ctx, req.LockTimeout,
			func(ctx context.Context) *Error { return w.pushLockTxn(ctx, req, state) },

same question here

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 777 at r2 (raw file):

		// NOTE: we look at timeoutCtx.Err() and not err to avoid the
		// potential for bugs if context cancellation is not propagated
		// correctly on some error paths.

So you are worried about err being nil, or err not correctly specifying that it was due to the deadline being exceeded?

pkg/kv/kvserver/concurrency/lock_table_waiter_test.go, line 477 at r1 (raw file):

	testutils.RunTrueAndFalse(t, "lockHeld", func(t *testing.T, lockHeld bool) {
		testutils.RunTrueAndFalse(t, "pusheeActive", func(t *testing.T, pusheeActive bool) {
			if !lockHeld && !pusheeActive {

can you add a comment like
// !lockHeld means a lock reservation, so is only possible when pusheeActive is true.

nvb

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @stevendanna, and @sumeerbhola)

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 299 at r2 (raw file):

Previously, sumeerbhola wrote…

If it timed out while doing the push, I don't understand the reason for doing another push with error wait. The probability of success would be low. Is there extra information we need, like the latest state of the pushee?

The idea is that on a timeout, we don't know whether the PUSH_TIMESTAMP/PUSH_ABORT actually got to the point of proving that the pushee was active and began waiting in its txnwait.Queue. The push may have timed out before this point. But just like with WaitPolicy_Error, we don't want to throw an error on abandoned locks. So on timeout, we issue a PUSH_TOUCH request (like we do for WaitPolicy_Error) that is not subject to the lock_timeout to check with certainty whether the conflict is active or not, but without blocking if it happens to be active.

If you buy all this, I'll add some more commentary to spell this out in the code.

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 777 at r2 (raw file):

Previously, sumeerbhola wrote…

So you are worried about err being nil, or err not correctly specifying that it was due to the deadline being exceeded?

The latter. As it turns out, this is also what contextutil.RunWithTimeout does, presumable for the same reason.

pkg/kv/kvserver/concurrency/lock_table_waiter_test.go, line 477 at r1 (raw file):

Previously, sumeerbhola wrote…

can you add a comment like
// !lockHeld means a lock reservation, so is only possible when pusheeActive is true.

Done.

sumeerbhola

sorry about the delay

Reviewed 18 of 18 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15, @nvanbenschoten, @stevendanna, and @sumeerbhola)

pkg/kv/kvserver/concurrency/concurrency_manager_test.go, line 162 at r6 (raw file):

					// without a delay by the lockTableWaiter, ensuring that the lock
					// timeout logic deterministically fires.
					// See (*lockTableWaiterImpl).timeUntilDeadline.

Does this mean we are not testing the doWithTimeoutAndFallback code?

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 299 at r2 (raw file):

actually got to the point of proving that the pushee was active and began waiting in its txnwait.Queue

And it may not because of slowness elsewhere like network or cpu overload?

If you buy all this, I'll add some more commentary to spell this out in the code.

sounds good

pkg/kv/kvserver/concurrency/lock_table_waiter_test.go, line 701 at r6 (raw file):

					h roachpb.Header,
					pushType roachpb.PushTxnType,
				) (*roachpb.Transaction, *Error) {

never mind my earlier question about testing of the fallback path.

The test was not exercising the case where a pushee was still active during the PUSH_TOUCH issued by requests with an Error wait policy.

Informs cockroachdb#67513. This commit introduces a new `lock_timeout` field on the BatchRequest Header struct. lock_timeout specifies the maximum amount of time that the batch request will wait while attempting to acquire a lock on a key or while blocking on an existing lock in order to perform a non-locking read on a key. The time limit applies separately to each lock acquisition attempt. If the timeout elapses when waiting for a lock, a WriteIntentError will be returned. Unlike in some other systems like PostgreSQL, where non-locking reads do not wait on conflicting locks, in CockroachDB, non-locking reads do wait for conflicting locks to be released. Because of this, the lock_timeout configuration applies to non-locking reads in read-write and read-only transactions as well. Only the (default) Block wait policy will allow a request to wait on conflicting locks, so the timeout only makes sense in conjunction with the Block wait policy. The Error wait policy will throw an error immediately if a conflicting lock held by an active transaction is encountered, so this timeout can never be hit with an Error wait policy. A value of zero disables the timeout.

nvb

sorry about the delay

Not a problem at all, you were away for a few weeks and this was a big PR. Thanks for getting through all of it!

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @stevendanna, and @sumeerbhola)

pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 299 at r2 (raw file):

And it may not because of slowness elsewhere like network or cpu overload?

Exactly. I added some commentary along these lines.

craig · 2021-08-10T02:43:04Z

Build succeeded:

GitHub CI (Cockroach)

Fixes cockroachdb#67513. In cockroachdb#68026, we added support for a new `lock_timeout` field on the `BatchRequest.Header` struct. A batch's lock_timeout specifies the maximum amount of time that it will wait while attempting to acquire a lock on a key or while blocking on an existing lock in order to perform a non-locking read on a key. The time limit applies separately to each lock acquisition attempt. If the timeout elapses when waiting for a lock, a WriteIntentError is returned. This extension to the KV API allows us to properly support the `lock_timeout` session variable in SQL, which matches these semantics closely. When configured, this session variable aborts any statement that waits longer than the specified amount of time while attempting to acquire a single lock. The only deviation from Postgres' behavior is that the lock_timeout configuration applies to non-locking reads in read-write and read-only transactions. This is because unlike in Postgres, where where non-locking reads do not wait on conflicting locks, in CockroachDB, non-locking reads do wait for conflicting locks to be released. Release note (sql change): The lock_timeout session variable is now supported. The configuration can be used to abort a query with an error if it waits longer than the configured duration while blocking on a single row-level lock acquisition.

@jordanlewis

68042: sql: support lock_timeout session variable r=nvanbenschoten a=nvanbenschoten Fixes #67513. Please ignore the first two commits, which are from #68026. In #68026, we added support for a new `lock_timeout` field on the `BatchRequest.Header` struct. A batch's lock_timeout specifies the maximum amount of time that it will wait while attempting to acquire a lock on a key or while blocking on an existing lock in order to perform a non-locking read on a key. The time limit applies separately to each lock acquisition attempt. If the timeout elapses when waiting for a lock, a WriteIntentError is returned. This extension to the KV API allows us to properly support the `lock_timeout` session variable in SQL, which matches these semantics closely. When configured, this session variable aborts any statement that waits longer than the specified amount of time while attempting to acquire a single lock. The only deviation from Postgres' behavior is that the lock_timeout configuration applies to non-locking reads in read-write and read-only transactions. This is because unlike in Postgres, where where non-locking reads do not wait on conflicting locks, in CockroachDB, non-locking reads do wait for conflicting locks to be released. Release note (sql change): The lock_timeout session variable is now supported. The configuration can be used to abort a query with an error if it waits longer than the configured duration while blocking on a single row-level lock acquisition. @jordanlewis I'm adding you for the review, but feel free to re-assign. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

@jordanlewis

68042: sql: support lock_timeout session variable r=nvanbenschoten a=nvanbenschoten Fixes #67513. Please ignore the first two commits, which are from #68026. In #68026, we added support for a new `lock_timeout` field on the `BatchRequest.Header` struct. A batch's lock_timeout specifies the maximum amount of time that it will wait while attempting to acquire a lock on a key or while blocking on an existing lock in order to perform a non-locking read on a key. The time limit applies separately to each lock acquisition attempt. If the timeout elapses when waiting for a lock, a WriteIntentError is returned. This extension to the KV API allows us to properly support the `lock_timeout` session variable in SQL, which matches these semantics closely. When configured, this session variable aborts any statement that waits longer than the specified amount of time while attempting to acquire a single lock. The only deviation from Postgres' behavior is that the lock_timeout configuration applies to non-locking reads in read-write and read-only transactions. This is because unlike in Postgres, where where non-locking reads do not wait on conflicting locks, in CockroachDB, non-locking reads do wait for conflicting locks to be released. Release note (sql change): The lock_timeout session variable is now supported. The configuration can be used to abort a query with an error if it waits longer than the configured duration while blocking on a single row-level lock acquisition. @jordanlewis I'm adding you for the review, but feel free to re-assign. 68313: kv: avoid regressions in closed timestamp sidetransportAccess r=nvanbenschoten a=nvanbenschoten This commit strengthens the guarantees of the `sidetransportAccess`, which now promises that the returned timestamp from its get method will never regress across sequential calls. This is made possible by retaining two closed timestamps in the access - a `cur` closed timestamp that is known to be applied and an optional `next` closed timestamp that is not yet applied. Using two timestamps ensures that we never lose information about an applied closed timestamp, so we can make the stronger guarantee to users of the abstraction that returned timestamp will always increase monotonically across consecutive calls. This was causing flakiness in tests added in #68194. The property-based tests in that PR were asserting (in one form or another) that after a `QueryResolvedTimestamp` request returns a resolved timestamp, a subsequent read-only request at that timestamp will never block on replication (i.e. the closed timestamp) or on conflicting transactions when executed. These tests were very rarely finding this not to be the case. The reason for this was that it was possible for the `sidetransportAccess` to regress across two sequential calls. cc. @andreimatei, but not assigning because you're away. 68553: util/log: start garbage collection routines for old log files r=knz a=rauchenstein This fixes a regression. The call the routine that deletes old log files when max-group-size is configured was incorrectly removed in a refactor, so that functionality was disabled. This change reinstates it. Release note (bug fix): File logs respect max-group-size configuration. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Jay Rauchenstein <rauchenstein@cockroachlabs.com>

nvb requested review from andreimatei and sumeerbhola July 24, 2021 02:07

nvb force-pushed the nvanbenschoten/lockTimeout branch 3 times, most recently from b40ca3a to 38f2d63 Compare July 25, 2021 04:19

nvb mentioned this pull request Jul 25, 2021

sql: support lock_timeout session variable #68042

Merged

nvb requested review from aayushshah15 and removed request for andreimatei July 28, 2021 19:30

sumeerbhola requested changes Aug 3, 2021

View reviewed changes

nvb force-pushed the nvanbenschoten/lockTimeout branch from 38f2d63 to e976aa8 Compare August 4, 2021 17:13

nvb requested review from a team as code owners August 4, 2021 17:13

nvb requested review from stevendanna and removed request for a team August 4, 2021 17:13

nvb commented Aug 4, 2021

View reviewed changes

nvb force-pushed the nvanbenschoten/lockTimeout branch from e976aa8 to a0380c8 Compare August 4, 2021 19:11

sumeerbhola approved these changes Aug 9, 2021

View reviewed changes

nvb added 2 commits August 9, 2021 18:46

kv: add active pushee variants TestLockTableWaiterWithErrorWaitPolicy

5dbc576

The test was not exercising the case where a pushee was still active during the PUSH_TOUCH issued by requests with an Error wait policy.

nvb force-pushed the nvanbenschoten/lockTimeout branch from a0380c8 to 77405a5 Compare August 9, 2021 23:00

nvb commented Aug 10, 2021

View reviewed changes

craig bot merged commit 847514d into cockroachdb:master Aug 10, 2021

nvb deleted the nvanbenschoten/lockTimeout branch August 10, 2021 02:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: add LockTimeout option to BatchRequest#68026

kv: add LockTimeout option to BatchRequest#68026
craig[bot] merged 2 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/lockTimeout

nvb commented Jul 24, 2021

Uh oh!

cockroach-teamcity commented Jul 24, 2021

Uh oh!

sumeerbhola left a comment

Uh oh!

nvb left a comment

Uh oh!

sumeerbhola left a comment

Uh oh!

nvb left a comment

Uh oh!

craig bot commented Aug 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nvb commented Jul 24, 2021

Uh oh!

cockroach-teamcity commented Jul 24, 2021

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

sumeerbhola left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

craig bot commented Aug 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants