kv: add LockTimeout option to BatchRequest#68026
kv: add LockTimeout option to BatchRequest#68026craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
b40ca3a to
38f2d63
Compare
Fixes cockroachdb#67513. In cockroachdb#68026, we added support for a new `lock_timeout` field on the `BatchRequest.Header` struct. A batch's lock_timeout specifies the maximum amount of time that it will wait while attempting to acquire a lock on a key or while blocking on an existing lock in order to perform a non-locking read on a key. The time limit applies separately to each lock acquisition attempt. If the timeout elapses when waiting for a lock, a WriteIntentError is returned. This extension to the KV API allows us to properly support the `lock_timeout` session variable in SQL, which matches these semantics closely. When configured, this session variable aborts any statement that waits longer than the specified amount of time while attempting to acquire a single lock. The only deviation from Postgres' behavior is that the lock_timeout configuration applies to non-locking reads in read-write and read-only transactions. This is because unlike in Postgres, where where non-locking reads do not wait on conflicting locks, in CockroachDB, non-locking reads do wait for conflicting locks to be released. Release note (sql change): The lock_timeout session variable is now supported. The configuration can be used to abort a query with an error if it waits longer than the configured duration while blocking on a single row-level lock acquisition.
sumeerbhola
left a comment
There was a problem hiding this comment.
I haven't read the tests yet. Looks good, with just some small questions.
Reviewed 2 of 2 files at r1, 10 of 18 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @nvanbenschoten, and @sumeerbhola)
pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 299 at r2 (raw file):
ctx, req.LockTimeout, func(ctx context.Context) *Error { return w.pushLockTxn(ctx, req, state) }, func(ctx context.Context) *Error { return w.pushLockTxnAfterTimeout(ctx, req, state) },
If it timed out while doing the push, I don't understand the reason for doing another push with error wait. The probability of success would be low. Is there extra information we need, like the latest state of the pushee?
pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 456 at r2 (raw file):
return doWithTimeoutAndFallback( ctx, req.LockTimeout, func(ctx context.Context) *Error { return w.pushLockTxn(ctx, req, state) },
same question here
pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 777 at r2 (raw file):
// NOTE: we look at timeoutCtx.Err() and not err to avoid the // potential for bugs if context cancellation is not propagated // correctly on some error paths.
So you are worried about err being nil, or err not correctly specifying that it was due to the deadline being exceeded?
pkg/kv/kvserver/concurrency/lock_table_waiter_test.go, line 477 at r1 (raw file):
testutils.RunTrueAndFalse(t, "lockHeld", func(t *testing.T, lockHeld bool) { testutils.RunTrueAndFalse(t, "pusheeActive", func(t *testing.T, pusheeActive bool) { if !lockHeld && !pusheeActive {
can you add a comment like
// !lockHeld means a lock reservation, so is only possible when pusheeActive is true.
38f2d63 to
e976aa8
Compare
nvb
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @aayushshah15, @stevendanna, and @sumeerbhola)
pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 299 at r2 (raw file):
Previously, sumeerbhola wrote…
If it timed out while doing the push, I don't understand the reason for doing another push with error wait. The probability of success would be low. Is there extra information we need, like the latest state of the pushee?
The idea is that on a timeout, we don't know whether the PUSH_TIMESTAMP/PUSH_ABORT actually got to the point of proving that the pushee was active and began waiting in its txnwait.Queue. The push may have timed out before this point. But just like with WaitPolicy_Error, we don't want to throw an error on abandoned locks. So on timeout, we issue a PUSH_TOUCH request (like we do for WaitPolicy_Error) that is not subject to the lock_timeout to check with certainty whether the conflict is active or not, but without blocking if it happens to be active.
If you buy all this, I'll add some more commentary to spell this out in the code.
pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 777 at r2 (raw file):
Previously, sumeerbhola wrote…
So you are worried about err being nil, or err not correctly specifying that it was due to the deadline being exceeded?
The latter. As it turns out, this is also what contextutil.RunWithTimeout does, presumable for the same reason.
pkg/kv/kvserver/concurrency/lock_table_waiter_test.go, line 477 at r1 (raw file):
Previously, sumeerbhola wrote…
can you add a comment like
// !lockHeld means a lock reservation, so is only possible when pusheeActive is true.
Done.
e976aa8 to
a0380c8
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
Reviewed 18 of 18 files at r6.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @aayushshah15, @nvanbenschoten, @stevendanna, and @sumeerbhola)
pkg/kv/kvserver/concurrency/concurrency_manager_test.go, line 162 at r6 (raw file):
// without a delay by the lockTableWaiter, ensuring that the lock // timeout logic deterministically fires. // See (*lockTableWaiterImpl).timeUntilDeadline.
Does this mean we are not testing the doWithTimeoutAndFallback code?
pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 299 at r2 (raw file):
actually got to the point of proving that the pushee was active and began waiting in its txnwait.Queue
And it may not because of slowness elsewhere like network or cpu overload?
If you buy all this, I'll add some more commentary to spell this out in the code.
sounds good
pkg/kv/kvserver/concurrency/lock_table_waiter_test.go, line 701 at r6 (raw file):
h roachpb.Header, pushType roachpb.PushTxnType, ) (*roachpb.Transaction, *Error) {
never mind my earlier question about testing of the fallback path.
The test was not exercising the case where a pushee was still active during the PUSH_TOUCH issued by requests with an Error wait policy.
Informs cockroachdb#67513. This commit introduces a new `lock_timeout` field on the BatchRequest Header struct. lock_timeout specifies the maximum amount of time that the batch request will wait while attempting to acquire a lock on a key or while blocking on an existing lock in order to perform a non-locking read on a key. The time limit applies separately to each lock acquisition attempt. If the timeout elapses when waiting for a lock, a WriteIntentError will be returned. Unlike in some other systems like PostgreSQL, where non-locking reads do not wait on conflicting locks, in CockroachDB, non-locking reads do wait for conflicting locks to be released. Because of this, the lock_timeout configuration applies to non-locking reads in read-write and read-only transactions as well. Only the (default) Block wait policy will allow a request to wait on conflicting locks, so the timeout only makes sense in conjunction with the Block wait policy. The Error wait policy will throw an error immediately if a conflicting lock held by an active transaction is encountered, so this timeout can never be hit with an Error wait policy. A value of zero disables the timeout.
a0380c8 to
77405a5
Compare
nvb
left a comment
There was a problem hiding this comment.
sorry about the delay
Not a problem at all, you were away for a few weeks and this was a big PR. Thanks for getting through all of it!
bors r+
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15, @stevendanna, and @sumeerbhola)
pkg/kv/kvserver/concurrency/lock_table_waiter.go, line 299 at r2 (raw file):
And it may not because of slowness elsewhere like network or cpu overload?
Exactly. I added some commentary along these lines.
|
Build succeeded: |
Fixes cockroachdb#67513. In cockroachdb#68026, we added support for a new `lock_timeout` field on the `BatchRequest.Header` struct. A batch's lock_timeout specifies the maximum amount of time that it will wait while attempting to acquire a lock on a key or while blocking on an existing lock in order to perform a non-locking read on a key. The time limit applies separately to each lock acquisition attempt. If the timeout elapses when waiting for a lock, a WriteIntentError is returned. This extension to the KV API allows us to properly support the `lock_timeout` session variable in SQL, which matches these semantics closely. When configured, this session variable aborts any statement that waits longer than the specified amount of time while attempting to acquire a single lock. The only deviation from Postgres' behavior is that the lock_timeout configuration applies to non-locking reads in read-write and read-only transactions. This is because unlike in Postgres, where where non-locking reads do not wait on conflicting locks, in CockroachDB, non-locking reads do wait for conflicting locks to be released. Release note (sql change): The lock_timeout session variable is now supported. The configuration can be used to abort a query with an error if it waits longer than the configured duration while blocking on a single row-level lock acquisition.
68042: sql: support lock_timeout session variable r=nvanbenschoten a=nvanbenschoten Fixes #67513. Please ignore the first two commits, which are from #68026. In #68026, we added support for a new `lock_timeout` field on the `BatchRequest.Header` struct. A batch's lock_timeout specifies the maximum amount of time that it will wait while attempting to acquire a lock on a key or while blocking on an existing lock in order to perform a non-locking read on a key. The time limit applies separately to each lock acquisition attempt. If the timeout elapses when waiting for a lock, a WriteIntentError is returned. This extension to the KV API allows us to properly support the `lock_timeout` session variable in SQL, which matches these semantics closely. When configured, this session variable aborts any statement that waits longer than the specified amount of time while attempting to acquire a single lock. The only deviation from Postgres' behavior is that the lock_timeout configuration applies to non-locking reads in read-write and read-only transactions. This is because unlike in Postgres, where where non-locking reads do not wait on conflicting locks, in CockroachDB, non-locking reads do wait for conflicting locks to be released. Release note (sql change): The lock_timeout session variable is now supported. The configuration can be used to abort a query with an error if it waits longer than the configured duration while blocking on a single row-level lock acquisition. @jordanlewis I'm adding you for the review, but feel free to re-assign. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
68042: sql: support lock_timeout session variable r=nvanbenschoten a=nvanbenschoten Fixes #67513. Please ignore the first two commits, which are from #68026. In #68026, we added support for a new `lock_timeout` field on the `BatchRequest.Header` struct. A batch's lock_timeout specifies the maximum amount of time that it will wait while attempting to acquire a lock on a key or while blocking on an existing lock in order to perform a non-locking read on a key. The time limit applies separately to each lock acquisition attempt. If the timeout elapses when waiting for a lock, a WriteIntentError is returned. This extension to the KV API allows us to properly support the `lock_timeout` session variable in SQL, which matches these semantics closely. When configured, this session variable aborts any statement that waits longer than the specified amount of time while attempting to acquire a single lock. The only deviation from Postgres' behavior is that the lock_timeout configuration applies to non-locking reads in read-write and read-only transactions. This is because unlike in Postgres, where where non-locking reads do not wait on conflicting locks, in CockroachDB, non-locking reads do wait for conflicting locks to be released. Release note (sql change): The lock_timeout session variable is now supported. The configuration can be used to abort a query with an error if it waits longer than the configured duration while blocking on a single row-level lock acquisition. @jordanlewis I'm adding you for the review, but feel free to re-assign. 68313: kv: avoid regressions in closed timestamp sidetransportAccess r=nvanbenschoten a=nvanbenschoten This commit strengthens the guarantees of the `sidetransportAccess`, which now promises that the returned timestamp from its get method will never regress across sequential calls. This is made possible by retaining two closed timestamps in the access - a `cur` closed timestamp that is known to be applied and an optional `next` closed timestamp that is not yet applied. Using two timestamps ensures that we never lose information about an applied closed timestamp, so we can make the stronger guarantee to users of the abstraction that returned timestamp will always increase monotonically across consecutive calls. This was causing flakiness in tests added in #68194. The property-based tests in that PR were asserting (in one form or another) that after a `QueryResolvedTimestamp` request returns a resolved timestamp, a subsequent read-only request at that timestamp will never block on replication (i.e. the closed timestamp) or on conflicting transactions when executed. These tests were very rarely finding this not to be the case. The reason for this was that it was possible for the `sidetransportAccess` to regress across two sequential calls. cc. @andreimatei, but not assigning because you're away. 68553: util/log: start garbage collection routines for old log files r=knz a=rauchenstein This fixes a regression. The call the routine that deletes old log files when max-group-size is configured was incorrectly removed in a refactor, so that functionality was disabled. This change reinstates it. Release note (bug fix): File logs respect max-group-size configuration. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Jay Rauchenstein <rauchenstein@cockroachlabs.com>
Informs #67513.
This commit introduces a new
lock_timeoutfield on the BatchRequest Header struct.lock_timeout specifies the maximum amount of time that the batch request will wait while attempting to acquire a lock on a key or while blocking on an existing lock in order to perform a non-locking read on a key. The time limit applies separately to each lock acquisition attempt. If the timeout elapses when waiting for a lock, a WriteIntentError will be returned.
Unlike in some other systems like PostgreSQL, where non-locking reads do not wait on conflicting locks, in CockroachDB, non-locking reads do wait for conflicting locks to be released. Because of this, the lock_timeout configuration applies to non-locking reads in read-write and read-only transactions as well.
Only the (default) Block wait policy will allow a request to wait on conflicting locks, so the timeout only makes sense in conjunction with the Block wait policy. The Error wait policy will throw an error immediately if a conflicting lock held by an active transaction is encountered, so this timeout can never be hit with an Error wait policy.
A value of zero disables the timeout.