Skip to content

sql: support lock_timeout session variable#68042

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/lockTimeoutSQL
Aug 10, 2021
Merged

sql: support lock_timeout session variable#68042
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/lockTimeoutSQL

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Jul 25, 2021

Fixes #67513.

Please ignore the first two commits, which are from #68026.

In #68026, we added support for a new lock_timeout field on the BatchRequest.Header struct.

A batch's lock_timeout specifies the maximum amount of time that it will wait while attempting to acquire a lock on a key or while blocking on an existing lock in order to perform a non-locking read on a key. The time limit applies separately to each lock acquisition attempt. If the timeout elapses when waiting for a lock, a WriteIntentError is returned.

This extension to the KV API allows us to properly support the lock_timeout session variable in SQL, which matches these semantics closely. When configured, this session variable aborts any statement that waits longer than the specified amount of time while attempting to acquire a single lock.

The only deviation from Postgres' behavior is that the lock_timeout configuration applies to non-locking reads in read-write and read-only transactions. This is because unlike in Postgres, where where non-locking reads do not wait on conflicting locks, in CockroachDB, non-locking reads do wait for conflicting locks to be released.

Release note (sql change): The lock_timeout session variable is now supported. The configuration can be used to abort a query with an error if it waits longer than the configured duration while blocking on a single row-level lock acquisition.

@jordanlewis I'm adding you for the review, but feel free to re-assign.

@nvb nvb requested review from a team and jordanlewis July 25, 2021 23:17
@nvb nvb requested a review from a team as a code owner July 25, 2021 23:17
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@nvb nvb force-pushed the nvanbenschoten/lockTimeoutSQL branch 3 times, most recently from 9d003c5 to b3fa491 Compare July 26, 2021 07:03
@jordanlewis jordanlewis requested review from a team and removed request for a team July 26, 2021 13:13
@jordanlewis
Copy link
Copy Markdown
Member

jordanlewis commented Jul 26, 2021

Awesome! I feel like most of the salient changes here are in the KV interface, no? I think we'd want a KV reviewer too.

One thing that I'd like to call to @rafiss's attention:

  // lock_timeout specifies the maximum amount of time that the batch request
  // will wait while attempting to acquire a lock on a key or while blocking on
  // an existing lock in order to perform a non-locking read on a key. The time
  // limit applies separately to each lock acquisition attempt. If the timeout
  // elapses when waiting for a lock, a WriteIntentError will be returned.
  //
  // Unlike in some other systems like PostgreSQL, where non-locking reads do
  // not wait on conflicting locks, in CockroachDB, non-locking reads do wait
  // for conflicting locks to be released. Because of this, the lock_timeout
  // configuration applies to non-locking reads in read-write and read-only
  // transactions as well.

This is a different definition than Postgres's lock_timeout. How much trouble do you think that'll put us in?

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Jul 26, 2021

I feel like most of the salient changes here are in the KV interface, no? I think we'd want a KV reviewer too.

Absolutely, this cuts across SQL and KV, so we'll want a KV reviewer on the first two commits. I had pulled those out into #68026 so they can be reviewed separately. This PR is really just about the third commit. But also, there's no rush on this, so no need to review before #68026 is merged and the first two commits are rebased away.

This is a different definition than Postgres's lock_timeout. How much trouble do you think that'll put us in?

If anything, I was thinking that not deviating from Postgres here would put us in trouble. It would lead to behavior where the lock_timeout would apply to queries like SELECT * FROM t FOR UPDATE but not to queries like SELECT * FROM t. So the former would abort when stuck on a lock for longer than the timeout, while the latter would hang indefinitely. I'm actually imagining the deviation to be one of the biggest reasons why people will want to use this because people are often surprised that reads block on writes in CRDB. But I'd love to get others' opinions.

@jordanlewis
Copy link
Copy Markdown
Member

Oh, sorry, I ignored your disclaimer about the first 2 commit by accident. Makes sense.

Copy link
Copy Markdown
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a pretty useful feature! The reasoning about the deviation from Postgres sgtm, and the last commit :lgtm:, but I'd like to hear Rafi's opinion about the deviation too.

Reviewed 32 of 32 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis)

@rafiss
Copy link
Copy Markdown
Collaborator

rafiss commented Jul 27, 2021

One thing that I'd like to call to @rafiss's attention:

  // lock_timeout specifies the maximum amount of time that the batch request
  // will wait while attempting to acquire a lock on a key or while blocking on
  // an existing lock in order to perform a non-locking read on a key. The time
  // limit applies separately to each lock acquisition attempt. If the timeout
  // elapses when waiting for a lock, a WriteIntentError will be returned.
  //
  // Unlike in some other systems like PostgreSQL, where non-locking reads do
  // not wait on conflicting locks, in CockroachDB, non-locking reads do wait
  // for conflicting locks to be released. Because of this, the lock_timeout
  // configuration applies to non-locking reads in read-write and read-only
  // transactions as well.

This is a different definition than Postgres's lock_timeout. How much trouble do you think that'll put us in?

Is the concern that people may be surprised to see non-locking read transactions get cancelled due to lock_timeout?

I don't know either way if this is a significant/painful divergence. My personal feeling based on no data is that this divergence is acceptable.

One thing that might go a long way to making it more acceptable is if we make two changes in the message:

  • a different message for the two cases you describe: i.e., "attempting to acquire a lock" versus "blocking on an existing lock for a non-locking read"
  • return an error hint that links to documentation about CockroachDB locking behavior. or even better, documentation about what the user should do to resolve this problem.

Fixes cockroachdb#67513.

In cockroachdb#68026, we added support for a new `lock_timeout` field on the
`BatchRequest.Header` struct.

A batch's lock_timeout specifies the maximum amount of time that it will
wait while attempting to acquire a lock on a key or while blocking on an
existing lock in order to perform a non-locking read on a key. The time
limit applies separately to each lock acquisition attempt. If the
timeout elapses when waiting for a lock, a WriteIntentError is returned.

This extension to the KV API allows us to properly support the
`lock_timeout` session variable in SQL, which matches these semantics
closely. When configured, this session variable aborts any statement
that waits longer than the specified amount of time while attempting to
acquire a single lock.

The only deviation from Postgres' behavior is that the lock_timeout
configuration applies to non-locking reads in read-write and read-only
transactions. This is because unlike in Postgres, where where
non-locking reads do not wait on conflicting locks, in CockroachDB,
non-locking reads do wait for conflicting locks to be released.

Release note (sql change): The lock_timeout session variable is now
supported. The configuration can be used to abort a query with an error
if it waits longer than the configured duration while blocking on a
single row-level lock acquisition.
@nvb nvb force-pushed the nvanbenschoten/lockTimeoutSQL branch from b3fa491 to 38da713 Compare August 10, 2021 03:08
@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 10, 2021

Is the concern that people may be surprised to see non-locking read transactions get cancelled due to lock_timeout?

My guess is that no more than those who may be surprised to see non-locking reads block on locks. And once people understand that non-locking reads block on locks (which isn't crazy in serializable isolation, it's just not how PG works), they'll be happy that this setting applies there as well.

a different message for the two cases you describe: i.e., "attempting to acquire a lock" versus "blocking on an existing lock for a non-locking read"

Good idea. I removed the "while locking row" language of the error message and made it more general.

return an error hint that links to documentation about CockroachDB locking behavior. or even better, documentation about what the user should do to resolve this problem.

This is also a great idea! Unfortunately, it doesn't look like we have any documentation about CockroachDB's locking behavior. cc. @rmloveland.

bors r+

craig bot pushed a commit that referenced this pull request Aug 10, 2021
68042: sql: support lock_timeout session variable r=nvanbenschoten a=nvanbenschoten

Fixes #67513.

Please ignore the first two commits, which are from #68026.

In #68026, we added support for a new `lock_timeout` field on the `BatchRequest.Header` struct. 

A batch's lock_timeout specifies the maximum amount of time that it will wait while attempting to acquire a lock on a key or while blocking on an existing lock in order to perform a non-locking read on a key. The time limit applies separately to each lock acquisition attempt. If the timeout elapses when waiting for a lock, a WriteIntentError is returned.

This extension to the KV API allows us to properly support the `lock_timeout` session variable in SQL, which matches these semantics closely. When configured, this session variable aborts any statement that waits longer than the specified amount of time while attempting to acquire a single lock.

The only deviation from Postgres' behavior is that the lock_timeout configuration applies to non-locking reads in read-write and read-only transactions. This is because unlike in Postgres, where where non-locking reads do not wait on conflicting locks, in CockroachDB, non-locking reads do wait for conflicting locks to be released.

Release note (sql change): The lock_timeout session variable is now supported. The configuration can be used to abort a query with an error if it waits longer than the configured duration while blocking on a single row-level lock acquisition.

@jordanlewis I'm adding you for the review, but feel free to re-assign.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 10, 2021

Timed out.

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 10, 2021

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 10, 2021

Build succeeded:

@craig craig bot merged commit 679ee6e into cockroachdb:master Aug 10, 2021
@nvb nvb deleted the nvanbenschoten/lockTimeoutSQL branch August 10, 2021 20:55
@rmloveland
Copy link
Copy Markdown
Collaborator

Unfortunately, it doesn't look like we have any documentation about CockroachDB's locking behavior

@nvanbenschoten we have some stuff on concurrency control that was added to the Transaction Layer architecture docs in cockroachdb/docs#7020

But given that this PR adds a session var, do you mean that we should add something focused on SQL users? E.g., something similar to the PG locking docs? If so I can create an issue, just wanted to be sure I understand ~what is needed

Right now I think the only things on locking we have in SQL user docs are a few mentions scattered around Indexes, Oracle migration, and the SELECT FOR UPDATE docs.

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 13, 2021

E.g., something similar to the PG locking docs? If so I can create an issue, just wanted to be sure I understand ~what is needed

I think this would be a fantastic addition! I also think our discussion about "Transactions, locking, and concurrency control" on the Oracle migration doc is great. Have we considered adding a similar discussion to the Postgres migration page? I imagine that that would be a useful resource for people comparing Cockroach's behavior to Postgres'.

But given that this PR adds a session var, do you mean that we should add something focused on SQL users?

That all said, I don't think improving the documentation on locking is a firm prerequisite to documenting this new session var.

@rmloveland
Copy link
Copy Markdown
Collaborator

rmloveland commented Aug 23, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql/kv: implement lock_timeout session variable

6 participants