sql: add a retry loop for stmts in READ COMMITTED txns#107044
sql: add a retry loop for stmts in READ COMMITTED txns#107044craig[bot] merged 3 commits intocockroachdb:masterfrom
Conversation
1d1d2d9 to
0b0b287
Compare
0b0b287 to
633add0
Compare
fqazi
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r1, 3 of 3 files at r2, 5 of 5 files at r3, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @nvanbenschoten)
633add0 to
9c30984
Compare
pkg/sql/vars.go
Outdated
| // CockroachDB extension. Configures the maximum number of automatic retries | ||
| // to perform for statements in explicit READ COMMITTED transactions that | ||
| // see a transaction retry error. | ||
| `max_retries_for_read_committed_transactions`: { |
There was a problem hiding this comment.
how about a cluster setting for this purpose?
There was a problem hiding this comment.
i lean towards session variable, since cluster settings are generally intended for usage by operators, and require higher permissions on the cluster. also, in this case, it is reasonable that different workloads may want different values here, since it depends on how much contention they see.
michae2
left a comment
There was a problem hiding this comment.
Nice work!
Reviewed 1 of 1 files at r1, 3 of 3 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @chengxiong-ruan, @fqazi, @nvanbenschoten, and @rafiss)
pkg/sql/conn_executor_exec.go line 963 at r2 (raw file):
if ex.state.mu.txn.IsOpen() && ex.state.mu.txn.IsoLevel() == isolation.ReadCommitted && !ex.implicitTxn() &&
Do we also want to defer to the state machine retry for the first statement of explicit transactions?
pkg/sql/conn_executor_exec.go line 980 at r2 (raw file):
} maxExecCount := 1
Rather than going through this loop codepath in every case, I think it would be cleaner to put the retry loop in a separate conn executor function that wraps dispatchToExecutionEngine and is only called for read committed transactions. That will do two things: (a) get rid of some of the conditional logic, and (b) add a function to the stack that will be visible when debugging which could help us know we're in a read committed transaction.
pkg/sql/conn_executor_exec.go line 987 at r2 (raw file):
} for attemptNum := 0; attemptNum < maxExecCount; attemptNum++ {
Does this loop need some kind of (randomized) backoff? For example, I think two update statements could both conflict with each other and both need to retry repeatedly.
pkg/sql/conn_executor_exec.go line 1005 at r2 (raw file):
break } if !errIsRetriable(maybeRetriableErr) || attemptNum == maxRetries {
Does errIsRetriable return true for any errors that need a full transaction retry (rather than just a statement retry)? If so, we might be performing extra useless statement retries in those cases, when we should go straight to a transaction-level retry.
pkg/sql/vars.go line 2034 at r4 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
i lean towards session variable, since cluster settings are generally intended for usage by operators, and require higher permissions on the cluster. also, in this case, it is reasonable that different workloads may want different values here, since it depends on how much contention they see.
bikeshedding: I understand that "_transactions" refers to the fact that the whole transaction is at read-committed isolation, but find it a little confusing for this setting. I think it would be clearer to leave it off or change it to "_statements".
rafiss
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @chengxiong-ruan, @fqazi, and @nvanbenschoten)
pkg/sql/conn_executor_exec.go line 1019 at r4 (raw file):
} res.SetError(nil) ex.state.mu.txn.PrepareForRetry(ctx)
we should wait for kv: remove TxnCoordSender.PrepareRetryableError, rationalize ManualRestart to complete.
9c30984 to
0c74392
Compare
|
For some reason I can't type into Reviewable, but replying here:
I don't think so - the state machine doesn't do retries if the
Very great idea. Done, and hopefully it's simpler to read now.
Great catch. I've now rebased on top of #105161 so that I can check with
I just removed "_transactions" from the end now. |
0c74392 to
f4f17c6
Compare
f4f17c6 to
2e3268d
Compare
f81d19e to
7d1895f
Compare
4e7f87e to
6be2709
Compare
|
tftrs! bors r+ |
|
Build failed: |
This test ensures that we do not do per-statement retries for READ COMMITTED transactions if results were already sent to the client. Release note: None
6be2709 to
0e57591
Compare
|
bors r+ |
|
Build failed: |
|
flake was: bors r+ |
|
Build failed: |
|
bors r+ |
|
Build succeeded: |
147682: sql: increase observability of automatic statement retries r=yuzefovich a=michae2 In #107044 we added statement-level automatic retries to the conn executor. These retries are used under Read Committed isolation to try to prevent retryable errors from escaping to the application. This PR adds more observability to statement-level retries, in some cases bringing it up to par with transaction-level retries. See individual commits for details. Informs: #145377 Release note: None Co-authored-by: Michael Erickson <michae2@cockroachlabs.com>
In cockroachdb#107044 we added a retry loop for individual statements executing under Read Committed isolation. We chose 10 as the default retry limit, but testing has shown that this is too low for high-contention workloads. (By comparison, automatic transaction retries in conn_fsm are unlimited.) Increase the default limit to 100. Informs: cockroachdb#145377 Release note (sql change): This change increases the default value for the `max_retries_for_read_committed` session variable from 10 to 100. Testing has shown that some high-contention workloads running under Read Committed isolation benefit from more statement retries.
In cockroachdb#107044 we added a retry loop for individual statements executing under Read Committed isolation. We chose 10 as the default retry limit, but testing has shown that this is too low for high-contention workloads. (By comparison, automatic transaction retries in conn_fsm are unlimited.) Increase the default limit to 100. Informs: cockroachdb#145377 Release note (sql change): This change increases the default value for the `max_retries_for_read_committed` session variable from 10 to 100. Testing has shown that some high-contention workloads running under Read Committed isolation benefit from more statement retries.
fixes #100145
This new loop retries individual statements inside an explicit READ
COMMITTED transaction. This is possible because each statement in a READ
COMMITTED transaction has a different read timestamp. The conn executor
already has retry logic for all implicit transactions, and we continue to
use those when possible.
A session setting controls how many retries will be performed for a statement
inside of an explicit READ COMMITTED transaction.
Release note (sql change): Added the
max_retries_for_read_committed session variable. It
defaults to 10, and determines the number of times an individual
statement in an explicit READ COMMITTED transaction will be retried
if it encounters a retriable transaction error.