Skip to content

workload/tpcc: optionally disable txn retry loops#117096

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/tpccNoRetryLoop
Jan 2, 2024
Merged

workload/tpcc: optionally disable txn retry loops#117096
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/tpccNoRetryLoop

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Dec 26, 2023

Informs #115191.

This commit adds a --txn-retries flag to tpcc, allowing users of the workload to disable transaction retry loops for 40001 errors.

For Serializable transactions (controllable with --isolation-level), this quickly leads to errors being thrown by the workload. For Read Committed transaction, the workload eventually hits an error due to a lease transfer. Combined with a prototype fix for #61986, Read Committed transactions survive for at least a few minutes without error.

Release note: None

Informs cockroachdb#115191.

This commit adds a `--txn-retries` flag to `tpcc`, allowing users of the
workload to disable transaction retry loops for 40001 errors.

For Serializable transactions (controllable with `--isolation-level`),
this quickly leads to errors being thrown by the workload. For Read
Committed transaction, the workload eventually hits an error due to
a lease transfer. Combined with a prototype fix for cockroachdb#61986, Read
Committed transactions survive for at least a few minutes without
error.

Release note: None
@nvb nvb requested review from arulajmani and michae2 December 26, 2023 22:11
@nvb nvb requested a review from a team as a code owner December 26, 2023 22:11
@nvb nvb requested review from renatolabs and srosenberg and removed request for a team December 26, 2023 22:11
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Collaborator

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Good idea. Should we add a roachtest that tries to run TPC-C under RC without retries as long as possible without an error?

Read Committed transactions survive for at least a few minutes without error.

Hmm. A few minutes is better, of course, but it seems like applications will probably still need to use client-side retries for 23.2? I left a few comments to this effect on cockroachdb/docs#18098 (review)

Reviewed 6 of 6 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani, @nvanbenschoten, @renatolabs, and @srosenberg)


pkg/workload/tpcc/tpcc.go line 910 at r1 (raw file):

	txOpts := pgx.TxOptions{}
	if w.txnRetries {
		return crdbpgx.ExecuteTx(ctx, conn, txOpts, fn)

@rafiss: Do you think we should add a variant of crdbpgx.ExecuteTx that doesn't use client-side retries? Or some kind of configuration helper? I guess there is a crdb.WithMaxRetries, but it assumes 0 to mean "infinite retries".

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Jan 2, 2024

TFTR!

bors r=michae2

Should we add a roachtest that tries to run TPC-C under RC without retries as long as possible without an error?

Yes, I think that's a good idea. I mentioned it in #115191.

Hmm. A few minutes is better, of course, but it seems like applications will probably still need to use client-side retries for 23.2?

Sorry, to clarify:

  1. kv: use tscache summary to eliminate txn retries due to lease transfers and range merges #61986 is a non-trivial project and won't make it into v23.2. Without it, certain lease transfers (those on ranges that txns use to store their records) are likely to cause a few retry errors.
  2. with a prototype of kv: use tscache summary to eliminate txn retries due to lease transfers and range merges #61986 that I have, I haven't seen any retry errors in the few minutes of testing (with rapid lease transfers) that I was testing.
  3. give 1 and 2 here, I think we should prioritize this project for v24.1.

Do you think we should add a variant of crdbpgx.ExecuteTx that doesn't use client-side retries? Or some kind of configuration helper? I guess there is a crdb.WithMaxRetries, but it assumes 0 to mean "infinite retries".

I looked into crdb.WithMaxRetries for this, but there was no way to disable all retries.

It's also worth pointing out that when retries are omitted, we can avoid the SAVEPOINT cockroach_restart and RELEASE SAVEPOINT cockroach_restart statements in each transaction, which will provide a (to be quantified) performance win.

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 2, 2024

Build succeeded!

And happy new year! 🎉

@craig craig bot merged commit a378e2e into cockroachdb:master Jan 2, 2024
nvb added a commit to nvb/cockroach that referenced this pull request Jan 19, 2024
Closes cockroachdb#115191.
Depends on cockroachdb#61986.

This commit switches the two nightly Read Committed variants of the TPC-C
roachtest to run without transaction retry loops, using the `--txn-retries` flag
introduced in cockroachdb#117096. With cockroachdb#117630 and cockroachdb#61986 resolved (the latter of which
is still in review and under development), these tests both pass.

Release note: None
nvb added a commit to nvb/cockroach that referenced this pull request Jan 24, 2024
Closes cockroachdb#115191.
Depends on cockroachdb#61986.

This commit switches the two nightly Read Committed variants of the TPC-C
roachtest to run without transaction retry loops, using the `--txn-retries` flag
introduced in cockroachdb#117096. With cockroachdb#117630 and cockroachdb#61986 resolved (the latter of which
is still in review and under development), these tests both pass.

Release note: None
craig bot pushed a commit that referenced this pull request Feb 2, 2024
118000: roachtest: run Read Committed variants of TPC-C without txn retry loops r=nvanbenschoten a=nvanbenschoten

Closes #115191.
Depends on #61986.

This commit switches the two nightly Read Committed variants of the TPC-C roachtest to run without transaction retry loops, using the `--txn-retries` flag introduced in #117096. With #117630 and #61986 resolved (the latter of which is still in review and under development), these tests both pass.

Release note: None

118600: gceworker bootstrap: fix checksum for go download r=rickystewart a=jlinder

On the last go version upgrade to cockroach, the wrong checksum was entered in the gceworker bootstrap script for the downloaded tar file. This fixes it to be the correct checksum.

Epic: none
Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: James H. Linder <jamesl@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants