Skip to content

kv: support recovery from retry error with partial rollback under RC#105161

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/errReadCommitted
Aug 16, 2023
Merged

kv: support recovery from retry error with partial rollback under RC#105161
craig[bot] merged 2 commits intocockroachdb:masterfrom
nvb:nvanbenschoten/errReadCommitted

Conversation

@nvb
Copy link
Copy Markdown
Contributor

@nvb nvb commented Jun 20, 2023

Fixes #104233.

This commit adds support for recovering from a retry error without starting a transaction from the beginning under Read Committed isolation.

The commit does so by first adjusting how Read Committed transactions (or any isolation level that uses a per-statement read snapshot) react to retry errors. Instead of immediately advancing their epochs, they simply adjust their read snapshot to the timestamp of the retry error. The commit then introduces a new TxnMustRestartFromBeginning method on TransactionRetryWithProtoRefreshError, to inform callers whether the transaction's writes were discarded due to the retry error, meaning that it must restart from the beginning. When this method returns false, callers are permitted to employ savepoints to selectively discard the writes from the current statement and then call a new Txn.PrepareForPartialRetry API to escape the retryable error state and continue using the transaction.

Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@cockroach-teamcity
Copy link
Copy Markdown
Member

cockroach-teamcity commented Aug 10, 2023

CLA assistant check
All committers have signed the CLA.

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Aug 10, 2023

Thank you for updating your pull request.

Before a member of our team reviews your PR, I have some potential action items for you:

  • We notice you have more than one commit in your PR. We try break logical changes into separate commits, but commits such as "fix typo" or "address review commits" should be squashed into one commit and pushed with --force
  • Please ensure your git commit message contains a release note.
  • When CI has completed, please ensure no errors have appeared.

I have added a few people who may be able to assist in reviewing:

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Aug 10, 2023
@nvb nvb force-pushed the nvanbenschoten/errReadCommitted branch 2 times, most recently from 75d9fa3 to 92723d6 Compare August 10, 2023 15:50
Copy link
Copy Markdown
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/sender.go line 183 at r7 (raw file):

	// error that will cause the transaction to be retried.
	//
	// The transaction's epoch is bumped and its timestamp is upgraded to the

is it still true that the epoch is bumped with this interface if this is used in a RC transaction? the test i added in #107044 doesn't work right now since it seems that the epoch stays at 0.

Copy link
Copy Markdown
Contributor Author

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rafiss)


pkg/kv/sender.go line 183 at r7 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

is it still true that the epoch is bumped with this interface if this is used in a RC transaction? the test i added in #107044 doesn't work right now since it seems that the epoch stays at 0.

No, this is no longer true, now that we've made GenerateForcedRetryableErr generate a statement-level retryable error for RC transactions. An epoch bump implies that all of the writes in a transaction have been discarded and that the transaction must restart from the very beginning. Transactions can't retry a single statement to get around these.

Copy link
Copy Markdown
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)


pkg/kv/sender.go line 183 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

No, this is no longer true, now that we've made GenerateForcedRetryableErr generate a statement-level retryable error for RC transactions. An epoch bump implies that all of the writes in a transaction have been discarded and that the transaction must restart from the very beginning. Transactions can't retry a single statement to get around these.

ah ok, thanks, that's helpful for me to understand.

since inject_retry_errors_enabled is just a testing knob, i can do something else in the SQL layer to limit it to 3 injections at most for RC txns

@rafiss rafiss removed O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Aug 11, 2023
@nvb nvb force-pushed the nvanbenschoten/errReadCommitted branch from 5443e8a to 9286fcd Compare August 11, 2023 22:22
@nvb nvb requested a review from arulajmani August 11, 2023 22:25
@nvb nvb marked this pull request as ready for review August 11, 2023 22:27
@nvb nvb requested review from a team as code owners August 11, 2023 22:27
@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 11, 2023

@arulajmani this should now be good for a review. I've rebased away all of the intermediate cleanup that's already been landed on master, and added a few layers of testing to ensure that the savepoint+retry error dance works correctly. Let me know if there's anything you'd like to talk about.

@rafiss feel free to rebase #107044 on this. There shouldn't be much churn going forward.

@arulajmani
Copy link
Copy Markdown
Collaborator

@arulajmani this should now be good for a review. I've rebased away all of the intermediate cleanup that's already been landed on master, and added a few layers of testing to ensure that the savepoint+retry error dance works correctly. Let me know if there's anything you'd like to talk about.

I'll give this a look Monday morning and maybe bother you with questions if I have any.

@nvb nvb force-pushed the nvanbenschoten/errReadCommitted branch from 9286fcd to 2082462 Compare August 15, 2023 14:06
Copy link
Copy Markdown
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Thanks for walking me through this, made reviewing this a breeze! :shipit:

Reviewed 1 of 8 files at r6, 18 of 18 files at r10, 17 of 17 files at r11, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @rafiss)


pkg/kv/txn.go line 1046 at r11 (raw file):

	if !retryErr.TxnMustRestartFromBeginning() {
		// If the retry error does not require the transaction to restart from

Should we throw in a verbose log in this case?


pkg/kv/kvclient/kvcoord/txn_coord_sender.go line 868 at r11 (raw file):

	tc.mu.storedRetryableErr = retErr

	// If the ID changed, it means we had to start a new transaction and the

nit: If the previous transaction is aborted, it means ...


pkg/kv/kvpb/errors.proto line 479 at r11 (raw file):

  // The ID of the transaction being retried. The client can compare this ID
  // that of the new transaction (below) to check whether the transaction was
  // aborted and will restart from the beginning with a new identity.

This commentary cleanup feels long overdue. Thanks for doing it!

Did we want to preserve some of the old commentary around making sure errors from inner transactions are handled correctly when running nested transactions? On one hand this is pretty niche, but it's also subtle usage we need to guard against. Up to you.


pkg/kv/kvpb/errors.proto line 504 at r11 (raw file):

  // - the same Transaction as before (same ID) with the same epoch as before,
  // but with an incremented read timestamp. The transaction can proceed without
  // the loss of writes, although clients may employ savepoints to selectively

nit: "without the loss of any writes"


pkg/kv/kvnemesis/validator_test.go line 38 at r11 (raw file):

var retryableError = kvpb.NewTransactionRetryWithProtoRefreshError(
	``, uuid.MakeV4(), 0, roachpb.Transaction{})

nit: /* prevTxnEpoch */


pkg/kv/kvpb/data_test.go line 240 at r11 (raw file):

		},
	}
	isolation.RunEachLevel(t, func(t *testing.T, isoLevel isolation.Level) {

I may be missing something, but this isoLevel field looks unused. Did you mean to create the txn inside of this loop and inject it into the test cases?


pkg/kv/kvserver/reports/reporter_test.go line 465 at r11 (raw file):

		var err error
		err = kvpb.NewTransactionRetryWithProtoRefreshError(
			"injected err", uuid.Nil, 0, roachpb.Transaction{})

nit: /* prevTxnEpoch */

nvb added 2 commits August 16, 2023 00:50
…ransaction

Also rename `TxnID` to `PrevTxnID`.

Pure renames.
Fixes cockroachdb#104233.

This commit adds support for recovering from a retry error without starting a
transaction from the beginning under Read Committed isolation.

The commit does so by first adjusting how Read Committed transactions (or any
isolation level that uses a per-statement read snapshot) react to retry errors.
Instead of immediately advancing their epochs, they simply adjust their read
snapshot to the timestamp of the retry error. The commit then introduces a new
`TxnMustRestartFromBeginning` method on `TransactionRetryWithProtoRefreshError`,
to inform callers whether the transaction's writes were discarded due to the
retry error, meaning that it must restart from the beginning. When this method
returns false, callers are permitted to employ savepoints to selectively
discard the writes from the current statement and then call a new
`Txn.PrepareForPartialRetry` API to escape the retryable error state and
continue using the transaction.

Release note: None
@nvb nvb force-pushed the nvanbenschoten/errReadCommitted branch from 2082462 to 1e1a063 Compare August 16, 2023 05:07
Copy link
Copy Markdown
Contributor Author

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani and @rafiss)


pkg/kv/txn.go line 1046 at r11 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Should we throw in a verbose log in this case?

Done.


pkg/kv/kvclient/kvcoord/txn_coord_sender.go line 868 at r11 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: If the previous transaction is aborted, it means ...

Done.


pkg/kv/kvpb/errors.proto line 479 at r11 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

This commentary cleanup feels long overdue. Thanks for doing it!

Did we want to preserve some of the old commentary around making sure errors from inner transactions are handled correctly when running nested transactions? On one hand this is pretty niche, but it's also subtle usage we need to guard against. Up to you.

I'm ok getting rid of it. I don't think this is real concern these days.


pkg/kv/kvpb/errors.proto line 504 at r11 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: "without the loss of any writes"

Done.


pkg/kv/kvnemesis/validator_test.go line 38 at r11 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: /* prevTxnEpoch */

Done.


pkg/kv/kvpb/data_test.go line 240 at r11 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

I may be missing something, but this isoLevel field looks unused. Did you mean to create the txn inside of this loop and inject it into the test cases?

Yep! Nice catch. Done.


pkg/kv/kvserver/reports/reporter_test.go line 465 at r11 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

nit: /* prevTxnEpoch */

Done.

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 16, 2023

bors r=arulajmani

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 16, 2023

Build failed:

@nvb
Copy link
Copy Markdown
Contributor Author

nvb commented Aug 16, 2023

That was green, bors.

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 16, 2023

Build succeeded:

@craig craig bot merged commit 126274a into cockroachdb:master Aug 16, 2023
@nvb nvb deleted the nvanbenschoten/errReadCommitted branch August 16, 2023 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kv: support savepoint rollback recovery from retry error under RC

4 participants