kv: support recovery from retry error with partial rollback under RC#105161
kv: support recovery from retry error with partial rollback under RC#105161craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
f8b3d48 to
75d9fa3
Compare
|
Thank you for updating your pull request. Before a member of our team reviews your PR, I have some potential action items for you:
I have added a few people who may be able to assist in reviewing:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
75d9fa3 to
92723d6
Compare
rafiss
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/sender.go line 183 at r7 (raw file):
// error that will cause the transaction to be retried. // // The transaction's epoch is bumped and its timestamp is upgraded to the
is it still true that the epoch is bumped with this interface if this is used in a RC transaction? the test i added in #107044 doesn't work right now since it seems that the epoch stays at 0.
nvb
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @rafiss)
pkg/kv/sender.go line 183 at r7 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
is it still true that the epoch is bumped with this interface if this is used in a RC transaction? the test i added in #107044 doesn't work right now since it seems that the epoch stays at 0.
No, this is no longer true, now that we've made GenerateForcedRetryableErr generate a statement-level retryable error for RC transactions. An epoch bump implies that all of the writes in a transaction have been discarded and that the transaction must restart from the very beginning. Transactions can't retry a single statement to get around these.
rafiss
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/kv/sender.go line 183 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
No, this is no longer true, now that we've made
GenerateForcedRetryableErrgenerate a statement-level retryable error for RC transactions. An epoch bump implies that all of the writes in a transaction have been discarded and that the transaction must restart from the very beginning. Transactions can't retry a single statement to get around these.
ah ok, thanks, that's helpful for me to understand.
since inject_retry_errors_enabled is just a testing knob, i can do something else in the SQL layer to limit it to 3 injections at most for RC txns
5443e8a to
9286fcd
Compare
|
@arulajmani this should now be good for a review. I've rebased away all of the intermediate cleanup that's already been landed on master, and added a few layers of testing to ensure that the savepoint+retry error dance works correctly. Let me know if there's anything you'd like to talk about. @rafiss feel free to rebase #107044 on this. There shouldn't be much churn going forward. |
I'll give this a look Monday morning and maybe bother you with questions if I have any. |
9286fcd to
2082462
Compare
arulajmani
left a comment
There was a problem hiding this comment.
Thanks for walking me through this, made reviewing this a breeze! ![]()
Reviewed 1 of 8 files at r6, 18 of 18 files at r10, 17 of 17 files at r11, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten and @rafiss)
pkg/kv/txn.go line 1046 at r11 (raw file):
if !retryErr.TxnMustRestartFromBeginning() { // If the retry error does not require the transaction to restart from
Should we throw in a verbose log in this case?
pkg/kv/kvclient/kvcoord/txn_coord_sender.go line 868 at r11 (raw file):
tc.mu.storedRetryableErr = retErr // If the ID changed, it means we had to start a new transaction and the
nit: If the previous transaction is aborted, it means ...
pkg/kv/kvpb/errors.proto line 479 at r11 (raw file):
// The ID of the transaction being retried. The client can compare this ID // that of the new transaction (below) to check whether the transaction was // aborted and will restart from the beginning with a new identity.
This commentary cleanup feels long overdue. Thanks for doing it!
Did we want to preserve some of the old commentary around making sure errors from inner transactions are handled correctly when running nested transactions? On one hand this is pretty niche, but it's also subtle usage we need to guard against. Up to you.
pkg/kv/kvpb/errors.proto line 504 at r11 (raw file):
// - the same Transaction as before (same ID) with the same epoch as before, // but with an incremented read timestamp. The transaction can proceed without // the loss of writes, although clients may employ savepoints to selectively
nit: "without the loss of any writes"
pkg/kv/kvnemesis/validator_test.go line 38 at r11 (raw file):
var retryableError = kvpb.NewTransactionRetryWithProtoRefreshError( ``, uuid.MakeV4(), 0, roachpb.Transaction{})
nit: /* prevTxnEpoch */
pkg/kv/kvpb/data_test.go line 240 at r11 (raw file):
}, } isolation.RunEachLevel(t, func(t *testing.T, isoLevel isolation.Level) {
I may be missing something, but this isoLevel field looks unused. Did you mean to create the txn inside of this loop and inject it into the test cases?
pkg/kv/kvserver/reports/reporter_test.go line 465 at r11 (raw file):
var err error err = kvpb.NewTransactionRetryWithProtoRefreshError( "injected err", uuid.Nil, 0, roachpb.Transaction{})
nit: /* prevTxnEpoch */
…ransaction Also rename `TxnID` to `PrevTxnID`. Pure renames.
Fixes cockroachdb#104233. This commit adds support for recovering from a retry error without starting a transaction from the beginning under Read Committed isolation. The commit does so by first adjusting how Read Committed transactions (or any isolation level that uses a per-statement read snapshot) react to retry errors. Instead of immediately advancing their epochs, they simply adjust their read snapshot to the timestamp of the retry error. The commit then introduces a new `TxnMustRestartFromBeginning` method on `TransactionRetryWithProtoRefreshError`, to inform callers whether the transaction's writes were discarded due to the retry error, meaning that it must restart from the beginning. When this method returns false, callers are permitted to employ savepoints to selectively discard the writes from the current statement and then call a new `Txn.PrepareForPartialRetry` API to escape the retryable error state and continue using the transaction. Release note: None
2082462 to
1e1a063
Compare
nvb
left a comment
There was a problem hiding this comment.
TFTR!
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @arulajmani and @rafiss)
pkg/kv/txn.go line 1046 at r11 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
Should we throw in a verbose log in this case?
Done.
pkg/kv/kvclient/kvcoord/txn_coord_sender.go line 868 at r11 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: If the previous transaction is aborted, it means ...
Done.
pkg/kv/kvpb/errors.proto line 479 at r11 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
This commentary cleanup feels long overdue. Thanks for doing it!
Did we want to preserve some of the old commentary around making sure errors from inner transactions are handled correctly when running nested transactions? On one hand this is pretty niche, but it's also subtle usage we need to guard against. Up to you.
I'm ok getting rid of it. I don't think this is real concern these days.
pkg/kv/kvpb/errors.proto line 504 at r11 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: "without the loss of any writes"
Done.
pkg/kv/kvnemesis/validator_test.go line 38 at r11 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: /* prevTxnEpoch */
Done.
pkg/kv/kvpb/data_test.go line 240 at r11 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
I may be missing something, but this isoLevel field looks unused. Did you mean to create the txn inside of this loop and inject it into the test cases?
Yep! Nice catch. Done.
pkg/kv/kvserver/reports/reporter_test.go line 465 at r11 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: /* prevTxnEpoch */
Done.
|
bors r=arulajmani |
|
Build failed: |
|
That was green, bors. bors r+ |
|
Build succeeded: |
Fixes #104233.
This commit adds support for recovering from a retry error without starting a transaction from the beginning under Read Committed isolation.
The commit does so by first adjusting how Read Committed transactions (or any isolation level that uses a per-statement read snapshot) react to retry errors. Instead of immediately advancing their epochs, they simply adjust their read snapshot to the timestamp of the retry error. The commit then introduces a new
TxnMustRestartFromBeginningmethod onTransactionRetryWithProtoRefreshError, to inform callers whether the transaction's writes were discarded due to the retry error, meaning that it must restart from the beginning. When this method returns false, callers are permitted to employ savepoints to selectively discard the writes from the current statement and then call a newTxn.PrepareForPartialRetryAPI to escape the retryable error state and continue using the transaction.Release note: None