kv: preemptively refresh transaction timestamps#52884
kv: preemptively refresh transaction timestamps#52884craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
1ae783e to
924c4ba
Compare
Fixes cockroachdb#51294. First two commits from cockroachdb#52884. This commit updates the txnSpanRefresher to split off EndTxn requests into their own partial batches on auto-retries after successful refreshes as a means of preventing starvation. This avoids starvation in two ways. First, it helps ensure that we lay down intents if any of the other requests in the batch are writes. Second, it ensures that if any writes are getting pushed due to contention with reads or due to the closed timestamp, they will still succeed and allow the batch to make forward progress. Without this, each retry attempt may get pushed because of writes in the batch and then rejected wholesale when the EndTxn tries to evaluate the pushed batch. When split, the writes will be pushed but succeed, the transaction will be refreshed, and the EndTxn will succeed. I still need to confirm that this fixes this indefinite stall [here](https://github.com/cockroachlabs/misc_projects_glenn/tree/master/rw_blockage#implicit-query-hangs--explict-query-works), but I suspect that it will. Release note (bug fix): A change in v20.1 caused a certain class of bulk UPDATEs and DELETE statements to hang indefinitely if run in an implicit transaction. We now break up these statements to avoid starvation and prevent them from hanging indefinitely.
andreimatei
left a comment
There was a problem hiding this comment.
Good stuff
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)
pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go, line 168 at r2 (raw file):
ba.CanForwardReadTimestamp = sr.canForwardReadTimestampWithoutRefresh(ba.Txn) // If we know that the transaction will need a refresh at some point because
As long as we're writing a long comment here (which I hope can be hidden inside maybeRefreshPreemptively(); see the following comment), mind also putting a TODO(andrei) explaining that whether or not we can still auto-retry at the SQL level should also play a role in deciding whether we want to refresh eagerly or not.
pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go, line 395 at r2 (raw file):
retryErr = preemptiveWriteTooOldErr } pErr := roachpb.NewErrorWithTxn(retryErr, ba.Txn)
I find it weird that we're creating this error here before we know that we need it (although I appreciate that we don't allocate). I'm also bothered by the fact that only a small part of CanTransactionRetryAtRefreshedTimestamp() seems to apply; that function doesn't seem to fit well here. For one, the no-op handling of the synthetic WriteTooOldError makes the timestamp flow hard to follow I think. I don't think this function should care about "write too old" at all.
I also find the splitting of responsibilities between maybeRefreshPreemptively and its caller pretty surprising: the caller checks that we'd want to refresh, and then this guy checks whether we can refresh.
I would try to do a Family Guy window blinds pulling a bit to see if it'd look better with this function taking on the checking of whether we want to refresh, and on the other hand I'd leave the error creation to the caller. This method could return OK or CommitHopelessBecauseRefreshIsNeededButImpossible. Or I dunno, I'm less sure about where the error creation should go, but one way or another I do think it should go after tryUpdatingTxnSpans returns false.
And then we don't have to argue about whether this function should be called maybe or try or maybeTry.
pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go, line 426 at r2 (raw file):
defer func() { if ok { sr.refreshSuccess.Inc(1)
I don't think we should increment this when there was nothing to refresh
pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher_test.go, line 379 at r2 (raw file):
// it observes a batch containing an EndTxn request that will necessarily throw // a serializable error. func TestTxnSpanrefresherPreemptiveRefresh(t *testing.T) {
s/Spanrefresher/SpanRefresher
pkg/kv/kvclient/kvcoord/txn_metrics.go, line 76 at r2 (raw file):
metaRefreshSuccess = metric.Metadata{ Name: "txn.refresh.success", Help: "Number of successful refreshes",
pls add to the help to link to txn.refresh.auto_retries and explain that some refreshes are preemptive and don't lead to a retry.
pkg/kv/kvclient/kvcoord/txn_metrics.go, line 105 at r2 (raw file):
metaRefreshAutoRetries = metric.Metadata{ Name: "txn.refresh.auto_retries", Help: "Number of batch retries after successful refreshes",
nit: s/batch/request
924c4ba to
e6ef4f5
Compare
nvb
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go, line 168 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
As long as we're writing a long comment here (which I hope can be hidden inside
maybeRefreshPreemptively(); see the following comment), mind also putting a TODO(andrei) explaining that whether or not we can still auto-retry at the SQL level should also play a role in deciding whether we want to refresh eagerly or not.
Done.
pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go, line 395 at r2 (raw file):
For one, the no-op handling of the synthetic WriteTooOldError makes the timestamp flow hard to follow I think. I don't think this function should care about "write too old" at all.
I don't quite understand what you mean here. This caller is guaranteed to hit the TransactionRetryError case, not the WriteTooOldError case.
I also find the splitting of responsibilities between maybeRefreshPreemptively and its caller pretty surprising: the caller checks that we'd want to refresh, and then this guy checks whether we can refresh.
I would try to do a Family Guy window blinds pulling a bit to see if it'd look better with this function taking on the checking of whether we want to refresh, and on the other hand I'd leave the error creation to the caller. This method could return OK or CommitHopelessBecauseRefreshIsNeededButImpossible. Or I dunno, I'm less sure about where the error creation should go, but one way or another I do think it should go after tryUpdatingTxnSpans returns false.
And then we don't have to argue about whether this function should be called maybe or try or maybeTry.
Done.
pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go, line 426 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I don't think we should increment this when there was nothing to refresh
We were before. You think we should change behavior? I tend to disagree because this still fits the definition of a "refresh". If we want to see how many refresh requests were sent, we have a metric for that too.
pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher_test.go, line 379 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
s/Spanrefresher/SpanRefresher
Done.
pkg/kv/kvclient/kvcoord/txn_metrics.go, line 76 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
pls add to the help to link to
txn.refresh.auto_retriesand explain that some refreshes are preemptive and don't lead to a retry.
Done.
pkg/kv/kvclient/kvcoord/txn_metrics.go, line 105 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: s/batch/request
Done.
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)
pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go, line 395 at r2 (raw file):
For one, the no-op handling of the synthetic WriteTooOldError makes the timestamp flow hard to follow I think. I don't think this function should care about "write too old" at all.
I don't quite understand what you mean here. This caller is guaranteed to hit the TransactionRetryError case, not the WriteTooOldError case.
Then I'm even more bothered by calling that fat function. But now that I look more, I think the preemptiveWriteTooOldErr can't be triggered because this interceptor terminates the WriteTooOld flag. Let's just assert that ba.Txn.WriteTooOld is not set, like we do in sendLockedWithRefreshAttempts.
Even if we get rid of preemptiveWriteTooOldErr, I'd then stil prefer to get rid of preemptiveSerializableErr too; I would break CanTransactionRetryAtRefreshedTimestamp() into two: an inner function that takes a retriable error and gives you back the timestamp that the error wants the client to retry at, and an outer function that takes an error and a timestamp and prepares the transaction for retrying at that given timestamp (i.e.
func prepareForRetry(txn, timestamp) (bool, txn)
if txn.CommitTimestampFixed {
return false
}
newTxn := txn.Clone()
newTxn.WriteTimestamp.Forward(timestamp)
newTxn.ReadTimestamp.Forward(newTxn.WriteTimestamp)
newTxn.WriteTooOld = false
maybeRefreshPreemptively() would then do
txn, ok := prepareForRetry(txn, ba.Txn.WriteTime)
The handling of write timestamp is just too obfuscated through the pErr := roachpb.NewErrorWithTxn(retryErr, ba.Txn) line, for no good reason IMO.
pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go, line 426 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
We were before. You think we should change behavior? I tend to disagree because this still fits the definition of a "refresh". If we want to see how many refresh requests were sent, we have a metric for that too.
ok
e6ef4f5 to
92ee907
Compare
nvb
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go, line 395 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
For one, the no-op handling of the synthetic WriteTooOldError makes the timestamp flow hard to follow I think. I don't think this function should care about "write too old" at all.
I don't quite understand what you mean here. This caller is guaranteed to hit the TransactionRetryError case, not the WriteTooOldError case.
Then I'm even more bothered by calling that fat function. But now that I look more, I think the
preemptiveWriteTooOldErrcan't be triggered because this interceptor terminates theWriteTooOldflag. Let's just assert thatba.Txn.WriteTooOldis not set, like we do insendLockedWithRefreshAttempts.Even if we get rid of
preemptiveWriteTooOldErr, I'd then stil prefer to get rid ofpreemptiveSerializableErrtoo; I would breakCanTransactionRetryAtRefreshedTimestamp()into two: an inner function that takes a retriable error and gives you back the timestamp that the error wants the client to retry at, and an outer function that takes an error and a timestamp and prepares the transaction for retrying at that given timestamp (i.e.func prepareForRetry(txn, timestamp) (bool, txn) if txn.CommitTimestampFixed { return false } newTxn := txn.Clone() newTxn.WriteTimestamp.Forward(timestamp) newTxn.ReadTimestamp.Forward(newTxn.WriteTimestamp) newTxn.WriteTooOld = false
maybeRefreshPreemptively()would then do
txn, ok := prepareForRetry(txn, ba.Txn.WriteTime)The handling of write timestamp is just too obfuscated through the
pErr := roachpb.NewErrorWithTxn(retryErr, ba.Txn)line, for no good reason IMO.
Ok, I did pretty much exactly what you're suggesting here. I think you'll like the result. PTAL.
And if so, do you mind giving the third commit in #52885 a look while you have this paged in? That needs to land before the stability period.
92ee907 to
dab1090
Compare
andreimatei
left a comment
There was a problem hiding this comment.
LGTM thanks
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)
|
bors r+ |
dab1090 to
1d461ba
Compare
|
Canceled. |
|
bors r+ |
The field was replaced with the more general BatchRequest.CanForwardReadTimestamp in 972915d. This commit completes the migration to remove the old flag, which is possible now that we're building towards v20.2.
Relates to cockroachdb#51294. This commit adds logic to the txnSpanRefresher to preemptively perform refreshes before issuing requests when doing is guaranteed to be beneficial. We perform a preemptive refresh if either a) doing so would be free because we have not yet accumulated any refresh spans, or b) the batch contains a committing EndTxn request that we know will be rejected if issued. The first case is straightforward. If the transaction has yet to perform any reads but has had its write timestamp bumped, refreshing is a trivial no-op. In this case, refreshing eagerly prevents the transaction for performing any future reads at its current read timestamp. Not doing so preemptively guarantees that we will need to perform a real refresh in the future if the transaction ever performs a read. At best, this would be wasted work. At worst, this could result in the future refresh failing. So we might as well refresh preemptively while doing so is free. Note that this first case here does NOT obviate the need for server-side refreshes. Notably, a transaction's write timestamp might be bumped in the same batch in which it performs its first read. In such cases, a preemptive refresh would not be needed but a reactive refresh would not be a trivial no-op. These situations are common for one-phase commit transactions. The second case is more complex. If the batch contains a committing EndTxn request that we know will need a refresh, we don't want to bother issuing it just for it to be rejected. Instead, preemptively refresh before issuing the EndTxn batch. If we view reads as acquiring a form of optimistic read locks under an optimistic concurrency control scheme (as is discussed in the comment on txnSpanRefresher) then this preemptive refresh immediately before the EndTxn is synonymous with the "validation" phase of a standard OCC transaction model. However, as an optimization compared to standard OCC, the validation phase is only performed when necessary in CockroachDB (i.e. if the transaction's writes have been pushed to higher timestamps). This second case will play into the solution for cockroachdb#51294. Now that we perform these preemptive refreshes when possible, we know that it is always to right choice to split off the EndTxn from the rest of the batch during a txnSpanRefresher auto-retry. Without this change, it was unclear whether the first refresh of an EndTxn batch was caused by earlier requests in the transaction or by other requests in the current batch. Now, it is always caused by other requests in the same batch, so it is always clear that we should split the EndTxn from the rest of the batch immediately after the first refresh. Release notes (performance improvement): validation of optimistic reads is now performed earlier in transactions when doing so can save work. This eliminates certain types of transaction retry errors and avoids wasted RPC traffic.
1d461ba to
3c2524f
Compare
|
Canceled. |
|
api.pb.go keeps changing on me! bors r+ |
|
Build succeeded: |
Fixes cockroachdb#51294. First two commits from cockroachdb#52884. This commit updates the txnSpanRefresher to split off EndTxn requests into their own partial batches on auto-retries after successful refreshes as a means of preventing starvation. This avoids starvation in two ways. First, it helps ensure that we lay down intents if any of the other requests in the batch are writes. Second, it ensures that if any writes are getting pushed due to contention with reads or due to the closed timestamp, they will still succeed and allow the batch to make forward progress. Without this, each retry attempt may get pushed because of writes in the batch and then rejected wholesale when the EndTxn tries to evaluate the pushed batch. When split, the writes will be pushed but succeed, the transaction will be refreshed, and the EndTxn will succeed. I still need to confirm that this fixes this indefinite stall [here](https://github.com/cockroachlabs/misc_projects_glenn/tree/master/rw_blockage#implicit-query-hangs--explict-query-works), but I suspect that it will. Release note (bug fix): A change in v20.1 caused a certain class of bulk UPDATEs and DELETE statements to hang indefinitely if run in an implicit transaction. We now break up these statements to avoid starvation and prevent them from hanging indefinitely.
Fixes cockroachdb#51294. First two commits from cockroachdb#52884. This commit updates the txnSpanRefresher to split off EndTxn requests into their own partial batches on auto-retries after successful refreshes as a means of preventing starvation. This avoids starvation in two ways. First, it helps ensure that we lay down intents if any of the other requests in the batch are writes. Second, it ensures that if any writes are getting pushed due to contention with reads or due to the closed timestamp, they will still succeed and allow the batch to make forward progress. Without this, each retry attempt may get pushed because of writes in the batch and then rejected wholesale when the EndTxn tries to evaluate the pushed batch. When split, the writes will be pushed but succeed, the transaction will be refreshed, and the EndTxn will succeed. I still need to confirm that this fixes this indefinite stall [here](https://github.com/cockroachlabs/misc_projects_glenn/tree/master/rw_blockage#implicit-query-hangs--explict-query-works), but I suspect that it will. Release note (bug fix): A change in v20.1 caused a certain class of bulk UPDATEs and DELETE statements to hang indefinitely if run in an implicit transaction. We now break up these statements to avoid starvation and prevent them from hanging indefinitely.
Relates to #51294.
This commit adds logic to the txnSpanRefresher to preemptively perform refreshes before issuing requests when doing is guaranteed to be beneficial. We perform a preemptive refresh if either a) doing so would be free because we have not yet accumulated any refresh spans, or b) the batch contains a committing EndTxn request that we know will be rejected if issued.
The first case is straightforward. If the transaction has yet to perform any reads but has had its write timestamp bumped, refreshing is a trivial no-op. In this case, refreshing eagerly prevents the transaction for performing any future reads at its current read timestamp. Not doing so preemptively guarantees that we will need to perform a real refresh in the future if the transaction ever performs a read. At best, this would be wasted work. At worst, this could result in the future refresh failing. So we might as well refresh preemptively while doing so is free.
Note that this first case here does NOT obviate the need for server-side refreshes. Notably, a transaction's write timestamp might be bumped in the same batch in which it performs its first read. In such cases, a preemptive refresh would not be needed but a reactive refresh would not be a trivial no-op. These situations are common for one-phase commit transactions.
The second case is more complex. If the batch contains a committing EndTxn request that we know will need a refresh, we don't want to bother issuing it just for it to be rejected. Instead, preemptively refresh before issuing the EndTxn batch. If we view reads as acquiring a form of optimistic read locks under an optimistic concurrency control scheme (as is discussed in the comment on txnSpanRefresher) then this preemptive refresh immediately before the EndTxn is synonymous with the "validation" phase of a standard OCC transaction model. However, as an optimization compared to standard OCC, the validation phase is only performed when necessary in CockroachDB (i.e. if the transaction's writes have been pushed to higher timestamps).
This second case will play into the solution for #51294. Now that we perform these preemptive refreshes when possible, we know that it is always to right choice to split off the EndTxn from the rest of the batch during a txnSpanRefresher auto-retry. Without this change, it was unclear whether the first refresh of an EndTxn batch was caused by earlier requests in the transaction or by other requests in the current batch. Now, it is always caused by other requests in the same batch, so it is always clear that we should split the EndTxn from the rest of the batch immediately after the first refresh.
Release notes (performance improvement): validation of optimistic reads is now performed earlier in transactions when doing so can save work. This eliminates certain types of transaction retry errors.