You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It appears we have a problem with retrying a parallel commit batch (i.e. a batch with EndTxn and other writes) after a successful refresh if the batch had been split by the DistSender on the previous attempt. This is the scenario (speculated, or even observed, by @nvanbenschoten):
txn1 sends Put(a) + Put(b) + EndTxn
DistSender splits the Put(a) from the rest.
Put(a) succeeds, but the rest catches some retriable error.
we refresh everything that came before this batch. The refresh succeeds.
we re-send the batch. It gets split again. The part with the EndTxn executes first. The transaction is now STAGING. More than that, the txn is in fact implicitly committed - the intent on a is already there since the previous attempt and, because it's at a lower timestamp than the txn record, it counts as golden for the purposes of verifying the implicit commit condition.
some other transaction wonders in, sees that txn1 is in its way, and transitions it to explicitly committed.
the Put(a) now tries to evaluate. It gets really confused. I guess that different things can happen; none of them good. One thing that I believe we've observed in roachtest: kv/contention/nodes=4 failed #46299 is that, if there's another txn's intent there already, the Put will try to push it, enter the txnWaitQueue, eventually observe that its own txn is committed and return an error. The client thus gets an error (and a non-ambiguous one to boot) although the txn is committed. Even worse perhaps, I think it's possible for a request to return wrong results instead of an error.
The problem here is that the original Put(a) satisfied the 2nd EndTxn attempt's implicit commit condition. That's not good because it can confuse the evaluation of the Put the 2nd time around.
The crux of the problem here is that Put(a) is not idempotent across the txn committing. The TxnCoordSender was relying on it to be, though, when it sent it the 2nd time. It's hard to straight-up make it idempotent, since we lose the intent history when we resolve an intent.
I think, however, that since we've introduced savepoints into KV, we have an alternative to relying on idempotency of request evaluation for the purposes of being able to retry a batch after a successful request: we can "rollback" the effect of the first evaluation together with retrying the batch. I think something like the following scheme would work in principle:
When retrying a batch, we'd also ignore the range of sequence numbers corresponding to the previous evaluation. We'd assign new sequence numbers to the retrying requests.
Now, the re-evaluation still has the properties that we wanted from idempotency - requests should return the same results as the first time around (actually, we're evaluating at a different timestamp the 2nd time around, so we're not necessarily expecting the exact same results. But the point is that the effects of the previous evaluation don't affect us).
The problems described above go away because, when the staging transaction record is written a second time, it'll include the list of ignore intents, and so previously-written intents will not satisfy the implicit commit condition (they'll be invisible to the QueryIntent requests).
The reverse of the problem described above is a newly-written intent satisfying a previously-written staging txn record is addressed either by making sure that newly-written intents have a timestamp that's higher than the previously-written txn record (and I think the timestamp from the retriable error causing the refresh is the upper-bound on the timestamp at which the txn record could have been written - so I think maybe our code is already fine), or we could make QueryIntent require an exact match on the expected intent sequence number.
The scheme above would require changing sequence numbers of the requests after a refresh, which might be hard? I guess we could move the spanRefresherInterceptor above the txnSeqNumAllocator in the interceptor stack.
It appears we have a problem with retrying a parallel commit batch (i.e. a batch with
EndTxnand other writes) after a successful refresh if the batch had been split by theDistSenderon the previous attempt. This is the scenario (speculated, or even observed, by @nvanbenschoten):Put(a) + Put(b) + EndTxnPut(a)from the rest.Put(a)succeeds, but the rest catches some retriable error.TxnCoordSendergets the retriable error. The fact that a sub-batch succeeded is lost. We used to care about that fact, but we've successively gotten rid of that tracking across storage: employ transactional idempotency to refresh mixed-success batches #35140 and kv: refresh less and retry more #44661.EndTxnexecutes first. The transaction is nowSTAGING. More than that, the txn is in fact implicitly committed - the intent onais already there since the previous attempt and, because it's at a lower timestamp than the txn record, it counts as golden for the purposes of verifying the implicit commit condition.Put(a)now tries to evaluate. It gets really confused. I guess that different things can happen; none of them good. One thing that I believe we've observed in roachtest: kv/contention/nodes=4 failed #46299 is that, if there's another txn's intent there already, thePutwill try to push it, enter thetxnWaitQueue, eventually observe that its own txn is committed and return an error. The client thus gets an error (and a non-ambiguous one to boot) although the txn is committed. Even worse perhaps, I think it's possible for a request to return wrong results instead of an error.The problem here is that the original
Put(a)satisfied the 2ndEndTxnattempt's implicit commit condition. That's not good because it can confuse the evaluation of the Put the 2nd time around.The crux of the problem here is that
Put(a)is not idempotent across the txn committing. TheTxnCoordSenderwas relying on it to be, though, when it sent it the 2nd time. It's hard to straight-up make it idempotent, since we lose the intent history when we resolve an intent.I think, however, that since we've introduced savepoints into KV, we have an alternative to relying on idempotency of request evaluation for the purposes of being able to retry a batch after a successful request: we can "rollback" the effect of the first evaluation together with retrying the batch. I think something like the following scheme would work in principle:
QueryIntentrequests).QueryIntentrequire an exact match on the expected intent sequence number.The scheme above would require changing sequence numbers of the requests after a refresh, which might be hard? I guess we could move the
spanRefresherInterceptorabove thetxnSeqNumAllocatorin the interceptor stack.