kv: idempotency failure across implicit commit breaks refreshing a parallel commit request

It appears we have a problem with retrying  a parallel commit batch (i.e. a batch with `EndTxn` and other writes) after a successful refresh if the batch had been split by the `DistSender` on the previous attempt. This is the scenario (speculated, or even observed, by @nvanbenschoten):

1. txn1 sends `Put(a) + Put(b) + EndTxn`
2. DistSender splits the `Put(a)` from the rest.
3. `Put(a)` succeeds, but the rest catches some retriable error.
4. `TxnCoordSender` gets the retriable error. The fact that a sub-batch succeeded is lost. We used to care about that fact, but we've successively gotten rid of that tracking across #35140 and #44661.
5. we refresh everything that came before this batch. The refresh succeeds.
6. we re-send the batch. It gets split again. The part with the `EndTxn` executes first. The transaction is now `STAGING`. More than that, the txn is in fact implicitly committed - the intent on `a` is already there since the previous attempt and, because it's at a lower timestamp than the txn record, it counts as golden for the purposes of verifying the implicit commit condition.
7. some other transaction wonders in, sees that txn1 is in its way, and transitions it to explicitly committed.
8. the `Put(a)` now tries to evaluate. It gets really confused. I guess that different things can happen; none of them good. One thing that I believe we've observed in #46299 is that, if there's another txn's intent there already, the `Put` will try to push it, enter the `txnWaitQueue`, eventually observe that its own txn is committed and [return an error](https://github.com/cockroachdb/cockroach/blob/f176a39446e5235c9985a07f2550583dc106c26c/pkg/kv/kvserver/txnwait/queue.go#L612). The client thus gets an error (and a non-ambiguous one to boot) although the txn is committed. Even worse perhaps, I think it's possible for a request to return wrong results instead of an error.

The problem here is that the original `Put(a)` satisfied the 2nd `EndTxn` attempt's implicit commit condition. That's not good because it can confuse the evaluation of the Put the 2nd time around.

The crux of the problem here is that `Put(a)` is not idempotent across the txn committing. The `TxnCoordSender` was relying on it to be, though, when it sent it the 2nd time. It's hard to straight-up make it idempotent, since we lose the intent history when we resolve an intent.
I think, however, that since we've introduced savepoints into KV, we have an alternative to relying on idempotency of request evaluation for the purposes of being able to retry a batch after a successful request: we can "rollback" the effect of the first evaluation together with retrying the batch. I think something like the following scheme would work in principle:
- When retrying a batch, we'd also ignore the range of sequence numbers corresponding to the previous evaluation. We'd assign new sequence numbers to the retrying requests.
- Now, the re-evaluation still has the properties that we wanted from idempotency - requests should return the same results as the first time around (actually, we're evaluating at a different timestamp the 2nd time around, so we're not necessarily expecting the exact same results. But the point is that the effects of the previous evaluation don't affect us).
- The problems described above go away because, when the staging transaction record is written a second time, it'll include the list of ignore intents, and so previously-written intents will not satisfy the implicit commit condition (they'll be invisible to the `QueryIntent` requests).
- The reverse of the problem described above is a newly-written intent satisfying a previously-written staging txn record is addressed either by making sure that newly-written intents have a timestamp that's higher than the previously-written txn record (and I think the timestamp from the retriable error causing the refresh is the upper-bound on the timestamp at which the txn record could have been written - so I think maybe our code is already fine), or we could make `QueryIntent` require an exact match on the expected intent sequence number.

The scheme above would require changing sequence numbers of the requests after a refresh, which might be hard? I guess we could move the `spanRefresherInterceptor` above the `txnSeqNumAllocator` in the interceptor stack.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: idempotency failure across implicit commit breaks refreshing a parallel commit request #46341

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

kv: idempotency failure across implicit commit breaks refreshing a parallel commit request #46341

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions