-
Notifications
You must be signed in to change notification settings - Fork 4.1k
rangefeed: Pushtxn in rangefeed returned abort, but txn may have been committed #104309
Description
I don't know if it's appropriate to send it here, but I'm really troubled.
The RangeFeed relies on the return result of PushTxns (task. go: pushOldTxns) ABORTED to remove the tracked txn in UnresolvedIntentQueue, which depends on the correctness of PushTxn.
cockroach/pkg/kv/kvserver/rangefeed/task.go
Lines 324 to 336 in 9c2f650
| case roachpb.ABORTED: | |
| // The transaction is aborted, so it doesn't need to be tracked | |
| // anymore nor does it need to prevent the resolved timestamp from | |
| // advancing. Inform the Processor that it can remove the txn from | |
| // its unresolvedIntentQueue. | |
| // | |
| // NOTE: the unresolvedIntentQueue will ignore MVCCAbortTxn operations | |
| // before it has been initialized. This is not a concern here though | |
| // because we never launch txnPushAttempt tasks before the queue has | |
| // been initialized. | |
| ops[i].SetValue(&enginepb.MVCCAbortTxnOp{ | |
| TxnID: txn.ID, | |
| }) |
But in cmd_push_txn.go The “case txnID” case of [PushTxn ->SynthesizeTxnFromMeta ->CanCreateTxnRecord(replica_tscache.go)] may return a transaction that has already been COMMITTED but has status=Abort, which may cause rts in the RangeFeed to advance incorrectly. Do I understand correctly?
cockroach/pkg/kv/kvserver/replica_tscache.go
Lines 529 to 543 in 9c2f650
| if txnMinTS.LessEq(tombstoneTimestamp) { | |
| switch tombstoneTxnID { | |
| case txnID: | |
| // If we find our own transaction ID then a transaction record has already | |
| // been written. We might be a replay (e.g. a DistSender retry), or we | |
| // raced with an asynchronous abort. Either way, return an error. | |
| // | |
| // TODO(andrei): We could keep a bit more info in the tscache to return a | |
| // different error for COMMITTED transactions. If the EndTxn(commit) was | |
| // the only request in the batch, this this would be sufficient for the | |
| // client to swallow the error and declare the transaction as committed. | |
| // If there were other requests in the EndTxn batch, then the client would | |
| // still have trouble reconstructing the result, but at least it could | |
| // provide a non-ambiguous error to the application. | |
| return false, kvpb.ABORT_REASON_RECORD_ALREADY_WRITTEN_POSSIBLE_REPLAY |
Jira issue: CRDB-28452
Epic CRDB-27235