Skip to content

rangefeed: Pushtxn in rangefeed returned abort, but txn may have been committed #104309

@pand5a

Description

@pand5a

I don't know if it's appropriate to send it here, but I'm really troubled.

The RangeFeed relies on the return result of PushTxns (task. go: pushOldTxns) ABORTED to remove the tracked txn in UnresolvedIntentQueue, which depends on the correctness of PushTxn.

case roachpb.ABORTED:
// The transaction is aborted, so it doesn't need to be tracked
// anymore nor does it need to prevent the resolved timestamp from
// advancing. Inform the Processor that it can remove the txn from
// its unresolvedIntentQueue.
//
// NOTE: the unresolvedIntentQueue will ignore MVCCAbortTxn operations
// before it has been initialized. This is not a concern here though
// because we never launch txnPushAttempt tasks before the queue has
// been initialized.
ops[i].SetValue(&enginepb.MVCCAbortTxnOp{
TxnID: txn.ID,
})

But in cmd_push_txn.go The “case txnID” case of [PushTxn ->SynthesizeTxnFromMeta ->CanCreateTxnRecord(replica_tscache.go)] may return a transaction that has already been COMMITTED but has status=Abort, which may cause rts in the RangeFeed to advance incorrectly. Do I understand correctly?

if txnMinTS.LessEq(tombstoneTimestamp) {
switch tombstoneTxnID {
case txnID:
// If we find our own transaction ID then a transaction record has already
// been written. We might be a replay (e.g. a DistSender retry), or we
// raced with an asynchronous abort. Either way, return an error.
//
// TODO(andrei): We could keep a bit more info in the tscache to return a
// different error for COMMITTED transactions. If the EndTxn(commit) was
// the only request in the batch, this this would be sufficient for the
// client to swallow the error and declare the transaction as committed.
// If there were other requests in the EndTxn batch, then the client would
// still have trouble reconstructing the result, but at least it could
// provide a non-ambiguous error to the application.
return false, kvpb.ABORT_REASON_RECORD_ALREADY_WRITTEN_POSSIBLE_REPLAY

Jira issue: CRDB-28452

Epic CRDB-27235

Metadata

Metadata

Assignees

Labels

A-kv-rangefeedRangefeed infrastructure, server+clientC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.C-technical-advisoryCaused a technical advisoryO-communityOriginated from the communitybranch-masterFailures and bugs on the master branch.branch-release-22.2Used to mark GA and release blockers, technical advisories, and bugs for 22.2branch-release-23.1Used to mark GA and release blockers, technical advisories, and bugs for 23.1branch-release-23.2Used to mark GA and release blockers, technical advisories, and bugs for 23.2v23.1.15v23.1.17v23.2.1v23.2.3

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions