-
Notifications
You must be signed in to change notification settings - Fork 6.2k
BatchResolveLock doesn't handle stale pessimistic locks with invalidated primary and may break data consistency #43243
Copy link
Copy link
Closed
Labels
affects-4.0This bug affects 4.0.x versions.This bug affects 4.0.x versions.affects-5.0This bug affects 5.0.x versions.This bug affects 5.0.x versions.affects-5.1This bug affects 5.1.x versions.This bug affects 5.1.x versions.affects-5.2This bug affects 5.2.x versions.This bug affects 5.2.x versions.affects-5.3This bug affects 5.3.x versions.This bug affects 5.3.x versions.affects-5.4This bug affects the 5.4.x(LTS) versions.This bug affects the 5.4.x(LTS) versions.affects-6.0affects-6.1This bug affects the 6.1.x(LTS) versions.This bug affects the 6.1.x(LTS) versions.affects-6.2affects-6.3affects-6.4affects-6.5This bug affects the 6.5.x(LTS) versions.This bug affects the 6.5.x(LTS) versions.affects-6.6affects-7.0affects-7.1This bug affects the 7.1.x(LTS) versions.This bug affects the 7.1.x(LTS) versions.severity/criticalsig/transactionSIG:TransactionSIG:Transactiontype/bugThe issue is confirmed as a bug.The issue is confirmed as a bug.
Metadata
Metadata
Assignees
Labels
affects-4.0This bug affects 4.0.x versions.This bug affects 4.0.x versions.affects-5.0This bug affects 5.0.x versions.This bug affects 5.0.x versions.affects-5.1This bug affects 5.1.x versions.This bug affects 5.1.x versions.affects-5.2This bug affects 5.2.x versions.This bug affects 5.2.x versions.affects-5.3This bug affects 5.3.x versions.This bug affects 5.3.x versions.affects-5.4This bug affects the 5.4.x(LTS) versions.This bug affects the 5.4.x(LTS) versions.affects-6.0affects-6.1This bug affects the 6.1.x(LTS) versions.This bug affects the 6.1.x(LTS) versions.affects-6.2affects-6.3affects-6.4affects-6.5This bug affects the 6.5.x(LTS) versions.This bug affects the 6.5.x(LTS) versions.affects-6.6affects-7.0affects-7.1This bug affects the 7.1.x(LTS) versions.This bug affects the 7.1.x(LTS) versions.severity/criticalsig/transactionSIG:TransactionSIG:Transactiontype/bugThe issue is confirmed as a bug.The issue is confirmed as a bug.
Bug Report
* This is found by reviewing code and not yet confirmed by tests.
As way know, pessimistic transactions may switch primary during execution, and a stale pessimistic lock's primary may point to a key that is no longer the primary of the transaction.
It's quite tricky to handle this case when resolving locks. Historically, we've done several fixes trying to make it correct, such as #14787 , #21689 and many more. However, we still find some incorrectly handled corner cases recently. One is #42937 , and here's another one.
Since a stale pessimistic lock may points to a key that's not the actual primary of that transaction, the status of the key pointed by the pessimistic lock may not indicate the real state (committed or rolled back) of that transaction. We used to optimize resolving lock by caching the result of
check_txn_status, and then it's found to be problematic and fixed in #21689 .However, we found that
BatchResolveLocks(which is used in GC) have a different way to misuse the wrong transaction status. It's passed an array of locks (which is usually collected byScanLockRPC), then checks their primary's status, collects the results in a list, and finally send them to TiKV in oneResolveLockRPC. There's no special handling for pessimistic locks, and it may send incorrect transaction state to TiKV. If the transaction is committed and some of their keys are prewritten but not committed (secondaries of 2PC transactions, or any keys of async commit transactions), these key may be incorrectly rolled back, causing the transaction incompletely committed and breaks the data consistency.