-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kv: some concurrent reads through a Txn can miss to see their own writes #25329
Copy link
Copy link
Open
Labels
A-kv-clientRelating to the KV client and the KV interface.Relating to the KV client and the KV interface.C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.S-3-erroneous-edge-caseDatabase produces or stores erroneous data without visible error/warning, in rare edge cases.Database produces or stores erroneous data without visible error/warning, in rare edge cases.T-kvKV TeamKV Team
Metadata
Metadata
Assignees
Labels
A-kv-clientRelating to the KV client and the KV interface.Relating to the KV client and the KV interface.C-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.S-3-erroneous-edge-caseDatabase produces or stores erroneous data without visible error/warning, in rare edge cases.Database produces or stores erroneous data without visible error/warning, in rare edge cases.T-kvKV TeamKV Team
There's a problem with concurrent uses of a
client.Txn: if a request receives aTransactionAbortedError, the proto inside theTxnwill be transparently switched to a new one (new transaction ID). From this moment on, future operations on the transaction succeed. This makes sense for what we generally want to do on retriable errors - start over. But it is a problem for concurrent uses of the Txn - every user but the one that got the error can do reads and writes hunky-dory ignoring the fact that it operates in a new transaction. Presumably there's a synchronization point somewhere for all concurrent users of a Txn, and above that point the fact that there was a error will take effect, but in the meantime some reads might have missed to see their writes, which was generally considered a no-no in several other situations.This is a sister issue to #17197, which complains about possible committing of unintended data in the face of concurrent writes and retriable errors. This one is about reads through, which we do more of concurrently than writes.
One way to see this clearly in action is by putting the following patch at the end of TestTxnCoordSenderHeartbeatFailurePostSplit
@nvanbenschoten, @bdarnell what do you think we should do about this, if anything? I think we really need to prioritize rethinking the Txn API for concurrent use.
Jira issue: CRDB-5720