-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kv: some concurrent reads through a Txn can miss to see their own writes #25329
Description
There's a problem with concurrent uses of a client.Txn: if a request receives a TransactionAbortedError, the proto inside the Txn will be transparently switched to a new one (new transaction ID). From this moment on, future operations on the transaction succeed. This makes sense for what we generally want to do on retriable errors - start over. But it is a problem for concurrent uses of the Txn - every user but the one that got the error can do reads and writes hunky-dory ignoring the fact that it operates in a new transaction. Presumably there's a synchronization point somewhere for all concurrent users of a Txn, and above that point the fact that there was a error will take effect, but in the meantime some reads might have missed to see their writes, which was generally considered a no-no in several other situations.
This is a sister issue to #17197, which complains about possible committing of unintended data in the face of concurrent writes and retriable errors. This one is about reads through, which we do more of concurrently than writes.
One way to see this clearly in action is by putting the following patch at the end of TestTxnCoordSenderHeartbeatFailurePostSplit
log.Infof(context.TODO(), "!!! test doing one more read")
if res, err := txn.Get(context.TODO(), keyA); err != nil {
t.Fatal(err)
} else {
log.Infof(context.TODO(), "!!! res: %+v", res)
}
@nvanbenschoten, @bdarnell what do you think we should do about this, if anything? I think we really need to prioritize rethinking the Txn API for concurrent use.
Jira issue: CRDB-5720