-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kv: make BeginTxn optional proposal #25437
Description
If a transaction sends a batch that touches multiple ranges, and if the sub-batch for the anchor range succeeds, but another one fails, we are now in a situation where the txn record has been written, but the client.Txn is not aware of that fact so it will attempt to send another BeginTxn with the next batch.
So, if the client choses to ignore the error (say, it's an expected ConditionFailedError on an InitPut), and send another batch, that batch will have an BeginTxn which fails with RETRY_POSSIBLE_REPLAY because the txn record already exists. If you're inside a db.Txn function, for example, you'll retry endlessly.
On the other hand, if we don't send the BeginTxn again then, in case it does exist, it would fail (because of its replay protection via the timestamp cache).
Discussing this with @tschottdorf, this is the proposal. We're moving towards a world with best-effort BeginTxn and idempotent EndTxn.
- We'll have
client.Txnnever send more than oneBeginTxn. Once aBeginTxnhas been sent, regardless of whether it succeeded or not (or if it's unclear whether it did), there's no secondBeginTxn. This way there's no failure when continuing to write in a transaction whoseBeginTxnbatch has previously failed. (*) - We'll make
EndTxnsucceed if it doesn't find a transaction record. In this case, theEndTxnwill simply write the transaction record and declare success. The txn record might not exist because it was never written, or it might not exist because this is a replay and it had already been cleaned up. Either case is fine. - We'll make it so that the
TxnCoordSendernever sends more than oneEndTxn. This ensures that a commit doesn't race with a rollback such that the rollback succeeds and then the commit also succeeds. EndTxnwill continue to populate the timestamp cache, as it currently does, andBeginTxnwill continue to consult it. This way, we continue to be protected against a replayedBeginTxnrecreating a phantom txn record after the correspondingEndTxn.
(*) This also simplifies the logic in client.Txn a bunch, which currently uses a combination of TxnProto.Writing and writingTxnRecord to figure out whether a BeginTxn/EndTxn needs to be sent. Beside being complex, this logic is also broken because it's not in sync with the logic used by the TxnCoordSender when starting the hearbeat loop. It's currently possible for the TCS to think that there might be intents laid down and so there's a heartbeat loop, but for the client.Txn to think that the transaction is read-only and so an EndTxn does not need to be sent (in which case nobody stops the heartbeat loop). Attempting to fix these things led to finding this current bug.