Skip to content

kv: make BeginTxn optional proposal #25437

@andreimatei

Description

@andreimatei

If a transaction sends a batch that touches multiple ranges, and if the sub-batch for the anchor range succeeds, but another one fails, we are now in a situation where the txn record has been written, but the client.Txn is not aware of that fact so it will attempt to send another BeginTxn with the next batch.
So, if the client choses to ignore the error (say, it's an expected ConditionFailedError on an InitPut), and send another batch, that batch will have an BeginTxn which fails with RETRY_POSSIBLE_REPLAY because the txn record already exists. If you're inside a db.Txn function, for example, you'll retry endlessly.

On the other hand, if we don't send the BeginTxn again then, in case it does exist, it would fail (because of its replay protection via the timestamp cache).

Discussing this with @tschottdorf, this is the proposal. We're moving towards a world with best-effort BeginTxn and idempotent EndTxn.

  • We'll have client.Txn never send more than one BeginTxn. Once a BeginTxn has been sent, regardless of whether it succeeded or not (or if it's unclear whether it did), there's no second BeginTxn. This way there's no failure when continuing to write in a transaction whose BeginTxn batch has previously failed. (*)
  • We'll make EndTxn succeed if it doesn't find a transaction record. In this case, the EndTxn will simply write the transaction record and declare success. The txn record might not exist because it was never written, or it might not exist because this is a replay and it had already been cleaned up. Either case is fine.
  • We'll make it so that the TxnCoordSender never sends more than one EndTxn. This ensures that a commit doesn't race with a rollback such that the rollback succeeds and then the commit also succeeds.
  • EndTxn will continue to populate the timestamp cache, as it currently does, and BeginTxn will continue to consult it. This way, we continue to be protected against a replayed BeginTxn recreating a phantom txn record after the corresponding EndTxn.

(*) This also simplifies the logic in client.Txn a bunch, which currently uses a combination of TxnProto.Writing and writingTxnRecord to figure out whether a BeginTxn/EndTxn needs to be sent. Beside being complex, this logic is also broken because it's not in sync with the logic used by the TxnCoordSender when starting the hearbeat loop. It's currently possible for the TCS to think that there might be intents laid down and so there's a heartbeat loop, but for the client.Txn to think that the transaction is read-only and so an EndTxn does not need to be sent (in which case nobody stops the heartbeat loop). Attempting to fix these things led to finding this current bug.

cc @tschottdorf @nvanbenschoten @bdarnell

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-kv-clientRelating to the KV client and the KV interface.C-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions