perf: pipeline transactional writes at the KV layer

Consider an application which executes the following statements one-by-one (yes, this could be done more efficiently, but an ORM is likely to produce SQL like this):

```
BEGIN;
SELECT balance FROM accounts WHERE id = 10;
UPDATE accounts SET balance = 101 WHERE id = 10;
SELECT balance FROM accounts WHERE id = 11;
UPDATE accounts SET balance = 99 WHERE id = 11;
COMMIT;
```

This transaction is moving $1 from account `11` to account `10`. Our current SQL implementation imposes unnecessary latency on the `UPDATE` operations. Internally, each `UPDATE` is a select for the matching rows followed by a consistent (replicated) write of the new balance. After the write completes we return the number of rows updated. We suffer the latency of the consistent write even though lower layers of the system will serialize access to a particular key. Translating into KV operations this looks like:

```
Begin
Scan /accounts/10-/accounts/11
Put  /accounts/10 101
Scan /accounts/11-/accounts/12
Put  /accounts/11 99
Commit
```

Notice that we know the number of rows that will be updated after the Scan operation completes. We can return to the client at that point and asynchronously send the Put. When the next statement arrives, we don't have to wait for any pending mutations as the KV layer will serialize the operations [*] for a particular key. We would have to wait for the outstanding mutations to complete before performing the commit. The above would become:

```
Begin
Scan /accounts/10-/accounts/11
  -> Go(Put /accounts/10 101)
Scan /accounts/11-/accounts/12
  -> Go(Put /accounts/11 99)
WaitForMutations
Commit
```

There is a similar win to be had for `DELETE` operations which involve a Scan followed by a DelRange. `INSERT` is somewhat more complicated as it is translated into a `ConditionalPut` operation which is internally a read (on the leader) followed by a write. We could potentially return from the `ConditionalPut` operation as soon as the write is proposed, but we'd have to leave the operation in the CommandQueue until the write is applied. There are likely lots of dragons here, though perhaps the approach of allowing a transactional write operation to return as soon as it is proposed could handle all of the cases here. The `TxnCoordSender` would then have to have a facility for waiting for the transactional write to be applied before allowing the transaction to be committed. 

[*] While a replica will serialize operations for a particular key, multiple operations sent via separate `DistSenders` will not. We'd need to make sure that the operations sent for a transaction are somehow pipelined within the `DistSender/TxnCoordSender`. We wouldn't want a Scan operation to get reordered in front of a Put operation. 

Cc @tschottdorf, @bdarnell 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: pipeline transactional writes at the KV layer #16026

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

perf: pipeline transactional writes at the KV layer #16026

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions