-
Notifications
You must be signed in to change notification settings - Fork 4.1k
storage: 1PC transactions can be applied twice #10023
Description
When a 1PC transaction is retried due to a network failure, the second attempt may return an error even if the first attempt succeeded (and the successful result was masked by the network failure). The most likely error to be seen in this case is WriteTooOldError (caused by the transaction seeing its own past write, which by now has been committed, resolved, and scrubbed of its transaction ID), which is a transactionRestartError. Upon restart, the transaction gets a new timestamp and may perform different writes, which may succeed on a subsequent attempt, leading to the same statement applying twice.
This is the more insidious cousin of #6053 and #7604. Those issues are about the failure leaking back to the client; this one is about the fact that if the operation is retried in some situations, it could succeed.
Here is one concrete case in which the error can occur:
- A SQL table has a primary key whose default value is auto-generated (e.g. the default
rowidcolumn) and has no secondary indexes - An insert to this table is performed as a single auto-committed statement (allowing the 1PC optimization)
- The network fails after the write is proposed. DistSender gets an RPC error and retries.
- The initial write succeeds, but there is no one listening so its response goes nowhere. It's a 1PC transaction so it cleans up its intent synchronously.
- The retried write fails with a WriteTooOldError
- This error is raised to client.Txn, which is running in AutoRetry mode, so it retries this transactionRestartError. (note that AutoRetry is not the problem here - if the error continued on to the client, the client's proper response to a WriteTooOldError is to retry)
- This runs the SQL transaction closure, which generates a new row id, avoids the conflict, and commits.
The response cache (removed in #3077 because it was too expensive and retries of non-transactional requests were deemed less important. The 1PC optimization makes its requests effectively non-transactional at the KV layer) handled this by remembering the response so the retry would get the same response.
A few possible solutions that don't require removing the 1PC optimization or bringing back the response cache:
- Distinguish between RPC errors that definitely won't succeed (e.g. connection refused) and those where the outcome is uncertain. When a 1PC transaction gets the latter error, return it to the client as an ambiguous result (ambiguous results are always possible in the postgres protocol if a network failure occurs during a commit)
- Remember all committed transaction ids for a time in something like the abort cache. This would be lighter than the full response cache and would only allow us to tell that this scenario has occurred and return a distinct error code for it, but not recover what exactly happened on the earlier attempt.
- At the SQL layer, perform a separate query to figure out what happened. This requires specific knowledge of the behavior of each type of query that can benefit from the 1PC optimization.