-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kv: divergent ideas about whether a txn is "read-only" between the Txn and the txnIntentCollector #28256
Description
The intent collector interceptor complains if it receives an EndTransaction without it having collected any intents prior. It returns an error saying that it's a "readonly txn" and the EndTxn should have been elided above it. That error is supposed to signal bugs.
| return nil, roachpb.NewErrorf("cannot commit a read-only transaction") |
Unfortunately, that assumption is not quite copacetic with what the layers above it do. There's currently two problems:
- If the txn ever encountered an error, the
Txnmoves to thetxnErrorerror, at which point it forgets if it ever performed any writes (i.e. if it came from statetxnReadonlyortxnWriting, etc..), and so it will never elide futureEndTransactions(rollbacks). - If the
BeginTxnbatch is rejected below theTxnlayer, in theTxnCoordSender, then theclient.Txnconsiders the transaction to be writing (cause it sent aBeginTxn), but the intent collector considers it read-only (cause it never saw saidBeginTxn). This happens for example if theStopperis stopped when theBeginTransactionis sent, and theTCSfails to start the heartbeat loop and rejects the batch.
The first problem goes away in #28185 because that PR brakes apart the different txn states and correctly tracks if a BeginTxn was ever sent.
The second problem is more fundamental, caused by the separate tracking of the BeginTxn done in both the TCS and the intent collector.
It's not very clear to me what to do about it. We could try to either share the "did we send a BeginTxn" state. This is a bit complicated by the fact that, in #28185, the BeginTxn tracking is not done by the TCS directly, but by the heartbeat interceptor. So we'd need to create a communication channel between two interceptors. I think what we'd want is for the txnLockGatekeeper to keep track of whether a BeginTxn is really sent to the server, and for the heartbeat interceptor to use that to dictate whether EndTransaction can be elided. Then any interceptor anywhere in the stack can retain the right to reject batches (as they tend to do already) and the intent collector can be left alone with the current assumptions - that if it sees and EndTxn it must have accumulated some intents.
Alternatively, the intent collector could get its own logic for eliding the end transaction, duplicating the existing one.
Yet alternatively, the intent collector could stop assuming anything and conservatively forward unnecessary EndTxns.
@nvanbenschoten for thoughts, if any.
Thanks @jordanlewis for seeing something and saying something.
The error can be demonstrated with
func TestXXX(t *testing.T) {
defer leaktest.AfterTest(t)()
s, _, db := serverutils.StartServer(t, base.TestServerArgs{})
ctx := context.Background()
s.Stopper().Stop(ctx)
err := db.Txn(ctx, func(ctx context.Context, txn *client.Txn) error {
key := roachpb.Key("a")
return txn.Put(ctx, key, "val")
})
if err != nil {
t.Fatal(err)
}
}