-
Notifications
You must be signed in to change notification settings - Fork 4.1k
kvserver: avoid or at least efficiently handle intent buildup #60585
Description
Describe the problem
We have received numerous instances in which large numbers of intents (hundreds of thousands and more) were able to build up on a range. While this is always theoretically possible - intent cleanup is async and best-effort, so an ill-timed crash can temporarily leave the intents behind - it appears that this is happening in scenarios where it is unexpected.
See for example (internal link): https://github.com/cockroachlabs/support/issues/817
The issue often leads to stuck backups because of #59704 (which we should independently fix).
To Reproduce
Reproducing these issues is part of the problem. So far we only strongly suspect that there are issues, but there is no concrete clue.
Expected behavior
Intents should only be able to build up when
Additional data / screenshots
Recap of the intent resolution behavior:
Intents are generally resolved when EndTxn(commit=true) has finished applying. We will then hit this code path:
cockroach/pkg/kv/kvserver/replica_write.go
Lines 188 to 192 in 218a5a3
| if err := r.store.intentResolver.CleanupTxnIntentsAsync( | |
| ctx, r.RangeID, propResult.EndTxns, true, /* allowSync */ | |
| ); err != nil { | |
| log.Warningf(ctx, "%v", err) | |
| } |
which calls into this:
cockroach/pkg/kv/kvserver/intentresolver/intent_resolver.go
Lines 545 to 559 in 23263a3
| for i := range endTxns { | |
| et := &endTxns[i] // copy for goroutine | |
| if err := ir.runAsyncTask(ctx, allowSyncProcessing, func(ctx context.Context) { | |
| locked, release := ir.lockInFlightTxnCleanup(ctx, et.Txn.ID) | |
| if !locked { | |
| return | |
| } | |
| defer release() | |
| if err := ir.cleanupFinishedTxnIntents( | |
| ctx, rangeID, et.Txn, et.Poison, nil, /* onComplete */ | |
| ); err != nil { | |
| if ir.every.ShouldLog() { | |
| log.Warningf(ctx, "failed to cleanup transaction intents: %v", err) | |
| } | |
| } |
One possibility is that the intent resolver erroneously skips large batches of txn intents because it is already tracking some smaller amount of work on behalf of the same txn, in which case we'd somehow end up in the !locked code path. In practice, this seems very unlikely, since we only invoke this method from the GC queue (which doesn't touch txns younger than 1h) and the EndTxn(commit=true) proposal result itself, of which there is only ever one.
This raises the question why lockInflightTxnCleanup even exists. I think it may either be premature optimization or related to the GC queue code path. The code hasn't changed much since 2017, i.e. it is very old.
I did check that the cleanup intents are properly scoped to a new background-derived context, i.e. we're not accidentally tying these cleanup attempts to the short-lived caller's context. We're also not introducing an overly aggressive timeout, actually to the best of my knowledge in this particular path there isn't a timeout (?) as runAsyncTask derives from context.Background() and we don't add a timeout anywhere.
Also, isn't this code wrong:
cockroach/pkg/internal/client/requestbatcher/batcher.go
Lines 355 to 366 in 8c5253b
| // Update the deadline for the batch if this requests's deadline is later | |
| // than the current latest. | |
| rDeadline, rHasDeadline := r.ctx.Deadline() | |
| // If this is the first request or | |
| if len(ba.reqs) == 0 || | |
| // there are already requests and there is a deadline and | |
| (len(ba.reqs) > 0 && !ba.sendDeadline.IsZero() && | |
| // this request either doesn't have a deadline or has a later deadline, | |
| (!rHasDeadline || rDeadline.After(ba.sendDeadline))) { | |
| // set the deadline to this request's deadline. | |
| ba.sendDeadline = rDeadline | |
| } |
It seems that it would erase ba.sendDeadline if len(ba.Reqs) > 0, ba.sendDeadline > 0, and !rHasDeadline. However, if anything, this would swallow a context cancellation rather than cause a premature one.
Another thing worth noting is that parallel commits prevents clients from seeing the backpressure. When EndTxn(commit=true) runs, it may feel backpressure. However, that request in parallel commits is completely disjoint from the SQL client as it is fired-and-forgotten here:
cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_committer.go
Lines 431 to 455 in ab0b87a
| // makeTxnCommitExplicitAsync launches an async task that attempts to move the | |
| // transaction from implicitly committed (STAGING status with all intents | |
| // written) to explicitly committed (COMMITTED status). It does so by sending a | |
| // second EndTxnRequest, this time with no InFlightWrites attached. | |
| func (tc *txnCommitter) makeTxnCommitExplicitAsync( | |
| ctx context.Context, txn *roachpb.Transaction, lockSpans []roachpb.Span, canFwdRTS bool, | |
| ) { | |
| // TODO(nvanbenschoten): consider adding tracing for this request. | |
| // TODO(nvanbenschoten): add a timeout to this request. | |
| // TODO(nvanbenschoten): consider making this semi-synchronous to | |
| // backpressure client writes when these start to slow down. This | |
| // would be similar to what we do for intent resolution. | |
| log.VEventf(ctx, 2, "making txn commit explicit: %s", txn) | |
| if err := tc.stopper.RunAsyncTask( | |
| context.Background(), "txnCommitter: making txn commit explicit", func(ctx context.Context) { | |
| tc.mu.Lock() | |
| defer tc.mu.Unlock() | |
| if err := makeTxnCommitExplicitLocked(ctx, tc.wrapped, txn, lockSpans, canFwdRTS); err != nil { | |
| log.Errorf(ctx, "making txn commit explicit failed for %s: %v", txn, err) | |
| } | |
| }, | |
| ); err != nil { | |
| log.VErrEventf(ctx, 1, "failed to make txn commit explicit: %v", err) | |
| } | |
| } |
Environment:
- CockroachDB 20.1.8 and above (probably also below)
Additional context
Once we have the separated lock table, it might be much cheaper to be reactive to the problem.
Epic: CRDB-2554