kvserver: avoid or at least efficiently handle intent buildup

**Describe the problem**

We have received numerous instances in which large numbers of intents (hundreds of thousands and more) were able to build up on a range. While this is always theoretically possible - intent cleanup is async and best-effort, so an ill-timed crash can temporarily leave the intents behind - it appears that this is happening in scenarios where it is unexpected.

See for example (internal link): https://github.com/cockroachlabs/support/issues/817

The issue often leads to stuck backups because of https://github.com/cockroachdb/cockroach/issues/59704 (which we should independently fix).

**To Reproduce**

Reproducing these issues is part of the problem. So far we only strongly suspect that there are issues, but there is no concrete clue.

**Expected behavior**
Intents should only be able to build up when 

**Additional data / screenshots**
Recap of the intent resolution behavior:

Intents are generally resolved when EndTxn(commit=true) has finished applying. We will then hit this code path:

https://github.com/cockroachdb/cockroach/blob/218a5a38dd1d3d6e33e17258391da8e94061cf96/pkg/kv/kvserver/replica_write.go#L188-L192

which calls into this:

https://github.com/cockroachdb/cockroach/blob/23263a3e19507d386ceeacc27fa3fa060a3a0815/pkg/kv/kvserver/intentresolver/intent_resolver.go#L545-L559

One possibility is that the intent resolver erroneously skips large batches of txn intents because it is already tracking some smaller amount of work on behalf of the same txn, in which case we'd somehow end up in the `!locked` code path. In practice, this seems very unlikely, since we only invoke this method from the GC queue (which doesn't touch txns younger than 1h) and the `EndTxn(commit=true)` proposal result itself, of which there is only ever one.

This raises the question why `lockInflightTxnCleanup` even exists. I think it may either be premature optimization or related to the GC queue code path. The code hasn't changed much since 2017, i.e. it is very old.

I did check that the cleanup intents are properly scoped to a new background-derived context, i.e. we're not accidentally tying these cleanup attempts to the short-lived caller's context. We're also not introducing an overly aggressive timeout, actually to the best of my knowledge in this particular path there isn't a timeout (?) as `runAsyncTask` derives from `context.Background()` and we don't add a timeout anywhere.

Also, isn't this code wrong:

https://github.com/cockroachdb/cockroach/blob/8c5253bbccea8966fe20e3cb8cc6111712f6f509/pkg/internal/client/requestbatcher/batcher.go#L355-L366

It seems that it would erase `ba.sendDeadline` if `len(ba.Reqs) > 0`, `ba.sendDeadline > 0`, and `!rHasDeadline`. However, if anything, this would swallow a context cancellation rather than cause a premature one.

Another thing worth noting is that parallel commits prevents clients from seeing the backpressure. When `EndTxn(commit=true)` runs, it may feel backpressure. However, that request in parallel commits is completely disjoint from the SQL client as it is fired-and-forgotten here:

https://github.com/cockroachdb/cockroach/blob/ab0b87adee4d505f1670596e2274c89d3231aefe/pkg/kv/kvclient/kvcoord/txn_interceptor_committer.go#L431-L455

**Environment:**
 - CockroachDB 20.1.8 and above (probably also below)

**Additional context**
Once we have the separated lock table, it might be much cheaper to be reactive to the problem.

Epic: CRDB-2554

	if err := r.store.intentResolver.CleanupTxnIntentsAsync(
	ctx, r.RangeID, propResult.EndTxns, true, /* allowSync */
	); err != nil {
	log.Warningf(ctx, "%v", err)
	}

	for i := range endTxns {
	et := &endTxns[i] // copy for goroutine
	if err := ir.runAsyncTask(ctx, allowSyncProcessing, func(ctx context.Context) {
	locked, release := ir.lockInFlightTxnCleanup(ctx, et.Txn.ID)
	if !locked {
	return
	}
	defer release()
	if err := ir.cleanupFinishedTxnIntents(
	ctx, rangeID, et.Txn, et.Poison, nil, /* onComplete */
	); err != nil {
	if ir.every.ShouldLog() {
	log.Warningf(ctx, "failed to cleanup transaction intents: %v", err)
	}
	}

	// Update the deadline for the batch if this requests's deadline is later
	// than the current latest.
	rDeadline, rHasDeadline := r.ctx.Deadline()
	// If this is the first request or
	if len(ba.reqs) == 0 \|\|
	// there are already requests and there is a deadline and
	(len(ba.reqs) > 0 && !ba.sendDeadline.IsZero() &&
	// this request either doesn't have a deadline or has a later deadline,
	(!rHasDeadline \|\| rDeadline.After(ba.sendDeadline))) {
	// set the deadline to this request's deadline.
	ba.sendDeadline = rDeadline
	}

	// makeTxnCommitExplicitAsync launches an async task that attempts to move the
	// transaction from implicitly committed (STAGING status with all intents
	// written) to explicitly committed (COMMITTED status). It does so by sending a
	// second EndTxnRequest, this time with no InFlightWrites attached.
	func (tc *txnCommitter) makeTxnCommitExplicitAsync(
	ctx context.Context, txn *roachpb.Transaction, lockSpans []roachpb.Span, canFwdRTS bool,
	) {
	// TODO(nvanbenschoten): consider adding tracing for this request.
	// TODO(nvanbenschoten): add a timeout to this request.
	// TODO(nvanbenschoten): consider making this semi-synchronous to
	// backpressure client writes when these start to slow down. This
	// would be similar to what we do for intent resolution.
	log.VEventf(ctx, 2, "making txn commit explicit: %s", txn)
	if err := tc.stopper.RunAsyncTask(
	context.Background(), "txnCommitter: making txn commit explicit", func(ctx context.Context) {
	tc.mu.Lock()
	defer tc.mu.Unlock()
	if err := makeTxnCommitExplicitLocked(ctx, tc.wrapped, txn, lockSpans, canFwdRTS); err != nil {
	log.Errorf(ctx, "making txn commit explicit failed for %s: %v", txn, err)
	}
	},
	); err != nil {
	log.VErrEventf(ctx, 1, "failed to make txn commit explicit: %v", err)
	}
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: avoid or at least efficiently handle intent buildup #60585

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

kvserver: avoid or at least efficiently handle intent buildup #60585

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions