kv: split EndTxn into sub-batch on auto-retry after successful refresh by nvb · Pull Request #52885 · cockroachdb/cockroach

nvb · 2020-08-17T00:25:46Z

This commit updates the txnSpanRefresher to split off EndTxn requests into their own partial batches on auto-retries after successful refreshes as a means of preventing starvation. This avoids starvation in two ways. First, it helps ensure that we lay down intents if any of the other requests in the batch are writes. Second, it ensures that if any writes are getting pushed due to contention with reads or due to the closed timestamp, they will still succeed and allow the batch to make forward progress. Without this, each retry attempt may get pushed because of writes in the batch and then rejected wholesale when the EndTxn tries to evaluate the pushed batch. When split, the writes will be pushed but succeed, the transaction will be refreshed, and the EndTxn will succeed.

I still need to confirm that this fixes this indefinite stall here, but I suspect that it will.

Release note (bug fix): A change in v20.1 caused a certain class of bulk UPDATEs and DELETE statements to hang indefinitely if run in an implicit transaction. We now break up these statements to avoid starvation and prevent them from hanging indefinitely.

cockroach-teamcity · 2020-08-17T00:25:56Z

This change is

nvb · 2020-08-17T02:12:07Z

I still need to confirm that this fixes this indefinite stall here, but I suspect that it will.

I just confirmed that this resolves the hanging implicit txn in https://github.com/cockroachlabs/misc_projects_glenn/tree/master/rw_blockage#implicit-query-hangs--explict-query-works. Without this change, the query hangs indefinitely and I see txn.refresh.success grow without bound. I then kill the query (at 02:04:20), upgrade the binary to include this change, and try again. The implicit txn then completes in 8 seconds.

cc. @glennfawcett

andreimatei

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)

pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go, line 380 at r3 (raw file):

	}

	// We've refreshed all of the read spans successfully and bumped

move this comment up

pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go, line 405 at r3 (raw file):

	// Issue a batch up to but not including the EndTxn request.
	etIdx := len(ba.Requests) - 1

put a Eventf around here talking about the split pls

nvb

TFTR!

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)

pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go, line 380 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

move this comment up

Done.

pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go, line 405 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

put a Eventf around here talking about the split pls

Good idea. Done.

Fixes cockroachdb#51294. First two commits from cockroachdb#52884. This commit updates the txnSpanRefresher to split off EndTxn requests into their own partial batches on auto-retries after successful refreshes as a means of preventing starvation. This avoids starvation in two ways. First, it helps ensure that we lay down intents if any of the other requests in the batch are writes. Second, it ensures that if any writes are getting pushed due to contention with reads or due to the closed timestamp, they will still succeed and allow the batch to make forward progress. Without this, each retry attempt may get pushed because of writes in the batch and then rejected wholesale when the EndTxn tries to evaluate the pushed batch. When split, the writes will be pushed but succeed, the transaction will be refreshed, and the EndTxn will succeed. I still need to confirm that this fixes this indefinite stall [here](https://github.com/cockroachlabs/misc_projects_glenn/tree/master/rw_blockage#implicit-query-hangs--explict-query-works), but I suspect that it will. Release note (bug fix): A change in v20.1 caused a certain class of bulk UPDATEs and DELETE statements to hang indefinitely if run in an implicit transaction. We now break up these statements to avoid starvation and prevent them from hanging indefinitely.

nvb · 2020-08-21T04:48:39Z

bors r+

craig · 2020-08-21T06:20:11Z

Build succeeded:

GitHub CI (Cockroach)

Fixes cockroachdb#53249. The race was caused by the reintroduction of DeprecatedCanCommitAtHigherTimestamp in cockroachdb#53220, mixed with the new splitEndTxnAndRetrySend logic introduced in cockroachdb#52885.

53275: kv: avoid race in txnSpanRefresher.SendLocked r=nvanbenschoten a=nvanbenschoten Fixes #53249. The race was caused by the reintroduction of DeprecatedCanCommitAtHigherTimestamp in #53220, mixed with the new splitEndTxnAndRetrySend logic introduced in #52885. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

nvb requested a review from andreimatei August 17, 2020 00:25

nvb force-pushed the nvanbenschoten/refreshSplitEndTxn branch from 3c300ae to c373a77 Compare August 17, 2020 01:38

andreimatei approved these changes Aug 20, 2020

View reviewed changes

nvb force-pushed the nvanbenschoten/refreshSplitEndTxn branch from c373a77 to 37b5dd2 Compare August 21, 2020 01:50

nvb commented Aug 21, 2020

View reviewed changes

nvb force-pushed the nvanbenschoten/refreshSplitEndTxn branch from 37b5dd2 to cc233c5 Compare August 21, 2020 04:47

craig bot merged commit 95a13a8 into cockroachdb:master Aug 21, 2020

ajwerner mentioned this pull request Aug 21, 2020

roachtest: acceptance/version-upgrade failed #53198

Closed

nvb deleted the nvanbenschoten/refreshSplitEndTxn branch August 22, 2020 03:24

nvb mentioned this pull request Aug 22, 2020

kv: avoid race in txnSpanRefresher.SendLocked #53275

Merged

This was referenced Aug 27, 2020

release-20.1: kv: split EndTxn into sub-batch on auto-retry after successful refresh #53561

Merged

roachtest: tpccbench/nodes=6/cpu=16/multi-az failed #53469

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: split EndTxn into sub-batch on auto-retry after successful refresh#52885

kv: split EndTxn into sub-batch on auto-retry after successful refresh#52885
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/refreshSplitEndTxn

nvb commented Aug 17, 2020 •

edited

Loading

Uh oh!

cockroach-teamcity commented Aug 17, 2020

Uh oh!

nvb commented Aug 17, 2020

Uh oh!

andreimatei left a comment

Uh oh!

nvb left a comment

Uh oh!

nvb commented Aug 21, 2020

Uh oh!

craig bot commented Aug 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nvb commented Aug 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Aug 17, 2020

Uh oh!

nvb commented Aug 17, 2020

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

nvb commented Aug 21, 2020

Uh oh!

craig bot commented Aug 21, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nvb commented Aug 17, 2020 •

edited

Loading