kv: avoid thrashing and livelocking on multiple concurrent Range merge txns by nvb · Pull Request #59026 · cockroachdb/cockroach

nvb · 2021-01-15T00:57:04Z

This commit updates the Range merge transaction to perform a locking read on the left-hand side's range descriptor. Locking the descriptor early (i.e. on the read instead of the write of the read-modify-write operation) helps prevent multiple concurrent Range merges from thrashing. Thrashing is especially detrimental for Range merges because any restart results in an abort, so thrashing can result in indefinite livelock.

Before this change, I was easily able to reproduce the livelock discussed in #46639 by stressing TestKVNemesisMultiNode with the merge probability increased by 10x. When stressing on a 20 node roachprod cluster and using a 60s timeout, the test would typically fail after about 6,000 iterations. With this change and under the same setup, I have yet to see the failure over 125,000 iterations.

cockroach-teamcity · 2021-01-15T00:57:14Z

This change is

aayushshah15

thanks for updating the tech-note along with the change! I know this is blocked on the other PR, I'll get to that ASAP.

Reviewed 1 of 3 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)

docs/tech-notes/range-merges.md, line 252 at r3 (raw file):

As of v21.1, the merge transaction uses a locking read on the LHS range's
transaction record as its first course of action. This eliminates thrashing and
livelock in the face of concurrent merge attempts. It also has the effect of

Do you think it's worth pointing out that we only expect these concurrent merge attempts in testing scenarios (right?)

pkg/kv/kvserver/replica_command.go, line 2273 at r3 (raw file):

// The method allows callers to specify whether a locking read should be used or
// not. A locking read can be used to manage contention and avoid transaction
// restarts in a read-modify-write operation (which all users of this method

Do you think the part in parenthesis could easily rot? If not, we could consider making this method always issue locking gets and remove the forUpdate option.

…e txns Fixes cockroachdb#46639. This commit updates the Range merge transaction to perform a locking read on the left-hand side's range descriptor. Locking the descriptor early (i.e. on the read instead of the write of the read-modify-write operation) helps prevent multiple concurrent Range merges from thrashing. Thrashing is especially detrimental for Range merges because any restart results in an abort, so thrashing can result in indefinite livelock. Before this change, I was easily able to reproduce the livelock discussed in cockroachdb#46639 by stressing `TestKVNemesisMultiNode` with the merge probability increased by 10x. When stressing on a 20 node roachprod cluster and using a 60s timeout, the test would typically fail after about 6,000 iterations. With this change and under the same setup, I have yet to see the failure over 125,000 iterations.

nvb

TFTR!

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aayushshah15)

docs/tech-notes/range-merges.md, line 252 at r3 (raw file):

Previously, aayushshah15 (Aayush Shah) wrote…

Do you think it's worth pointing out that we only expect these concurrent merge attempts in testing scenarios (right?)

Done.

pkg/kv/kvserver/replica_command.go, line 2273 at r3 (raw file):

Do you think the part in parenthesis could easily rot?

This is a pretty specialized function, so I don't imagine it being used for other things.

If not, we could consider making this method always issue locking gets and remove the forUpdate option.

That's a good question. I actually had this written without the forUpdate option at first, but I decided not to change the behavior of all transactions that use it at the same time, especially when we're in the migration period where the recipient might not even respect the locking field in the GetRequest. I do think we will eventually want to switch over to that, though. I added a TODO.

nvb · 2021-02-01T01:56:27Z

bors r+

craig · 2021-02-01T02:26:01Z

Build succeeded:

GitHub CI (Cockroach)

nvb requested a review from andreimatei January 15, 2021 00:57

nvb force-pushed the nvanbenschoten/mergeLock2 branch from 48d6ac3 to 646f0d2 Compare January 15, 2021 03:52

nvb requested review from aayushshah15 and removed request for andreimatei January 26, 2021 17:58

nvb force-pushed the nvanbenschoten/mergeLock2 branch from 646f0d2 to cc05807 Compare January 27, 2021 01:02

aayushshah15 reviewed Jan 27, 2021

View reviewed changes

nvb force-pushed the nvanbenschoten/mergeLock2 branch from cc05807 to c3f328e Compare January 31, 2021 23:44

nvb commented Jan 31, 2021

View reviewed changes

craig bot merged commit d62f71d into cockroachdb:master Feb 1, 2021

nvb deleted the nvanbenschoten/mergeLock2 branch February 1, 2021 03:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: avoid thrashing and livelocking on multiple concurrent Range merge txns#59026

kv: avoid thrashing and livelocking on multiple concurrent Range merge txns#59026
craig[bot] merged 1 commit intocockroachdb:masterfrom
nvb:nvanbenschoten/mergeLock2

nvb commented Jan 15, 2021 •

edited

Loading

Uh oh!

cockroach-teamcity commented Jan 15, 2021

Uh oh!

aayushshah15 left a comment

Uh oh!

nvb left a comment

Uh oh!

nvb commented Feb 1, 2021

Uh oh!

craig bot commented Feb 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nvb commented Jan 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Jan 15, 2021

Uh oh!

aayushshah15 left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

nvb commented Feb 1, 2021

Uh oh!

craig bot commented Feb 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nvb commented Jan 15, 2021 •

edited

Loading