Skip to content

kv: ranged intent resolution is not batched, regresses performance on TPC-C #46752

@nvb

Description

@nvb

Over the past few weeks, we've been tracking a performance regression in TPC-C of 4-5% that occurs when implicit SELECT FOR UPDATE is enabled (see enable_implicit_select_for_update).

One thing that became clear early in the investigation was that the total number of batches sent in the system rose by about 25% when running TPC-C with implicit SELECT FOR UPDATE enabled. This was surprising, as implicit SELECT FOR UPDATE is not supposed to cause the system to perform any extra work - we built it to only activate opportunistically when a scan was already going to be visiting a leaseholder to perform the row-fetch for an upcoming mutation. Using #46747, I was able to track down this increase in RPCs to a switch from ResolveIntent requests to ResolveIntentRange requests when implicit SELECT FOR UPDATE is in use. Remember that because SQL only currently issues ranged Scan requests and never point Get requests, implicit SFU is always acquiring ranged upgrade locks that overlap the point exclusive locks acquired by the following mutation, so we must release these with ResolveIntentRange requests.

The following graph demonstrates this effect:

Screen Shot 2020-03-30 at 5 11 56 PM

The graph shows two runs of TPC-C, first with sql.defaults.implicit_select_for_update.enabled set to true (the default) and second with sql.defaults.implicit_select_for_update.enabled set to false. The top graph shows what we had already been seeing - with implicit SFU, we issue about 25% more batches (21k vs. 16k). The bottom graph is more interesting. It shows that with implicit SFU, we issue about 11k ResolveIntent requests and about 5k ResolveIntentRange requests. Without implicit SFU, we issue about 20k ResolveIntent requests and no ResolveIntentRange requests.

So aren't we issuing more batches without implicit SFU? The answer is no, because ResolveIntent requests are batched together by the IntentResolver. So even though we actually issue more requests, these requests are issued in batches of 100. However, ResolveIntentRange requests are not batched together (see below), so each request is a separate batch. So these 5k ResolveIntentRange requests are exactly the missing 5k extra RPCs that we see with implicit SFU enabled.

// Resolve spans differently. We don't know how many intents will be
// swept up with each request, so we limit the spanning resolve
// requests to a maximum number of keys and resume as necessary.

This explains the modest regression in TPC-C performance and indicates that if we can lift this restriction on intent resolution batching, we should be able to close the gap.

Metadata

Metadata

Assignees

Labels

A-kv-transactionsRelating to MVCC and the transactional model.C-performancePerf of queries or internals. Solution not expected to change functional behavior.regressionRegression from a release.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions