kv: assorted client optimizations by andreimatei · Pull Request #30448 · cockroachdb/cockroach

andreimatei · 2018-09-20T03:30:41Z

See individual commits.

This PR increases throughput on a single node kv-95 workload as follows (measured on a 96-core GCE machine and running the client with concurrency=48).

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1257869        12578.5      2.8      3.3      5.5      7.1     48.2  read
  100.0s        0          66469          664.7     19.5     19.9     31.5     35.7     60.8  write

to

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1369178        13691.4      2.5      2.8      5.2      6.8    503.3  read
  100.0s        0          71850          718.5     18.3     18.9     28.3     33.6    520.1  write

The PR also takes away about 3/4 of the performance regression introduced in #28185. See #30208 for more.

cockroach-teamcity · 2018-09-20T03:30:48Z

This change is

andreimatei · 2018-09-20T03:31:08Z

cc @jordanlewis

jordanlewis · 2018-09-20T13:18:24Z

Nice! But where do you get the 3/4 number? The delta I observed before and after the linked PR was very high - the runs were variable, but up to about 50% (13k -> 20k). I can show you how to reproduce that if you like. It's true that it's hard to benchmark, but averages of repeated, interleaved runs don't lie.

andreimatei · 2018-09-20T16:21:36Z

Jordan, replied in #30208 (comment)

nvb

Reviewed 2 of 2 files at r1, 1 of 1 files at r2, 3 of 3 files at r3, 4 of 4 files at r4.
Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/internal/client/txn.go, line 520 at r2 (raw file):

	// ba.Add(endTxnReq(true /* commit */, txn.deadline(), txn.systemConfigTrigger))
	_, pErr := txn.Send(ctx, *ba)
	commitReqPool.Put(ba)

Is this safe? We should be clearing the ba before adding it to the sync.Pool, not after. Will that cause issues with raceTransport?

pkg/kv/txn_coord_sender.go, line 585 at r1 (raw file):

	defer tc.mu.Unlock()

	if ba.IsSingleEndTransactionRequest() && !tc.interceptorAlloc.txnHeartbeat.mu.everSentBeginTxn {

Does this need to be after maybeRejectClientLocked?

andreimatei

Removed the commit about the reuse of EndTxn requests until #30485 settles.

TPAL

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/kv/txn_coord_sender.go, line 585 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does this need to be after maybeRejectClientLocked?

I guess that's safer indeed. done

pkg/internal/client/txn.go, line 520 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this safe? We should be clearing the ba before adding it to the sync.Pool, not after. Will that cause issues with raceTransport?

reworked

nvb

Reviewed 1 of 9 files at r5, 3 of 3 files at r6, 3 of 3 files at r7, 2 of 4 files at r8.
Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/kv/txn_interceptor_metrics.go, line 99 at r7 (raw file):

	// it artificially (in the parallel execution queue).
	if restarts > 0 {
		m.metrics.Restarts.RecordValue(restarts)

Won't this mess up the histogram by skewing its distribution away from 0? How much of an effect does this have on performance?

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained

pkg/kv/txn_interceptor_metrics.go, line 99 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Won't this mess up the histogram by skewing its distribution away from 0? How much of an effect does this have on performance?

yes, it will skew the histogram. I've added a TODO to optimize the histogram code so we don't have to do this.

Here's some results with this PR backported on top of the commit that introduced the perf regression (69fa9f7). backport is the PR as is, record_all_backports is if we record the restarts for all txns. There appears to be a pretty significant difference.

Tuesday, 24 core
backport:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1801235        18012.2      1.9      1.5      4.5      6.0     37.7  read
  100.0s        0          94877          948.8     14.4     13.6     25.2     29.4     48.2  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1862335        18623.2      1.8      1.4      4.5      5.8     35.7  read
  100.0s        0          97708          977.1     13.9     13.6     24.1     28.3     48.2  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1881041        18810.0      1.8      1.4      4.5      5.8     41.9  read
  100.0s        0          98603          986.0     13.8     13.6     24.1     28.3     48.2  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1859887        18598.7      1.8      1.4      4.5      5.8     33.6  read
  100.0s        0          97774          977.7     14.0     13.6     24.1     29.4     50.3  write

backport-record_all_backports:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1818145        18181.3      1.9      1.5      4.5      5.8    436.2  read
  100.0s        0          95332          953.3     14.3     13.6     24.1     29.4    453.0  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1819280        18192.7      1.9      1.4      4.5      6.0     31.5  read
  100.0s        0          95799          958.0     14.5     14.2     25.2     30.4     50.3  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1791759        17917.4      1.9      1.5      4.5      6.0     48.2  read
  100.0s        0          94375          943.7     14.7     14.2     25.2     30.4     50.3  write

96 core
backport:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1670523        16705.1      2.1      1.7      4.7      6.0    130.0  read
  100.0s        0          88173          881.7     15.4     14.7     26.2     30.4    234.9  write

backport-record_all_backports:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1596959        15969.5      2.2      1.8      4.7      6.3    142.6  read
  100.0s        0          84326          843.3     16.0     15.7     27.3     31.5    226.5  write

nvb

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)

pkg/kv/txn_interceptor_metrics.go, line 99 at r7 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

yes, it will skew the histogram. I've added a TODO to optimize the histogram code so we don't have to do this.

Here's some results with this PR backported on top of the commit that introduced the perf regression (69fa9f7). backport is the PR as is, record_all_backports is if we record the restarts for all txns. There appears to be a pretty significant difference.

Tuesday, 24 core
backport:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1801235        18012.2      1.9      1.5      4.5      6.0     37.7  read
  100.0s        0          94877          948.8     14.4     13.6     25.2     29.4     48.2  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1862335        18623.2      1.8      1.4      4.5      5.8     35.7  read
  100.0s        0          97708          977.1     13.9     13.6     24.1     28.3     48.2  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1881041        18810.0      1.8      1.4      4.5      5.8     41.9  read
  100.0s        0          98603          986.0     13.8     13.6     24.1     28.3     48.2  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1859887        18598.7      1.8      1.4      4.5      5.8     33.6  read
  100.0s        0          97774          977.7     14.0     13.6     24.1     29.4     50.3  write

backport-record_all_backports:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1818145        18181.3      1.9      1.5      4.5      5.8    436.2  read
  100.0s        0          95332          953.3     14.3     13.6     24.1     29.4    453.0  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1819280        18192.7      1.9      1.4      4.5      6.0     31.5  read
  100.0s        0          95799          958.0     14.5     14.2     25.2     30.4     50.3  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1791759        17917.4      1.9      1.5      4.5      6.0     48.2  read
  100.0s        0          94375          943.7     14.7     14.2     25.2     30.4     50.3  write

96 core
backport:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1670523        16705.1      2.1      1.7      4.7      6.0    130.0  read
  100.0s        0          88173          881.7     15.4     14.7     26.2     30.4    234.9  write

backport-record_all_backports:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1596959        15969.5      2.2      1.8      4.7      6.3    142.6  read
  100.0s        0          84326          843.3     16.0     15.7     27.3     31.5    226.5  write

Thanks for taking the measurements. I don't want to hold this PR up any longer and I think I'm ok with an incorrect metric, but please open an issue to correct this historgram and also to explore optimizing other histograms that have a very common bucket.

petermattis

Reviewable status: complete! 1 of 0 LGTMs obtained

pkg/kv/txn_interceptor_metrics.go, line 99 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Thanks for taking the measurements. I don't want to hold this PR up any longer and I think I'm ok with an incorrect metric, but please open an issue to correct this historgram and also to explore optimizing other histograms that have a very common bucket.

Is the txn.restarts histogram even used?

nvb

Reviewable status: complete! 1 of 0 LGTMs obtained

pkg/kv/txn_interceptor_metrics.go, line 99 at r7 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Is the txn.restarts histogram even used?

To my knowledge, it's not used by the admin ui in any of the dashboards. However, I have found it useful as a custom graph. I don't know any way to track the frequency of txn restarts with respect to a single txn, as opposed to aggregated across all txns.

This is a performance optimization that we used to have but went away in a refactor. Read-only txns don't need to send an EndTransaction to the server since there's not txn record. Their EndTransaction was elided by the txnHeartbeatInterceptor. This patch moves the short-circuiting higher, to the TxnCoordSender. This avoids allocating a result and other tidbits of code. This makes a difference on a kv 95% read workload. Depending on the machine and the dice roll, it's about 10% throughput increase. Touches cockroachdb#30208 Release note: None

Use Counter instead of CounterWithRate and such. Makes a difference on a single-node read-heavy workload because some lock contention was softened. Also eliminates some uses of PhysicalTime and starts measuring durations even for read-only txns. Fixes cockroachdb#28424 Release note: None

It was unused since all of the users switched to Counter in a previous commit. Even before that, the "rate" part was unused. Release note: None

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)

pkg/kv/txn_interceptor_metrics.go, line 99 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

To my knowledge, it's not used by the admin ui in any of the dashboards. However, I have found it useful as a custom graph. I don't know any way to track the frequency of txn restarts with respect to a single txn, as opposed to aggregated across all txns.

I've opened #30644 and referenced it in the code.

andreimatei

bors r+

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale)

craig · 2018-09-25T18:37:10Z

👎 Rejected by code reviews

andreimatei · 2018-09-25T18:46:01Z

bors r+

30448: kv: assorted client optimizations r=andreimatei a=andreimatei See individual commits. This PR increases throughput on a single node kv-95 workload as follows (measured on a 96-core GCE machine and running the client with concurrency=48). ``` _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 100.0s 0 1257869 12578.5 2.8 3.3 5.5 7.1 48.2 read 100.0s 0 66469 664.7 19.5 19.9 31.5 35.7 60.8 write ``` to ``` _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 100.0s 0 1369178 13691.4 2.5 2.8 5.2 6.8 503.3 read 100.0s 0 71850 718.5 18.3 18.9 28.3 33.6 520.1 write ``` The PR also takes away about 3/4 of the performance regression introduced in #28185. See #30208 for more. Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>

craig · 2018-09-25T19:00:57Z

Build succeeded

GitHub CI (Cockroach)

andreimatei requested a review from a team September 20, 2018 03:30

andreimatei assigned nvb Sep 20, 2018

andreimatei mentioned this pull request Sep 20, 2018

kv: major performance regression in kv workload on single node GCE after txn.go changes #30208

Closed

andreimatei force-pushed the elide-ro-commit branch 2 times, most recently from f8dfded to 4e441d4 Compare September 20, 2018 17:58

nvb suggested changes Sep 20, 2018

View reviewed changes

andreimatei mentioned this pull request Sep 20, 2018

kv: clarify the Sender contract and pre-allocate commit batches #30485

Merged

andreimatei force-pushed the elide-ro-commit branch 3 times, most recently from 3a91ce8 to 0df44d4 Compare September 24, 2018 19:07

andreimatei commented Sep 24, 2018

View reviewed changes

andreimatei force-pushed the elide-ro-commit branch 2 times, most recently from 49d2329 to 5f2700c Compare September 24, 2018 23:42

andreimatei requested a review from a team September 24, 2018 23:42

nvb suggested changes Sep 25, 2018

View reviewed changes

andreimatei force-pushed the elide-ro-commit branch from 5f2700c to a98cfe8 Compare September 25, 2018 17:34

andreimatei commented Sep 25, 2018

View reviewed changes

nvb approved these changes Sep 25, 2018

View reviewed changes

petermattis reviewed Sep 25, 2018

View reviewed changes

nvb suggested changes Sep 25, 2018

View reviewed changes

andreimatei mentioned this pull request Sep 25, 2018

util/metric: histograms and other metrics are too damn expensive #30644

Closed

andreimatei added 3 commits September 25, 2018 14:34

util/metric: remove CounterWithRates

feb135f

It was unused since all of the users switched to Counter in a previous commit. Even before that, the "rate" part was unused. Release note: None

andreimatei force-pushed the elide-ro-commit branch from a98cfe8 to feb135f Compare September 25, 2018 18:35

andreimatei commented Sep 25, 2018

View reviewed changes

nvb approved these changes Sep 25, 2018

View reviewed changes

andreimatei mentioned this pull request Sep 25, 2018

release-2.1: kv: assorted client optimizations #30645

Merged

craig bot merged commit feb135f into cockroachdb:master Sep 25, 2018

andreimatei deleted the elide-ro-commit branch September 25, 2018 20:00

andreimatei mentioned this pull request Oct 3, 2018

metrics: txn duration metric is broken #28424

Closed

Conversation

andreimatei commented Sep 20, 2018

Uh oh!

cockroach-teamcity commented Sep 20, 2018

Uh oh!

andreimatei commented Sep 20, 2018

Uh oh!

jordanlewis commented Sep 20, 2018

Uh oh!

andreimatei commented Sep 20, 2018

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

petermattis left a comment

Choose a reason for hiding this comment

Uh oh!

nvb left a comment

Choose a reason for hiding this comment

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

craig bot commented Sep 25, 2018

Uh oh!

andreimatei commented Sep 25, 2018

Uh oh!

craig bot commented Sep 25, 2018

Build succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants