Skip to content

kv: assorted client optimizations#30448

Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom
andreimatei:elide-ro-commit
Sep 25, 2018
Merged

kv: assorted client optimizations#30448
craig[bot] merged 3 commits intocockroachdb:masterfrom
andreimatei:elide-ro-commit

Conversation

@andreimatei
Copy link
Copy Markdown
Contributor

See individual commits.

This PR increases throughput on a single node kv-95 workload as follows (measured on a 96-core GCE machine and running the client with concurrency=48).

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1257869        12578.5      2.8      3.3      5.5      7.1     48.2  read
  100.0s        0          66469          664.7     19.5     19.9     31.5     35.7     60.8  write

to

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1369178        13691.4      2.5      2.8      5.2      6.8    503.3  read
  100.0s        0          71850          718.5     18.3     18.9     28.3     33.6    520.1  write

The PR also takes away about 3/4 of the performance regression introduced in #28185. See #30208 for more.

@andreimatei andreimatei requested a review from a team September 20, 2018 03:30
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@andreimatei
Copy link
Copy Markdown
Contributor Author

cc @jordanlewis

@jordanlewis
Copy link
Copy Markdown
Member

Nice! But where do you get the 3/4 number? The delta I observed before and after the linked PR was very high - the runs were variable, but up to about 50% (13k -> 20k). I can show you how to reproduce that if you like. It's true that it's hard to benchmark, but averages of repeated, interleaved runs don't lie.

@andreimatei
Copy link
Copy Markdown
Contributor Author

Jordan, replied in #30208 (comment)

@andreimatei andreimatei force-pushed the elide-ro-commit branch 2 times, most recently from f8dfded to 4e441d4 Compare September 20, 2018 17:58
Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 2 files at r1, 1 of 1 files at r2, 3 of 3 files at r3, 4 of 4 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/internal/client/txn.go, line 520 at r2 (raw file):

	// ba.Add(endTxnReq(true /* commit */, txn.deadline(), txn.systemConfigTrigger))
	_, pErr := txn.Send(ctx, *ba)
	commitReqPool.Put(ba)

Is this safe? We should be clearing the ba before adding it to the sync.Pool, not after. Will that cause issues with raceTransport?


pkg/kv/txn_coord_sender.go, line 585 at r1 (raw file):

	defer tc.mu.Unlock()

	if ba.IsSingleEndTransactionRequest() && !tc.interceptorAlloc.txnHeartbeat.mu.everSentBeginTxn {

Does this need to be after maybeRejectClientLocked?

Copy link
Copy Markdown
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the commit about the reuse of EndTxn requests until #30485 settles.

TPAL

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/kv/txn_coord_sender.go, line 585 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Does this need to be after maybeRejectClientLocked?

I guess that's safer indeed. done


pkg/internal/client/txn.go, line 520 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is this safe? We should be clearing the ba before adding it to the sync.Pool, not after. Will that cause issues with raceTransport?

reworked

@andreimatei andreimatei force-pushed the elide-ro-commit branch 2 times, most recently from 49d2329 to 5f2700c Compare September 24, 2018 23:42
@andreimatei andreimatei requested a review from a team September 24, 2018 23:42
Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 9 files at r5, 3 of 3 files at r6, 3 of 3 files at r7, 2 of 4 files at r8.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/kv/txn_interceptor_metrics.go, line 99 at r7 (raw file):

	// it artificially (in the parallel execution queue).
	if restarts > 0 {
		m.metrics.Restarts.RecordValue(restarts)

Won't this mess up the histogram by skewing its distribution away from 0? How much of an effect does this have on performance?

Copy link
Copy Markdown
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained


pkg/kv/txn_interceptor_metrics.go, line 99 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Won't this mess up the histogram by skewing its distribution away from 0? How much of an effect does this have on performance?

yes, it will skew the histogram. I've added a TODO to optimize the histogram code so we don't have to do this.

Here's some results with this PR backported on top of the commit that introduced the perf regression (69fa9f7). backport is the PR as is, record_all_backports is if we record the restarts for all txns. There appears to be a pretty significant difference.

Tuesday, 24 core
backport:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1801235        18012.2      1.9      1.5      4.5      6.0     37.7  read
  100.0s        0          94877          948.8     14.4     13.6     25.2     29.4     48.2  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1862335        18623.2      1.8      1.4      4.5      5.8     35.7  read
  100.0s        0          97708          977.1     13.9     13.6     24.1     28.3     48.2  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1881041        18810.0      1.8      1.4      4.5      5.8     41.9  read
  100.0s        0          98603          986.0     13.8     13.6     24.1     28.3     48.2  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1859887        18598.7      1.8      1.4      4.5      5.8     33.6  read
  100.0s        0          97774          977.7     14.0     13.6     24.1     29.4     50.3  write

backport-record_all_backports:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1818145        18181.3      1.9      1.5      4.5      5.8    436.2  read
  100.0s        0          95332          953.3     14.3     13.6     24.1     29.4    453.0  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1819280        18192.7      1.9      1.4      4.5      6.0     31.5  read
  100.0s        0          95799          958.0     14.5     14.2     25.2     30.4     50.3  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1791759        17917.4      1.9      1.5      4.5      6.0     48.2  read
  100.0s        0          94375          943.7     14.7     14.2     25.2     30.4     50.3  write

96 core
backport:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1670523        16705.1      2.1      1.7      4.7      6.0    130.0  read
  100.0s        0          88173          881.7     15.4     14.7     26.2     30.4    234.9  write

backport-record_all_backports:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1596959        15969.5      2.2      1.8      4.7      6.3    142.6  read
  100.0s        0          84326          843.3     16.0     15.7     27.3     31.5    226.5  write

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/kv/txn_interceptor_metrics.go, line 99 at r7 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

yes, it will skew the histogram. I've added a TODO to optimize the histogram code so we don't have to do this.

Here's some results with this PR backported on top of the commit that introduced the perf regression (69fa9f7). backport is the PR as is, record_all_backports is if we record the restarts for all txns. There appears to be a pretty significant difference.

Tuesday, 24 core
backport:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1801235        18012.2      1.9      1.5      4.5      6.0     37.7  read
  100.0s        0          94877          948.8     14.4     13.6     25.2     29.4     48.2  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1862335        18623.2      1.8      1.4      4.5      5.8     35.7  read
  100.0s        0          97708          977.1     13.9     13.6     24.1     28.3     48.2  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1881041        18810.0      1.8      1.4      4.5      5.8     41.9  read
  100.0s        0          98603          986.0     13.8     13.6     24.1     28.3     48.2  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1859887        18598.7      1.8      1.4      4.5      5.8     33.6  read
  100.0s        0          97774          977.7     14.0     13.6     24.1     29.4     50.3  write

backport-record_all_backports:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1818145        18181.3      1.9      1.5      4.5      5.8    436.2  read
  100.0s        0          95332          953.3     14.3     13.6     24.1     29.4    453.0  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1819280        18192.7      1.9      1.4      4.5      6.0     31.5  read
  100.0s        0          95799          958.0     14.5     14.2     25.2     30.4     50.3  write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1791759        17917.4      1.9      1.5      4.5      6.0     48.2  read
  100.0s        0          94375          943.7     14.7     14.2     25.2     30.4     50.3  write

96 core
backport:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1670523        16705.1      2.1      1.7      4.7      6.0    130.0  read
  100.0s        0          88173          881.7     15.4     14.7     26.2     30.4    234.9  write

backport-record_all_backports:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1596959        15969.5      2.2      1.8      4.7      6.3    142.6  read
  100.0s        0          84326          843.3     16.0     15.7     27.3     31.5    226.5  write

Thanks for taking the measurements. I don't want to hold this PR up any longer and I think I'm ok with an incorrect metric, but please open an issue to correct this historgram and also to explore optimizing other histograms that have a very common bucket.

Copy link
Copy Markdown
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/kv/txn_interceptor_metrics.go, line 99 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Thanks for taking the measurements. I don't want to hold this PR up any longer and I think I'm ok with an incorrect metric, but please open an issue to correct this historgram and also to explore optimizing other histograms that have a very common bucket.

Is the txn.restarts histogram even used?

Copy link
Copy Markdown
Contributor

@nvb nvb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/kv/txn_interceptor_metrics.go, line 99 at r7 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Is the txn.restarts histogram even used?

To my knowledge, it's not used by the admin ui in any of the dashboards. However, I have found it useful as a custom graph. I don't know any way to track the frequency of txn restarts with respect to a single txn, as opposed to aggregated across all txns.

This is a performance optimization that we used to have but went away in
a refactor. Read-only txns don't need to send an EndTransaction to the
server since there's not txn record. Their EndTransaction was elided by
the txnHeartbeatInterceptor. This patch moves the short-circuiting
higher, to the TxnCoordSender. This avoids allocating a result and other
tidbits of code.

This makes a difference on a kv 95% read workload. Depending on the
machine and the dice roll, it's about 10% throughput increase.

Touches cockroachdb#30208

Release note: None
Use Counter instead of CounterWithRate and such. Makes a difference on
a single-node read-heavy workload because some lock contention was
softened.
Also eliminates some uses of PhysicalTime and starts measuring durations
even for read-only txns.

Fixes cockroachdb#28424

Release note: None
It was unused since all of the users switched to Counter in a previous
commit. Even before that, the "rate" part was unused.

Release note: None
Copy link
Copy Markdown
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)


pkg/kv/txn_interceptor_metrics.go, line 99 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

To my knowledge, it's not used by the admin ui in any of the dashboards. However, I have found it useful as a custom graph. I don't know any way to track the frequency of txn restarts with respect to a single txn, as opposed to aggregated across all txns.

I've opened #30644 and referenced it in the code.

Copy link
Copy Markdown
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 25, 2018

👎 Rejected by code reviews

@andreimatei
Copy link
Copy Markdown
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request Sep 25, 2018
30448: kv: assorted client optimizations r=andreimatei a=andreimatei

See individual commits.

This PR increases throughput on a single node kv-95 workload as follows (measured on a 96-core GCE machine and running the client with concurrency=48).
```
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1257869        12578.5      2.8      3.3      5.5      7.1     48.2  read
  100.0s        0          66469          664.7     19.5     19.9     31.5     35.7     60.8  write
```
to
```
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  100.0s        0        1369178        13691.4      2.5      2.8      5.2      6.8    503.3  read
  100.0s        0          71850          718.5     18.3     18.9     28.3     33.6    520.1  write
```

The PR also takes away about 3/4 of the performance regression introduced in #28185. See #30208 for more.

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Sep 25, 2018

Build succeeded

@craig craig bot merged commit feb135f into cockroachdb:master Sep 25, 2018
@andreimatei andreimatei deleted the elide-ro-commit branch September 25, 2018 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants