kv: assorted client optimizations#30448
Conversation
|
cc @jordanlewis |
|
Nice! But where do you get the 3/4 number? The delta I observed before and after the linked PR was very high - the runs were variable, but up to about 50% (13k -> 20k). I can show you how to reproduce that if you like. It's true that it's hard to benchmark, but averages of repeated, interleaved runs don't lie. |
|
Jordan, replied in #30208 (comment) |
f8dfded to
4e441d4
Compare
nvb
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r1, 1 of 1 files at r2, 3 of 3 files at r3, 4 of 4 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/internal/client/txn.go, line 520 at r2 (raw file):
// ba.Add(endTxnReq(true /* commit */, txn.deadline(), txn.systemConfigTrigger)) _, pErr := txn.Send(ctx, *ba) commitReqPool.Put(ba)
Is this safe? We should be clearing the ba before adding it to the sync.Pool, not after. Will that cause issues with raceTransport?
pkg/kv/txn_coord_sender.go, line 585 at r1 (raw file):
defer tc.mu.Unlock() if ba.IsSingleEndTransactionRequest() && !tc.interceptorAlloc.txnHeartbeat.mu.everSentBeginTxn {
Does this need to be after maybeRejectClientLocked?
3a91ce8 to
0df44d4
Compare
andreimatei
left a comment
There was a problem hiding this comment.
Removed the commit about the reuse of EndTxn requests until #30485 settles.
TPAL
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/kv/txn_coord_sender.go, line 585 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Does this need to be after
maybeRejectClientLocked?
I guess that's safer indeed. done
pkg/internal/client/txn.go, line 520 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Is this safe? We should be clearing the
babefore adding it to thesync.Pool, not after. Will that cause issues withraceTransport?
reworked
49d2329 to
5f2700c
Compare
nvb
left a comment
There was a problem hiding this comment.
Reviewed 1 of 9 files at r5, 3 of 3 files at r6, 3 of 3 files at r7, 2 of 4 files at r8.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/kv/txn_interceptor_metrics.go, line 99 at r7 (raw file):
// it artificially (in the parallel execution queue). if restarts > 0 { m.metrics.Restarts.RecordValue(restarts)
Won't this mess up the histogram by skewing its distribution away from 0? How much of an effect does this have on performance?
5f2700c to
a98cfe8
Compare
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/kv/txn_interceptor_metrics.go, line 99 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Won't this mess up the histogram by skewing its distribution away from 0? How much of an effect does this have on performance?
yes, it will skew the histogram. I've added a TODO to optimize the histogram code so we don't have to do this.
Here's some results with this PR backported on top of the commit that introduced the perf regression (69fa9f7). backport is the PR as is, record_all_backports is if we record the restarts for all txns. There appears to be a pretty significant difference.
Tuesday, 24 core
backport:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
100.0s 0 1801235 18012.2 1.9 1.5 4.5 6.0 37.7 read
100.0s 0 94877 948.8 14.4 13.6 25.2 29.4 48.2 write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
100.0s 0 1862335 18623.2 1.8 1.4 4.5 5.8 35.7 read
100.0s 0 97708 977.1 13.9 13.6 24.1 28.3 48.2 write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
100.0s 0 1881041 18810.0 1.8 1.4 4.5 5.8 41.9 read
100.0s 0 98603 986.0 13.8 13.6 24.1 28.3 48.2 write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
100.0s 0 1859887 18598.7 1.8 1.4 4.5 5.8 33.6 read
100.0s 0 97774 977.7 14.0 13.6 24.1 29.4 50.3 write
backport-record_all_backports:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
100.0s 0 1818145 18181.3 1.9 1.5 4.5 5.8 436.2 read
100.0s 0 95332 953.3 14.3 13.6 24.1 29.4 453.0 write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
100.0s 0 1819280 18192.7 1.9 1.4 4.5 6.0 31.5 read
100.0s 0 95799 958.0 14.5 14.2 25.2 30.4 50.3 write
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
100.0s 0 1791759 17917.4 1.9 1.5 4.5 6.0 48.2 read
100.0s 0 94375 943.7 14.7 14.2 25.2 30.4 50.3 write
96 core
backport:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
100.0s 0 1670523 16705.1 2.1 1.7 4.7 6.0 130.0 read
100.0s 0 88173 881.7 15.4 14.7 26.2 30.4 234.9 write
backport-record_all_backports:
_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
100.0s 0 1596959 15969.5 2.2 1.8 4.7 6.3 142.6 read
100.0s 0 84326 843.3 16.0 15.7 27.3 31.5 226.5 write
nvb
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/kv/txn_interceptor_metrics.go, line 99 at r7 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
yes, it will skew the histogram. I've added a TODO to optimize the histogram code so we don't have to do this.
Here's some results with this PR backported on top of the commit that introduced the perf regression (69fa9f7).
backportis the PR as is,record_all_backportsis if we record the restarts for all txns. There appears to be a pretty significant difference.Tuesday, 24 core backport: _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 100.0s 0 1801235 18012.2 1.9 1.5 4.5 6.0 37.7 read 100.0s 0 94877 948.8 14.4 13.6 25.2 29.4 48.2 write _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 100.0s 0 1862335 18623.2 1.8 1.4 4.5 5.8 35.7 read 100.0s 0 97708 977.1 13.9 13.6 24.1 28.3 48.2 write _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 100.0s 0 1881041 18810.0 1.8 1.4 4.5 5.8 41.9 read 100.0s 0 98603 986.0 13.8 13.6 24.1 28.3 48.2 write _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 100.0s 0 1859887 18598.7 1.8 1.4 4.5 5.8 33.6 read 100.0s 0 97774 977.7 14.0 13.6 24.1 29.4 50.3 write backport-record_all_backports: _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 100.0s 0 1818145 18181.3 1.9 1.5 4.5 5.8 436.2 read 100.0s 0 95332 953.3 14.3 13.6 24.1 29.4 453.0 write _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 100.0s 0 1819280 18192.7 1.9 1.4 4.5 6.0 31.5 read 100.0s 0 95799 958.0 14.5 14.2 25.2 30.4 50.3 write _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 100.0s 0 1791759 17917.4 1.9 1.5 4.5 6.0 48.2 read 100.0s 0 94375 943.7 14.7 14.2 25.2 30.4 50.3 write 96 core backport: _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 100.0s 0 1670523 16705.1 2.1 1.7 4.7 6.0 130.0 read 100.0s 0 88173 881.7 15.4 14.7 26.2 30.4 234.9 write backport-record_all_backports: _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 100.0s 0 1596959 15969.5 2.2 1.8 4.7 6.3 142.6 read 100.0s 0 84326 843.3 16.0 15.7 27.3 31.5 226.5 write
Thanks for taking the measurements. I don't want to hold this PR up any longer and I think I'm ok with an incorrect metric, but please open an issue to correct this historgram and also to explore optimizing other histograms that have a very common bucket.
petermattis
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained
pkg/kv/txn_interceptor_metrics.go, line 99 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Thanks for taking the measurements. I don't want to hold this PR up any longer and I think I'm ok with an incorrect metric, but please open an issue to correct this historgram and also to explore optimizing other histograms that have a very common bucket.
Is the txn.restarts histogram even used?
nvb
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained
pkg/kv/txn_interceptor_metrics.go, line 99 at r7 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Is the
txn.restartshistogram even used?
To my knowledge, it's not used by the admin ui in any of the dashboards. However, I have found it useful as a custom graph. I don't know any way to track the frequency of txn restarts with respect to a single txn, as opposed to aggregated across all txns.
This is a performance optimization that we used to have but went away in a refactor. Read-only txns don't need to send an EndTransaction to the server since there's not txn record. Their EndTransaction was elided by the txnHeartbeatInterceptor. This patch moves the short-circuiting higher, to the TxnCoordSender. This avoids allocating a result and other tidbits of code. This makes a difference on a kv 95% read workload. Depending on the machine and the dice roll, it's about 10% throughput increase. Touches cockroachdb#30208 Release note: None
Use Counter instead of CounterWithRate and such. Makes a difference on a single-node read-heavy workload because some lock contention was softened. Also eliminates some uses of PhysicalTime and starts measuring durations even for read-only txns. Fixes cockroachdb#28424 Release note: None
It was unused since all of the users switched to Counter in a previous commit. Even before that, the "rate" part was unused. Release note: None
a98cfe8 to
feb135f
Compare
andreimatei
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/kv/txn_interceptor_metrics.go, line 99 at r7 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
To my knowledge, it's not used by the admin ui in any of the dashboards. However, I have found it useful as a custom graph. I don't know any way to track the frequency of txn restarts with respect to a single txn, as opposed to aggregated across all txns.
I've opened #30644 and referenced it in the code.
andreimatei
left a comment
There was a problem hiding this comment.
bors r+
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale)
|
👎 Rejected by code reviews |
|
bors r+ |
30448: kv: assorted client optimizations r=andreimatei a=andreimatei See individual commits. This PR increases throughput on a single node kv-95 workload as follows (measured on a 96-core GCE machine and running the client with concurrency=48). ``` _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 100.0s 0 1257869 12578.5 2.8 3.3 5.5 7.1 48.2 read 100.0s 0 66469 664.7 19.5 19.9 31.5 35.7 60.8 write ``` to ``` _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 100.0s 0 1369178 13691.4 2.5 2.8 5.2 6.8 503.3 read 100.0s 0 71850 718.5 18.3 18.9 28.3 33.6 520.1 write ``` The PR also takes away about 3/4 of the performance regression introduced in #28185. See #30208 for more. Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Build succeeded |
See individual commits.
This PR increases throughput on a single node kv-95 workload as follows (measured on a 96-core GCE machine and running the client with concurrency=48).
to
The PR also takes away about 3/4 of the performance regression introduced in #28185. See #30208 for more.