Batch (raw) get requests to same region across batch commands#5598
Batch (raw) get requests to same region across batch commands#5598Little-Wallace merged 45 commits intotikv:masterfrom
Conversation
Signed-off-by: tabokie <xy.tao@outlook.com>
Signed-off-by: tabokie <xy.tao@outlook.com>
|
/run-all-tests/bench |
|
/bench tidb=pr/12569 |
src/storage/mod.rs
Outdated
| Err(e) => return future::err(e), | ||
| }; | ||
| for get in gets { | ||
| callback( |
There was a problem hiding this comment.
Can we only use one callback for these get requests to reduce the cost?
There was a problem hiding this comment.
The update roughly achieves -50% gRPC poll CPU and +9% qps in oltp_point_select.
src/server/service/kv.rs
Outdated
| let minibatcher = MiniBatcher::new(tx.clone(), self.minibatch_wait_millis); | ||
| let stopped = Arc::new(AtomicBool::new(false)); | ||
| let minibatcher = Arc::new(Mutex::new(minibatcher)); | ||
| if self.minibatch_wait_millis > 0 { |
|
Can you add some comments to help reviewer better understand the code? |
Signed-off-by: tabokie <xy.tao@outlook.com>
BusyJay
left a comment
There was a problem hiding this comment.
Why doing batch inside the kv service instead of storage?
|
/bench tidb=pr/12569 |
@BusyJay Currently the batch is collected within the lifetime of one |
Signed-off-by: tabokie <xy.tao@outlook.com>
@@ Benchmark Diff @@
================================================================================
--- tidb: be6163c823285eb7bd048c8d0377c7b05dfd464e
+++ tidb: a387fbf23e26f8f0dc6b140489d04860bc7ba942
--- tikv: 00b0234fecbe97f53700b125ce1f5793ba88531b
+++ tikv: 4eb0c21f0c91f1eefad69bbcc6fc743e3c3bf7a3
pd: 4acaa8c715d40a07f521dd85ef7dcd4118064289
================================================================================
test-1: < oltp_point_select >
* QPS : 64453.41 ± 2.2211% (std=921.42) delta: -18.37% (p=0.000)
* AvgMs : 3.97 ± 2.2670% (std=0.06) delta: 22.46%
* PercentileMs99 : 8.28 ± 0.0000% (std=0.00) delta: 24.14%
test-2: < oltp_read_write >
* QPS : 25687.45 ± 0.8507% (std=150.98) delta: -32.83% (p=0.000)
* AvgMs : 200.13 ± 0.8644% (std=1.19) delta: 48.86%
* PercentileMs99 : 312.21 ± 1.0781% (std=2.75) delta: 21.03%
test-3: < oltp_insert >
* QPS : 9036.30 ± 1.8150% (std=119.24) delta: -59.13% (p=0.000)
* AvgMs : 28.33 ± 1.8002% (std=0.37) delta: 144.70%
* PercentileMs99 : 46.91 ± 1.1938% (std=0.40) delta: 97.28%
test-4: < oltp_update_index >
* QPS : 7239.76 ± 2.6838% (std=156.80) delta: -58.45% (p=0.000)
* AvgMs : 35.32 ± 2.8537% (std=0.72) delta: 143.50%
* PercentileMs99 : 48.78 ± 0.8919% (std=0.43) delta: 53.81%
test-5: < oltp_update_non_index >
* QPS : 12779.03 ± 1.0768% (std=100.24) delta: -57.16% (p=0.000)
* AvgMs : 20.03 ± 1.0815% (std=0.16) delta: 133.54%
* PercentileMs99 : 29.19 ± 0.0000% (std=0.00) delta: 38.28%
|
|
/run-all-tests |
|
oh, seem the perf reduced. @tabokie |
Signed-off-by: tabokie <xy.tao@outlook.com>
@siddontang Don't sweat, this PR only batch get request, should focus on oltp_point_select. As for oltp_point_select, the batch timeout is set to default 0ms in that run, I don't expect a performance gain considering gRPC conn is set to 1. |
|
/release make dist_release/bench tidb=pr/12569 |
Signed-off-by: tabokie <xy.tao@outlook.com>
Signed-off-by: tabokie <xy.tao@outlook.com>
Signed-off-by: tabokie <xy.tao@outlook.com>
Signed-off-by: tabokie <xy.tao@outlook.com>
Signed-off-by: tabokie <xy.tao@outlook.com>
Signed-off-by: tabokie <xy.tao@outlook.com>
What have you changed?
To avoid request timeout due to batch too large or short of timeslice, add one thread of interval timer and periodically issues a forced submit.
This change will reduce readpool tasking pressure and in turn cpu usage. The effect is relevent to gRPC connection count (contribute to qps in one stream) and batch wait duration.
What is the type of the changes?
How is the PR tested?
Does this PR affect documentation (docs) or should it be mentioned in the release notes?
No
Does this PR affect
tidb-ansible?Will do
Refer to a related PR or issue link (optional)
This PR is split from #5382
Benchmark result if necessary (optional)
improve 17% qps for oltp_point_select bench (grpc-connection-count=3, request-batch-wait-duration="1ms") without compromising other workloads.improve 20% qps for oltp_point_select (grpc-connection-count=1 or 2, request-batch-wait-duration="1ms"), with a notable side effect on other workloads.This PR has seen some major performance speed-ups on tikv master since it first started (#5363, #5654 ). Now request batch is only feasible under higher perssure.
Here is a result from sysbench oltp_point_select 1024 thread, on 1 tidb 1 tikv each on seperate server with 40 core. (with tidb point get grpc conn of 4)
before batch, tikv cpu ~ 900%
after batch, tikv cpu ~ 550%
Any examples? (optional)