ReadMode: introduce AutoFollowerRead mechanism by adding a new `ReadMode == PreferLeader` by LykxSassinator · Pull Request #671 · tikv/client-go

LykxSassinator · 2023-01-17T08:06:48Z

Description

Introduce AutoFollowerRead mechanism to improve the Read availability on Read workloads when setting with ReadMode == PreferLeader. Here, a new read mode PreferLeader has been introduced for enabling this mechanism.

Close #670.

Signed-off-by: lucasliang nkcs_lykx@hotmail.com

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

…ose AufoFollowerRead feature. Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

…ed into "replica_mode=leader-and-follower". Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

…der ReplicaReadMixed mode. Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

…es, making it meet the uniform distribution. Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

…errors. Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

LykxSassinator · 2023-01-28T07:19:28Z

@tonyxuqqi PTAL,thx

disksing · 2023-01-28T08:08:33Z

I wonder If the load of different stores is different (for example, some stores are all point get requests, while others are full table scans), then the method of determining store slow may not be accurate.

LykxSassinator · 2023-01-29T04:05:38Z

I wonder If the load of different stores is different (for example, some stores are all point get requests, while others are full table scans), then the method of determining store slow may not be accurate.

Thx for your comments.
What u've pointed about the influence of different workloads has been considered in the design of the detection algorithm.

Design in brief introduction

One store in TiKV clusters may be slow because of network delays or IO delays on disk. Currently, we cannot have a very concise method to aggregate all informations into the Client and make the determination on which store is slow.
So, we use a measurement, so-called Slow Score, to represent the severity of slow on each store. And only when the Slow Score exceed the threshold(80 in current implementation), will the corresponding store be marked as SLOW. Then, the read flows will be redirected to other nodes.

And about the calculation of Slow Score, we introduce two dimensions to determine it on each store:

Requests in one timing tick. This dimension can be regarded as QPS on each store.
Average timecost on each request in one timing tick. This dimension is used to detect whether the relative store is busy on processing requests.

And as we known, if one store is slow, its Requests will keep decreasing gradually, but Average timecost will keep ascending.

So, we just introduce an appropriate algorithm(just like Sliding window) to combine these two factors and convert them to Slow Score, for representing the business of each node. That is:

Slow Score focus the severity of slowness on each store, and each store has its own counting statistics for calculating Slow Score.
Slow Score of one store will gradually ascend if this store is gradually slow on processing requests. And this store will be marked as SLOW if Slow Score exceed the threshold.

Of course, about the point on:

store slow may not be accurate.

Slow Score can be more accurate if we make the statistics divided by the type of commands, which can be reviewed in later generation of this algorithm. We just wanna introduce an appropriate and simple way, for less costs of resources, to optimize the replica_read_mode == leader_and_follower mode.

disksing · 2023-01-31T08:31:17Z

@LykxSassinator Thank you for the clarification.

… a new mode `prefer-leader`. Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

internal/locate/region_cache.go

internal/locate/region_request.go

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

LykxSassinator · 2023-02-07T07:49:12Z

cc @disksing PTAL, thx.

LykxSassinator added 25 commits November 17, 2022 10:53

Prototype for automatical Follower Reads.

c620442

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

Merge branch 'tikv:master' into auto_redirect_flows

b6246a0

Remove unnecessary modifications.

952999a

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

Bugfix for integration tests.

ae43374

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Merge branch 'tikv:master' into auto_redirect_flows

0aa0057

Refine the algorithm for detecting slowStores

b0e3aa2

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

Bugfix when incorrectly redirecting WriteCmd to followers.

b3dbe02

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Supply extra store_slow_store metrics.

c4c48d1

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

Bugfix for getting the type of commands.

66b4224

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

Bugfix and fine-tuning on algorithms

3bde4a1

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Add configuration enable-auto-follower-read as a trigger to open/cl…

221fd10

…ose AufoFollowerRead feature. Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Refactor the configuration of AutoFollowerRead by making it contain…

1a4831b

…ed into "replica_mode=leader-and-follower". Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Bugfix on the detection algorithm.

8a149fc

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Bugfix when resolving the store from unhealthy to healthy.

5b9f906

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Bugfix for algorithm when no requests for updating.

c3109ad

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Fix incorrect updating.

770b166

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

Optimize the selection strategy of nodes when processing READ cmds un…

4f7fff5

…der ReplicaReadMixed mode. Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

[Draft]Add extra metrics for monitoring Read flows.

0662579

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Fine-tuning the algorithm of slow nodes detection.

1c70aa3

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Refine the algorithm of choosing nodes when there exists abnormal nod…

8ad29d6

…es, making it meet the uniform distribution. Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Merge branch 'master' into auto_redirect_flows

e95ad29

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

Merge branch 'master' into auto_redirect_flows

bcac417

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

Merge branch 'tikv:master' into auto_redirect_flows

2d8a5cd

Polish codes.

b29d04c

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

Remove unnecessary metrics.

6444a28

Signed-off-by: lucasliang <nkcs_lykx@hotmail.com>

LykxSassinator marked this pull request as draft January 17, 2023 08:09

Revoke incorrect modifications on intergration tests and code format …

a16c415

…errors. Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

LykxSassinator marked this pull request as ready for review January 17, 2023 08:59

LykxSassinator marked this pull request as draft January 18, 2023 09:05

Fine-tuning the detection algorithm.

85bcc91

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

LykxSassinator added 2 commits January 28, 2023 10:50

Merge branch 'master' into auto_redirect_flows

4b583d6

Revoke incorrect modification on the detection algorithm.

ddb85a2

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

LykxSassinator marked this pull request as ready for review January 28, 2023 03:26

disksing requested a review from tonyxuqqi January 28, 2023 07:20

Merge branch 'master' into auto_redirect_flows

13c7f40

Refactor the setting on enabling auto-follower-read feature by adding…

ada803e

… a new mode `prefer-leader`. Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

LykxSassinator changed the title ~~ReadMode: introduce AutoFollowerRead mechanism to enhance availabilities on ReadMode == LeaderAndFollower~~ ReadMode: introduce AutoFollowerRead mechanism by adding a new ReadMode == PreferLeader Jan 31, 2023

LykxSassinator added 4 commits February 1, 2023 14:33

Merge branch 'master' into auto_redirect_flows

bad6a1f

Signed-off-by: Lucasliang <nkcs_lykx@hotmail.com>

Merge branch 'master' into auto_redirect_flows

2eb4265

Merge branch 'master' into auto_redirect_flows

69acc0b

Merge branch 'master' into auto_redirect_flows

3e54d22