-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Development Task
Currently, RawRecords are aggregated by "sql+plan+table+region+key_range". We pick the topN RawRecords ordered by cpu time. Other RawRecords which are not in topN are merged into one single "Other" record. Since "N" is a limit number, when queries access many different regions, the length of RawRecords will be much larger than N, and many records which map to one of the topN "sql+plan" will be merged into the "Other" record. It would cause data distortion in some cases.
For example:
A cluster with only 1 tikv node, running QueryA and QueryB
QueryA access 100 regions concurrently in one second, each takes 100.5ms tikv cpu time.
QueryB access 100 regions concurrently in one second, takes [150, 149, 148,... 51]ms time.
We will get 200 RawRecords for these 200 region requests, and then 49 of QueryA and 50 of QueryB will be picked, others will be merged. So in the final picked records, we would conclude that QueryB takes much larger cpu time than QueryA. However, they actually take almost the same cpu time.
So we can aggregate RawRecords using "sql+plan+table" before picking topN records to reduce such data distortion cases.