Skip to content

server: improve the addMetrics method performance#12454

Merged
lonng merged 5 commits intopingcap:masterfrom
lonng:simplify-metrics
Sep 29, 2019
Merged

server: improve the addMetrics method performance#12454
lonng merged 5 commits intopingcap:masterfrom
lonng:simplify-metrics

Conversation

@lonng
Copy link
Contributor

@lonng lonng commented Sep 28, 2019

Signed-off-by: Lonng heng@lonng.org

What problem does this PR solve?

  1. The addMetrics method is a hot path in TiDB and will be executed in every query. It uses switch .. case to find the Counter and add metrics to monitor system. It's not an efficient way.
  2. Every metric use two global variables (ok/error), which is global variable abuse.

See: https://godbolt.org/z/czr__4

What is changed and how it works?

This PR makes the following change.

  1. Group related metrics into two global variables by status (ok/error) and save them in an array. It's fast to get Counter by cmd (index).
  2. Use a hash map to save statement type related metrics. And find the Observer from map is fast than switch .. case in common case.

Check List

Tests

  • Unit test
  • Integration test

Release note

  • No need to add to release note.

Signed-off-by: Lonng <heng@lonng.org>
@lonng
Copy link
Contributor Author

lonng commented Sep 28, 2019

/bench

@ngaut
Copy link
Member

ngaut commented Sep 28, 2019

Do you have bench results?

server/conn.go Outdated
case metrics.LblGeneral:
queryDurationHistogramGeneral.Observe(time.Since(startTime).Seconds())
default:
observer, found := queryDurationHistogram[sqlType]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about refactor sqlType to be int first, then here also can use array :D

Signed-off-by: Lonng <heng@lonng.org>
@lonng
Copy link
Contributor Author

lonng commented Sep 28, 2019

@ngaut
I did a simple benchmark and got two conclusions.

  1. If you use hashmap, performance regression will occur

    BenchmarkSwitchCase-8         	287540623	         4.18 ns/op
    BenchmarkArrayAndHashMap-8    	70893422	        16.3 ns/op
    BenchmarkSwitchCase2-8        	330713746	         3.60 ns/op
    BenchmarkArrayAndHashMap2-8   	68912976	        16.1 ns/op
    BenchmarkSwitchCase3-8        	333756572	         3.56 ns/op
    BenchmarkArrayAndHashMap3-8   	70810513	        16.2 ns/op
    
  2. If you only use an array, there will be ~2x performance improvement

    BenchmarkSwitchCase-8         	161118134	         7.48 ns/op
    BenchmarkArrayAndHashMap-8    	566121494	         2.35 ns/op
    BenchmarkSwitchCase2-8        	198848967	         5.75 ns/op
    BenchmarkArrayAndHashMap2-8   	400275949	         2.64 ns/op
    BenchmarkSwitchCase3-8        	451752652	         2.51 ns/op
    BenchmarkArrayAndHashMap3-8   	666522830	         1.79 ns/op
    

See snippet: https://gist.github.com/lonng/339c17800a20fe48998d552d9ee1d535

So I have reverted the hashmap part.

@ngaut
Copy link
Member

ngaut commented Sep 28, 2019

/bench

@zyxbest
Copy link

zyxbest commented Sep 28, 2019

/run-all-tests
/rebuild

@codecov
Copy link

codecov bot commented Sep 28, 2019

Codecov Report

Merging #12454 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #12454   +/-   ##
===========================================
  Coverage   79.8785%   79.8785%           
===========================================
  Files           462        462           
  Lines        102622     102622           
===========================================
  Hits          81973      81973           
  Misses        14752      14752           
  Partials       5897       5897

@lonng
Copy link
Contributor Author

lonng commented Sep 29, 2019

@lysu @jackysp PTAL

@lonng
Copy link
Contributor Author

lonng commented Sep 29, 2019

/run-all-tests

Copy link
Contributor

@lysu lysu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lysu lysu added the status/LGT1 Indicates that a PR has LGTM 1. label Sep 29, 2019
@lysu lysu requested a review from cfzjywxk September 29, 2019 03:14
@cfzjywxk
Copy link
Contributor

LGTM

@lonng lonng added status/all tests passed status/LGT2 Indicates that a PR has LGTM 2. status/can-merge Indicates a PR has been approved by a committer. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Sep 29, 2019
@sre-bot
Copy link
Contributor

sre-bot commented Sep 29, 2019

/run-all-tests

@sre-bot
Copy link
Contributor

sre-bot commented Sep 29, 2019

@lonng merge failed.

@lonng
Copy link
Contributor Author

lonng commented Sep 29, 2019

/run-unit-test

@lonng lonng merged commit 044766f into pingcap:master Sep 29, 2019
@lonng lonng deleted the simplify-metrics branch September 29, 2019 04:40
lonng added a commit that referenced this pull request Sep 30, 2019
Signed-off-by: Lonng <heng@lonng.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/server status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants