Skip to content

Add options to support Profile Guided Optimization#5160

Merged
ti-chi-bot merged 9 commits intopingcap:masterfrom
solotzg:pgo
Jul 15, 2022
Merged

Add options to support Profile Guided Optimization#5160
ti-chi-bot merged 9 commits intopingcap:masterfrom
solotzg:pgo

Conversation

@solotzg
Copy link
Contributor

@solotzg solotzg commented Jun 17, 2022

What problem does this PR solve?

Issue Number: ref #4909

What is changed and how it works?

doc: https://solotzg.github.io/2022/06/01/system/cpp-compile-optimize/#PGO%EF%BC%88Profile-Guided-Optimization%EF%BC%89

Profile Guided Optimization aka Feedback Directed Optimization make it possible to achieve better optimization by profile information. This PR supports profile guided optimization with two different kinds of profiling:

  • Using Sampling Profilers:
    1. add cmake option ENABLE_LLVM_PGO and ENABLE_LLVM_PGO_USE_SAMPLE, then recompile.
    2. run the executable under a sampling profiler by Linux Perf profiler
    3. convert the collected profile data to LLVM’s sample profile format by AutoFDO
    4. add env var TIFLASH_LLVM_PROFDATA=xxxx for recompiling.
  • Profiling with Instrumentation
    1. add cmake option ENABLE_LLVM_PROFILE_INSTR to generate instrumented code and recompile.
    2. set env LLVM_PROFILE_FILE=xxxx then start program and run benchmark to collect profile data
    3. profdata merge -output=xxxx xxxx.profraw
    4. add cmake option ENABLE_LLVM_PGO and env var TIFLASH_LLVM_PROFDATA=xxxx for recompiling

Profiling with Instrumentation will harm performance, and CAN NOT collect the real online profile info. So it's recommended to use the sampling way.

Benchmark-10G

env:

  • 1 tiflash, 3 pd, 3 tikv, 1 tidb
  • data: tpch-10
  • limit tiflash cpu usage up to 5 cores(500%) by cgroup.
  • original commit: 5b61ae7

workload:

git clone git@github.com:pingcap/go-tpc.git
cd go-tpc
make build
bin/go-tpc tpch run --queries "q1,q2,q3,q4,q6,q7,q8,q9,q10,q11,q12,q13,q14,q15,q16,q17,q18,q19,q20,q21,q22" --host {} -P {} --db {}

steps:

  1. compile TiFlash with cmake option -DENABLE_LLVM_PGO=ON -DENABLE_LLVM_PGO_USE_SAMPLE=ON
  2. compile https://github.com/google/autofdo and get binary create_llvm_prof for converting perf data to llvm profile data
  3. start TiFlash process and get <pid>
  4. prepare workload scripts file
  5. run python3 perf-tpch.py --perf --pid <pid> --workload <workload-scripts-path> --convert-llvm --convert-tool <create_llvm_prof-path> --binary <tiflash-bianry-path>
  6. get llvm perf file(tiflash.llvm.code.prof by default)
  7. compile TiFlash with env TIFLASH_LLVM_PROFDATA=<output-perf-file> and cmake option -DENABLE_LLVM_PGO=ON -DENABLE_LLVM_PGO_USE_SAMPLE=ON
  8. re-run workload and compare result
Time Cost(s) original LTO FDO+LTO   FDO+LTO : LTO FDO+LTO : original
Q1 6.32 6.07 5.57   8.98% 13.46%
Q2 2.99 2.99 2.92   2.40% 2.40%
Q3 2.79 2.65 2.65   0.00% 5.28%
Q4 1.91 2.05 1.85   10.81% 3.24%
Q6 0.91 0.91 0.84   8.33% 8.33%
Q7 2.38 2.32 2.32   0.00% 2.59%
Q8 4.8 4.73 4.73   0.00% 1.48%
Q9 16.81 16.54 16.48   0.36% 2.00%
Q10 3.72 3.72 3.66   1.64% 1.64%
Q11 0.5 0.5 0.5   0.00% 0.00%
Q12 1.98 1.91 1.85   3.24% 7.03%
Q13 4.66 4.6 4.33   6.24% 7.62%
Q14 1.04 1.11 0.97   14.43% 7.22%
Q15 2.05 1.98 2.11   -6.16% -2.84%
Q16 1.04 1.04 0.97   7.22% 7.22%
Q17 5.67 5.8 5.67   2.29% 0.00%
Q18 8.62 8.41 7.99   5.26% 7.88%
Q19 3.12 3.05 3.05   0.00% 2.30%
Q20 1.58 1.58 1.64   -3.66% -3.66%
Q21 2.99 2.85 2.89   -1.38% 3.46%
Q22 0.64 0.64 0.5   28.00% 28.00%

Updated in 2022-07-08

  • Change original commit since: 97342db
Time Cost(s) LTO FDO+LTO FDO+LTO : LTO
Q1 5.47 5.07 7.89%
Q2 3.12 2.92 6.85%
Q3 2.72 2.58 5.43%
Q4 2.92 2.92 0.00%
Q6 0.97 0.84 15.48%
Q7 3.59 3.52 1.99%
Q8 5.34 5.67 -5.82%
Q9 15.8 15.2 3.95%
Q10 3.72 3.46 7.51%
Q11 0.5 0.44 13.64%
Q12 1.78 1.78 0.00%
Q13 4.4 4.19 5.01%
Q14 0.91 0.97 -6.19%
Q15 1.98 2.05 -3.41%
Q16 0.97 0.91 6.59%
Q17 5.54 5.27 5.12%
Q18 7.68 7.42 3.50%
Q19 2.72 2.58 5.43%
Q20 1.51 1.51 0.00%
Q21 3.39 3.32 2.11%
Q22 0.57 0.57 0.00%

version info with pgo-sample and fdo

TiFlash
Release Version: v6.2.0-alpha-29-gec199a817-dirty
Edition:         Community
...
Enable Features: jemalloc avx avx512 unwind thinlto pgo-sample fdo
Profile:         RELWITHDEBINFO
...

Benchmark-100G

env:

  • 3 tiflash, 3 pd, 3 tikv, 1 tidb
  • data: tpch-100
  • tiflash replica: 3
  • limit tiflash cpu usage up to 10 cores(1000%) by cgroup.
  • original commit: 97342db

Time Cost(s) LTO FDO+LTO FDO+LTO : LTO
Q1 12.08 11.04 9.42%
Q2 4.33 4.26 1.64%
Q3 8.22 8.09 1.61%
Q4 21.17 21.64 -2.17%
Q5 19.36 19.83 -2.37%
Q6 1.91 1.85 3.24%
Q7 9.83 9.97 -1.40%
Q8 11.98 11.58 3.45%
Q9 65.26 64.32 1.46%
Q10 10.57 10.37 1.93%
Q11 2.05 2.05 0.00%
Q12 5.2 5.27 -1.33%
Q13 12.72 12.11 5.04%
Q14 2.18 2.11 3.32%
Q15 4.13 4.06 1.72%
Q16 2.32 2.25 3.11%
Q17 18.29 18.22 0.38%
Q18 23.12 22.85 1.18%
Q19 5.87 5.74 2.26%
Q20 4.06 4.13 -1.69%
Q21 39.02 39.29 -0.69%
Q22 1.38 1.31 5.34%

Conclusion

Except Q1, io/net/schedule-wait cause much more latency in other queries when using multi nodes with large data. We believe if usage about cpu resource is relatively high, the improvement brought by fdo will be better.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@solotzg solotzg added the type/enhancement The issue or PR belongs to an enhancement. label Jun 17, 2022
@ti-chi-bot
Copy link
Member

ti-chi-bot commented Jun 17, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • SchrodingerZhu
  • zanmato1984

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Details

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 17, 2022
@ti-chi-bot ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 30, 2022
@SchrodingerZhu
Copy link
Contributor

cool,good autofdo results.

@SchrodingerZhu
Copy link
Contributor

/lgtm

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 1, 2022
@solotzg
Copy link
Contributor Author

solotzg commented Jul 11, 2022

Add benchmark about tpch100 with 3 tiflash nodes. PTAL @zanmato1984 .

Copy link
Contributor

@zanmato1984 zanmato1984 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jul 14, 2022
@solotzg
Copy link
Contributor Author

solotzg commented Jul 14, 2022

/hold

@ti-chi-bot ti-chi-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 14, 2022
@solotzg
Copy link
Contributor Author

solotzg commented Jul 14, 2022

/merge

@ti-chi-bot
Copy link
Member

@solotzg: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

You only need to trigger /merge once, and if the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

If you have any questions about the PR merge process, please refer to pr process.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

DetailsCommit hash: 92072a8

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jul 14, 2022
@sre-bot
Copy link
Collaborator

sre-bot commented Jul 14, 2022

Coverage for changed files

Filename                                                                                            Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
/home/jenkins/agent/workspace/tiflash-build-common/tiflash/dbms/src/Common/TiFlashBuildInfo.cpp          18                 8    55.56%          10                 5    50.00%          55                40    27.27%           0                 0         -
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                    18                 8    55.56%          10                 5    50.00%          55                40    27.27%           0                 0         -

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18528      9609             48.14%    208793  96774        53.65%

full coverage report (for internal network access only)

@solotzg
Copy link
Contributor Author

solotzg commented Jul 15, 2022

/unhold

@ti-chi-bot ti-chi-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 15, 2022
@ti-chi-bot
Copy link
Member

@solotzg: Your PR was out of date, I have automatically updated it for you.

At the same time I will also trigger all tests for you:

/run-all-tests

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@sre-bot
Copy link
Collaborator

sre-bot commented Jul 15, 2022

Coverage for changed files

Filename                                                                                            Regions    Missed Regions     Cover   Functions  Missed Functions  Executed       Lines      Missed Lines     Cover    Branches   Missed Branches     Cover
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
/home/jenkins/agent/workspace/tiflash-build-common/tiflash/dbms/src/Common/TiFlashBuildInfo.cpp          18                 8    55.56%          10                 5    50.00%          55                40    27.27%           0                 0         -
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
TOTAL                                                                                                    18                 8    55.56%          10                 5    50.00%          55                40    27.27%           0                 0         -

Coverage summary

Functions  MissedFunctions  Executed  Lines   MissedLines  Cover
18573      9609             48.26%    210626  96811        54.04%

full coverage report (for internal network access only)

@ti-chi-bot ti-chi-bot merged commit 7b9a6ad into pingcap:master Jul 15, 2022
@solotzg solotzg deleted the pgo branch July 15, 2022 02:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/enhancement The issue or PR belongs to an enhancement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants