Skip to content

Column: optimze filter#9670

Merged
ti-chi-bot[bot] merged 7 commits intopingcap:masterfrom
Lloyd-Pottiger:optimize-filter
Dec 4, 2024
Merged

Column: optimze filter#9670
ti-chi-bot[bot] merged 7 commits intopingcap:masterfrom
Lloyd-Pottiger:optimize-filter

Conversation

@Lloyd-Pottiger
Copy link
Contributor

@Lloyd-Pottiger Lloyd-Pottiger commented Nov 25, 2024

What problem does this PR solve?

Issue Number: ref #9699

Problem Summary:

What is changed and how it works?

Rewrite IColumn::filter interface with avx2, which can improve at most ~10x performance.

Perf test result (larger is better):

$ ./dbms/bench_dbms --benchmark_filter="columnFilter*"          
2024-11-27T15:16:36+08:00
Running ./dbms/bench_dbms
Run on (72 X 3299.98 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x36)
  L1 Instruction 32 KiB (x36)
  L2 Unified 1024 KiB (x36)
  L3 Unified 25344 KiB (x2)
Load Average: 11.32, 18.49, 38.43
----------------------------------------------------------------
Benchmark                      Time             CPU   Iterations
----------------------------------------------------------------
columnFilter/sse2_00        3866 ns         3849 ns       182665
columnFilter/avx2_00        2614 ns         2603 ns       270405
columnFilter/sse2_01       22352 ns        22252 ns        32578
columnFilter/avx2_01        4663 ns         4642 ns       159532
columnFilter/sse2_10      147419 ns       146777 ns         4694
columnFilter/avx2_10       15518 ns        15453 ns        45420
columnFilter/sse2_20      215868 ns       214854 ns         3274
columnFilter/avx2_20       25262 ns        25147 ns        29981
columnFilter/sse2_30      284431 ns       283065 ns         2463
columnFilter/avx2_30       30250 ns        30120 ns        24217
columnFilter/sse2_40      345727 ns       344208 ns         2040
columnFilter/avx2_40       35211 ns        35066 ns        19974
columnFilter/sse2_50      388276 ns       386500 ns         1850
columnFilter/avx2_50       41809 ns        41632 ns        17209
columnFilter/sse2_60      360463 ns       358898 ns         1962
columnFilter/avx2_60       47454 ns        47242 ns        14610
columnFilter/sse2_70      306273 ns       304934 ns         2284
columnFilter/avx2_70       52609 ns        52386 ns        13615
columnFilter/sse2_80      249978 ns       248908 ns         2825
columnFilter/avx2_80       61784 ns        61495 ns        12060
columnFilter/sse2_90      180275 ns       179494 ns         3867
columnFilter/avx2_90       70565 ns        70232 ns        11137
columnFilter/sse2_99       48001 ns        47784 ns        15408
columnFilter/avx2_99       47023 ns        46831 ns        15164
columnFilter/sse2_100     153413 ns       152821 ns         4436
columnFilter/avx2_100     148977 ns       148396 ns         4720
Rewrite `IColumn::filter` interface with `avx2`, which can improve at most ~10x performance.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

None

@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 25, 2024
@Lloyd-Pottiger Lloyd-Pottiger force-pushed the optimize-filter branch 2 times, most recently from 1e4d2a2 to 3a82055 Compare November 26, 2024 03:53
Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
@purelind
Copy link
Contributor

/retest

@JinheLin
Copy link
Contributor

JinheLin commented Nov 27, 2024

Do you compare the performance of different filtration rates. For example, 1%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 99%.

And set the size of the column to DEFAULT_BLOCK_SIZE.

@Lloyd-Pottiger
Copy link
Contributor Author

Do you compare the performance of different filtration rates. For example, 1%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 99%.

And set the size of the column to DEFAULT_BLOCK_SIZE.

$ ./dbms/src/Columns/tests/column_vector_perftest int64 filter 10000 100 30
Test int64-filter rows=10000 columns=100 seconds=30
FilterV1: 514449    
FilterV2: 1245839

100 / 10000 is the filtration rate.

@JinheLin
Copy link
Contributor

Add micro-benchmark for column filter: Lloyd-Pottiger#18

./dbms/bench_dbms --benchmark_filter="columnFilter*"
2024-11-27T14:37:00+08:00
Running ./dbms/bench_dbms
Run on (72 X 3300.01 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x36)
  L1 Instruction 32 KiB (x36)
  L2 Unified 1024 KiB (x36)
  L3 Unified 25344 KiB (x2)
Load Average: 10.74, 19.38, 23.60
----------------------------------------------------------------
Benchmark                      Time             CPU   Iterations
----------------------------------------------------------------
columnFilter/sse2_00        2697 ns         2686 ns       246181
columnFilter/sse2_01       16080 ns        16017 ns        41333
columnFilter/sse2_10      127919 ns       127345 ns         5714
columnFilter/sse2_20      202684 ns       201722 ns         3510
columnFilter/sse2_30      258492 ns       257367 ns         2754
columnFilter/sse2_40      309530 ns       308144 ns         2289
columnFilter/sse2_50      331585 ns       330077 ns         2063
columnFilter/sse2_60      320293 ns       318848 ns         2250
columnFilter/sse2_70      271073 ns       269877 ns         2633
columnFilter/sse2_80      210172 ns       209248 ns         3346
columnFilter/sse2_90      151390 ns       150685 ns         4818
columnFilter/sse2_99       42376 ns        42196 ns        16193
columnFilter/sse2_100     158190 ns       157494 ns         4296
columnFilter/avx2_00        2585 ns         2574 ns       274291
columnFilter/avx2_01        4273 ns         4257 ns       162088
columnFilter/avx2_10       14650 ns        14586 ns        49541
columnFilter/avx2_20       21996 ns        21904 ns        32461
columnFilter/avx2_30       27256 ns        27141 ns        25264
columnFilter/avx2_40       33599 ns        33456 ns        20461
columnFilter/avx2_50       39686 ns        39513 ns        18076
columnFilter/avx2_60       45817 ns        45624 ns        15675
columnFilter/avx2_70       54466 ns        54213 ns        11688
columnFilter/avx2_80       59390 ns        59112 ns        12559
columnFilter/avx2_90       65427 ns        65135 ns        10859
columnFilter/avx2_99       52000 ns        51764 ns        10000
columnFilter/avx2_100     157735 ns       157017 ns         4695

@ti-chi-bot ti-chi-bot bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 27, 2024
Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
Co-authored-by: jinhelin <linjinhe33@gmail.com>
Copy link
Contributor

@gengliqi gengliqi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using _mm512(256)_maskz_compress_epi(width) to achieve more speedup when avx512 is enabled?

@Lloyd-Pottiger
Copy link
Contributor Author

How about using _mm512(256)_maskz_compress_epi(width) to achieve more speedup when avx512 is enabled?

Good idea! But leave it for later PR.

@gengliqi
Copy link
Contributor

gengliqi commented Dec 3, 2024

How about using _mm512(256)_maskz_compress_epi(width) to achieve more speedup when avx512 is enabled?

Good idea! But leave it for later PR.

OK. I will give it a try then.

Lloyd-Pottiger and others added 2 commits December 4, 2024 14:41
Signed-off-by: Lloyd-Pottiger <yan1579196623@gmail.com>
@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Dec 4, 2024
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Dec 4, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JaySon-Huang, JinheLin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [JaySon-Huang,JinheLin]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Dec 4, 2024
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Dec 4, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-12-04 07:09:22.452845341 +0000 UTC m=+1225150.072499856: ☑️ agreed by JaySon-Huang.
  • 2024-12-04 07:49:22.317299507 +0000 UTC m=+1227549.936954023: ☑️ agreed by JinheLin.

@ti-chi-bot ti-chi-bot bot merged commit 7a32e7a into pingcap:master Dec 4, 2024
@Lloyd-Pottiger Lloyd-Pottiger deleted the optimize-filter branch December 4, 2024 07:53
yongman pushed a commit to yongman/tiflash that referenced this pull request Jun 18, 2025
pingcap#9649

Signed-off-by: Wish <breezewish@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants