Skip to content

[NEW] Optimize HyperLogLog by SIMD acceleration #13551

@Nugine

Description

@Nugine

We found that the dense-encoding part of HyperLogLog can be significantly accelerated by SIMD instructions.

Our benchmark tests the performance of merging 3 dense hll structures.

pfcount key1 key2 key3
pfmerge keyall key1 key2 key3

The result shows a 12x speedup compared to the scalar version.

======================================================================================================
Type             Ops/sec    Avg. Latency     p50 Latency     p99 Latency   p99.9 Latency       KB/sec 
------------------------------------------------------------------------------------------------------
PFCOUNT-scalar    5280.76        37.93893        34.81500        69.63100        73.72700       283.63
PFCOUNT-avx2     69802.99         2.85844         2.75100         5.53500         6.97500      3749.18
------------------------------------------------------------------------------------------------------
PFMERGE-scalar    9445.56        21.17554        20.09500        38.91100        41.21500       590.35
PFMERGE-avx2    120642.02         1.65367         1.59100         3.21500         5.11900      7540.13
------------------------------------------------------------------------------------------------------

CPU:    13th Gen Intel® Core™ i9-13900H × 20
Memory: 32.0 GiB
OS:     Ubuntu 22.04.5 LTS

I'm not sure whether this change is acceptable to maintainers. I will create a PR if it is ok.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions