Skip to content

Auto-vectorize count_if, count #4653

@AlexGuteniev

Description

@AlexGuteniev

This is the rephrasing of #4456, with all progress made so far incorporated.
 
count and count_if can be auto-vectorized as follows:

  • For sizeof(difference_type) == sizeof(T) they are already auto-vectorized
  • For sizeof(difference_type) < sizeof(T) can use the approach similar to Help the compiler vectorize std::iota #4627
  • For sizeof(difference_type) > sizeof(T) can also use the approach similar to Help the compiler vectorize std::iota #4627, but it will not cover some large array sizes. To cover large array sizes, can also split the range into smaller ranges, so that for these smaller ranges T is enough to represent the count.

For count_if this would be the only feasible way to vectorize, as predicates cannot be used in separately compiled implementation, and we don't want complex manual vectorization with intrinsics in headers for throughput reasons.

For count this can be still an alternative to manual vectorization. The performance of auto-vectorization when compiling with /arch:AVX2 seems to be not much worse than existing manual vectorization for large ranges, albeit significantly worse for small ranges with large tails (auto-vectorization doesn't do the mask thing). So we can:

  • Add auto-vectorization as an alternative to manual vectorization, when the latter is not available
    (ARM64, or opt-out from _USE_STD_VECTOR_ALGORITHMS)
  • Use auto-vectorization as the only one (lose some perf for tails, but have unified vectorization implementation)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions