-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed as not planned
Labels
performanceMust go fasterMust go faster
Description
This is the rephrasing of #4456, with all progress made so far incorporated.
count and count_if can be auto-vectorized as follows:
- For
sizeof(difference_type) == sizeof(T)they are already auto-vectorized - For
sizeof(difference_type) < sizeof(T)can use the approach similar to Help the compiler vectorizestd::iota#4627 - For
sizeof(difference_type) > sizeof(T)can also use the approach similar to Help the compiler vectorizestd::iota#4627, but it will not cover some large array sizes. To cover large array sizes, can also split the range into smaller ranges, so that for these smaller rangesTis enough to represent the count.
For count_if this would be the only feasible way to vectorize, as predicates cannot be used in separately compiled implementation, and we don't want complex manual vectorization with intrinsics in headers for throughput reasons.
For count this can be still an alternative to manual vectorization. The performance of auto-vectorization when compiling with /arch:AVX2 seems to be not much worse than existing manual vectorization for large ranges, albeit significantly worse for small ranges with large tails (auto-vectorization doesn't do the mask thing). So we can:
- Add auto-vectorization as an alternative to manual vectorization, when the latter is not available
(ARM64, or opt-out from_USE_STD_VECTOR_ALGORITHMS) - Use auto-vectorization as the only one (lose some perf for tails, but have unified vectorization implementation)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
performanceMust go fasterMust go faster