Vectorize reverse_copy()#804
Conversation
Fixed function signature and conditions for vectorization.
Line 11 in 88f8f44 Line 16 in 88f8f44 Lines 24 to 28 in 88f8f44 |
Is already defined in xutility
Rename __std_reverse_trivially_copyable_X to __std_reverse_copy_trivially_copyable_X
|
Any chance you would be willing to provide some performance numbers for before/after this change? Edit: And ideally the source used to acquire the numbers. |
Sure, I did some basic micro benchmarking of the implementation with google/benchmark on my x64 machine. A summary of the results can be found on google docs, the code is available on gist. I’d be happy to do more benchmarking if required. |
|
That google doc isn't public. |
BillyONeal
left a comment
There was a problem hiding this comment.
I think this needs testing exercising each of the newly added special paths.
|
Let us know if you need any help adding test coverage, or if there are any questions we can answer. 😺 |
|
@pawREP, just a friendly ping. Are you available to add testing for this? Otherwise it will have to wait until another contributor or maintainer adds testing. |
I think the benefits we saw for reverse are sufficient to motivate this. All this PR needs to land is some tests. |
|
Here's the benchmark I wrote way back when I added the vector reverse: Here are results from my 3970X; first, the important ones like vector: by the time you get to 64 elements the wins are huge. There's no reason this wouldn't apply just as much to reverse_copy (although absolute wins might be lower because memory bandwidth consumption is higher for that algorithm) The full list: |
|
Here would be a good place to add the tests: https://github.com/microsoft/STL/blob/master/tests/std/tests/VSO_0000000_vector_algorithms/test.cpp |
Remove an unnecessary semicolon after the function definition. Use the modern ranges::reverse pattern. Remove unnecessary bool_constant. Add test coverage for varying containers.
|
I pushed a significant simplification. Originally, we had both Additionally, @CaseyCarter used a simpler technique that we developed for vectorizing I removed an unnecessary semicolon after the function definition. I also removed an unnecessary use of I added test coverage for what happens when the containers/iterators vary in strength. This was prompted by my concern that |
Also remove an unnecessary std:: in the test.
|
I noticed the comment banner said:
So I changed the separately compiled functions accordingly, synthesizing the return value in the header. Thanks to @BillyONeal for writing the original extremely detailed comment banner; otherwise this would have appeared as a regression in optimized code that would have taken forever to diagnose. |
The 1, 4, and 8-byte reverse_copy implementations all start by testing for 32 bytes of AVX2 work. This is exactly half of what the 1/2/4/8-byte reverse implementations test for; I believe they need 64 bytes because they swap the front and the back of the range (so, 32 bytes on each side). Uniquely, 2-byte reverse_copy said 64. I believe that this is a copy-paste error, and that it should say 32 like all other reverse_copy implementations. This couldn't have been caught by testing, because the effect was to disable the AVX2 optimization for 2-byte elements, where the source range was between [32, 64) bytes, resulting in reduced performance only.
The 1, 4, and 8-byte Uniquely, 2-byte This couldn't have been caught by testing, because the effect was to disable the AVX2 optimization for 2-byte elements, where the source range was between [32, 64) bytes, resulting in reduced performance only. |
StephanTLavavej
left a comment
There was a problem hiding this comment.
I think this is good to go! 🌔 🚀 🪐
I think I just copy pasta'd an email from Neeraj about it :) |
[microsoftGH-804 squashed and rebased]
|
Thanks for your contribution! I was getting tired of always forward copying my vector eyes. |
Co-authored-by: Billy Robert O'Neal III <bion@microsoft.com> Co-authored-by: Stephan T. Lavavej <stl@microsoft.com>
Resolves #181
The implementation closely follows the one for vectorized
std::reverse.This currently includes the define of _USE_STD_VECTOR_ALGORITHMS in<algorithm>, which presumably should be changed. What's the best solution here?