[Microbenchmarks] Add benchmark for conditional scalar assignment autovec#295
Conversation
…ovec Benchmarks with vs. without autovec for a loop containing conditional scalar assignment (plus a little extra arithmetic as a 'work payload').
|
Microbenchmark for FindLast/CSA autovec, as requested on llvm/llvm-project#158088 With just the conditional assignment in the loop, there was no noticeable performance difference. However, when I added a small arithmetic payload I saw a noticeable difference, especially for uint8t. |
MacDue
left a comment
There was a problem hiding this comment.
Generally seems reasonable to me (bar a few nits), but I've not added a benchmark before, so wait and see if there's any more comments.
MicroBenchmarks/LoopVectorization/ConditionalScalarAssignment.cpp
Outdated
Show resolved
Hide resolved
| @@ -0,0 +1,118 @@ | |||
| #include <iostream> | |||
There was a problem hiding this comment.
Was going to comment about the license header, but it seems that's not done here (looking at other files).
There was a problem hiding this comment.
Yeah, I wondered about that too.
MicroBenchmarks/LoopVectorization/ConditionalScalarAssignment.cpp
Outdated
Show resolved
Hide resolved
| // for 'A' in init_data below. | ||
| T Result = 101; | ||
| for (unsigned i = 0; i < ITERATIONS; i++) { | ||
| // Do some work to make the difference noticeable |
There was a problem hiding this comment.
could you add a few more variations, like the minimal case with just a CAS and multiple independent CAS?
| } | ||
| } | ||
|
|
||
| // Add add auto-vectorized and disabled vectorization benchmarks for math |
There was a problem hiding this comment.
The comment needs updating, currently passes only ty and Threshold, but it might be helpful to also pass a function if it helps to reduce the duplication for additional patterns
| #endif | ||
|
|
||
| for (auto _ : state) { | ||
| VecFn(&A[0], &B[0], &C[0], Threshold); |
There was a problem hiding this comment.
I am not sure this is working as expected. I think we need something like below to make sure the CAS result is used:
- NoVecFn(&A[0], &B[0], &C[0], Threshold);
+ auto Res = NoVecFn(&A[0], &B[0], &C[0], Threshold);
+ benchmark::DoNotOptimize(Res);
Without a use of the result, compiler is probably able completely remove the variantst that don't have stores in the loop and also remove the unused CAS chain after inlining?
Benchmarks with vs. without autovec for a loop containing conditional
scalar assignment (plus a little extra arithmetic as a 'work payload').