Use intrinsics for cvRound on x86_64 __GNUC__ (clang/gcc linux) too.#24001
Conversation
|
@legrosbuffle Thanks a lot for the contribution. I made quick benchmark of imgproc module with and without the patch and outcome is the following:
|
|
@legrosbuffle could you describe your use case? |
|
We're mostly interested in Diff between base and patch on machines we care about: |
|
@legrosbuffle, thanks for PR. We just want to give compiler more chances to vectorize code. I believe, with higher-level intrinsics like __builtin_lrintf() the chances are higher than with manually embedded scalar SSE2 intrinsics. Another reason is that the code may be compiled with AVX2 or AVX512 and then those SSE2 intrinsics would look weird. May we get more detailed information about your test environment:
|
Do you have an example where the buitin version vectorizes ? Neither GCC nor clang seem to vectorize any version of
When building with AVX2, the compiler will select the proper encoding (see godbolt link).
Google's linux disrib. Results are similar on gLinux and prod.
We're using
Skylake
HEAD
The only relevant one is |
…linux) too. We've measured a 7x improvement in speed for `cvRound` using the intrinsic.
23118c9 to
3cce299
Compare
|
@legrosbuffle Thanks a lot for the patch. We discussed the solution on OpenCV core team meeting and decide to merge it. I removed x86_64 check to cover x86 32-bit configuration too. I'll merge it as soon as CI passed. |
There is no reason to enable this only on windows. We've measured a 7x improvement in speed for
cvRoundusing the intrinsic.Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.