Add a 512 bit codepath to the AVX512 fastConv function#10468
Merged
alalek merged 4 commits intoopencv:masterfrom Jan 31, 2018
Merged
Add a 512 bit codepath to the AVX512 fastConv function#10468alalek merged 4 commits intoopencv:masterfrom
alalek merged 4 commits intoopencv:masterfrom
Conversation
Contributor
Author
|
Geometric mean
|
Contributor
Author
|
performance is slightly mixed, so for sure feedback needed. |
alalek
reviewed
Dec 30, 2017
Member
alalek
left a comment
There was a problem hiding this comment.
Thank you!
Please rebase your patch on the latest master (CV_AVX512_SKX is not available on commit of this patch)
| /* only use AVX512 for multiple-of-16 vectors */ | ||
| if ((vecsize & 15) == 0) { | ||
|
|
||
| __m512 vs00_5 = _mm512_setzero_ps(), vs01_5 = _mm512_setzero_ps(), |
Member
There was a problem hiding this comment.
Please fix indentation here (4 spaces).
| __m512 w0 = _mm512_load_ps(wptr0 + k); | ||
| __m512 w1 = _mm512_load_ps(wptr1 + k); | ||
| __m512 w2 = _mm512_load_ps(wptr2 + k); | ||
| __m512 r0 = _mm512_load_ps(rptr); |
Member
There was a problem hiding this comment.
_mm512_load_ps() is aligned load. Need to check that ptrs are aligned to 64-bytes or to change this to _mm512_loadu_ps().
Currently these ptrs are 32-bytes aligned only (based on OpenCV's memory allocator alignment requirement), so AVX/AVX2 code is fine here.
this patch adds a 512 wide codepath to the fastConv() function for AVX512 use. The basic idea is to process the first N * 16 elements of the vector with avx512, and then run the rest of the vector using the traditional AVX2 codepath.
alalek
approved these changes
Jan 31, 2018
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
this patch adds a 512 wide codepath to the fastConv() function for
AVX512 use.
The basic idea is to process the first N * 16 elements of the vector
with avx512, and then run the rest of the vector using the traditional
AVX2 codepath.