DNN: Try to be compatible with win32#22454
Conversation
|
Hi @alalek, can you test if this patch can fix the issue Win32? |
| #define CONV_NR 24 | ||
|
|
||
| #ifdef CV_AVX2 | ||
| #if CV_AVX2 |
There was a problem hiding this comment.
ISA-specific checks should be avoided in general (as API is called "universal intrinsics"), they could be used for fine-tuning only.
CV_SIMD_WIDTH should be used instead for detection of SIMD128 and others.
There was a problem hiding this comment.
Thanks for the reminder, what macros should we use to distinguish function calls? How about #if __AVX2__?
There was a problem hiding this comment.
According to comment // SIMD 128 in the else branch you don't want to handle AVX2 here only (because code is a priori broken for other SIMD256 ISAs).
I believe you want to check for SIMD256 / 128.
So you should do that through CV_SIMD_WIDTH check in that case.
There was a problem hiding this comment.
I got your point now. For now, we have AVX2 branch, NEON branch, and Universal intrinsics (SIMD 128).
And if we just set FAST_VEC_NLANES to 8 when SIMD256 is true. We need add the then Universal intrinsics (SIMD 256) implementation to prevent code errors.
There was a problem hiding this comment.
I will reconsider how to better support more platforms.
There was a problem hiding this comment.
Hi @alalek, the current AVX implementation is compatible with AVX and AVX2. How can the implementation of a function (like convBlock_AVX2) exist in two namespaces (opt_AVX and opt_AVX2) at the same time?
There was a problem hiding this comment.
Updata: The current workaround is that AVX2 will be computed at AVX2 branch, and AVX to the SIMD256 branch.
There was a problem hiding this comment.
Thank you for update!
I would propose to keep original one-line "quick fix" to unlock win32/linux32 builds and then move refactoring into separate PR (as there is still open questions).
How can the implementation of a function (like convBlock_AVX2) exist in two namespaces (opt_AVX and opt_AVX2) at the same time?
This should be implemented with "runtime dispatching" through .simd.hpp files (finally we should not have SIMD code in .cpp files). See also this wiki page: https://github.com/opencv/opencv/wiki/CPU-optimizations-build-options
703c147 to
0138615
Compare
0138615 to
b69b1ea
Compare
Fixes #22450
Relates #22401
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.