Conversation
opencv-alalek
left a comment
There was a problem hiding this comment.
- Crashes of DNN test binary are resolved
| char *c_, int ldc, float alpha) { | ||
| // NEON (AARCH64: 32 x 128-bit registers, armv7: 16 x 128-bit registers) | ||
| #if CV_NEON && CV_NEON_AARCH64 | ||
| static inline void fast_gemm8x12_f32(int k, const char *a_, const char *b_, |
There was a problem hiding this comment.
It is better to keep this code in .simd.hpp (that file is processed anyway including "baseline" pass).
Example: fastAtan64f
There was a problem hiding this comment.
I dont quite understand. If these default kernels are put in .simd.hpp, how to make them define only when other sets are not availble (AVX, AVX2, LASX)? For example, we can control this using #if !defined(CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY) && CV_AVX for AVX / AVX2; what about defaults?
There was a problem hiding this comment.
"Defaults" are CPU_BASELINE.
other sets are not availble (AVX, AVX2, LASX)
CPU_BASELINE could be AVX2 or even AVX512_SKX.
All optimization code paths should be properly handled through the single file (.simd.hpp). Baseline code path is generated from this file too.
Follow the provided example (it is simple enough).
Use the same function name and same parameters for the entry point.
There was a problem hiding this comment.
Do you mean something like this in .simd.hpp:
// ...
CV_CPU_OPTIMIZATION_NAMESPACE_BEGIN
/*
fastGemm signatures
*/
#if !defined(CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY) && CV_AVX
// ...
#endif
#if !defined(CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY) && CV_LASX
// ...
#endif
#if !defined(CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY)
// content of fast_gemm_kernels.default.hpp
#endif
CV_CPU_OPTIMIZATION_NAMESPACE_END
// ...This can lead to redefinitions.
There was a problem hiding this comment.
This scheme doesn't violate ODR rule:
// .simd.hpp
CV_CPU_OPTIMIZATION_NAMESPACE_BEGIN
// forward declaration
void my_dispatched_function(...)
#ifndef CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY
... optional helper functions (see below)...
void my_dispatched_function(...)
{
CV_TRACE_FUNCTION();
#if CV_AVX
// CV_TRACE_REGION("AVX");
... AVX instructions (rewrites successors like AVX2, AVX512 too) ...
// CV_TRACE_REGION_NEXT("TAIL");
#elif CV_NEON
// CV_TRACE_REGION("NEON");
... NEON instructions ...
// CV_TRACE_REGION_NEXT("TAIL");
#elif CV_LASX
// CV_TRACE_REGION("LASX");
... LASX instructions ...
// CV_TRACE_REGION_NEXT("TAIL");
#elif CV_SIMD
// CV_TRACE_REGION("SIMD");
... other universal SIMD ...
// CV_TRACE_REGION_NEXT("TAIL");
#endif
... generic C++ (scalar) tail processing ...
}
#endif // CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY
CV_CPU_OPTIMIZATION_NAMESPACE_END
If we need helper functions, then make them static inline right after #ifndef CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY:
#if CV_AVX
static inline my_avx_helper(...) {}
#elif CV_NEON
static inline my_neon_helper(...) {}
#elif CV_LASX
static inline my_lasx_helper(...) {}
#elif CV_SIMD
static inline my_simd_helper(...) {}
#endif
static inline my_generic_helper(...) {}
(structures should go in anonynous namespace to avoid ODR violation)
|
@fengyuentau, I discussed this with @opencv-alalek for another time. If I understand correctly, the main complain is that inside fast_gemm_kernels.default.hpp there is NEON-specific code, which should be moved to fast_gemm_kernels.simd.hpp. In fast_gemm_kernels.default.hpp there should be only pure C++ code and universal intrinsics. I agree with that. As soon as you make this change, the PR could be merged. |
|
@vpisarev Thank you for the updates. If we put the NEON-specific code back to fast_gemm_kernels.simd.hpp, we have to restore the original implementation, which uses dispatcher for NEON. This goes against #23897 (comment). |
|
|
19c319c to
e042c16
Compare
* remove Conformance from test names * integrate neon optimization into default * quick fix: define CV_NEON_AARCH64 0 for non NEON platforms * remove var batch that leads to memory leak * put neon code back to fast_gemm_kernels.simd * reorganize code to reduce duplicate code
* remove Conformance from test names * integrate neon optimization into default * quick fix: define CV_NEON_AARCH64 0 for non NEON platforms * remove var batch that leads to memory leak * put neon code back to fast_gemm_kernels.simd * reorganize code to reduce duplicate code
Resolves #24312
Resolves #23897 (review)
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.