Skip to content

dnn: hotfixes for fast gemm#24315

Merged
vpisarev merged 6 commits intoopencv:4.xfrom
fengyuentau:gemm_fixes
Oct 7, 2023
Merged

dnn: hotfixes for fast gemm#24315
vpisarev merged 6 commits intoopencv:4.xfrom
fengyuentau:gemm_fixes

Conversation

@fengyuentau
Copy link
Copy Markdown
Member

@fengyuentau fengyuentau commented Sep 25, 2023

Resolves #24312
Resolves #23897 (review)

  • Rename tests
  • Remove neon from dispatcher
  • Fix opencl build

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
force_builders=Linux OpenCL,Win64 OpenCL,Custom
buildworker:Custom=linux-4
build_image:Custom=ubuntu:18.04
CPU_BASELINE:Custom=AVX512_SKX
modules_filter:Custom=none
disable_ipp:Custom=ON

Copy link
Copy Markdown
Contributor

@opencv-alalek opencv-alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Crashes of DNN test binary are resolved

char *c_, int ldc, float alpha) {
// NEON (AARCH64: 32 x 128-bit registers, armv7: 16 x 128-bit registers)
#if CV_NEON && CV_NEON_AARCH64
static inline void fast_gemm8x12_f32(int k, const char *a_, const char *b_,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to keep this code in .simd.hpp (that file is processed anyway including "baseline" pass).

Example: fastAtan64f

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont quite understand. If these default kernels are put in .simd.hpp, how to make them define only when other sets are not availble (AVX, AVX2, LASX)? For example, we can control this using #if !defined(CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY) && CV_AVX for AVX / AVX2; what about defaults?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Defaults" are CPU_BASELINE.


other sets are not availble (AVX, AVX2, LASX)

CPU_BASELINE could be AVX2 or even AVX512_SKX.

All optimization code paths should be properly handled through the single file (.simd.hpp). Baseline code path is generated from this file too.
Follow the provided example (it is simple enough).
Use the same function name and same parameters for the entry point.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean something like this in .simd.hpp:

// ...

CV_CPU_OPTIMIZATION_NAMESPACE_BEGIN

/*
fastGemm signatures
*/

#if !defined(CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY) && CV_AVX
// ...
#endif

#if !defined(CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY) && CV_LASX
// ...
#endif

#if !defined(CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY)
// content of fast_gemm_kernels.default.hpp
#endif

CV_CPU_OPTIMIZATION_NAMESPACE_END

// ...

This can lead to redefinitions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scheme doesn't violate ODR rule:

// .simd.hpp

CV_CPU_OPTIMIZATION_NAMESPACE_BEGIN

// forward declaration
void my_dispatched_function(...)

#ifndef CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY

... optional helper functions (see below)...

void my_dispatched_function(...)
{
    CV_TRACE_FUNCTION();

#if CV_AVX
    // CV_TRACE_REGION("AVX");
    ... AVX instructions (rewrites successors like AVX2, AVX512 too) ...
    // CV_TRACE_REGION_NEXT("TAIL");
#elif CV_NEON
    // CV_TRACE_REGION("NEON");
    ... NEON instructions ...
    // CV_TRACE_REGION_NEXT("TAIL");
#elif CV_LASX
    // CV_TRACE_REGION("LASX");
    ... LASX instructions ...
    // CV_TRACE_REGION_NEXT("TAIL");
#elif CV_SIMD
    // CV_TRACE_REGION("SIMD");
    ... other universal SIMD ...
    // CV_TRACE_REGION_NEXT("TAIL");
#endif

    ... generic C++ (scalar) tail processing ...
}

#endif // CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY

CV_CPU_OPTIMIZATION_NAMESPACE_END

If we need helper functions, then make them static inline right after #ifndef CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY:

#if CV_AVX
static inline my_avx_helper(...) {}
#elif CV_NEON
static inline my_neon_helper(...) {}
#elif CV_LASX
static inline my_lasx_helper(...) {}
#elif CV_SIMD
static inline my_simd_helper(...) {}
#endif
static inline my_generic_helper(...) {}

(structures should go in anonynous namespace to avoid ODR violation)

@vpisarev
Copy link
Copy Markdown
Contributor

@fengyuentau, I discussed this with @opencv-alalek for another time. If I understand correctly, the main complain is that inside fast_gemm_kernels.default.hpp there is NEON-specific code, which should be moved to fast_gemm_kernels.simd.hpp. In fast_gemm_kernels.default.hpp there should be only pure C++ code and universal intrinsics. I agree with that. As soon as you make this change, the PR could be merged.

@fengyuentau
Copy link
Copy Markdown
Member Author

@vpisarev Thank you for the updates. If we put the NEON-specific code back to fast_gemm_kernels.simd.hpp, we have to restore the original implementation, which uses dispatcher for NEON. This goes against #23897 (comment).

@opencv-alalek
Copy link
Copy Markdown
Contributor

.simp.hpp should handle all optimization variants including BASELINE (which could be any).
Just use the same entry point (name and parameters).

Copy link
Copy Markdown
Contributor

@opencv-alalek opencv-alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done 👍

@vpisarev vpisarev merged commit 590f150 into opencv:4.x Oct 7, 2023
@fengyuentau fengyuentau deleted the gemm_fixes branch October 8, 2023 01:19
@asmorkalov asmorkalov mentioned this pull request Oct 17, 2023
thewoz pushed a commit to thewoz/opencv that referenced this pull request Jan 4, 2024
* remove Conformance from test names

* integrate neon optimization into default

* quick fix: define CV_NEON_AARCH64 0 for non NEON platforms

* remove var batch that leads to memory leak

* put neon code back to fast_gemm_kernels.simd

* reorganize code to reduce duplicate code
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
* remove Conformance from test names

* integrate neon optimization into default

* quick fix: define CV_NEON_AARCH64 0 for non NEON platforms

* remove var batch that leads to memory leak

* put neon code back to fast_gemm_kernels.simd

* reorganize code to reduce duplicate code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DNN: test binary crashed after gemm refactoring in build with OpenCL (2023-09-20)

3 participants