dnn: hotfixes for fast gemm by fengyuentau · Pull Request #24315 · opencv/opencv

fengyuentau · 2023-09-25T03:18:12Z

Resolves #24312
Resolves #23897 (review)

Rename tests
Remove neon from dispatcher
Fix opencl build

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

force_builders=Linux OpenCL,Win64 OpenCL,Custom
buildworker:Custom=linux-4
build_image:Custom=ubuntu:18.04
CPU_BASELINE:Custom=AVX512_SKX
modules_filter:Custom=none
disable_ipp:Custom=ON

opencv-alalek

Crashes of DNN test binary are resolved

opencv-alalek · 2023-09-25T13:42:15Z

modules/dnn/src/layers/cpu_kernels/fast_gemm_kernels.default.hpp

-                       char *c_, int ldc, float alpha) {
+// NEON (AARCH64: 32 x 128-bit registers, armv7: 16 x 128-bit registers)
+#if CV_NEON && CV_NEON_AARCH64
+static inline void fast_gemm8x12_f32(int k, const char *a_, const char *b_,


It is better to keep this code in .simd.hpp (that file is processed anyway including "baseline" pass).

Example: fastAtan64f

I dont quite understand. If these default kernels are put in .simd.hpp, how to make them define only when other sets are not availble (AVX, AVX2, LASX)? For example, we can control this using #if !defined(CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY) && CV_AVX for AVX / AVX2; what about defaults?

"Defaults" are CPU_BASELINE.

other sets are not availble (AVX, AVX2, LASX)

CPU_BASELINE could be AVX2 or even AVX512_SKX.

All optimization code paths should be properly handled through the single file (.simd.hpp). Baseline code path is generated from this file too.
Follow the provided example (it is simple enough).
Use the same function name and same parameters for the entry point.

Do you mean something like this in .simd.hpp:

// ... CV_CPU_OPTIMIZATION_NAMESPACE_BEGIN /* fastGemm signatures */ #if !defined(CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY) && CV_AVX // ... #endif #if !defined(CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY) && CV_LASX // ... #endif #if !defined(CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY) // content of fast_gemm_kernels.default.hpp #endif CV_CPU_OPTIMIZATION_NAMESPACE_END // ...

This can lead to redefinitions.

This scheme doesn't violate ODR rule:

// .simd.hpp CV_CPU_OPTIMIZATION_NAMESPACE_BEGIN // forward declaration void my_dispatched_function(...) #ifndef CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY ... optional helper functions (see below)... void my_dispatched_function(...) { CV_TRACE_FUNCTION(); #if CV_AVX // CV_TRACE_REGION("AVX"); ... AVX instructions (rewrites successors like AVX2, AVX512 too) ... // CV_TRACE_REGION_NEXT("TAIL"); #elif CV_NEON // CV_TRACE_REGION("NEON"); ... NEON instructions ... // CV_TRACE_REGION_NEXT("TAIL"); #elif CV_LASX // CV_TRACE_REGION("LASX"); ... LASX instructions ... // CV_TRACE_REGION_NEXT("TAIL"); #elif CV_SIMD // CV_TRACE_REGION("SIMD"); ... other universal SIMD ... // CV_TRACE_REGION_NEXT("TAIL"); #endif ... generic C++ (scalar) tail processing ... } #endif // CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY CV_CPU_OPTIMIZATION_NAMESPACE_END

If we need helper functions, then make them static inline right after #ifndef CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY:

#if CV_AVX static inline my_avx_helper(...) {} #elif CV_NEON static inline my_neon_helper(...) {} #elif CV_LASX static inline my_lasx_helper(...) {} #elif CV_SIMD static inline my_simd_helper(...) {} #endif static inline my_generic_helper(...) {}

(structures should go in anonynous namespace to avoid ODR violation)

vpisarev · 2023-09-29T07:21:50Z

@fengyuentau, I discussed this with @opencv-alalek for another time. If I understand correctly, the main complain is that inside fast_gemm_kernels.default.hpp there is NEON-specific code, which should be moved to fast_gemm_kernels.simd.hpp. In fast_gemm_kernels.default.hpp there should be only pure C++ code and universal intrinsics. I agree with that. As soon as you make this change, the PR could be merged.

fengyuentau · 2023-10-05T14:27:13Z

@vpisarev Thank you for the updates. If we put the NEON-specific code back to fast_gemm_kernels.simd.hpp, we have to restore the original implementation, which uses dispatcher for NEON. This goes against #23897 (comment).

opencv-alalek · 2023-10-06T01:15:16Z

.simp.hpp should handle all optimization variants including BASELINE (which could be any).
Just use the same entry point (name and parameters).

opencv-alalek

Well done 👍

* remove Conformance from test names * integrate neon optimization into default * quick fix: define CV_NEON_AARCH64 0 for non NEON platforms * remove var batch that leads to memory leak * put neon code back to fast_gemm_kernels.simd * reorganize code to reduce duplicate code

fengyuentau mentioned this pull request Sep 25, 2023

dnn: add gemm_layer in place of fully_connected_layer for onnx models #23897

Merged

13 tasks

fengyuentau requested a review from opencv-alalek September 25, 2023 09:02

fengyuentau assigned opencv-alalek Sep 25, 2023

fengyuentau added bug category: dnn labels Sep 25, 2023

fengyuentau added this to the 4.9.0 milestone Sep 25, 2023

fengyuentau marked this pull request as ready for review September 25, 2023 09:02

opencv-alalek reviewed Sep 25, 2023

View reviewed changes

fengyuentau mentioned this pull request Sep 27, 2023

Update OpenVINO init of new GEMM layer #24309

Merged

8 tasks

opencv-alalek mentioned this pull request Sep 28, 2023

(5.x) Merge 4.x #24338

Merged

fengyuentau and others added 5 commits October 6, 2023 17:22

remove Conformance from test names

2521490

integrate neon optimization into default

6f196fc

quick fix: define CV_NEON_AARCH64 0 for non NEON platforms

59a0003

remove var batch that leads to memory leak

2383f92

put neon code back to fast_gemm_kernels.simd

e042c16

fengyuentau force-pushed the gemm_fixes branch from 19c319c to e042c16 Compare October 6, 2023 09:24

reorganize code to reduce duplicate code

e1ab141

opencv-alalek approved these changes Oct 7, 2023

View reviewed changes

vpisarev merged commit 590f150 into opencv:4.x Oct 7, 2023

fengyuentau deleted the gemm_fixes branch October 8, 2023 01:19

asmorkalov mentioned this pull request Oct 17, 2023

(5.x) Merge 4.x #24416

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dnn: hotfixes for fast gemm#24315

dnn: hotfixes for fast gemm#24315
vpisarev merged 6 commits intoopencv:4.xfrom
fengyuentau:gemm_fixes

fengyuentau commented Sep 25, 2023 •

edited by opencv-alalek

Loading

Uh oh!

opencv-alalek left a comment

Uh oh!

opencv-alalek Sep 25, 2023

Uh oh!

fengyuentau Sep 26, 2023

Uh oh!

opencv-alalek Sep 26, 2023

Uh oh!

fengyuentau Sep 26, 2023

Uh oh!

opencv-alalek Sep 26, 2023

Uh oh!

vpisarev commented Sep 29, 2023

Uh oh!

fengyuentau commented Oct 5, 2023

Uh oh!

opencv-alalek commented Oct 6, 2023

Uh oh!

opencv-alalek left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

fengyuentau commented Sep 25, 2023 • edited by opencv-alalek Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

opencv-alalek left a comment

Choose a reason for hiding this comment

Uh oh!

opencv-alalek Sep 25, 2023

Choose a reason for hiding this comment

Uh oh!

fengyuentau Sep 26, 2023

Choose a reason for hiding this comment

Uh oh!

opencv-alalek Sep 26, 2023

Choose a reason for hiding this comment

Uh oh!

fengyuentau Sep 26, 2023

Choose a reason for hiding this comment

Uh oh!

opencv-alalek Sep 26, 2023

Choose a reason for hiding this comment

Uh oh!

vpisarev commented Sep 29, 2023

Uh oh!

fengyuentau commented Oct 5, 2023

Uh oh!

opencv-alalek commented Oct 6, 2023

Uh oh!

opencv-alalek left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fengyuentau commented Sep 25, 2023 •

edited by opencv-alalek

Loading