Skip to content

neon: add dotprod dispatch implementation#22271

Merged
opencv-pushbot merged 1 commit intoopencv:3.4from
tomoaki0705:dotprod_neon
Jul 25, 2022
Merged

neon: add dotprod dispatch implementation#22271
opencv-pushbot merged 1 commit intoopencv:3.4from
tomoaki0705:dotprod_neon

Conversation

@tomoaki0705
Copy link
Copy Markdown
Contributor

  • read vector at runtime
  • add enum

dot product instruction is an extended instruction of NEON on Aarch64.
I've used dispatch framework so it won't harm the other platforms.
Performance are reasonable

Topics I'd like to ask specifically for review

  • The way to extract dot product instruction
    • i.e.: it could be possible that other architecture hires dot product.
    • Is it OK to define DOTPROD for Armv8?
  • The name, especially the enum, and the DOTPROD which gets shown for the config
    • I reused the keyword CV_NEON_DOT but will be there a better way?
  • Support in Clang and MSVC (i.e.: I only checked on GCC)

Here's the performance summary
Jetson Orin and RK3588 support this dot product instructions.
Other platforms don't, but it's not making a harm.

Board Name of Test before after after vs before (x-factor)
Jetson Orin dot::MatType_Length::(8UC1, 1024) 0.235 ms 0.073 ms 3.20
Jetson Orin dot::MatType_Length::(8SC1, 1024) 0.170 ms 0.073 ms 2.33
RK3588 dot::MatType_Length::(8UC1, 1024) 0.269 ms 0.084 ms 3.22
RK3588 dot::MatType_Length::(8SC1, 1024) 0.196 ms 0.069 ms 2.85
Raspberry Pi 4 dot::MatType_Length::(8UC1, 1024) 0.581 ms 0.583 ms 1.00
Raspberry Pi 4 dot::MatType_Length::(8SC1, 1024) 0.571 ms 0.569 ms 1.00
Jetson Nano dot::MatType_Length::(8UC1, 1024) 0.743 ms 0.738 ms 1.01
Jetson Nano dot::MatType_Length::(8SC1, 1024) 0.675 ms 0.666 ms 1.01
Jetson Xavier dot::MatType_Length::(8UC1, 1024) 0.492 ms 0.528 ms 0.93
Jetson Xavier dot::MatType_Length::(8SC1, 1024) 0.389 ms 0.423 ms 0.92

I also confirmed that this PR doesn't harm the Arm 32bit build.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contribution!

Please take a look on PRs which add .simd.hpp files for dispatching.
Please read this Wiki page: https://github.com/opencv/opencv/wiki/CPU-optimizations-build-options

set(CPU_ALL_OPTIMIZATIONS "SSE;SSE2;SSE3;SSSE3;SSE4_1;SSE4_2;POPCNT;AVX;FP16;AVX2;FMA3;AVX_512F")
list(APPEND CPU_ALL_OPTIMIZATIONS "AVX512_COMMON;AVX512_KNL;AVX512_KNM;AVX512_SKX;AVX512_CNL;AVX512_CLX;AVX512_ICL")
list(APPEND CPU_ALL_OPTIMIZATIONS NEON VFPV3 FP16)
list(APPEND CPU_ALL_OPTIMIZATIONS NEON VFPV3 FP16 DOTPROD)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DOTPROD

such common name may confuse.

NEON_DOTPROD or ARM_DOTPROD or AARCH64_DOT should be more clear.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.
I used NEON_DOTPROD as a keyword.

@@ -0,0 +1,168 @@
/*M///////////////////////////////////////////////////////////////////////////////////////
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use short license header: https://github.com/opencv/opencv/wiki/Coding_Style_Guide#file-structure

// This file is part of OpenCV project.
// It is subject to the license terms in the LICENSE file found in the top-level directory
// of this distribution and at http://opencv.org/license.html.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally misunderstood the dispatch framework.
This file is not used so I removed from the PR.




#ifndef CV_CPU_OPTIMIZATION_DECLARATIONS_ONLY
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be used with .simd.hpp files only.

modules/core/src/matmul.dotprod.hpp

     * read vector at runtime
     * add enum
Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@opencv-pushbot opencv-pushbot merged commit 0862d69 into opencv:3.4 Jul 25, 2022
@tomoaki0705 tomoaki0705 deleted the dotprod_neon branch July 25, 2022 20:42
@alalek alalek mentioned this pull request Jul 26, 2022
This was referenced Aug 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: core optimization platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants