Optimize int8 layers in DNN modules by using RISC-V Vector intrinsic.#25230
Optimize int8 layers in DNN modules by using RISC-V Vector intrinsic.#25230asmorkalov merged 5 commits intoopencv:4.xfrom
Conversation
|
Which compiler did you use? Currently OpenCV uses v0.10 of RVV intrinsics + compatibility layers to v0.11 and v0.12. It seems you've used the latest intrinsics version. Perhaps these code blocks should be guarded: |
|
I'm using clang 16.0.6. These code can work when RVV intrinsic version is 0.11 or 0.12, so both clang 16.x and 17.x (and trunk) should work. But with clang 15.x, which only supports version 0.10, this code will not compile. Considering that v1.0-rc1 of RVV intrinsic has been released and looks like it is very close to stable, I think it is reasonable to upgrade all our intrinsics to the v1.0 (when it is officially released). And for now, we may use |
|
Compiler compatibility: https://godbolt.org/z/ns9afhTae |
|
@mshabunin Could you take a look? Let's merge it. |
I'll take a closer look today. |
| vint32m1_t zero = __riscv_vmv_v_x_i32m1(0, e8m1); | ||
| int sum0[FASCONV_BASE_VECSZ], sum1[FASCONV_BASE_VECSZ], sum2[FASCONV_BASE_VECSZ]; | ||
| int vs[16] = {0}; | ||
| __riscv_vse32(vs, vs00, e8m1); |
There was a problem hiding this comment.
I have a problem here when compiling with GCC 13.2+ (13.x release branch somewhere after 13.2.0):
/work/opencv/modules/dnn/src/int8layers/layers_common.simd.hpp:1370:26: error: no matching function for call to '__riscv_vse32(int [16], vint32m2_t&, const size_t&)'
1370 | __riscv_vse32(vs, vs00, e8m1);
| ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~
GCC version:
./riscv64-unknown-linux-gnu-g++ --version
riscv64-unknown-linux-gnu-g++ (g128d9cc0599) 13.2.1 20240220
I'm using this GCC because 13.2.0 can not build OpenCV due to an error which has been fixed after this release on branch releases/gcc-13 and on trunk branch trunk
This overloaded intrinsic can be replaced with __riscv_vse32_v_i32m2.
There was a problem hiding this comment.
BTW, is int vs[16] array actually used anywhere?
There was a problem hiding this comment.
Sorry, it looks like the code I used for debugging was not removed. I have deleted it now.
|
For some reason some tests fail when built with GCC 13.2+ (qemu 8.2.1): Examples of failures: With clang 17, these tests pass. I'm not sure whether the problem is with GCC or OpenCV. Perhaps we can merge it as-is and try to find the problem later. |
|
I can also reproduce those tests fail when I use gcc 14.0.1(trunk). Working on it. |
|
Fixed. Test are passed both with clang and gcc now. Thanks for your tests on gcc @mshabunin ! GCC is right, test passed on clang is lucky. When accumulating vectors, we should use "tail undisturbed" (tu) to make sure that the elements after |
Optimize int8 layers in DNN modules by using RISC-V Vector intrinsic. opencv#25230 This patch optimize 3 functions in the int8 layer by using RVV Native Intrinsic. This patch was tested on QEMU using VLEN=128 and VLEN=256 on `./bin/opencv_test_dnn --gtest_filter="*Int8*"`; On the real device (k230, VLEN=128), `EfficientDet_int8` in `opencv_perf_dnn` showed a performance improvement of 1.46x. | Name of Test | Original | optimized | Speed-up | | ------------------------------------------ | -------- | ---------- | -------- | | EfficientDet_int8::DNNTestNetwork::OCV/CPU | 2843.467 | 1947.013 | 1.46 | ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [ ] I agree to contribute to the project under Apache 2 License. - [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [ ] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
This patch optimize 3 functions in the int8 layer by using RVV Native Intrinsic.
This patch was tested on QEMU using VLEN=128 and VLEN=256 on
./bin/opencv_test_dnn --gtest_filter="*Int8*";On the real device (k230, VLEN=128),
EfficientDet_int8inopencv_perf_dnnshowed a performance improvement of 1.46x.Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.