Skip to content

Optimize int8 layers in DNN modules by using RISC-V Vector intrinsic.#25230

Merged
asmorkalov merged 5 commits intoopencv:4.xfrom
hanliutong:rvv-conv
Mar 31, 2024
Merged

Optimize int8 layers in DNN modules by using RISC-V Vector intrinsic.#25230
asmorkalov merged 5 commits intoopencv:4.xfrom
hanliutong:rvv-conv

Conversation

@hanliutong
Copy link
Copy Markdown
Contributor

This patch optimize 3 functions in the int8 layer by using RVV Native Intrinsic.

This patch was tested on QEMU using VLEN=128 and VLEN=256 on ./bin/opencv_test_dnn --gtest_filter="*Int8*";
On the real device (k230, VLEN=128), EfficientDet_int8 in opencv_perf_dnn showed a performance improvement of 1.46x.

Name of Test Original optimized Speed-up
EfficientDet_int8::DNNTestNetwork::OCV/CPU 2843.467 1947.013 1.46

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@mshabunin mshabunin self-assigned this Mar 19, 2024
@mshabunin
Copy link
Copy Markdown
Contributor

Which compiler did you use? Currently OpenCV uses v0.10 of RVV intrinsics + compatibility layers to v0.11 and v0.12. It seems you've used the latest intrinsics version. Perhaps these code blocks should be guarded: #if defined(__riscv_v_intrinsic) && __riscv_v_intrinsic>=12000. Alternatively we can consider upgrading all our intrinsics to the latest version and drop support of older compilers (intrinsics version check would be added to the current compile check).

@hanliutong
Copy link
Copy Markdown
Contributor Author

I'm using clang 16.0.6. These code can work when RVV intrinsic version is 0.11 or 0.12, so both clang 16.x and 17.x (and trunk) should work. But with clang 15.x, which only supports version 0.10, this code will not compile.

Considering that v1.0-rc1 of RVV intrinsic has been released and looks like it is very close to stable, I think it is reasonable to upgrade all our intrinsics to the v1.0 (when it is officially released).

And for now, we may use #if defined(__riscv_v_intrinsic) && __riscv_v_intrinsic>=11000 to temporary guard there code.

@hanliutong
Copy link
Copy Markdown
Contributor Author

Compiler compatibility: https://godbolt.org/z/ns9afhTae

@asmorkalov asmorkalov requested a review from mshabunin March 23, 2024 11:00
@asmorkalov
Copy link
Copy Markdown
Contributor

@mshabunin Could you take a look? Let's merge it.

@mshabunin
Copy link
Copy Markdown
Contributor

@mshabunin Could you take a look? Let's merge it.

I'll take a closer look today.

vint32m1_t zero = __riscv_vmv_v_x_i32m1(0, e8m1);
int sum0[FASCONV_BASE_VECSZ], sum1[FASCONV_BASE_VECSZ], sum2[FASCONV_BASE_VECSZ];
int vs[16] = {0};
__riscv_vse32(vs, vs00, e8m1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a problem here when compiling with GCC 13.2+ (13.x release branch somewhere after 13.2.0):

/work/opencv/modules/dnn/src/int8layers/layers_common.simd.hpp:1370:26: error: no matching function for call to '__riscv_vse32(int [16], vint32m2_t&, const size_t&)'
 1370 |             __riscv_vse32(vs, vs00, e8m1);
      |             ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~

GCC version:

./riscv64-unknown-linux-gnu-g++ --version
riscv64-unknown-linux-gnu-g++ (g128d9cc0599) 13.2.1 20240220

I'm using this GCC because 13.2.0 can not build OpenCV due to an error which has been fixed after this release on branch releases/gcc-13 and on trunk branch trunk

This overloaded intrinsic can be replaced with __riscv_vse32_v_i32m2.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, is int vs[16] array actually used anywhere?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it looks like the code I used for debugging was not removed. I have deleted it now.

@mshabunin
Copy link
Copy Markdown
Contributor

For some reason some tests fail when built with GCC 13.2+ (qemu 8.2.1):

qemu-riscv64 \
	-L ${sysroot} \
	-cpu rv64,v=true,vext_spec=v1.0 \
./bin/opencv_test_$t --gtest_filter="*Int8_layers*"
...
[  FAILED  ] 11 tests, listed below:
[  FAILED  ] Test_Int8_layers.Convolution2D/0, where GetParam() = OCV/CPU
[  FAILED  ] Test_Int8_layers.Padding/0, where GetParam() = OCV/CPU
[  FAILED  ] Test_Int8_layers.AvePooling/0, where GetParam() = OCV/CPU
[  FAILED  ] Test_Int8_layers.MaxPooling/0, where GetParam() = OCV/CPU
[  FAILED  ] Test_Int8_layers.Softmax_slim_TF/0, where GetParam() = OCV/CPU
[  FAILED  ] Test_Int8_layers.Concat/0, where GetParam() = OCV/CPU
[  FAILED  ] Test_Int8_layers.InnerProduct/0, where GetParam() = OCV/CPU
[  FAILED  ] Test_Int8_layers.Reshape/0, where GetParam() = OCV/CPU
[  FAILED  ] Test_Int8_layers.Slice_4d_tf/0, where GetParam() = OCV/CPU
[  FAILED  ] Test_Int8_layers.Slice_strided_tf/0, where GetParam() = OCV/CPU
[  FAILED  ] Test_Int8_layers.Eltwise/0, where GetParam() = OCV/CPU

Examples of failures:

[ RUN      ] Test_Int8_layers.Convolution2D/0, where GetParam() = OCV/CPU
/work/opencv/modules/dnn/test/test_common.impl.hpp:76: Failure
Expected: (normL1) <= (l1), actual: 0.49239 vs 0.00413
single_conv  |ref| = 4.2324190139770508
/work/opencv/modules/dnn/test/test_common.impl.hpp:79: Failure
Expected: (normInf) <= (lInf), actual: 3.21996 vs 0.02201
single_conv  |ref| = 4.2324190139770508
/work/opencv/modules/dnn/test/test_common.impl.hpp:76: Failure
Expected: (normL1) <= (l1), actual: 1.6915 vs 0.0193
atrous_conv2d_valid  |ref| = 9.5424537658691406

With clang 17, these tests pass. I'm not sure whether the problem is with GCC or OpenCV. Perhaps we can merge it as-is and try to find the problem later.

@hanliutong
Copy link
Copy Markdown
Contributor Author

I can also reproduce those tests fail when I use gcc 14.0.1(trunk). Working on it.

@hanliutong
Copy link
Copy Markdown
Contributor Author

Fixed. Test are passed both with clang and gcc now. Thanks for your tests on gcc @mshabunin !

GCC is right, test passed on clang is lucky. When accumulating vectors, we should use "tail undisturbed" (tu) to make sure that the elements after vl not change (keeping the last accumulation result). I incorrectly used "tail agnostic" (ta) earlier.

Copy link
Copy Markdown
Contributor

@mshabunin mshabunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! 👍

@asmorkalov asmorkalov merged commit eba158f into opencv:4.x Mar 31, 2024
@asmorkalov asmorkalov mentioned this pull request Apr 1, 2024
@hanliutong hanliutong deleted the rvv-conv branch April 7, 2024 03:27
klatism pushed a commit to klatism/opencv that referenced this pull request May 17, 2024
Optimize int8 layers in DNN modules by using RISC-V Vector intrinsic. opencv#25230

This patch optimize 3 functions in the int8 layer by using RVV Native Intrinsic.

This patch was tested on QEMU using VLEN=128 and VLEN=256 on `./bin/opencv_test_dnn --gtest_filter="*Int8*"`;
On the real device (k230, VLEN=128), `EfficientDet_int8` in `opencv_perf_dnn` showed a performance improvement of 1.46x.

| Name of Test                               |  Original | optimized | Speed-up |
| ------------------------------------------ | -------- | ---------- | -------- |
| EfficientDet_int8::DNNTestNetwork::OCV/CPU | 2843.467 | 1947.013   | 1.46     |


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants