Improve and refactor softmax layer#24466
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
|
The performance test result was updated, the speed increase is very obvious. BTW, I am not sure why windows CI failed, seems like it's not related to this PR. |
|
Please take a look at the failed log from |
|
@asmorkalov This build is actually failed but somehow the workflow did not catch a failed signal and it continued: https://github.com/opencv/opencv/actions/runs/6682987045/job/18158738007?pr=24466. It seems |
|
Windows: |
|
Windows: |
|
Thanks @asmorkalov. I found the code will throw |
|
I just tried armv7 configuration locally. It produces the following warning (ubuntu 16.04): |
|
@asmorkalov That's because the operators |
|
Armv7 (Jetson-tk1) perf results with and without NEON: |
|
Jetson Tk1 with 2 GBs of RAM: |
The performance test has a large input with |
|
The error on windows because a marco was defined as |
|
@WanliZhong, excellent job, great acceleration numbers! As we discussed, please, refactor the code to reduce code duplication. Then we will gladly merge it. |
cbf0474 to
790da1b
Compare
|
Update: As discuss with Vadim, I only use the universal intrinsics to accelerate the softmax layer. The results show that even faster than implementing it individually on each platform. Note: Added performance tests on different axis. The test results show some cases are slower than before, especially with small size softmax and 0 or 1 axis. |
|
I have no idea why this error occur in some platforms. /home/ci/opencv/modules/dnn/src/layers/cpu_kernels/softmax.cpp:78:32: error: 'cv::hal_baseline::v_float32x4::<unnamed enum> cv::hal_baseline::v_float32x4::nlanes' is private within this context
78 | size_t nlanes = v_float32::nlanes;
| ^~~~~~
In file included from /home/ci/opencv/modules/core/include/opencv2/core/hal/intrin.hpp:221,
from /home/ci/opencv/modules/dnn/src/layers/cpu_kernels/softmax.hpp:15,
from /home/ci/opencv/modules/dnn/src/layers/cpu_kernels/softmax.cpp:13:
/home/ci/opencv/modules/core/include/opencv2/core/hal/intrin_neon.hpp:301:12: note: declared private here
301 | enum { nlanes = 4 };
| ^~~~~~ |
|
OpenCV migrated to new Universal Intrinsics approach to support scalable intrinsics like RISC-V RVV. The size of vector is not defined in compile time and may be different in runtime. You need to replace:
|
Enable softmax layer vectorization on RISC-V RVV #24510 Related: #24466 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
* improve and refactor softmax layer * fix building error * compatible region layer * fix axisStep when disable SIMD * fix dynamic array * try to fix error * use nlanes from VTraits * move axisBias to srcOffset * fix bug caused by axisBias * remove macro * replace #ifdef with #if for CV_SIMD
Enable softmax layer vectorization on RISC-V RVV opencv#24510 Related: opencv#24466 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
* improve and refactor softmax layer * fix building error * compatible region layer * fix axisStep when disable SIMD * fix dynamic array * try to fix error * use nlanes from VTraits * move axisBias to srcOffset * fix bug caused by axisBias * remove macro * replace #ifdef with #if for CV_SIMD
Enable softmax layer vectorization on RISC-V RVV opencv#24510 Related: opencv#24466 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
* improve and refactor softmax layer * fix building error * compatible region layer * fix axisStep when disable SIMD * fix dynamic array * try to fix error * use nlanes from VTraits * move axisBias to srcOffset * fix bug caused by axisBias * remove macro * replace #ifdef with #if for CV_SIMD
Enable softmax layer vectorization on RISC-V RVV opencv#24510 Related: opencv#24466 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
This PR improves softmax from ficus nn.
Performance Test result (use min value and Muti-threads):
macOS M2
Ubuntu Intel Core i7-12700K: 8 Performance-cores (3.60 GHz, turbo up to 4.90 GHz), 4 Efficient-cores (2.70 GHz, turbo up to 3.80 GHz), 20 threads.
Ubuntu Loongnix