Speed up line merging in INTER_AREA#24412
Conversation
|
Running: Here are the results ("real" INTER_AREA is only used for non-integer scale division): |
|
My results for AMD Ryzen 7 2700X: |
|
Less obvious result for Jetson TK1 (Armv7 with NEON): |
asmorkalov
left a comment
There was a problem hiding this comment.
OpenCV team migrates from static-size vector code to scalable vector code to support RISC-V RVV and other platforms. Could you accommodate to it? Example: #24166
|
Thx, I added taskset -c 33 python3 .//modules/ts/misc/run.py ...... |
modules/imgproc/src/resize.cpp
Outdated
| const v_int32 tmp0 = v_round(vx_load(src + 0 * v_float32::nlanes)); | ||
| const v_int32 tmp1 = v_round(vx_load(src + 1 * v_float32::nlanes)); | ||
| const v_int32 tmp2 = v_round(vx_load(src + 2 * v_float32::nlanes)); | ||
| const v_int32 tmp3 = v_round(vx_load(src + 3 * v_float32::nlanes)); |
There was a problem hiding this comment.
v_float32::nlanes -> VTraits<float32>::vlanes()
There was a problem hiding this comment.
Thx, done. BTW, it could be nice to change that API to make it constexpr.
|
@mshabunin , it seems the CI is hanging for RISC-V. |
|
@vrabaud, thank you for the patch! I wonder why in several places of the patch double-precision floating-point arithmetics is used? Isn't FP32 enough for this algorithm? (unless resize is applied to FP64 images). I mean, it's fine to use double precision to construct the interpolation tables, but when we do actual interpolation and accumulation, FP32 should probably be enough, right? |
|
@vpisarev , indeed float32 is used for all integer types and float32. Double is only used for doubles: opencv/modules/imgproc/src/resize.cpp Line 3797 in 2f1d529 This is the current behavior so I kept it. BTW, any idea as to why int and schar are disabled? |
|
RSIC-V builds are broken in CI now. Manual build error messages: |
|
@vrabaud you need to use v_add, v_mul and other functions instead of overloaded +, * and other operators. |
|
Thx @asmorkalov , I believe I fixed it using |
asmorkalov
left a comment
There was a problem hiding this comment.
👍
Tested RISC-V config manually.
|
As summary: |
|
0.1 and 0.25 mean integer proportions (a tenth and 4th of the original image) whose dimensions are divisible by 10 or 4. In that case, a different algorithm is used (areafast) that this pull request does not speed up. It is normal to get no change there. 0.81 is a non-even scale, normal INTER_AREA is used there and that's where you can see the speed-up. |
Speed up line merging in INTER_AREA opencv#24412 This provides a 10 to 20% speed-up. Related perf test fix: opencv#24417 This is a split of opencv#23525 that will be updated to only deal with column merging. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Speed up line merging in INTER_AREA opencv#24412 This provides a 10 to 20% speed-up. Related perf test fix: opencv#24417 This is a split of opencv#23525 that will be updated to only deal with column merging. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Speed up line merging in INTER_AREA opencv#24412 This provides a 10 to 20% speed-up. Related perf test fix: opencv#24417 This is a split of opencv#23525 that will be updated to only deal with column merging. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
This provides a 10 to 20% speed-up.
Related perf test fix: #24417
This is a split of #23525 that will be updated to only deal with column merging.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.