[hal_rvv] Add cv::integral implementation and more types of input for test#27060
[hal_rvv] Add cv::integral implementation and more types of input for test#27060asmorkalov merged 7 commits intoopencv:4.xfrom
Conversation
|
I'm sorry that there is a few problems with this PR, I modified the wrong perf test which belongs to OpenCL, and I will cover more data types, so I turned this into a draft 🙇 TODO List:
|
| template <typename T> struct rvv; | ||
|
|
||
| // Vector operations wrapper | ||
| template<> struct rvv<uint8_t> { |
There was a problem hiding this comment.
Do not expose the struct rvv into cv_hal_rvv namespace, see #26865 (comment).
|
I still observe several test failures in opencv_perf_imgproc: Example: |
Thank you! I will investigate and fix it |
|
I'm sorry that I can't reproduce the failed test using both GCC 14.2.0 (from riscv-gnu-toolchain) and GCC 14.2.1 (from SpacemiT toolchain) 😔 |
Hi Alex @asmorkalov , could you please take a look at this PR when you have time? |
| return __riscv_vget_v_##TYPE##TWO_LMUL##_##TYPE##ONE_LMUL(v, idx); \ | ||
| } \ | ||
| inline TWO::VecType vcreate(ONE::VecType v0, ONE::VecType v1) { \ | ||
| return __riscv_vcreate_v_##TYPE##ONE_LMUL##_##TYPE##TWO_LMUL(v0, v1); \ |
There was a problem hiding this comment.
vcreate is not supported in clang 17. Use this instead.
vuint8m2_t v{};
v = __riscv_vset_v_u8m1_u8m2(v, 0, x);
v = __riscv_vset_v_u8m1_u8m2(v, 1, y);There was a problem hiding this comment.
Thank you for pointing out!
Should I change the implementation of vcreate and fall back to vset when clang version is under 18, or just add vset for types.hpp?
There was a problem hiding this comment.
I prefer both. Note that __riscv_vset needs an immediate number for the second argument.
There was a problem hiding this comment.
I prefer both. Note that
__riscv_vsetneeds an immediate number for the second argument.
Thanks, I will update this patch later
|
There are a lot of build issues. Please take a look. |
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
…pe conversions for rvv hal macros Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
|
Oops, sorry I just rebased this patch to the latest commit of |
Current patch passed all build checks, could you please have a look? Thanks! : ) |
|
Regressions with this patch on K1: |
Sorry about that 🥹 |
It is not about performance but accuracy. We have sanity checks, which works with data in opencv_extra, in most of the performance testings. For example, opencv/modules/imgproc/perf/perf_integral.cpp Lines 82 to 83 in 5d3a978 |
Oh now I understand, thank you for explanation! I remembered #27060 (comment) pointed out the same failed tests. I will try to reproduce the failed tests and investigate about that |
This patch introduces an RVV-optimized implementation of
cv::integral()in hal_rvv, along with performance and accuracy tests for all valid input/output type combinations specified inmodules/imgproc/src/hal_replacement.hpp:opencv/modules/imgproc/src/hal_replacement.hpp
Lines 960 to 974 in 2a8d4b8
The vectorized prefix sum algorithm follows the approach described in Prefix Sum with SIMD - Algorithmica.
I intentionally omitted support for the following cases by returning
CV_HAL_ERROR_NOT_IMPLEMENTED, as they are harder to implement or show limited performance gains:cn == 3): Current implementation requiresVLEN/SEW(a.k.a. number of elements in a vector register) to be a multiple of channel count, which 3-channel formats typically cannot satisfy.!(width >> 8 || height >> 8)): The scalar implementation demonstrates better performance for images with limited dimensions.3rdparty/ndsrvp/src/integral.cppopencv/3rdparty/ndsrvp/src/integral.cpp
Lines 24 to 26 in 09c71ae
Test configuration:
integral_sqsum_fulltest is disabled by default, so--gtest_also_run_disabled_testsis neededTest results:
The vectorized implementation shows progressively better acceleration for larger image sizes and higher channel counts, achieving up to 6.21× speedup for 64FC4 (1920×1080) inputs with
DEPTH_64F_64Fconfiguration.This is my first time proposing patch for the OpenCV Project 🥹, if there's anything that can be improved, please tell me.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.