Skip to content

[Refactor](HAL RVV): Consolidate Helpers for Code Reusability#26977

Merged
asmorkalov merged 5 commits intoopencv:4.xfrom
GenshinImpactStarts:helper_hal_rvv
Mar 10, 2025
Merged

[Refactor](HAL RVV): Consolidate Helpers for Code Reusability#26977
asmorkalov merged 5 commits intoopencv:4.xfrom
GenshinImpactStarts:helper_hal_rvv

Conversation

@GenshinImpactStarts
Copy link
Copy Markdown
Contributor

This PR introduces a new helper file with utility types and templates to standardize function interfaces. This refactor allows us to avoid duplicate code when types differ but logic remains the same.

The flip and minmax implementations have been updated to use the new generic helpers, replacing the previously defined, redundant classes.

Due to the large number of functions, not all interfaces are unified yet. Future development can extend the types as needed. While the usage of function templates is currently limited, this will ease future development.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@asmorkalov asmorkalov added platform: riscv cleanup Code cleanup (e.g, drop legacy C-API, legacy unmaintained code) labels Feb 25, 2025
@asmorkalov asmorkalov added this to the 4.12.0 milestone Feb 25, 2025
@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

@asmorkalov I believe this PR is simple and doesn’t introduce any issues. Could you review and merge it quickly? I have several other PRs waiting on this one. Thanks a lot!

@mshabunin
Copy link
Copy Markdown
Contributor

Did you consider using overloaded intrinsics variants where possible? Sections 6.4 - 6.6 and Appendices C and D of the spec.
I'm asking because we are trying to reduce number of complex multiline macros for better readability and debugging support. It would be better to use templated traits classes instead, see examples in #26892 and #26865. If it is not possible, then OK.

@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

I think I have use overloaded intrinsics in all possible places. The helper class in this PR is designed to solve the macro problem, which provides unified interfaces.

The remaining parts in the helper class that use macros instead of overloaded intrinsics are mainly due to the following two reasons:

  1. For operations like load and cast, overloaded intrinsics require different function names to distinguish between return values.
  2. For many operations, although the meaning is the same, the function names differ based on whether they are signed integers, unsigned integers, or floating-point numbers (e.g., add, addu, fadd).

For the two examples you mentioned, they do exactly the same thing as this PR, and parts of this patch are also inspired by them. They use macros to define the helper class and use the helper class for templates. And this PR is to introduce a unified helper class.

@mshabunin
Copy link
Copy Markdown
Contributor

For the two examples you mentioned, they do exactly the same thing as this PR, and parts of this patch are also inspired by them. They use macros to define the helper class and use the helper class for templates.

Those PRs don't use macros, only templates:

template<typename T> struct rvv;

template<> struct rvv<float>
{
    static inline size_t vsetvlmax() { return __riscv_vsetvlmax_e32m4(); }
    static inline size_t vsetvl(size_t a) { return __riscv_vsetvl_e32m4(a); }
    static inline vfloat32m4_t vfmv_v_f(float a, size_t b) { return __riscv_vfmv_v_f_f32m4(a, b); }
    static inline vfloat32m1_t vfmv_s_f(float a, size_t b) { return __riscv_vfmv_s_f_f32m1(a, b); }
    static inline vfloat32m4_t vle(const float* a, size_t b) { return __riscv_vle32_v_f32m4(a, b); }
    static inline vfloat32m4_t vlse(const float* a, ptrdiff_t b, size_t c) { return __riscv_vlse32_v_f32m4(a, b, c); }
    static inline void vse(float* a, vfloat32m4_t b, size_t c) { __riscv_vse32(a, b, c); }
};

Other PRs use macros, yes.
If I understand correctly, the code without macros will become larger, but it might be benefitial for readablility and debugging. And splitting these trait classes into separate ones might make the implementation less complex.
However if it is not possible, then I'm fine with current approach.

@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

Sorry, I didn’t look closely. I thought you were referring to the minmax.hpp in the latter PR (#26865) that I referenced.

If we fully expand those macros, the file would end up being several thousand lines long. Another option is to use inheritance, but even then, it would add nearly a thousand lines. If you think that’s better, I can use inheritance to remove all the macros. Alternatively, if we go with inheritance, perhaps it would be better to split the signed integers, unsigned integers, and floating-point numbers into three separate files?

@mshabunin
Copy link
Copy Markdown
Contributor

No, let's leave macros then.

@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

Since each unified function corresponds to a single intrinsic, I believe it’s acceptable to use macros to define these "function aliases." Developers can then use them just like the original intrinsics.

Copy link
Copy Markdown
Contributor

@horror-proton horror-proton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also looking for a proper way to collect rvv intrinsics into C++ functions.

For the functions in HAL_RVV_SIZE_UNRELATED, my current solution is to extract the scalar type from a rvv type first, then dispatch them according to the result of std::is_floating_point_v, std::is_unsigned_v statically, e.g.

template <typename Vtype, typename Enabled = void> struct rvv_scalar
{
    using type = decltype(__riscv_vmv_x(std::declval<Vtype>()));
};

template <typename Vtype>
struct rvv_scalar<Vtype,
                  std::void_t<decltype(__riscv_vfmv_f(std::declval<Vtype>()))>>
{
    using type = decltype(__riscv_vfmv_f(std::declval<Vtype>()));
};

template <typename Vtype> using rvv_scalar_t = typename rvv_scalar<Vtype>::type;

static_assert(std::is_same_v<rvv_scalar_t<vint32m1_t>, int>);
template<typename V, typename T>
auto rvv_min(V a, T b, size_t vl) {
    using scalar = rvv_scalar_t<V>;
    if constexpr (std::is_floating_point_v<scalar>)
        return __riscv_vfmin(a, b, vl);
    else if constexpr (std::is_unsigned_v<scalar>)
        return __riscv_vminu(a, b, vl);
    else
        return __riscv_vmin(a, b, vl);
}
...

So that there's no need to define vmin in every struct RVV.
And users may write rvv_min(v1, v1, vl) directly instead of figuring out what VEC_T is then do VEC_T::vmin(....
🤔

The code above requires C++17, but I guess there is also C++11 equivalence.

Copy link
Copy Markdown
Contributor

@mshabunin mshabunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is potential conflict with #26865
cc @amane-ame

@horror-proton , yes it might be useful to implement switching between i, u and f variants. Perhaps regular if might work too or we would need to use std::enable_if maybe. On 4.x branch we are limited by C++11.

@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

@horror-proton Indeed, in C++11, template specialization can be used for dispatching, but this would result in each function generating an additional three functions each with more than three lines of code. Also, if we don’t use macros, I believe using inheritance for categorization would look better. If macros are used, I think this is the shortest approach for now. As for the functions you mentioned that don't need to worry about VEC_T, first, I believe there's no case where VEC_T is unknown. Second, with the current implementation, it's easy to add a template function to call VEC_T::vmin.

@mshabunin You can merge that PR first. I will resolve conflict and try to make that PR use this PR's helper class.

@asmorkalov
Copy link
Copy Markdown
Contributor

The related patch has been merged. Please rebase and fix conflicts.

GenshinImpactStarts and others added 4 commits March 7, 2025 14:44
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
@GenshinImpactStarts
Copy link
Copy Markdown
Contributor Author

Since the last commit make a great change, I have tested testcases all mentioned in corresponding PR (#26865 #26892 #26958). The list is too long so a file is attached. solve_dxt_pyramids_report.txt

Through some factor <0.9 and >1.1 appears, when rerun these testcases they have similar performance again. So it looks like this commit doesn't break anything.

@asmorkalov asmorkalov assigned asmorkalov and mshabunin and unassigned asmorkalov Mar 10, 2025
@asmorkalov asmorkalov merged commit 830d031 into opencv:4.x Mar 10, 2025
28 checks passed
@asmorkalov asmorkalov mentioned this pull request Mar 11, 2025
@GenshinImpactStarts GenshinImpactStarts deleted the helper_hal_rvv branch March 12, 2025 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cleanup Code cleanup (e.g, drop legacy C-API, legacy unmaintained code) platform: riscv

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants