Skip to content

Optimize the v_lut* functions for RISC-V Vector(RVV).#24582

Merged
asmorkalov merged 1 commit intoopencv:4.xfrom
hanliutong:rvv-lut
Nov 30, 2023
Merged

Optimize the v_lut* functions for RISC-V Vector(RVV).#24582
asmorkalov merged 1 commit intoopencv:4.xfrom
hanliutong:rvv-lut

Conversation

@hanliutong
Copy link
Copy Markdown
Contributor

This patch is going to optimize the implementation of Universal Intrinsic functions v_lut_pairs and v_lut_quads on the RVV backend: when generating index, vector instructions are used to replace loops and std::vector in the existing implementation.

In the core module, v_lut_quads is used in transform_32f in matmul.simd.hpp. According to the experimental results on k230, this patch improves performance by nearly 10x (although it is still slow than the scalar version, may improve with longer VLEN)

Name of Test scalar vector vector_opt vector vs scalar vector_opt vs scalar)
Mat_Transform::Size_MatType::(127x61,_32FC3) 0.268 6.146 0.633 0.04 0.42
Mat_Transform::Size_MatType::(640x480,_32FC3) 11.642 246.761 25.622 0.05 0.45
Mat_Transform::Size_MatType::(1920x1080,_32FC3) 76.516 1654.286 172.701 0.05 0.44
Mat_Transform::Size_MatType::(1280x720,_32FC3) 35.173 735.625 76.856 0.05 0.46

Full result are here: core.zip

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@asmorkalov
Copy link
Copy Markdown
Contributor

@mshabunin Friendly reminder.

Copy link
Copy Markdown
Contributor

@mshabunin mshabunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Perhaps this optimization (transform_32f) should be reimplemented in a different way or disabled for RISC-V in future. As I can see that vectorized part is disabled for ARM platforms for now.

@asmorkalov asmorkalov merged commit e202501 into opencv:4.x Nov 30, 2023
@asmorkalov asmorkalov mentioned this pull request Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants