Skip to content

Add RISC-V HAL implementation for cv::dft and cv::dct#26865

Merged
asmorkalov merged 12 commits intoopencv:4.xfrom
amane-ame:dxt_hal_rvv
Mar 7, 2025
Merged

Add RISC-V HAL implementation for cv::dft and cv::dct#26865
asmorkalov merged 12 commits intoopencv:4.xfrom
amane-ame:dxt_hal_rvv

Conversation

@amane-ame
Copy link
Copy Markdown
Contributor

This patch implements static cv::DFT function in RVV_HAL using native intrinsic, optimizing the performance for cv::dft and cv::dct with data types 32FC1/64FC1/32FC2/64FC2.

The reason I chose to create a new cv_hal_dftOcv interface is that if I were to use the existing interfaces (cv_hal_dftInit1D and cv_hal_dft1D), it would require handling and parsing the dft flags within HAL, as well as performing preprocessing operations such as handling unit roots. Since these operations are not performance hotspots and do not require optimization, reusing the existing interfaces would result in copying approximately 300 lines of code from core/src/dxt.cpp into HAL, which I believe is unnecessary.

Moreover, if I insert the new interface into static cv::DFT, both static cv::RealDFT and static cv::DCT can be optimized as well. The processing performed before and after calling static cv::DFT in these functions is also not a performance hotspot.

Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.

$ opencv_test_core --gtest_filter="*DFT*"
$ opencv_perf_core --gtest_filter="*dft*:*dct*" --perf_min_samples=30 --perf_force_samples=30

The head of the perf table is shown below since the table is too long.

View the full perf table here: hal_rvv_dxt.pdf

Untitled

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

amane-ame and others added 4 commits January 29, 2025 02:24
…calar.

Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
@amane-ame amane-ame requested a review from asmorkalov February 3, 2025 15:11
@asmorkalov asmorkalov self-assigned this Feb 5, 2025
@vpisarev vpisarev requested a review from fengyuentau February 5, 2025 07:27
@vpisarev
Copy link
Copy Markdown
Contributor

vpisarev commented Feb 5, 2025

@fengyuentau, can you please check it, measure performance against the current scalar implementation?

@amane-ame
Copy link
Copy Markdown
Contributor Author

cc @mshabunin @fengyuentau
Could anyone please review this?

Copy link
Copy Markdown
Contributor

@mshabunin mshabunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall.

amane-ame and others added 2 commits February 19, 2025 12:04
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
@fengyuentau
Copy link
Copy Markdown
Member

@fengyuentau, can you please check it, measure performance against the current scalar implementation?

This patch generally makes sense with some speedup (tested on K1).

dxt-perf.zip

@amane-ame
Copy link
Copy Markdown
Contributor Author

cc @asmorkalov Ready to be merged.

Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
@amane-ame
Copy link
Copy Markdown
Contributor Author

amane-ame commented Feb 24, 2025

Slightly optimized the performance further.

This optimize ran into the same problem in #26923 (comment). I strongly recommend that update the clang to at least 18.1.0 because vcreate series function would be very important in the optimizing of algorithms with multi channels images.

hal_rvv_dxt.pdf

Untitled

@amane-ame
Copy link
Copy Markdown
Contributor Author

Committed to make clang 17 happy. This should be reverted once clang is updated.

Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
@asmorkalov asmorkalov merged commit bb525fe into opencv:4.x Mar 7, 2025
27 of 28 checks passed
@asmorkalov asmorkalov mentioned this pull request Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants