Add RISC-V HAL implementation for cv::dft and cv::dct#26865
Add RISC-V HAL implementation for cv::dft and cv::dct#26865asmorkalov merged 12 commits intoopencv:4.xfrom
Conversation
…calar. Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
|
@fengyuentau, can you please check it, measure performance against the current scalar implementation? |
|
cc @mshabunin @fengyuentau |
mshabunin
left a comment
There was a problem hiding this comment.
Looks good to me overall.
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
This patch generally makes sense with some speedup (tested on K1). |
|
cc @asmorkalov Ready to be merged. |
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
35ce923 to
ef84bd6
Compare
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
|
Slightly optimized the performance further. This optimize ran into the same problem in #26923 (comment). I strongly recommend that update the clang to at least 18.1.0 because
|
|
Committed to make clang 17 happy. This should be reverted once clang is updated. |
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
f14401d to
9c598af
Compare

This patch implements
static cv::DFTfunction in RVV_HAL using native intrinsic, optimizing the performance forcv::dftandcv::dctwith data types32FC1/64FC1/32FC2/64FC2.The reason I chose to create a new
cv_hal_dftOcvinterface is that if I were to use the existing interfaces (cv_hal_dftInit1Dandcv_hal_dft1D), it would require handling and parsing the dft flags within HAL, as well as performing preprocessing operations such as handling unit roots. Since these operations are not performance hotspots and do not require optimization, reusing the existing interfaces would result in copying approximately 300 lines of code fromcore/src/dxt.cppinto HAL, which I believe is unnecessary.Moreover, if I insert the new interface into
static cv::DFT, bothstatic cv::RealDFTandstatic cv::DCTcan be optimized as well. The processing performed before and after callingstatic cv::DFTin these functions is also not a performance hotspot.Tested on MUSE-PI (Spacemit X60) for both gcc 14.2 and clang 20.0.
The head of the perf table is shown below since the table is too long.
View the full perf table here: hal_rvv_dxt.pdf
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.