[HAL RVV] impl magnitude | add perf test#27002
Conversation
|
Performance for Muse Pi v30 (GCC 14.2): |
|
@fengyuentau Could you take a look too. |
|
@GenshinImpactStarts Could you rebase and fix conflicts. |
378d719 to
bcd8ff6
Compare
|
@fengyuentau please take a look too. |
|
I found out again, that we have magnitude32f and magnitude64f in mathfuncs_core.sing.hpp implemented with UI, but not enabled for scalable intrinsics. Makes sense to add the option there and benchmark it too. |
This comment was marked as outdated.
This comment was marked as outdated.
| auto vx = __riscv_vle32_v_f32m4(x, vl); | ||
| auto vy = __riscv_vle32_v_f32m4(y, vl); | ||
|
|
||
| auto vmag = __riscv_vfsqrt(__riscv_vfmadd(vx, vx, __riscv_vfmul(vy, vy, vl), vl), vl); |
|
Please resolve conflicts. My performance results (K1 vs RK3568): |
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
bcd8ff6 to
75a95d0
Compare
|
UI is enabled. Now the only difference between UI and HAL is HAL use a approximation of sqrt. If HAL use __riscv_vfsqrt, their performance would be same. Perf test is updated in the first comment. |
|
My performance results (K1 vs. RK3568): Nice 👍 |
Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>
|
The modification of I apologize for the repeated CI failures on the same test case. Some tests require reading local files, but I didn’t know where to get the files used in CI, so I had to find corner cases myself, which wasn’t always comprehensive. As a result, the tests kept failing. I’m sorry for wasting CI resources. Now that I know where to get the files, I’ve tested the two failing projects locally, and this shouldn’t happen again. |
Merge pull request opencv#27002 from GenshinImpactStarts:magnitude
Implement through the existing
cv_hal_magnitude32fandcv_hal_magnitude64finterfaces.UPDATE: UI is enabled. The only difference between UI and HAL now is HAL use a approximate
sqrt.Perf test done on MUSE-PI.
Test result between enabled UI and HAL:
Test result before and after enabling UI:
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.