Skip to content

HAL: implemented cv_hal_dotProduct in hal_rvv#27201

Merged
asmorkalov merged 8 commits intoopencv:4.xfrom
fengyuentau:4x/hal_rvv/dotprod
Apr 21, 2025
Merged

HAL: implemented cv_hal_dotProduct in hal_rvv#27201
asmorkalov merged 8 commits intoopencv:4.xfrom
fengyuentau:4x/hal_rvv/dotprod

Conversation

@fengyuentau
Copy link
Copy Markdown
Member

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@fengyuentau
Copy link
Copy Markdown
Member Author

Performance results on K1:

Details

K1, GCC

           Name of Test            base-gcc patch-gcc patch-gcc 
                                                          vs    
                                                       base-gcc 
                                                      (x-factor)
dot::MatType_Length::(8UC1, 32)     0.001     0.001      1.41   
dot::MatType_Length::(8UC1, 64)     0.002     0.001      1.70   
dot::MatType_Length::(8UC1, 128)    0.008     0.005      1.64   
dot::MatType_Length::(8UC1, 256)    0.029     0.024      1.22   
dot::MatType_Length::(8UC1, 512)    0.116     0.117      1.00   
dot::MatType_Length::(8UC1, 1024)   0.459     0.395      1.16   
dot::MatType_Length::(8SC1, 32)     0.001     0.001      1.42   
dot::MatType_Length::(8SC1, 64)     0.002     0.001      1.67   
dot::MatType_Length::(8SC1, 128)    0.008     0.005      1.64   
dot::MatType_Length::(8SC1, 256)    0.029     0.024      1.23   
dot::MatType_Length::(8SC1, 512)    0.119     0.108      1.10   
dot::MatType_Length::(8SC1, 1024)   0.458     0.393      1.16   
dot::MatType_Length::(16UC1, 32)    0.001     0.001      1.54   
dot::MatType_Length::(16UC1, 64)    0.004     0.002      1.74   
dot::MatType_Length::(16UC1, 128)   0.015     0.012      1.24   
dot::MatType_Length::(16UC1, 256)   0.057     0.047      1.21   
dot::MatType_Length::(16UC1, 512)   0.230     0.202      1.14   
dot::MatType_Length::(16UC1, 1024)  0.957     0.855      1.12   
dot::MatType_Length::(16SC1, 32)    0.001     0.001      1.55   
dot::MatType_Length::(16SC1, 64)    0.004     0.002      1.73   
dot::MatType_Length::(16SC1, 128)   0.016     0.012      1.25   
dot::MatType_Length::(16SC1, 256)   0.058     0.047      1.22   
dot::MatType_Length::(16SC1, 512)   0.231     0.204      1.13   
dot::MatType_Length::(16SC1, 1024)  1.023     1.328      0.77   
dot::MatType_Length::(32SC1, 32)    0.003     0.001      2.00   
dot::MatType_Length::(32SC1, 64)    0.011     0.005      2.00   
dot::MatType_Length::(32SC1, 128)   0.036     0.024      1.51   
dot::MatType_Length::(32SC1, 256)   0.148     0.109      1.35   
dot::MatType_Length::(32SC1, 512)   0.587     0.410      1.43   
dot::MatType_Length::(32SC1, 1024)  2.537     2.329      1.09   
dot::MatType_Length::(32FC1, 32)    0.001     0.001      1.12   
dot::MatType_Length::(32FC1, 64)    0.004     0.004      1.11   
dot::MatType_Length::(32FC1, 128)   0.025     0.024      1.05   
dot::MatType_Length::(32FC1, 256)   0.117     0.108      1.08   
dot::MatType_Length::(32FC1, 512)   0.405     0.399      1.01   
dot::MatType_Length::(32FC1, 1024)  1.687     1.847      0.91

K1 vs RK3568

           Name of Test             rk   patch-gcc patch-clang patch-gcc  patch-clang
                                                                   vs         vs     
                                                                   rk         rk     
                                                               (x-factor) (x-factor) 
dot::MatType_Length::(8UC1, 32)    0.001   0.001      0.001       0.96       0.97    
dot::MatType_Length::(8UC1, 64)    0.001   0.001      0.001       1.19       1.21    
dot::MatType_Length::(8UC1, 128)   0.007   0.005      0.005       1.35       1.31    
dot::MatType_Length::(8UC1, 256)   0.025   0.024      0.017       1.06       1.53    
dot::MatType_Length::(8UC1, 512)   0.113   0.117      0.116       0.97       0.98    
dot::MatType_Length::(8UC1, 1024)  0.380   0.395      0.392       0.96       0.97    
dot::MatType_Length::(8SC1, 32)    0.001   0.001      0.001       0.94       0.96    
dot::MatType_Length::(8SC1, 64)    0.001   0.001      0.001       1.17       1.20    
dot::MatType_Length::(8SC1, 128)   0.007   0.005      0.005       1.34       1.34    
dot::MatType_Length::(8SC1, 256)   0.026   0.024      0.017       1.07       1.52    
dot::MatType_Length::(8SC1, 512)   0.112   0.108      0.107       1.04       1.05    
dot::MatType_Length::(8SC1, 1024)  0.380   0.393      0.408       0.97       0.93    
dot::MatType_Length::(16UC1, 32)   0.002   0.001      0.001       2.26       2.28    
dot::MatType_Length::(16UC1, 64)   0.006   0.002      0.002       3.11       3.12    
dot::MatType_Length::(16UC1, 128)  0.025   0.012      0.009       2.03       2.84    
dot::MatType_Length::(16UC1, 256)  0.100   0.047      0.032       2.12       3.09    
dot::MatType_Length::(16UC1, 512)  0.406   0.202      0.213       2.01       1.91    
dot::MatType_Length::(16UC1, 1024) 2.324   0.855      0.870       2.72       2.67    
dot::MatType_Length::(16SC1, 32)   0.002   0.001      0.001       1.83       1.87    
dot::MatType_Length::(16SC1, 64)   0.005   0.002      0.002       2.57       2.58    
dot::MatType_Length::(16SC1, 128)  0.028   0.012      0.013       2.26       2.23    
dot::MatType_Length::(16SC1, 256)  0.085   0.047      0.047       1.80       1.80    
dot::MatType_Length::(16SC1, 512)  0.346   0.204      0.203       1.70       1.70    
dot::MatType_Length::(16SC1, 1024) 2.056   1.328      1.338       1.55       1.54    
dot::MatType_Length::(32SC1, 32)   0.002   0.001      0.001       1.78       1.76    
dot::MatType_Length::(32SC1, 64)   0.010   0.005      0.006       1.85       1.81    
dot::MatType_Length::(32SC1, 128)  0.033   0.024      0.024       1.39       1.37    
dot::MatType_Length::(32SC1, 256)  0.153   0.109      0.107       1.40       1.43    
dot::MatType_Length::(32SC1, 512)  0.537   0.410      0.398       1.31       1.35    
dot::MatType_Length::(32SC1, 1024) 3.015   2.329      2.332       1.29       1.29    
dot::MatType_Length::(32FC1, 32)   0.001   0.001      0.001       1.28       1.25    
dot::MatType_Length::(32FC1, 64)   0.006   0.004      0.004       1.60       1.59    
dot::MatType_Length::(32FC1, 128)  0.023   0.024      0.025       0.96       0.94    
dot::MatType_Length::(32FC1, 256)  0.098   0.108      0.106       0.90       0.93    
dot::MatType_Length::(32FC1, 512)  0.380   0.399      0.396       0.95       0.96    
dot::MatType_Length::(32FC1, 1024) 1.882   1.847      1.885       1.02       1.00 

More: perf-dotprod.zip

@asmorkalov
Copy link
Copy Markdown
Contributor

Accuracy test failure on RISC-V (CI):

[ RUN      ] Core_DotProduct.accuracy
/home/ci/opencv/modules/ts/src/ts.cpp:612: Failure
Failed

	failure reason: Bad accuracy
	test case #2
	seed: 00000000000c5a5e
-----------------------------------
	LOG:
output: Too big difference (=5.2399 > 2.5e-05) at element 0
input array 0 type=32fC3, size=(191, 42)
input array 1 type=32fC3, size=(191, 42)
ref output array 0 type=64fC1, size=(1, 4)
test_case_idx = 2

-----------------------------------

[  FAILED  ] Core_DotProduct.accuracy (8 ms)

@asmorkalov
Copy link
Copy Markdown
Contributor

Please rebase and fix conflicts.

@fengyuentau
Copy link
Copy Markdown
Member Author

Accuracy issues are fixed. Updated performance results:

Details

K1, GCC

           Name of Test            base-gcc patch-gcc patch-gcc 
                                                          vs    
                                                       base-gcc 
                                                      (x-factor)
dot::MatType_Length::(8UC1, 32)     0.001     0.001      1.42   
dot::MatType_Length::(8UC1, 64)     0.002     0.001      1.69   
dot::MatType_Length::(8UC1, 128)    0.008     0.005      1.64   
dot::MatType_Length::(8UC1, 256)    0.029     0.017      1.74   
dot::MatType_Length::(8UC1, 512)    0.116     0.122      0.96   
dot::MatType_Length::(8UC1, 1024)   0.459     0.395      1.16   
dot::MatType_Length::(8SC1, 32)     0.001     0.001      1.52   
dot::MatType_Length::(8SC1, 64)     0.002     0.001      1.68   
dot::MatType_Length::(8SC1, 128)    0.008     0.005      1.62   
dot::MatType_Length::(8SC1, 256)    0.029     0.017      1.77   
dot::MatType_Length::(8SC1, 512)    0.119     0.119      1.01   
dot::MatType_Length::(8SC1, 1024)   0.458     0.416      1.10   
dot::MatType_Length::(16UC1, 32)    0.001     0.001      1.56   
dot::MatType_Length::(16UC1, 64)    0.004     0.002      1.76   
dot::MatType_Length::(16UC1, 128)   0.015     0.009      1.68   
dot::MatType_Length::(16UC1, 256)   0.057     0.032      1.78   
dot::MatType_Length::(16UC1, 512)   0.230     0.212      1.08   
dot::MatType_Length::(16UC1, 1024)  0.957     0.886      1.08   
dot::MatType_Length::(16SC1, 32)    0.001     0.001      1.57   
dot::MatType_Length::(16SC1, 64)    0.004     0.002      1.78   
dot::MatType_Length::(16SC1, 128)   0.016     0.013      1.22   
dot::MatType_Length::(16SC1, 256)   0.058     0.047      1.22   
dot::MatType_Length::(16SC1, 512)   0.231     0.211      1.09   
dot::MatType_Length::(16SC1, 1024)  1.023     1.292      0.79   
dot::MatType_Length::(32SC1, 32)    0.003     0.001      2.01   
dot::MatType_Length::(32SC1, 64)    0.011     0.006      1.94   
dot::MatType_Length::(32SC1, 128)   0.036     0.024      1.51   
dot::MatType_Length::(32SC1, 256)   0.148     0.116      1.27   
dot::MatType_Length::(32SC1, 512)   0.587     0.396      1.48   
dot::MatType_Length::(32SC1, 1024)  2.537     2.397      1.06   
dot::MatType_Length::(32FC1, 32)    0.001     0.001      1.14   
dot::MatType_Length::(32FC1, 64)    0.004     0.004      1.10   
dot::MatType_Length::(32FC1, 128)   0.025     0.026      0.98   
dot::MatType_Length::(32FC1, 256)   0.117     0.113      1.04   
dot::MatType_Length::(32FC1, 512)   0.405     0.395      1.02   
dot::MatType_Length::(32FC1, 1024)  1.687     1.899      0.89

K1 vs RK3568

           Name of Test             rk   patch-gcc patch-clang patch-gcc  patch-clang
                                                                   vs         vs     
                                                                   rk         rk     
                                                               (x-factor) (x-factor) 
dot::MatType_Length::(8UC1, 32)    0.001   0.001      0.001       0.97       1.01    
dot::MatType_Length::(8UC1, 64)    0.001   0.001      0.001       1.18       1.20    
dot::MatType_Length::(8UC1, 128)   0.007   0.005      0.005       1.36       1.37    
dot::MatType_Length::(8UC1, 256)   0.025   0.017      0.024       1.52       1.06    
dot::MatType_Length::(8UC1, 512)   0.113   0.122      0.110       0.93       1.03    
dot::MatType_Length::(8UC1, 1024)  0.380   0.395      0.426       0.96       0.89    
dot::MatType_Length::(8SC1, 32)    0.001   0.001      0.001       1.00       0.98    
dot::MatType_Length::(8SC1, 64)    0.001   0.001      0.001       1.17       1.17    
dot::MatType_Length::(8SC1, 128)   0.007   0.005      0.005       1.33       1.33    
dot::MatType_Length::(8SC1, 256)   0.026   0.017      0.024       1.53       1.06    
dot::MatType_Length::(8SC1, 512)   0.112   0.119      0.113       0.95       1.00    
dot::MatType_Length::(8SC1, 1024)  0.380   0.416      0.427       0.91       0.89    
dot::MatType_Length::(16UC1, 32)   0.002   0.001      0.001       2.30       2.32    
dot::MatType_Length::(16UC1, 64)   0.006   0.002      0.002       3.15       3.14    
dot::MatType_Length::(16UC1, 128)  0.025   0.009      0.012       2.75       2.03    
dot::MatType_Length::(16UC1, 256)  0.100   0.032      0.047       3.10       2.10    
dot::MatType_Length::(16UC1, 512)  0.406   0.212      0.206       1.91       1.98    
dot::MatType_Length::(16UC1, 1024) 2.324   0.886      0.849       2.62       2.74    
dot::MatType_Length::(16SC1, 32)   0.002   0.001      0.001       1.86       1.85    
dot::MatType_Length::(16SC1, 64)   0.005   0.002      0.002       2.65       2.63    
dot::MatType_Length::(16SC1, 128)  0.028   0.013      0.011       2.21       2.58    
dot::MatType_Length::(16SC1, 256)  0.085   0.047      0.052       1.80       1.65    
dot::MatType_Length::(16SC1, 512)  0.346   0.211      0.215       1.64       1.61    
dot::MatType_Length::(16SC1, 1024) 2.056   1.292      1.158       1.59       1.78    
dot::MatType_Length::(32SC1, 32)   0.002   0.001      0.001       1.79       1.81    
dot::MatType_Length::(32SC1, 64)   0.010   0.006      0.005       1.80       1.86    
dot::MatType_Length::(32SC1, 128)  0.033   0.024      0.018       1.39       1.88    
dot::MatType_Length::(32SC1, 256)  0.153   0.116      0.109       1.32       1.40    
dot::MatType_Length::(32SC1, 512)  0.537   0.396      0.397       1.35       1.35    
dot::MatType_Length::(32SC1, 1024) 3.015   2.397      2.306       1.26       1.31    
dot::MatType_Length::(32FC1, 32)   0.001   0.001      0.001       1.29       1.29    
dot::MatType_Length::(32FC1, 64)   0.006   0.004      0.004       1.59       1.57    
dot::MatType_Length::(32FC1, 128)  0.023   0.026      0.025       0.89       0.94    
dot::MatType_Length::(32FC1, 256)  0.098   0.113      0.115       0.87       0.85    
dot::MatType_Length::(32FC1, 512)  0.380   0.395      0.396       0.96       0.96    
dot::MatType_Length::(32FC1, 1024) 1.882   1.899      1.955       0.99       0.96

More:
perf-dotprod.zip

@asmorkalov asmorkalov self-assigned this Apr 10, 2025
@asmorkalov asmorkalov merged commit 11e46cd into opencv:4.x Apr 21, 2025
53 of 55 checks passed
@asmorkalov asmorkalov added the backport is needed Label for maintainers. Authors of PR can ignore this label Apr 21, 2025
@fengyuentau fengyuentau deleted the 4x/hal_rvv/dotprod branch April 22, 2025 08:04
@asmorkalov asmorkalov mentioned this pull request Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport is needed Label for maintainers. Authors of PR can ignore this category: core optimization platform: riscv

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants