Skip to content

Backport to 4.x: patchNaNs() SIMD acceleration#24480

Merged
asmorkalov merged 7 commits intoopencv:4.xfrom
savuor:backport_patch_nans
Nov 3, 2023
Merged

Backport to 4.x: patchNaNs() SIMD acceleration#24480
asmorkalov merged 7 commits intoopencv:4.xfrom
savuor:backport_patch_nans

Conversation

@savuor
Copy link
Copy Markdown
Contributor

@savuor savuor commented Nov 2, 2023

backport from #23098
connected PR in extra: #1118@extra

This PR contains:

  • new SIMD code for patchNaNs()
  • CPU perf test
Performance comparison

Geometric mean (ms)

Name of Test noopt sse2 avx2 sse2 vs noopt (x-factor) avx2 vs noopt (x-factor)
PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC1) 0.019 0.017 0.018 1.11 1.07
PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC4) 0.037 0.037 0.033 1.00 1.10
PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC1) 0.032 0.032 0.033 0.99 0.98
PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC4) 0.072 0.072 0.070 1.00 1.03
PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC1) 0.051 0.051 0.050 1.00 1.01
PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC4) 0.137 0.138 0.128 0.99 1.06
PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC1) 0.137 0.128 0.129 1.07 1.06
PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC4) 0.450 0.450 0.448 1.00 1.01
PatchNaNs::PatchNaNsFixture::(640x480, 32FC1) 0.149 0.029 0.020 5.13 7.44
PatchNaNs::PatchNaNsFixture::(640x480, 32FC2) 0.304 0.058 0.040 5.25 7.65
PatchNaNs::PatchNaNsFixture::(640x480, 32FC3) 0.448 0.086 0.059 5.22 7.55
PatchNaNs::PatchNaNsFixture::(640x480, 32FC4) 0.601 0.133 0.083 4.51 7.23
PatchNaNs::PatchNaNsFixture::(1280x720, 32FC1) 0.451 0.093 0.060 4.83 7.52
PatchNaNs::PatchNaNsFixture::(1280x720, 32FC2) 0.892 0.184 0.126 4.85 7.06
PatchNaNs::PatchNaNsFixture::(1280x720, 32FC3) 1.345 0.311 0.230 4.32 5.84
PatchNaNs::PatchNaNsFixture::(1280x720, 32FC4) 1.831 0.546 0.436 3.35 4.20
PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC1) 1.017 0.250 0.160 4.06 6.35
PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC2) 2.077 0.646 0.605 3.21 3.43
PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC3) 3.134 1.053 0.961 2.97 3.26
PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC4) 4.222 1.436 1.288 2.94 3.28
PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC1) 4.225 1.401 1.277 3.01 3.31
PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC2) 8.310 2.953 2.635 2.81 3.15
PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC3) 12.396 4.455 4.252 2.78 2.92
PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC4) 17.174 5.831 5.824 2.95 2.95

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@asmorkalov
Copy link
Copy Markdown
Contributor

C:\build\precommit_windows64\4.x\opencv\modules\core\src\mathfuncs.cpp(1613): warning C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data [C:\build\precommit_windows64\build\modules\core\opencv_core.vcxproj]

@asmorkalov
Copy link
Copy Markdown
Contributor

armv7 NEON performance numbers (jetson-tk1) looks very good:

Geometric mean (ms)

                   Name of Test                     baseline-2 NEON-2   NEON-2  
                                                                          vs    
                                                                      baseline-2
                                                                      (x-factor)
PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC1)     1.237    0.313     3.96   
PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC4)     5.097    1.524     3.34   
PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC1)    3.819    1.124     3.40   
PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC4)    15.205   4.619     3.29   
PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC1)   8.574    2.599     3.30   
PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC4)   34.248   10.403    3.29   
PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC1)   34.562   10.377    3.33   
PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC4)  136.766   41.367    3.31   
PatchNaNs::PatchNaNsFixture::(640x480, 32FC1)         1.232    0.310     3.98   
PatchNaNs::PatchNaNsFixture::(640x480, 32FC2)         2.541    0.710     3.58   
PatchNaNs::PatchNaNsFixture::(640x480, 32FC3)         3.818    1.111     3.44   
PatchNaNs::PatchNaNsFixture::(640x480, 32FC4)         5.081    1.520     3.34   
PatchNaNs::PatchNaNsFixture::(1280x720, 32FC1)        3.798    1.108     3.43   
PatchNaNs::PatchNaNsFixture::(1280x720, 32FC2)        7.601    2.318     3.28   
PatchNaNs::PatchNaNsFixture::(1280x720, 32FC3)        11.416   3.465     3.29   
PatchNaNs::PatchNaNsFixture::(1280x720, 32FC4)        15.216   4.607     3.30   
PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC1)       8.569    2.586     3.31   
PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC2)       17.104   5.200     3.29   
PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC3)       25.602   7.817     3.28   
PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC4)       34.205   10.369    3.30   
PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC1)       34.198   10.350    3.30   
PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC2)       68.387   20.706    3.30   
PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC3)      102.523   31.008    3.31   
PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC4)      136.734   41.357    3.31   

Copy link
Copy Markdown
Contributor

@opencv-alalek opencv-alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Thank you!

Comment on lines -1611 to -1613
int* ptrs[1] = {};
int32_t* ptrs[1] = {};
NAryMatIterator it(arrays, (uchar**)ptrs);
size_t len = it.size*a.channels();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not relevant. I propose to revert.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. OK about pointers
  2. The loop condition should be of a form j < loop_invariant and has to be calculated in signed integers to avoid unsigned wrapping over zero, so I propose to let len and j be ints rather than size_t

@asmorkalov
Copy link
Copy Markdown
Contributor

C:\build\precommit_windows64\4.x\opencv\modules\core\src\mathfuncs.cpp(1613): warning C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data [C:\build\precommit_windows64\build\modules\core\opencv_core.vcxproj]

Rostislav Vasilikhin added 2 commits November 2, 2023 17:37
Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@asmorkalov asmorkalov merged commit ea47cb3 into opencv:4.x Nov 3, 2023
@savuor savuor deleted the backport_patch_nans branch November 3, 2023 15:46
IskXCr pushed a commit to Haosonn/opencv that referenced this pull request Dec 20, 2023
Backport to 4.x: patchNaNs() SIMD acceleration opencv#24480

backport from opencv#23098
connected PR in extra: [opencv#1118@extra](opencv/opencv_extra#1118)

### This PR contains:
* new SIMD code for `patchNaNs()`
* CPU perf test

<details>
<summary>Performance comparison</summary>

Geometric mean (ms)

|Name of Test|noopt|sse2|avx2|sse2 vs noopt (x-factor)|avx2 vs noopt (x-factor)|
|---|:-:|:-:|:-:|:-:|:-:|
|PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC1)|0.019|0.017|0.018|1.11|1.07|
|PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC4)|0.037|0.037|0.033|1.00|1.10|
|PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC1)|0.032|0.032|0.033|0.99|0.98|
|PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC4)|0.072|0.072|0.070|1.00|1.03|
|PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC1)|0.051|0.051|0.050|1.00|1.01|
|PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC4)|0.137|0.138|0.128|0.99|1.06|
|PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC1)|0.137|0.128|0.129|1.07|1.06|
|PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC4)|0.450|0.450|0.448|1.00|1.01|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC1)|0.149|0.029|0.020|5.13|7.44|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC2)|0.304|0.058|0.040|5.25|7.65|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC3)|0.448|0.086|0.059|5.22|7.55|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC4)|0.601|0.133|0.083|4.51|7.23|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC1)|0.451|0.093|0.060|4.83|7.52|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC2)|0.892|0.184|0.126|4.85|7.06|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC3)|1.345|0.311|0.230|4.32|5.84|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC4)|1.831|0.546|0.436|3.35|4.20|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC1)|1.017|0.250|0.160|4.06|6.35|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC2)|2.077|0.646|0.605|3.21|3.43|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC3)|3.134|1.053|0.961|2.97|3.26|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC4)|4.222|1.436|1.288|2.94|3.28|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC1)|4.225|1.401|1.277|3.01|3.31|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC2)|8.310|2.953|2.635|2.81|3.15|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC3)|12.396|4.455|4.252|2.78|2.92|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC4)|17.174|5.831|5.824|2.95|2.95|

</details>

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request Jan 4, 2024
Backport to 4.x: patchNaNs() SIMD acceleration opencv#24480

backport from opencv#23098
connected PR in extra: [opencv#1118@extra](opencv/opencv_extra#1118)

### This PR contains:
* new SIMD code for `patchNaNs()`
* CPU perf test

<details>
<summary>Performance comparison</summary>

Geometric mean (ms)

|Name of Test|noopt|sse2|avx2|sse2 vs noopt (x-factor)|avx2 vs noopt (x-factor)|
|---|:-:|:-:|:-:|:-:|:-:|
|PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC1)|0.019|0.017|0.018|1.11|1.07|
|PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC4)|0.037|0.037|0.033|1.00|1.10|
|PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC1)|0.032|0.032|0.033|0.99|0.98|
|PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC4)|0.072|0.072|0.070|1.00|1.03|
|PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC1)|0.051|0.051|0.050|1.00|1.01|
|PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC4)|0.137|0.138|0.128|0.99|1.06|
|PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC1)|0.137|0.128|0.129|1.07|1.06|
|PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC4)|0.450|0.450|0.448|1.00|1.01|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC1)|0.149|0.029|0.020|5.13|7.44|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC2)|0.304|0.058|0.040|5.25|7.65|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC3)|0.448|0.086|0.059|5.22|7.55|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC4)|0.601|0.133|0.083|4.51|7.23|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC1)|0.451|0.093|0.060|4.83|7.52|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC2)|0.892|0.184|0.126|4.85|7.06|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC3)|1.345|0.311|0.230|4.32|5.84|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC4)|1.831|0.546|0.436|3.35|4.20|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC1)|1.017|0.250|0.160|4.06|6.35|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC2)|2.077|0.646|0.605|3.21|3.43|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC3)|3.134|1.053|0.961|2.97|3.26|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC4)|4.222|1.436|1.288|2.94|3.28|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC1)|4.225|1.401|1.277|3.01|3.31|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC2)|8.310|2.953|2.635|2.81|3.15|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC3)|12.396|4.455|4.252|2.78|2.92|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC4)|17.174|5.831|5.824|2.95|2.95|

</details>

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
@asmorkalov asmorkalov mentioned this pull request Jan 19, 2024
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
Backport to 4.x: patchNaNs() SIMD acceleration opencv#24480

backport from opencv#23098
connected PR in extra: [opencv#1118@extra](opencv/opencv_extra#1118)

### This PR contains:
* new SIMD code for `patchNaNs()`
* CPU perf test

<details>
<summary>Performance comparison</summary>

Geometric mean (ms)

|Name of Test|noopt|sse2|avx2|sse2 vs noopt (x-factor)|avx2 vs noopt (x-factor)|
|---|:-:|:-:|:-:|:-:|:-:|
|PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC1)|0.019|0.017|0.018|1.11|1.07|
|PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC4)|0.037|0.037|0.033|1.00|1.10|
|PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC1)|0.032|0.032|0.033|0.99|0.98|
|PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC4)|0.072|0.072|0.070|1.00|1.03|
|PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC1)|0.051|0.051|0.050|1.00|1.01|
|PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC4)|0.137|0.138|0.128|0.99|1.06|
|PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC1)|0.137|0.128|0.129|1.07|1.06|
|PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC4)|0.450|0.450|0.448|1.00|1.01|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC1)|0.149|0.029|0.020|5.13|7.44|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC2)|0.304|0.058|0.040|5.25|7.65|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC3)|0.448|0.086|0.059|5.22|7.55|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC4)|0.601|0.133|0.083|4.51|7.23|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC1)|0.451|0.093|0.060|4.83|7.52|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC2)|0.892|0.184|0.126|4.85|7.06|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC3)|1.345|0.311|0.230|4.32|5.84|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC4)|1.831|0.546|0.436|3.35|4.20|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC1)|1.017|0.250|0.160|4.06|6.35|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC2)|2.077|0.646|0.605|3.21|3.43|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC3)|3.134|1.053|0.961|2.97|3.26|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC4)|4.222|1.436|1.288|2.94|3.28|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC1)|4.225|1.401|1.277|3.01|3.31|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC2)|8.310|2.953|2.635|2.81|3.15|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC3)|12.396|4.455|4.252|2.78|2.92|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC4)|17.174|5.831|5.824|2.95|2.95|

</details>

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants