Skip to content

finiteMask() and doubles for patchNaNs()#23098

Merged
asmorkalov merged 100 commits intoopencv:5.xfrom
savuor:nanMask
Nov 9, 2023
Merged

finiteMask() and doubles for patchNaNs()#23098
asmorkalov merged 100 commits intoopencv:5.xfrom
savuor:nanMask

Conversation

@savuor
Copy link
Copy Markdown
Contributor

@savuor savuor commented Jan 5, 2023

Related to #22826
Connected PR in extra: #1037@extra

TODOs:

  • Vectorize finiteMask() for 64FC3 and 64FC4

Changes

This PR:

  • adds a new function finiteMask()
  • extends patchNaNs() by CV_64F support
  • moves patchNaNs() and finiteMask() to a separate file

NOTE: now the function is called finiteMask() as discussed with the OpenCV core team

Performance comparison

Geometric mean (ms)

Name of Test noopt sse2 default sse2 vs noopt (x-factor) default vs noopt (x-factor)
FiniteMask::FiniteMaskFixture::(640x480, 32FC1) 0.066 0.021 0.025 3.09 2.67
FiniteMask::FiniteMaskFixture::(640x480, 64FC1) 0.189 0.065 0.047 2.89 4.04
FiniteMask::FiniteMaskFixture::(640x480, 32FC2) 0.278 0.052 0.058 5.36 4.82
FiniteMask::FiniteMaskFixture::(640x480, 64FC2) 0.279 0.157 0.100 1.78 2.79
FiniteMask::FiniteMaskFixture::(640x480, 32FC3) 0.284 0.189 0.244 1.50 1.16
FiniteMask::FiniteMaskFixture::(640x480, 64FC3) 0.298 0.174 0.266 1.71 1.12
FiniteMask::FiniteMaskFixture::(640x480, 32FC4) 0.290 0.092 0.097 3.14 2.99
FiniteMask::FiniteMaskFixture::(640x480, 64FC4) 0.343 0.317 0.242 1.08 1.42
FiniteMask::FiniteMaskFixture::(1280x720, 32FC1) 0.201 0.066 0.068 3.04 2.98
FiniteMask::FiniteMaskFixture::(1280x720, 64FC1) 0.914 0.224 0.165 4.07 5.54
FiniteMask::FiniteMaskFixture::(1280x720, 32FC2) 0.843 0.196 0.176 4.30 4.78
FiniteMask::FiniteMaskFixture::(1280x720, 64FC2) 0.936 0.676 0.566 1.38 1.66
FiniteMask::FiniteMaskFixture::(1280x720, 32FC3) 0.901 0.637 0.780 1.42 1.16
FiniteMask::FiniteMaskFixture::(1280x720, 64FC3) 1.118 0.923 1.147 1.21 0.97
FiniteMask::FiniteMaskFixture::(1280x720, 32FC4) 0.982 0.536 0.529 1.83 1.86
FiniteMask::FiniteMaskFixture::(1280x720, 64FC4) 1.396 1.352 1.304 1.03 1.07
FiniteMask::FiniteMaskFixture::(1920x1080, 32FC1) 0.477 0.246 0.206 1.94 2.32
FiniteMask::FiniteMaskFixture::(1920x1080, 64FC1) 1.660 0.745 0.683 2.23 2.43
FiniteMask::FiniteMaskFixture::(1920x1080, 32FC2) 1.938 0.707 1.092 2.74 1.77
FiniteMask::FiniteMaskFixture::(1920x1080, 64FC2) 2.202 1.658 1.612 1.33 1.37
FiniteMask::FiniteMaskFixture::(1920x1080, 32FC3) 2.117 1.521 1.786 1.39 1.19
FiniteMask::FiniteMaskFixture::(1920x1080, 64FC3) 2.603 2.277 2.622 1.14 0.99
FiniteMask::FiniteMaskFixture::(1920x1080, 32FC4) 2.282 1.487 1.496 1.53 1.52
FiniteMask::FiniteMaskFixture::(1920x1080, 64FC4) 3.247 3.142 2.866 1.03 1.13
FiniteMask::FiniteMaskFixture::(3840x2160, 32FC1) 2.397 2.387 2.132 1.00 1.12
FiniteMask::FiniteMaskFixture::(3840x2160, 64FC1) 10.340 3.801 3.422 2.72 3.02
FiniteMask::FiniteMaskFixture::(3840x2160, 32FC2) 7.811 3.759 3.421 2.08 2.28
FiniteMask::FiniteMaskFixture::(3840x2160, 64FC2) 8.708 7.136 6.361 1.22 1.37
FiniteMask::FiniteMaskFixture::(3840x2160, 32FC3) 8.577 6.366 7.692 1.35 1.12
FiniteMask::FiniteMaskFixture::(3840x2160, 64FC3) 11.015 9.593 11.396 1.15 0.97
FiniteMask::FiniteMaskFixture::(3840x2160, 32FC4) 9.330 6.539 6.451 1.43 1.45
FiniteMask::FiniteMaskFixture::(3840x2160, 64FC4) 13.350 12.691 12.341 1.05 1.08
FiniteMask::OCL_FiniteMaskFixture::(640x480, 32FC1) 0.017 0.016 0.016 1.04 1.02
FiniteMask::OCL_FiniteMaskFixture::(640x480, 64FC1) 0.016 0.022 0.017 0.73 0.92
FiniteMask::OCL_FiniteMaskFixture::(640x480, 32FC3) 0.025 0.025 0.027 1.00 0.93
FiniteMask::OCL_FiniteMaskFixture::(640x480, 64FC3) 0.039 0.036 0.045 1.08 0.87
FiniteMask::OCL_FiniteMaskFixture::(640x480, 32FC4) 0.030 0.029 0.029 1.05 1.02
FiniteMask::OCL_FiniteMaskFixture::(640x480, 64FC4) 0.045 0.051 0.047 0.88 0.96
FiniteMask::OCL_FiniteMaskFixture::(1280x720, 32FC1) 0.033 0.033 0.033 1.01 0.99
FiniteMask::OCL_FiniteMaskFixture::(1280x720, 64FC1) 0.044 0.045 0.043 0.98 1.02
FiniteMask::OCL_FiniteMaskFixture::(1280x720, 32FC3) 0.056 0.057 0.054 0.98 1.02
FiniteMask::OCL_FiniteMaskFixture::(1280x720, 64FC3) 0.090 0.091 0.092 0.99 0.98
FiniteMask::OCL_FiniteMaskFixture::(1280x720, 32FC4) 0.067 0.066 0.068 1.01 0.99
FiniteMask::OCL_FiniteMaskFixture::(1280x720, 64FC4) 0.113 0.115 0.114 0.98 0.99
FiniteMask::OCL_FiniteMaskFixture::(1920x1080, 32FC1) 0.052 0.048 0.053 1.10 0.99
FiniteMask::OCL_FiniteMaskFixture::(1920x1080, 64FC1) 0.077 0.078 0.076 0.98 1.01
FiniteMask::OCL_FiniteMaskFixture::(1920x1080, 32FC3) 0.101 0.101 0.101 1.00 1.00
FiniteMask::OCL_FiniteMaskFixture::(1920x1080, 64FC3) 0.182 0.181 0.182 1.01 1.00
FiniteMask::OCL_FiniteMaskFixture::(1920x1080, 32FC4) 0.129 0.127 0.129 1.02 1.00
FiniteMask::OCL_FiniteMaskFixture::(1920x1080, 64FC4) 0.231 0.231 0.232 1.00 0.99
FiniteMask::OCL_FiniteMaskFixture::(3840x2160, 32FC1) 0.152 0.154 0.154 0.99 0.99
FiniteMask::OCL_FiniteMaskFixture::(3840x2160, 64FC1) 0.250 0.250 0.251 1.00 1.00
FiniteMask::OCL_FiniteMaskFixture::(3840x2160, 32FC3) 0.355 0.353 0.354 1.00 1.00
FiniteMask::OCL_FiniteMaskFixture::(3840x2160, 64FC3) 0.661 0.661 0.660 1.00 1.00
FiniteMask::OCL_FiniteMaskFixture::(3840x2160, 32FC4) 0.455 0.455 0.456 1.00 1.00
FiniteMask::OCL_FiniteMaskFixture::(3840x2160, 64FC4) 0.867 0.866 0.866 1.00 1.00
PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC1) 0.018 0.018 0.019 1.01 0.95
PatchNaNs::OCL_PatchNaNsFixture::(640x480, 64FC1) 0.029 0.026 0.027 1.10 1.06
PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC3) 0.032 0.034 0.032 0.96 1.01
PatchNaNs::OCL_PatchNaNsFixture::(640x480, 64FC3) 0.041 0.041 0.041 1.00 0.99
PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC4) 0.035 0.035 0.032 0.99 1.11
PatchNaNs::OCL_PatchNaNsFixture::(640x480, 64FC4) 0.049 0.048 0.047 1.03 1.04
PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC1) 0.032 0.032 0.030 1.00 1.08
PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 64FC1) 0.043 0.042 0.043 1.02 0.98
PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC3) 0.059 0.054 0.059 1.08 0.99
PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 64FC3) 0.087 0.086 0.085 1.01 1.02
PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC4) 0.072 0.066 0.071 1.08 1.01
PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 64FC4) 0.110 0.108 0.110 1.02 1.00
PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC1) 0.047 0.047 0.047 1.00 1.01
PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 64FC1) 0.069 0.070 0.070 1.00 1.00
PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC3) 0.103 0.103 0.103 1.00 0.99
PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 64FC3) 0.171 0.168 0.171 1.02 1.00
PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC4) 0.128 0.129 0.128 0.99 1.00
PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 64FC4) 0.220 0.221 0.223 1.00 0.99
PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC1) 0.128 0.127 0.128 1.01 1.00
PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 64FC1) 0.221 0.222 0.222 0.99 0.99
PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC3) 0.343 0.341 0.346 1.01 0.99
PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 64FC3) 0.626 0.626 0.625 1.00 1.00
PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC4) 0.452 0.452 0.454 1.00 0.99
PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 64FC4) 0.826 0.826 0.827 1.00 1.00
PatchNaNs::PatchNaNsFixture::(640x480, 32FC1) 0.152 0.028 0.017 5.33 9.03
PatchNaNs::PatchNaNsFixture::(640x480, 64FC1) 0.226 0.079 0.043 2.85 5.28
PatchNaNs::PatchNaNsFixture::(640x480, 32FC2) 0.305 0.058 0.033 5.28 9.13
PatchNaNs::PatchNaNsFixture::(640x480, 64FC2) 0.456 0.158 0.086 2.89 5.30
PatchNaNs::PatchNaNsFixture::(640x480, 32FC3) 0.454 0.087 0.050 5.19 9.06
PatchNaNs::PatchNaNsFixture::(640x480, 64FC3) 0.697 0.250 0.131 2.79 5.31
PatchNaNs::PatchNaNsFixture::(640x480, 32FC4) 0.603 0.119 0.068 5.08 8.85
PatchNaNs::PatchNaNsFixture::(640x480, 64FC4) 0.934 0.347 0.205 2.69 4.55
PatchNaNs::PatchNaNsFixture::(1280x720, 32FC1) 0.458 0.088 0.050 5.21 9.19
PatchNaNs::PatchNaNsFixture::(1280x720, 64FC1) 0.702 0.253 0.130 2.78 5.40
PatchNaNs::PatchNaNsFixture::(1280x720, 32FC2) 0.915 0.187 0.136 4.90 6.72
PatchNaNs::PatchNaNsFixture::(1280x720, 64FC2) 1.435 0.637 0.493 2.25 2.91
PatchNaNs::PatchNaNsFixture::(1280x720, 32FC3) 1.377 0.330 0.206 4.17 6.68
PatchNaNs::PatchNaNsFixture::(1280x720, 64FC3) 2.164 1.006 0.809 2.15 2.67
PatchNaNs::PatchNaNsFixture::(1280x720, 32FC4) 1.940 0.545 0.452 3.56 4.29
PatchNaNs::PatchNaNsFixture::(1280x720, 64FC4) 2.856 1.365 1.094 2.09 2.61
PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC1) 1.091 0.237 0.123 4.61 8.88
PatchNaNs::PatchNaNsFixture::(1920x1080, 64FC1) 1.638 1.062 0.562 1.54 2.91
PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC2) 2.155 0.648 0.563 3.32 3.83
PatchNaNs::PatchNaNsFixture::(1920x1080, 64FC2) 3.276 1.531 1.252 2.14 2.62
PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC3) 3.229 1.021 0.893 3.16 3.62
PatchNaNs::PatchNaNsFixture::(1920x1080, 64FC3) 4.851 2.313 1.891 2.10 2.57
PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC4) 4.264 1.343 1.238 3.17 3.45
PatchNaNs::PatchNaNsFixture::(1920x1080, 64FC4) 6.450 3.054 2.546 2.11 2.53
PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC1) 4.224 1.340 1.205 3.15 3.51
PatchNaNs::PatchNaNsFixture::(3840x2160, 64FC1) 6.409 3.092 2.549 2.07 2.51
PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC2) 8.320 2.762 2.511 3.01 3.31
PatchNaNs::PatchNaNsFixture::(3840x2160, 64FC2) 12.777 6.285 5.283 2.03 2.42
PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC3) 12.697 4.281 3.833 2.97 3.31
PatchNaNs::PatchNaNsFixture::(3840x2160, 64FC3) 19.309 9.636 7.945 2.00 2.43
PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC4) 16.848 5.745 5.197 2.93 3.24
PatchNaNs::PatchNaNsFixture::(3840x2160, 64FC4) 25.701 12.955 10.637 1.98 2.42

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@savuor savuor marked this pull request as ready for review January 13, 2023 01:24
@asmorkalov asmorkalov self-requested a review January 13, 2023 07:23
@savuor
Copy link
Copy Markdown
Contributor Author

savuor commented Jan 17, 2023

@alalek @vpisarev Looks like there is a bug in cvIsInf(double): sometimes it assumes NaNs for Inf.

The Inf bug was introduced in PR #15370.

This PR provides a fix & regression test.

@savuor savuor mentioned this pull request Jan 17, 2023
6 tasks
Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@savuor
Copy link
Copy Markdown
Contributor Author

savuor commented Feb 3, 2023

Discussed with OpenCV core team, decided to make finiteMask() instead of nanMask() and cut down other features.

@vpisarev vpisarev requested a review from asmorkalov February 10, 2023 08:42
@savuor savuor changed the title nanMask() and doubles for patchNaNs() finiteMask() and doubles for patchNaNs() Feb 10, 2023
Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

{
CV_INSTRUMENT_REGION();

int channels = _img.channels();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

channels=5 doesn't throw any exception and do nothing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +1893 to +1854
switch (channels)
{
case 1: finiteMask_<float, 1>((const float*)sptr, dptr, total); break;
case 2: finiteMask_<float, 2>((const float*)sptr, dptr, total); break;
case 3: finiteMask_<float, 3>((const float*)sptr, dptr, total); break;
case 4: finiteMask_<float, 4>((const float*)sptr, dptr, total); break;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

channels values are not validated at all.
For channels=5 function does nothing and doesn't throw any exception.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

}
}

#if CV_SIMD
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIMD optimizations in core module should go to .simd.hpp.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I add it through HAL and all SIMD dispatching mechanisms as other functions in mathfuncs_core.simd.hpp are done or there are easier ways?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other functions should be handled separately (do not touch them in this PR).

At first we need to collect performance for different ISA optimizations for added code: https://github.com/opencv/opencv/wiki/CPU-optimizations-build-options#optimization-developer-guide

Comment on lines +1662 to +1663
#if !CV_SIMD128_64F
v_int64 mask10 = vx_setall_s64(0xffffffff00000000);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CV_SIMD128_64F

64F is about double (float64) type.
Using it to limit int64 processing is wrong.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but this is how to enable int64 comparison in NEON universal intrinsics: intrin_neon.hpp

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently comparison of 64-bit integer SIMD is declared as non-supported:

https://github.com/opencv/opencv/blame/4.7.0/modules/core/include/opencv2/core/hal/intrin_cpp.hpp#L885

For all types except 64-bit integer values.

No idea why NEON hijacks that and provides some implementation (only for v_uint64x2, but not for signed v_int64x2).
Probably added by mistake here: #7175 (patch should target 64F only).
Also there is contributed test for eq/ne 64-bit here: #15738 (with discussion of misused macro)


Perhaps we need to allow and implement this support for eq/ne (==/!=) comparisons at least for all SIMD backends.

/cc @mshabunin @vpisarev

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this works now w/o workarounds, I've rewritten it in a more convenient way

Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Tested manually with ARMv7, x86_64 desktop and RISC-V RVV.

@asmorkalov
Copy link
Copy Markdown
Contributor

@savuor Please rebase and fix the conflict.

{
// v_select is not available for v_int64, emulating it
v_int64 v_dst0 = v_or(v_and(v_cmp_mask0, v_val), v_and(v_not(v_cmp_mask0), v_src0));
v_int64 v_dst1 = v_or(v_and(v_cmp_mask1, v_val), v_and(v_not(v_cmp_mask1), v_src1));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// v_select is not available for v_int64, emulating it

reinterpret + vselect should work faster than provided emulation.

BTW, it makes sense to provide such implementation in a single place (HAL) /cc @vpisarev

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It really gives +10%...+30% more to performance, thanks!

template <typename _Tp, int cn>
void finiteMask_(const uchar *src, uchar *dst, size_t total)
{
size_t i = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is externally exposed function (through getFiniteMaskFunc).
CV_INSTRUMENT_REGION() is required here to inject vzeroupper.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

asmorkalov pushed a commit that referenced this pull request Nov 3, 2023
Backport to 4.x: patchNaNs() SIMD acceleration #24480

backport from #23098
connected PR in extra: [#1118@extra](opencv/opencv_extra#1118)

### This PR contains:
* new SIMD code for `patchNaNs()`
* CPU perf test

<details>
<summary>Performance comparison</summary>

Geometric mean (ms)

|Name of Test|noopt|sse2|avx2|sse2 vs noopt (x-factor)|avx2 vs noopt (x-factor)|
|---|:-:|:-:|:-:|:-:|:-:|
|PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC1)|0.019|0.017|0.018|1.11|1.07|
|PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC4)|0.037|0.037|0.033|1.00|1.10|
|PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC1)|0.032|0.032|0.033|0.99|0.98|
|PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC4)|0.072|0.072|0.070|1.00|1.03|
|PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC1)|0.051|0.051|0.050|1.00|1.01|
|PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC4)|0.137|0.138|0.128|0.99|1.06|
|PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC1)|0.137|0.128|0.129|1.07|1.06|
|PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC4)|0.450|0.450|0.448|1.00|1.01|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC1)|0.149|0.029|0.020|5.13|7.44|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC2)|0.304|0.058|0.040|5.25|7.65|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC3)|0.448|0.086|0.059|5.22|7.55|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC4)|0.601|0.133|0.083|4.51|7.23|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC1)|0.451|0.093|0.060|4.83|7.52|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC2)|0.892|0.184|0.126|4.85|7.06|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC3)|1.345|0.311|0.230|4.32|5.84|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC4)|1.831|0.546|0.436|3.35|4.20|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC1)|1.017|0.250|0.160|4.06|6.35|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC2)|2.077|0.646|0.605|3.21|3.43|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC3)|3.134|1.053|0.961|2.97|3.26|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC4)|4.222|1.436|1.288|2.94|3.28|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC1)|4.225|1.401|1.277|3.01|3.31|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC2)|8.310|2.953|2.635|2.81|3.15|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC3)|12.396|4.455|4.252|2.78|2.92|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC4)|17.174|5.831|5.824|2.95|2.95|

</details>

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
@savuor
Copy link
Copy Markdown
Contributor Author

savuor commented Nov 7, 2023

Replaced vectorized 64FC3 and 64FC4 by unrolled scalar code back since they gave no actual acceleration

@savuor
Copy link
Copy Markdown
Contributor Author

savuor commented Nov 8, 2023

64FC4 vectorized again, now it gives +20%...+60% depending on image size and SSE2/AVX2

@asmorkalov
Copy link
Copy Markdown
Contributor

OpenCL issue on Intel integrated GPU:

RUN      ] OCL_FiniteMaskFixture_FiniteMask.FiniteMask/3, where GetParam() = (640x480, 64FC1)
OpenCL program build log: core/finitemask
Status -11: CL_BUILD_PROGRAM_FAILURE
-D srcT=double -D cn=1 -D rowsPerWI=4 -D INTEL_DEVICE
1:9:57: error: use of type 'double' requires cl_khr_fp64 extension to be enabled
int src_index = mad24(y0, srcstep, mad24(x, (int)sizeof(srcT) * cn, srcoffset));
                                                        ^
<command line>:1:15: note: expanded from here
#define  srcT double
              ^
1:16:1: error: use of type 'double' requires cl_khr_fp64 extension to be enabled
srcT val = *(__global srcT *)(srcptr + src_index + c * (int)sizeof(srcT));
^
<command line>:1:15: note: expanded from here
#define  srcT double
              ^
1:16:23: error: use of type 'double' requires cl_khr_fp64 extension to be enabled
srcT val = *(__global srcT *)(srcptr + src_index + c * (int)sizeof(srcT));
                      ^
<command line>:1:15: note: expanded from here
#define  srcT double
              ^

[ PERFSTAT ]    (samples=100   mean=0.08   median=0.07   min=0.07   stddev=0.00 (4.4%))

asmorkalov pushed a commit to opencv/opencv_extra that referenced this pull request Nov 9, 2023
Perf sanity data for NaN functions #1037
Connected PR: [#23098@main](opencv/opencv#23098)
@asmorkalov asmorkalov merged commit 53aad98 into opencv:5.x Nov 9, 2023
@savuor savuor deleted the nanMask branch November 9, 2023 07:43
@mshabunin mshabunin mentioned this pull request Nov 22, 2023
2 tasks
IskXCr pushed a commit to Haosonn/opencv that referenced this pull request Dec 20, 2023
Backport to 4.x: patchNaNs() SIMD acceleration opencv#24480

backport from opencv#23098
connected PR in extra: [opencv#1118@extra](opencv/opencv_extra#1118)

### This PR contains:
* new SIMD code for `patchNaNs()`
* CPU perf test

<details>
<summary>Performance comparison</summary>

Geometric mean (ms)

|Name of Test|noopt|sse2|avx2|sse2 vs noopt (x-factor)|avx2 vs noopt (x-factor)|
|---|:-:|:-:|:-:|:-:|:-:|
|PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC1)|0.019|0.017|0.018|1.11|1.07|
|PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC4)|0.037|0.037|0.033|1.00|1.10|
|PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC1)|0.032|0.032|0.033|0.99|0.98|
|PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC4)|0.072|0.072|0.070|1.00|1.03|
|PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC1)|0.051|0.051|0.050|1.00|1.01|
|PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC4)|0.137|0.138|0.128|0.99|1.06|
|PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC1)|0.137|0.128|0.129|1.07|1.06|
|PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC4)|0.450|0.450|0.448|1.00|1.01|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC1)|0.149|0.029|0.020|5.13|7.44|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC2)|0.304|0.058|0.040|5.25|7.65|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC3)|0.448|0.086|0.059|5.22|7.55|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC4)|0.601|0.133|0.083|4.51|7.23|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC1)|0.451|0.093|0.060|4.83|7.52|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC2)|0.892|0.184|0.126|4.85|7.06|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC3)|1.345|0.311|0.230|4.32|5.84|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC4)|1.831|0.546|0.436|3.35|4.20|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC1)|1.017|0.250|0.160|4.06|6.35|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC2)|2.077|0.646|0.605|3.21|3.43|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC3)|3.134|1.053|0.961|2.97|3.26|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC4)|4.222|1.436|1.288|2.94|3.28|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC1)|4.225|1.401|1.277|3.01|3.31|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC2)|8.310|2.953|2.635|2.81|3.15|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC3)|12.396|4.455|4.252|2.78|2.92|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC4)|17.174|5.831|5.824|2.95|2.95|

</details>

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request Jan 4, 2024
Backport to 4.x: patchNaNs() SIMD acceleration opencv#24480

backport from opencv#23098
connected PR in extra: [opencv#1118@extra](opencv/opencv_extra#1118)

### This PR contains:
* new SIMD code for `patchNaNs()`
* CPU perf test

<details>
<summary>Performance comparison</summary>

Geometric mean (ms)

|Name of Test|noopt|sse2|avx2|sse2 vs noopt (x-factor)|avx2 vs noopt (x-factor)|
|---|:-:|:-:|:-:|:-:|:-:|
|PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC1)|0.019|0.017|0.018|1.11|1.07|
|PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC4)|0.037|0.037|0.033|1.00|1.10|
|PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC1)|0.032|0.032|0.033|0.99|0.98|
|PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC4)|0.072|0.072|0.070|1.00|1.03|
|PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC1)|0.051|0.051|0.050|1.00|1.01|
|PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC4)|0.137|0.138|0.128|0.99|1.06|
|PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC1)|0.137|0.128|0.129|1.07|1.06|
|PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC4)|0.450|0.450|0.448|1.00|1.01|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC1)|0.149|0.029|0.020|5.13|7.44|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC2)|0.304|0.058|0.040|5.25|7.65|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC3)|0.448|0.086|0.059|5.22|7.55|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC4)|0.601|0.133|0.083|4.51|7.23|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC1)|0.451|0.093|0.060|4.83|7.52|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC2)|0.892|0.184|0.126|4.85|7.06|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC3)|1.345|0.311|0.230|4.32|5.84|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC4)|1.831|0.546|0.436|3.35|4.20|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC1)|1.017|0.250|0.160|4.06|6.35|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC2)|2.077|0.646|0.605|3.21|3.43|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC3)|3.134|1.053|0.961|2.97|3.26|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC4)|4.222|1.436|1.288|2.94|3.28|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC1)|4.225|1.401|1.277|3.01|3.31|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC2)|8.310|2.953|2.635|2.81|3.15|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC3)|12.396|4.455|4.252|2.78|2.92|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC4)|17.174|5.831|5.824|2.95|2.95|

</details>

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
Backport to 4.x: patchNaNs() SIMD acceleration opencv#24480

backport from opencv#23098
connected PR in extra: [opencv#1118@extra](opencv/opencv_extra#1118)

### This PR contains:
* new SIMD code for `patchNaNs()`
* CPU perf test

<details>
<summary>Performance comparison</summary>

Geometric mean (ms)

|Name of Test|noopt|sse2|avx2|sse2 vs noopt (x-factor)|avx2 vs noopt (x-factor)|
|---|:-:|:-:|:-:|:-:|:-:|
|PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC1)|0.019|0.017|0.018|1.11|1.07|
|PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC4)|0.037|0.037|0.033|1.00|1.10|
|PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC1)|0.032|0.032|0.033|0.99|0.98|
|PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC4)|0.072|0.072|0.070|1.00|1.03|
|PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC1)|0.051|0.051|0.050|1.00|1.01|
|PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC4)|0.137|0.138|0.128|0.99|1.06|
|PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC1)|0.137|0.128|0.129|1.07|1.06|
|PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC4)|0.450|0.450|0.448|1.00|1.01|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC1)|0.149|0.029|0.020|5.13|7.44|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC2)|0.304|0.058|0.040|5.25|7.65|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC3)|0.448|0.086|0.059|5.22|7.55|
|PatchNaNs::PatchNaNsFixture::(640x480, 32FC4)|0.601|0.133|0.083|4.51|7.23|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC1)|0.451|0.093|0.060|4.83|7.52|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC2)|0.892|0.184|0.126|4.85|7.06|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC3)|1.345|0.311|0.230|4.32|5.84|
|PatchNaNs::PatchNaNsFixture::(1280x720, 32FC4)|1.831|0.546|0.436|3.35|4.20|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC1)|1.017|0.250|0.160|4.06|6.35|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC2)|2.077|0.646|0.605|3.21|3.43|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC3)|3.134|1.053|0.961|2.97|3.26|
|PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC4)|4.222|1.436|1.288|2.94|3.28|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC1)|4.225|1.401|1.277|3.01|3.31|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC2)|8.310|2.953|2.635|2.81|3.15|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC3)|12.396|4.455|4.252|2.78|2.92|
|PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC4)|17.174|5.831|5.824|2.95|2.95|

</details>

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants