finiteMask() and doubles for patchNaNs() by savuor · Pull Request #23098 · opencv/opencv

savuor · 2023-01-05T08:55:37Z

Related to #22826
Connected PR in extra: #1037@extra

TODOs:

Vectorize finiteMask() for 64FC3 and 64FC4

Changes

This PR:

adds a new function finiteMask()
extends patchNaNs() by CV_64F support
moves patchNaNs() and finiteMask() to a separate file

NOTE: now the function is called finiteMask() as discussed with the OpenCV core team

Performance comparison

Geometric mean (ms)

Name of Test	noopt	sse2	default	sse2 vs noopt (x-factor)	default vs noopt (x-factor)
FiniteMask::FiniteMaskFixture::(640x480, 32FC1)	0.066	0.021	0.025	3.09	2.67
FiniteMask::FiniteMaskFixture::(640x480, 64FC1)	0.189	0.065	0.047	2.89	4.04
FiniteMask::FiniteMaskFixture::(640x480, 32FC2)	0.278	0.052	0.058	5.36	4.82
FiniteMask::FiniteMaskFixture::(640x480, 64FC2)	0.279	0.157	0.100	1.78	2.79
FiniteMask::FiniteMaskFixture::(640x480, 32FC3)	0.284	0.189	0.244	1.50	1.16
FiniteMask::FiniteMaskFixture::(640x480, 64FC3)	0.298	0.174	0.266	1.71	1.12
FiniteMask::FiniteMaskFixture::(640x480, 32FC4)	0.290	0.092	0.097	3.14	2.99
FiniteMask::FiniteMaskFixture::(640x480, 64FC4)	0.343	0.317	0.242	1.08	1.42
FiniteMask::FiniteMaskFixture::(1280x720, 32FC1)	0.201	0.066	0.068	3.04	2.98
FiniteMask::FiniteMaskFixture::(1280x720, 64FC1)	0.914	0.224	0.165	4.07	5.54
FiniteMask::FiniteMaskFixture::(1280x720, 32FC2)	0.843	0.196	0.176	4.30	4.78
FiniteMask::FiniteMaskFixture::(1280x720, 64FC2)	0.936	0.676	0.566	1.38	1.66
FiniteMask::FiniteMaskFixture::(1280x720, 32FC3)	0.901	0.637	0.780	1.42	1.16
FiniteMask::FiniteMaskFixture::(1280x720, 64FC3)	1.118	0.923	1.147	1.21	0.97
FiniteMask::FiniteMaskFixture::(1280x720, 32FC4)	0.982	0.536	0.529	1.83	1.86
FiniteMask::FiniteMaskFixture::(1280x720, 64FC4)	1.396	1.352	1.304	1.03	1.07
FiniteMask::FiniteMaskFixture::(1920x1080, 32FC1)	0.477	0.246	0.206	1.94	2.32
FiniteMask::FiniteMaskFixture::(1920x1080, 64FC1)	1.660	0.745	0.683	2.23	2.43
FiniteMask::FiniteMaskFixture::(1920x1080, 32FC2)	1.938	0.707	1.092	2.74	1.77
FiniteMask::FiniteMaskFixture::(1920x1080, 64FC2)	2.202	1.658	1.612	1.33	1.37
FiniteMask::FiniteMaskFixture::(1920x1080, 32FC3)	2.117	1.521	1.786	1.39	1.19
FiniteMask::FiniteMaskFixture::(1920x1080, 64FC3)	2.603	2.277	2.622	1.14	0.99
FiniteMask::FiniteMaskFixture::(1920x1080, 32FC4)	2.282	1.487	1.496	1.53	1.52
FiniteMask::FiniteMaskFixture::(1920x1080, 64FC4)	3.247	3.142	2.866	1.03	1.13
FiniteMask::FiniteMaskFixture::(3840x2160, 32FC1)	2.397	2.387	2.132	1.00	1.12
FiniteMask::FiniteMaskFixture::(3840x2160, 64FC1)	10.340	3.801	3.422	2.72	3.02
FiniteMask::FiniteMaskFixture::(3840x2160, 32FC2)	7.811	3.759	3.421	2.08	2.28
FiniteMask::FiniteMaskFixture::(3840x2160, 64FC2)	8.708	7.136	6.361	1.22	1.37
FiniteMask::FiniteMaskFixture::(3840x2160, 32FC3)	8.577	6.366	7.692	1.35	1.12
FiniteMask::FiniteMaskFixture::(3840x2160, 64FC3)	11.015	9.593	11.396	1.15	0.97
FiniteMask::FiniteMaskFixture::(3840x2160, 32FC4)	9.330	6.539	6.451	1.43	1.45
FiniteMask::FiniteMaskFixture::(3840x2160, 64FC4)	13.350	12.691	12.341	1.05	1.08
FiniteMask::OCL_FiniteMaskFixture::(640x480, 32FC1)	0.017	0.016	0.016	1.04	1.02
FiniteMask::OCL_FiniteMaskFixture::(640x480, 64FC1)	0.016	0.022	0.017	0.73	0.92
FiniteMask::OCL_FiniteMaskFixture::(640x480, 32FC3)	0.025	0.025	0.027	1.00	0.93
FiniteMask::OCL_FiniteMaskFixture::(640x480, 64FC3)	0.039	0.036	0.045	1.08	0.87
FiniteMask::OCL_FiniteMaskFixture::(640x480, 32FC4)	0.030	0.029	0.029	1.05	1.02
FiniteMask::OCL_FiniteMaskFixture::(640x480, 64FC4)	0.045	0.051	0.047	0.88	0.96
FiniteMask::OCL_FiniteMaskFixture::(1280x720, 32FC1)	0.033	0.033	0.033	1.01	0.99
FiniteMask::OCL_FiniteMaskFixture::(1280x720, 64FC1)	0.044	0.045	0.043	0.98	1.02
FiniteMask::OCL_FiniteMaskFixture::(1280x720, 32FC3)	0.056	0.057	0.054	0.98	1.02
FiniteMask::OCL_FiniteMaskFixture::(1280x720, 64FC3)	0.090	0.091	0.092	0.99	0.98
FiniteMask::OCL_FiniteMaskFixture::(1280x720, 32FC4)	0.067	0.066	0.068	1.01	0.99
FiniteMask::OCL_FiniteMaskFixture::(1280x720, 64FC4)	0.113	0.115	0.114	0.98	0.99
FiniteMask::OCL_FiniteMaskFixture::(1920x1080, 32FC1)	0.052	0.048	0.053	1.10	0.99
FiniteMask::OCL_FiniteMaskFixture::(1920x1080, 64FC1)	0.077	0.078	0.076	0.98	1.01
FiniteMask::OCL_FiniteMaskFixture::(1920x1080, 32FC3)	0.101	0.101	0.101	1.00	1.00
FiniteMask::OCL_FiniteMaskFixture::(1920x1080, 64FC3)	0.182	0.181	0.182	1.01	1.00
FiniteMask::OCL_FiniteMaskFixture::(1920x1080, 32FC4)	0.129	0.127	0.129	1.02	1.00
FiniteMask::OCL_FiniteMaskFixture::(1920x1080, 64FC4)	0.231	0.231	0.232	1.00	0.99
FiniteMask::OCL_FiniteMaskFixture::(3840x2160, 32FC1)	0.152	0.154	0.154	0.99	0.99
FiniteMask::OCL_FiniteMaskFixture::(3840x2160, 64FC1)	0.250	0.250	0.251	1.00	1.00
FiniteMask::OCL_FiniteMaskFixture::(3840x2160, 32FC3)	0.355	0.353	0.354	1.00	1.00
FiniteMask::OCL_FiniteMaskFixture::(3840x2160, 64FC3)	0.661	0.661	0.660	1.00	1.00
FiniteMask::OCL_FiniteMaskFixture::(3840x2160, 32FC4)	0.455	0.455	0.456	1.00	1.00
FiniteMask::OCL_FiniteMaskFixture::(3840x2160, 64FC4)	0.867	0.866	0.866	1.00	1.00
PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC1)	0.018	0.018	0.019	1.01	0.95
PatchNaNs::OCL_PatchNaNsFixture::(640x480, 64FC1)	0.029	0.026	0.027	1.10	1.06
PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC3)	0.032	0.034	0.032	0.96	1.01
PatchNaNs::OCL_PatchNaNsFixture::(640x480, 64FC3)	0.041	0.041	0.041	1.00	0.99
PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC4)	0.035	0.035	0.032	0.99	1.11
PatchNaNs::OCL_PatchNaNsFixture::(640x480, 64FC4)	0.049	0.048	0.047	1.03	1.04
PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC1)	0.032	0.032	0.030	1.00	1.08
PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 64FC1)	0.043	0.042	0.043	1.02	0.98
PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC3)	0.059	0.054	0.059	1.08	0.99
PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 64FC3)	0.087	0.086	0.085	1.01	1.02
PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC4)	0.072	0.066	0.071	1.08	1.01
PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 64FC4)	0.110	0.108	0.110	1.02	1.00
PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC1)	0.047	0.047	0.047	1.00	1.01
PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 64FC1)	0.069	0.070	0.070	1.00	1.00
PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC3)	0.103	0.103	0.103	1.00	0.99
PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 64FC3)	0.171	0.168	0.171	1.02	1.00
PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC4)	0.128	0.129	0.128	0.99	1.00
PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 64FC4)	0.220	0.221	0.223	1.00	0.99
PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC1)	0.128	0.127	0.128	1.01	1.00
PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 64FC1)	0.221	0.222	0.222	0.99	0.99
PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC3)	0.343	0.341	0.346	1.01	0.99
PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 64FC3)	0.626	0.626	0.625	1.00	1.00
PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC4)	0.452	0.452	0.454	1.00	0.99
PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 64FC4)	0.826	0.826	0.827	1.00	1.00
PatchNaNs::PatchNaNsFixture::(640x480, 32FC1)	0.152	0.028	0.017	5.33	9.03
PatchNaNs::PatchNaNsFixture::(640x480, 64FC1)	0.226	0.079	0.043	2.85	5.28
PatchNaNs::PatchNaNsFixture::(640x480, 32FC2)	0.305	0.058	0.033	5.28	9.13
PatchNaNs::PatchNaNsFixture::(640x480, 64FC2)	0.456	0.158	0.086	2.89	5.30
PatchNaNs::PatchNaNsFixture::(640x480, 32FC3)	0.454	0.087	0.050	5.19	9.06
PatchNaNs::PatchNaNsFixture::(640x480, 64FC3)	0.697	0.250	0.131	2.79	5.31
PatchNaNs::PatchNaNsFixture::(640x480, 32FC4)	0.603	0.119	0.068	5.08	8.85
PatchNaNs::PatchNaNsFixture::(640x480, 64FC4)	0.934	0.347	0.205	2.69	4.55
PatchNaNs::PatchNaNsFixture::(1280x720, 32FC1)	0.458	0.088	0.050	5.21	9.19
PatchNaNs::PatchNaNsFixture::(1280x720, 64FC1)	0.702	0.253	0.130	2.78	5.40
PatchNaNs::PatchNaNsFixture::(1280x720, 32FC2)	0.915	0.187	0.136	4.90	6.72
PatchNaNs::PatchNaNsFixture::(1280x720, 64FC2)	1.435	0.637	0.493	2.25	2.91
PatchNaNs::PatchNaNsFixture::(1280x720, 32FC3)	1.377	0.330	0.206	4.17	6.68
PatchNaNs::PatchNaNsFixture::(1280x720, 64FC3)	2.164	1.006	0.809	2.15	2.67
PatchNaNs::PatchNaNsFixture::(1280x720, 32FC4)	1.940	0.545	0.452	3.56	4.29
PatchNaNs::PatchNaNsFixture::(1280x720, 64FC4)	2.856	1.365	1.094	2.09	2.61
PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC1)	1.091	0.237	0.123	4.61	8.88
PatchNaNs::PatchNaNsFixture::(1920x1080, 64FC1)	1.638	1.062	0.562	1.54	2.91
PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC2)	2.155	0.648	0.563	3.32	3.83
PatchNaNs::PatchNaNsFixture::(1920x1080, 64FC2)	3.276	1.531	1.252	2.14	2.62
PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC3)	3.229	1.021	0.893	3.16	3.62
PatchNaNs::PatchNaNsFixture::(1920x1080, 64FC3)	4.851	2.313	1.891	2.10	2.57
PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC4)	4.264	1.343	1.238	3.17	3.45
PatchNaNs::PatchNaNsFixture::(1920x1080, 64FC4)	6.450	3.054	2.546	2.11	2.53
PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC1)	4.224	1.340	1.205	3.15	3.51
PatchNaNs::PatchNaNsFixture::(3840x2160, 64FC1)	6.409	3.092	2.549	2.07	2.51
PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC2)	8.320	2.762	2.511	3.01	3.31
PatchNaNs::PatchNaNsFixture::(3840x2160, 64FC2)	12.777	6.285	5.283	2.03	2.42
PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC3)	12.697	4.281	3.833	2.97	3.31
PatchNaNs::PatchNaNsFixture::(3840x2160, 64FC3)	19.309	9.636	7.945	2.00	2.43
PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC4)	16.848	5.745	5.197	2.93	3.24
PatchNaNs::PatchNaNsFixture::(3840x2160, 64FC4)	25.701	12.955	10.637	1.98	2.42

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

modules/core/include/opencv2/core.hpp

modules/core/perf/opencl/perf_arithm.cpp

modules/core/src/mathfuncs.cpp

savuor · 2023-01-17T03:44:36Z

@alalek @vpisarev Looks like there is a bug in cvIsInf(double): sometimes it assumes NaNs for Inf.

The Inf bug was introduced in PR #15370.

This PR provides a fix & regression test.

modules/core/include/opencv2/core/fast_math.hpp

asmorkalov

👍

modules/core/include/opencv2/core/base.hpp

savuor · 2023-02-03T09:17:26Z

Discussed with OpenCV core team, decided to make finiteMask() instead of nanMask() and cut down other features.

modules/3d/perf/perf_tsdf.cpp

modules/3d/test/test_tsdf.cpp

modules/core/include/opencv2/core.hpp

asmorkalov

👍

modules/core/src/opencl/finitemask.cl

alalek · 2023-02-14T05:37:10Z

modules/core/src/mathfuncs.cpp

+{
+    CV_INSTRUMENT_REGION();
+
+    int channels = _img.channels();


channels=5 doesn't throw any exception and do nothing.

alalek · 2023-02-14T05:38:15Z

modules/core/src/mathfuncs.cpp

+            switch (channels)
+            {
+            case 1: finiteMask_<float, 1>((const float*)sptr, dptr, total); break;
+            case 2: finiteMask_<float, 2>((const float*)sptr, dptr, total); break;
+            case 3: finiteMask_<float, 3>((const float*)sptr, dptr, total); break;
+            case 4: finiteMask_<float, 4>((const float*)sptr, dptr, total); break;
+            }


channels values are not validated at all.
For channels=5 function does nothing and doesn't throw any exception.

alalek · 2023-02-14T05:42:43Z

modules/core/src/mathfuncs.cpp

+    }
+}
+
+#if CV_SIMD


SIMD optimizations in core module should go to .simd.hpp.

Should I add it through HAL and all SIMD dispatching mechanisms as other functions in mathfuncs_core.simd.hpp are done or there are easier ways?

Other functions should be handled separately (do not touch them in this PR).

At first we need to collect performance for different ISA optimizations for added code: https://github.com/opencv/opencv/wiki/CPU-optimizations-build-options#optimization-developer-guide

modules/core/src/mathfuncs.cpp

modules/core/include/opencv2/core.hpp

modules/core/perf/perf_precomp.hpp

modules/core/src/mathfuncs.cpp

alalek · 2023-02-14T05:52:48Z

modules/core/src/mathfuncs.cpp

+#if !CV_SIMD128_64F
+        v_int64 mask10 = vx_setall_s64(0xffffffff00000000);


CV_SIMD128_64F

64F is about double (float64) type.
Using it to limit int64 processing is wrong.

Sure, but this is how to enable int64 comparison in NEON universal intrinsics: intrin_neon.hpp

Currently comparison of 64-bit integer SIMD is declared as non-supported:

https://github.com/opencv/opencv/blame/4.7.0/modules/core/include/opencv2/core/hal/intrin_cpp.hpp#L885

For all types except 64-bit integer values.

No idea why NEON hijacks that and provides some implementation (only for v_uint64x2, but not for signed v_int64x2).
Probably added by mistake here: #7175 (patch should target 64F only).
Also there is contributed test for eq/ne 64-bit here: #15738 (with discussion of misused macro)

Perhaps we need to allow and implement this support for eq/ne (==/!=) comparisons at least for all SIMD backends.

/cc @mshabunin @vpisarev

Since this works now w/o workarounds, I've rewritten it in a more convenient way

modules/core/src/nan_mask.simd.hpp

asmorkalov

👍 Tested manually with ARMv7, x86_64 desktop and RISC-V RVV.

asmorkalov · 2023-10-24T14:08:35Z

@savuor Please rebase and fix the conflict.

opencv-alalek · 2023-11-01T09:37:16Z

modules/core/src/nan_mask.simd.hpp

+        {
+            // v_select is not available for v_int64, emulating it
+            v_int64 v_dst0 = v_or(v_and(v_cmp_mask0, v_val), v_and(v_not(v_cmp_mask0), v_src0));
+            v_int64 v_dst1 = v_or(v_and(v_cmp_mask1, v_val), v_and(v_not(v_cmp_mask1), v_src1));


// v_select is not available for v_int64, emulating it

reinterpret + vselect should work faster than provided emulation.

BTW, it makes sense to provide such implementation in a single place (HAL) /cc @vpisarev

It really gives +10%...+30% more to performance, thanks!

opencv-alalek · 2023-11-01T09:40:12Z

modules/core/src/nan_mask.simd.hpp

+template <typename _Tp, int cn>
+void finiteMask_(const uchar *src, uchar *dst, size_t total)
+{
+    size_t i = 0;


This is externally exposed function (through getFiniteMaskFunc).
CV_INSTRUMENT_REGION() is required here to inject vzeroupper.

Backport to 4.x: patchNaNs() SIMD acceleration #24480 backport from #23098 connected PR in extra: [#1118@extra](opencv/opencv_extra#1118) ### This PR contains: * new SIMD code for `patchNaNs()` * CPU perf test <details> <summary>Performance comparison</summary> Geometric mean (ms) |Name of Test|noopt|sse2|avx2|sse2 vs noopt (x-factor)|avx2 vs noopt (x-factor)| |---|:-:|:-:|:-:|:-:|:-:| |PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC1)|0.019|0.017|0.018|1.11|1.07| |PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC4)|0.037|0.037|0.033|1.00|1.10| |PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC1)|0.032|0.032|0.033|0.99|0.98| |PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC4)|0.072|0.072|0.070|1.00|1.03| |PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC1)|0.051|0.051|0.050|1.00|1.01| |PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC4)|0.137|0.138|0.128|0.99|1.06| |PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC1)|0.137|0.128|0.129|1.07|1.06| |PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC4)|0.450|0.450|0.448|1.00|1.01| |PatchNaNs::PatchNaNsFixture::(640x480, 32FC1)|0.149|0.029|0.020|5.13|7.44| |PatchNaNs::PatchNaNsFixture::(640x480, 32FC2)|0.304|0.058|0.040|5.25|7.65| |PatchNaNs::PatchNaNsFixture::(640x480, 32FC3)|0.448|0.086|0.059|5.22|7.55| |PatchNaNs::PatchNaNsFixture::(640x480, 32FC4)|0.601|0.133|0.083|4.51|7.23| |PatchNaNs::PatchNaNsFixture::(1280x720, 32FC1)|0.451|0.093|0.060|4.83|7.52| |PatchNaNs::PatchNaNsFixture::(1280x720, 32FC2)|0.892|0.184|0.126|4.85|7.06| |PatchNaNs::PatchNaNsFixture::(1280x720, 32FC3)|1.345|0.311|0.230|4.32|5.84| |PatchNaNs::PatchNaNsFixture::(1280x720, 32FC4)|1.831|0.546|0.436|3.35|4.20| |PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC1)|1.017|0.250|0.160|4.06|6.35| |PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC2)|2.077|0.646|0.605|3.21|3.43| |PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC3)|3.134|1.053|0.961|2.97|3.26| |PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC4)|4.222|1.436|1.288|2.94|3.28| |PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC1)|4.225|1.401|1.277|3.01|3.31| |PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC2)|8.310|2.953|2.635|2.81|3.15| |PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC3)|12.396|4.455|4.252|2.78|2.92| |PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC4)|17.174|5.831|5.824|2.95|2.95| </details> ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake

savuor · 2023-11-07T03:31:07Z

Replaced vectorized 64FC3 and 64FC4 by unrolled scalar code back since they gave no actual acceleration

savuor · 2023-11-08T01:25:34Z

64FC4 vectorized again, now it gives +20%...+60% depending on image size and SSE2/AVX2

asmorkalov · 2023-11-08T07:27:00Z

OpenCL issue on Intel integrated GPU:

RUN      ] OCL_FiniteMaskFixture_FiniteMask.FiniteMask/3, where GetParam() = (640x480, 64FC1)
OpenCL program build log: core/finitemask
Status -11: CL_BUILD_PROGRAM_FAILURE
-D srcT=double -D cn=1 -D rowsPerWI=4 -D INTEL_DEVICE
1:9:57: error: use of type 'double' requires cl_khr_fp64 extension to be enabled
int src_index = mad24(y0, srcstep, mad24(x, (int)sizeof(srcT) * cn, srcoffset));
                                                        ^
<command line>:1:15: note: expanded from here
#define  srcT double
              ^
1:16:1: error: use of type 'double' requires cl_khr_fp64 extension to be enabled
srcT val = *(__global srcT *)(srcptr + src_index + c * (int)sizeof(srcT));
^
<command line>:1:15: note: expanded from here
#define  srcT double
              ^
1:16:23: error: use of type 'double' requires cl_khr_fp64 extension to be enabled
srcT val = *(__global srcT *)(srcptr + src_index + c * (int)sizeof(srcT));
                      ^
<command line>:1:15: note: expanded from here
#define  srcT double
              ^

[ PERFSTAT ]    (samples=100   mean=0.08   median=0.07   min=0.07   stddev=0.00 (4.4%))

Perf sanity data for NaN functions #1037 Connected PR: [#23098@main](opencv/opencv#23098)

Backport to 4.x: patchNaNs() SIMD acceleration opencv#24480 backport from opencv#23098 connected PR in extra: [opencv#1118@extra](opencv/opencv_extra#1118) ### This PR contains: * new SIMD code for `patchNaNs()` * CPU perf test <details> <summary>Performance comparison</summary> Geometric mean (ms) |Name of Test|noopt|sse2|avx2|sse2 vs noopt (x-factor)|avx2 vs noopt (x-factor)| |---|:-:|:-:|:-:|:-:|:-:| |PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC1)|0.019|0.017|0.018|1.11|1.07| |PatchNaNs::OCL_PatchNaNsFixture::(640x480, 32FC4)|0.037|0.037|0.033|1.00|1.10| |PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC1)|0.032|0.032|0.033|0.99|0.98| |PatchNaNs::OCL_PatchNaNsFixture::(1280x720, 32FC4)|0.072|0.072|0.070|1.00|1.03| |PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC1)|0.051|0.051|0.050|1.00|1.01| |PatchNaNs::OCL_PatchNaNsFixture::(1920x1080, 32FC4)|0.137|0.138|0.128|0.99|1.06| |PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC1)|0.137|0.128|0.129|1.07|1.06| |PatchNaNs::OCL_PatchNaNsFixture::(3840x2160, 32FC4)|0.450|0.450|0.448|1.00|1.01| |PatchNaNs::PatchNaNsFixture::(640x480, 32FC1)|0.149|0.029|0.020|5.13|7.44| |PatchNaNs::PatchNaNsFixture::(640x480, 32FC2)|0.304|0.058|0.040|5.25|7.65| |PatchNaNs::PatchNaNsFixture::(640x480, 32FC3)|0.448|0.086|0.059|5.22|7.55| |PatchNaNs::PatchNaNsFixture::(640x480, 32FC4)|0.601|0.133|0.083|4.51|7.23| |PatchNaNs::PatchNaNsFixture::(1280x720, 32FC1)|0.451|0.093|0.060|4.83|7.52| |PatchNaNs::PatchNaNsFixture::(1280x720, 32FC2)|0.892|0.184|0.126|4.85|7.06| |PatchNaNs::PatchNaNsFixture::(1280x720, 32FC3)|1.345|0.311|0.230|4.32|5.84| |PatchNaNs::PatchNaNsFixture::(1280x720, 32FC4)|1.831|0.546|0.436|3.35|4.20| |PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC1)|1.017|0.250|0.160|4.06|6.35| |PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC2)|2.077|0.646|0.605|3.21|3.43| |PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC3)|3.134|1.053|0.961|2.97|3.26| |PatchNaNs::PatchNaNsFixture::(1920x1080, 32FC4)|4.222|1.436|1.288|2.94|3.28| |PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC1)|4.225|1.401|1.277|3.01|3.31| |PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC2)|8.310|2.953|2.635|2.81|3.15| |PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC3)|12.396|4.455|4.252|2.78|2.92| |PatchNaNs::PatchNaNsFixture::(3840x2160, 32FC4)|17.174|5.831|5.824|2.95|2.95| </details> ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake

savuor mentioned this pull request Jan 10, 2023

Perf sanity data for NaN functions opencv/opencv_extra#1037

Merged

savuor marked this pull request as ready for review January 13, 2023 01:24

asmorkalov self-requested a review January 13, 2023 07:23

asmorkalov reviewed Jan 13, 2023

View reviewed changes

alalek reviewed Jan 17, 2023

View reviewed changes

modules/core/include/opencv2/core/fast_math.hpp Show resolved Hide resolved

savuor mentioned this pull request Jan 17, 2023

Backport to 3.4: cvIsInf() fix #23145

Merged

6 tasks

asmorkalov approved these changes Jan 18, 2023

View reviewed changes

asmorkalov added bug feature category: core labels Jan 18, 2023

asmorkalov added this to the 5.0 milestone Jan 18, 2023

alalek assigned asmorkalov Jan 18, 2023

savuor force-pushed the nanMask branch from def3675 to 8aae828 Compare January 24, 2023 02:20

asmorkalov requested a review from alalek January 27, 2023 07:19

alalek reviewed Jan 27, 2023

View reviewed changes

modules/core/include/opencv2/core/base.hpp Outdated Show resolved Hide resolved

savuor force-pushed the nanMask branch from c5b95e5 to bff6544 Compare February 6, 2023 23:37

asmorkalov reviewed Feb 7, 2023

View reviewed changes

vpisarev requested a review from asmorkalov February 10, 2023 08:42

savuor changed the title ~~nanMask() and doubles for patchNaNs()~~ finiteMask() and doubles for patchNaNs() Feb 10, 2023

asmorkalov approved these changes Feb 13, 2023

View reviewed changes

alalek reviewed Feb 14, 2023

View reviewed changes

This was referenced Feb 22, 2023

add multiview calibration [GSOC 2022] #22363

Merged

core(simd): 64-bit integer EQ/NE without misused 64F guard #23307

Merged

savuor force-pushed the nanMask branch from 52d70b7 to 51a104d Compare October 20, 2023 00:34

asmorkalov reviewed Oct 24, 2023

View reviewed changes

modules/core/src/nan_mask.simd.hpp Outdated Show resolved Hide resolved

asmorkalov approved these changes Oct 24, 2023

View reviewed changes

patchNaNs: adding CV_64F support (draft)

f7b37d4

Rostislav Vasilikhin added 4 commits November 1, 2023 00:17

RNG fix

7a8f6e6

v_check_any() improvement 32f

f46c84f

loop conditions, size_t -> int

9ca84ee

vx_cleanup)( removed

0fb307f

opencv-alalek reviewed Nov 1, 2023

View reviewed changes

Rostislav Vasilikhin added 4 commits November 1, 2023 15:00

theRNG()

50e398d

v_select for 64f

4ce7bb9

vzeroupper added

b919e04

theRNG() fix

3261488

savuor mentioned this pull request Nov 2, 2023

Backport to 4.x: patchNaNs() SIMD acceleration #24480

Merged

6 tasks

Rostislav Vasilikhin added 4 commits November 7, 2023 02:01

64fc2 accelerated

ee6260c

minor

3f6ef22

remove bad vectorization for 64fc4

35a16cb

64FC3 vectorization reverted

f8a144e

Rostislav Vasilikhin added 2 commits November 7, 2023 21:52

32fc4 accelerated

82fe313

64fc4 vectorized again

94c0c56

64F OCL fix

7139dca

asmorkalov pushed a commit to opencv/opencv_extra that referenced this pull request Nov 9, 2023

Merge pull request #1037 from savuor:nanMask

df83fe2

Perf sanity data for NaN functions #1037 Connected PR: [#23098@main](opencv/opencv#23098)

asmorkalov merged commit 53aad98 into opencv:5.x Nov 9, 2023

savuor deleted the nanMask branch November 9, 2023 07:43

savuor mentioned this pull request Nov 9, 2023

nanMask call for in OpenCV core #22826

Closed

mshabunin mentioned this pull request Nov 22, 2023

cv::magnitudeSqr() #15683

Open

2 tasks

		#if !CV_SIMD128_64F
		v_int64 mask10 = vx_setall_s64(0xffffffff00000000);

Uh oh!

Conversation

savuor commented Jan 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODOs:

Changes

Pull Request Readiness Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

savuor commented Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

asmorkalov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

savuor commented Feb 3, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

asmorkalov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

asmorkalov left a comment

Choose a reason for hiding this comment

Uh oh!

asmorkalov commented Oct 24, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

savuor commented Nov 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

savuor commented Nov 8, 2023

Uh oh!

asmorkalov commented Nov 8, 2023

Uh oh!

Reviewers

savuor commented Jan 5, 2023 •

edited

Loading

savuor commented Jan 17, 2023 •

edited

Loading

savuor commented Nov 7, 2023 •

edited

Loading