Vectorize calculating integral for line for single and multiple channels by ChipKerchner · Pull Request #16556 · opencv/opencv

ChipKerchner · 2020-02-11T12:48:13Z

Vectorize calculating integral for line for single and multiple channels - up to 2.75x faster.

force_builders=Linux AVX2,Custom
buildworker:Custom=linux-3
build_image:Custom=ubuntu:18.04
CPU_BASELINE:Custom=AVX512_SKX
disable_ipp=ON

alalek · 2020-02-12T16:27:55Z

modules/imgproc/src/sumpixels.simd.hpp

+                    prev = vx_setall_f64(v_extract_n<v_float64::nlanes - 1>(el4hh));
+//                    prev = v_broadcast_element<v_float64::nlanes - 1>(el4hh);


Why removed v_broadcast_element()?

v_broadcast_element for v_float64 is not available for all platforms. Left this in for when they are added.

alalek · 2020-02-12T16:31:06Z

modules/imgproc/src/sumpixels.simd.hpp

+    }
+};
+
+#if CV_SIMD128_64F && !CV_AVX512_SKX


Why is excluded CV_AVX512_SKX?
Do we want CV_SIMD_WIDTH <= 32 here instead?

There is already a AVX512 version for doubles. See in above code.

terfendail · 2020-02-18T13:13:11Z

modules/imgproc/src/sumpixels.simd.hpp

+                v_int32 prev_1 = vx_setzero_s32(), prev_2 = vx_setzero_s32(),
+                        prev_3 = vx_setzero_s32(), prev_4 = vx_setzero_s32();
+                int j = 0;
+                for ( ; j + v_uint16::nlanes * cn <= width; j += v_uint16::nlanes * cn)


The code looks over-complicated to me. IMO it would be better to process one vector at a time and reduce amount of shifts and additions starting with addition of element quads.

terfendail · 2020-02-18T13:18:03Z

I've collected performance for the existing change on my setup

Performance for SSE2 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.003	0.003	0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.002	0.002	1.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.009	0.004	2.02
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.017	0.005	3.53
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.010	0.004	2.42
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.018	0.008	2.13
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.026	0.017	1.56
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.015	0.016	0.92
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.026	0.019	1.33
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.034	0.012	2.85
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.010	1.96
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.034	0.021	1.61
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.087	0.089	0.97
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.062	0.066	0.93
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.324	0.154	2.10
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.661	0.155	4.26
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.348	0.153	2.28
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.650	0.300	2.17
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	0.993	0.642	1.55
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	0.520	0.617	0.84
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	0.985	0.755	1.31
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	1.312	0.444	2.95
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.727	0.384	1.89
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	1.426	0.802	1.78
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	0.263	0.249	1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	0.183	0.193	0.95
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	0.971	0.440	2.21
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	1.980	0.478	4.14
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	1.021	0.461	2.21
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	2.002	1.008	1.99
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	2.950	1.850	1.59
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	1.635	1.852	0.88
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	3.281	2.246	1.46
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	4.114	1.403	2.93
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.776	1.274	2.18
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	5.018	2.699	1.86
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	0.605	0.588	1.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	0.447	0.462	0.97
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	2.307	1.156	2.00
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	4.670	1.256	3.72
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	2.477	1.218	2.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	4.790	2.587	1.85
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	7.003	4.342	1.61
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	4.255	4.356	0.98
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	8.015	5.343	1.50
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	9.576	3.276	2.92
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.253	3.006	2.08
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	11.487	6.133	1.87

Performance for SSE3 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.003	0.003	0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.002	0.002	0.98
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.009	0.004	2.13
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.017	0.005	3.52
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.010	0.004	2.37
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.018	0.008	2.16
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.025	0.017	1.53
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.015	0.016	0.92
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.027	0.019	1.44
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.034	0.012	2.81
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.010	1.94
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.036	0.020	1.75
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.089	0.089	1.00
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.066	0.064	1.03
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.359	0.146	2.46
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.651	0.155	4.20
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.357	0.147	2.44
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.696	0.307	2.27
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	0.982	0.641	1.53
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	0.526	0.615	0.86
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	1.062	0.692	1.54
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	1.304	0.421	3.10
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.746	0.373	2.00
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	1.489	0.809	1.84
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	0.250	0.250	1.00
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	0.192	0.192	1.00
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	1.037	0.445	2.33
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	1.977	0.480	4.12
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	1.033	0.451	2.29
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	2.106	1.009	2.09
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	2.938	1.848	1.59
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	1.667	1.851	0.90
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	3.438	2.161	1.59
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	4.207	1.352	3.11
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.916	1.230	2.37
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	5.246	2.597	2.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	0.594	0.580	1.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	0.473	0.460	1.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	2.454	1.141	2.15
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	4.691	1.238	3.79
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	2.487	1.193	2.08
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	4.974	2.554	1.95
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	6.955	4.257	1.63
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	4.295	4.332	0.99
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	8.487	5.008	1.69
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	9.514	3.176	3.00
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.302	2.953	2.13
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	12.035	5.964	2.02

Performance for SSE4_2 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.003	0.003	0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.002	0.002	1.01
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.009	0.005	1.88
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.017	0.005	3.67
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.011	0.004	2.55
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.017	0.008	2.05
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.025	0.011	2.28
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.015	0.012	1.32
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.025	0.014	1.76
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.034	0.012	2.89
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.010	1.98
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.034	0.020	1.65
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.089	0.087	1.02
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.071	0.068	1.04
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.340	0.158	2.15
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.685	0.149	4.60
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.357	0.137	2.60
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.673	0.301	2.23
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	1.011	0.404	2.50
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	0.551	0.439	1.26
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	1.032	0.527	1.96
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	1.355	0.413	3.28
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.769	0.372	2.07
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	1.472	0.797	1.85
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	0.259	0.251	1.03
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	0.216	0.205	1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	1.025	0.477	2.15
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	2.083	0.462	4.51
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	1.077	0.427	2.52
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	2.091	0.995	2.10
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	3.087	1.207	2.56
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	1.731	1.288	1.34
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	3.420	1.766	1.94
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	4.190	1.307	3.21
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.902	1.215	2.39
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	5.128	2.551	2.01
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	0.612	0.571	1.07
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	0.499	0.487	1.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	2.336	1.162	2.01
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	4.718	1.206	3.91
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	2.525	1.164	2.17
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	4.921	2.534	1.94
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	7.053	2.881	2.45
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	4.398	2.955	1.49
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	8.256	4.204	1.96
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	9.619	3.114	3.09
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.353	2.928	2.17
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	11.662	5.916	1.97

Performance for AVX2 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.002	0.003	0.98
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.002	0.002	0.98
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.009	0.003	2.60
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.017	0.005	3.52
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.010	0.004	2.65
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.017	0.007	2.59
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.025	0.008	3.23
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.015	0.007	2.16
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.025	0.011	2.27
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.034	0.010	3.22
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.010	2.07
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.034	0.017	1.96
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.069	0.067	1.03
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.055	0.056	0.98
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.344	0.106	3.24
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.678	0.149	4.54
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.356	0.126	2.83
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.680	0.227	3.00
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	1.013	0.260	3.89
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	0.541	0.246	2.20
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	1.031	0.405	2.54
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	1.344	0.356	3.78
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.747	0.354	2.11
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	1.486	0.794	1.87
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	0.195	0.193	1.01
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	0.174	0.171	1.02
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	0.991	0.318	3.11
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	1.959	0.426	4.60
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	1.019	0.392	2.60
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	2.022	0.871	2.32
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	2.925	0.794	3.69
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	1.646	0.761	2.16
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	3.268	1.523	2.15
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	4.049	1.193	3.39
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.818	1.180	2.39
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	4.835	2.465	1.96
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	0.458	0.451	1.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	0.408	0.413	0.99
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	2.262	0.945	2.39
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	4.441	1.150	3.86
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	2.404	1.093	2.20
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	4.834	2.291	2.11
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	6.938	2.018	3.44
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	4.265	1.993	2.14
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	8.018	3.729	2.15
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	9.524	2.879	3.31
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.241	2.854	2.19
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	11.551	5.547	2.08

Performance for AVX512 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.003	0.003	0.96
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.002	0.002	0.97
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.005	0.006	0.96
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.017	0.005	3.65
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.010	0.004	2.59
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.005	0.005	1.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.025	0.007	3.43
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.015	0.006	2.34
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.006	0.006	1.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.034	0.008	4.08
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.007	2.65
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.007	0.007	0.97
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.055	0.052	1.04
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.055	0.052	1.06
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.175	0.186	0.94
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.655	0.118	5.56
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.338	0.116	2.90
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.212	0.205	1.03
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	0.978	0.204	4.79
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	0.523	0.189	2.77
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	0.329	0.316	1.04
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	1.297	0.240	5.40
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.739	0.240	3.08
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	0.493	0.468	1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	0.180	0.174	1.04
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	0.181	0.172	1.05
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	0.521	0.531	0.98
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	1.984	0.370	5.36
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	1.018	0.358	2.85
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	0.851	0.822	1.04
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	3.049	0.694	4.39
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	1.732	0.656	2.64
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	1.449	1.387	1.04
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	4.225	1.002	4.22
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.956	1.028	2.88
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	2.102	2.035	1.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	0.435	0.423	1.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	0.436	0.406	1.07
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	1.304	1.305	1.00
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	4.656	1.093	4.26
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	2.501	1.076	2.32
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	2.369	2.285	1.04
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	6.922	1.923	3.60
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	4.311	1.896	2.27
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	3.722	3.583	1.04
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	9.539	2.585	3.69
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.316	2.631	2.40
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	5.232	5.068	1.03

terfendail · 2020-02-18T14:25:11Z

I've tested single vector processing for 4-channel to 32S

                for ( ; j + v_uint16::nlanes <= width; j += v_uint16::nlanes)
                {
                    v_int16 el8 = v_reinterpret_as_s16(vx_load_expand(src_row + j));
                    v_int32 el4l, el4h;
#if CV_AVX2 && CV_SIMD_WIDTH == 32
                    __m256i vsum = _mm256_add_epi16(el8.val, _mm256_slli_si256(el8.val, 8));
                    __m256i shmask = _mm256_set1_epi32(7);
                    el4l.val = _mm256_add_epi32(_mm256_cvtepi16_epi32(_v256_extract_low(vsum)), prev.val);
                    el4h.val = _mm256_add_epi32(_mm256_cvtepi16_epi32(_v256_extract_high(vsum)), _mm256_permute2x128_si256(el4l.val, el4l.val, 0x31));
                    prev.val = _mm256_permute2x128_si256(el4h.val, el4h.val, 0x31);
#else
#if CV_SIMD_WIDTH >= 32
                    el8 += v_rotate_left<4>(el8);
#if CV_SIMD_WIDTH == 64
                    el8 += v_rotate_left<8>(el8);
#endif
#endif
                    v_expand(el8, el4l, el4h);
                    el4l += prev;
                    el4h += el4l;
#if CV_SIMD_WIDTH == 16
                    prev = el4h;
#elif CV_SIMD_WIDTH == 32
                    prev = v_combine_high(el4h, el4h);
#else
                    v_int32 t0, t1; v_zip(el4h, el4h, t0, t1);
                    prev = v_combine_high(t1, t1);
#endif
#endif
                    v_store(sum_row + j                  , el4l + vx_load(prev_sum_row + j                  ));
                    v_store(sum_row + j + v_int32::nlanes, el4h + vx_load(prev_sum_row + j + v_int32::nlanes));
                }

Performance is a bit better on my setup

Performance for SSE2 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.005	4.10
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.727	0.239	3.04
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.776	1.016	2.73
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.253	2.625	2.38

Performance for SSE3 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.005	4.08
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.746	0.229	3.26
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.916	0.988	2.95
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.302	2.562	2.46

Performance for SSE4_2 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.005	4.13
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.769	0.238	3.23
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.902	1.024	2.83
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.353	2.627	2.42

Performance for AVX2 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.005	4.37
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.747	0.227	3.29
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.818	0.993	2.84
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.241	2.568	2.43

Performance for AVX512 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.004	4.85
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.739	0.235	3.15
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.956	0.979	3.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.316	2.562	2.46

terfendail · 2020-02-18T15:17:54Z

Looks like new way to vectorize 8UC1 to 64FC1 works better than existing AVX512 implementation.

Performance for AVX512 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.005	0.004	1.49
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.005	0.006	0.96
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.006	0.009	0.72
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.007	0.010	0.68
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.175	0.107	1.64
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.212	0.214	0.99
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	0.329	0.316	1.04
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	0.493	0.472	1.04
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	0.521	0.320	1.63
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	0.851	0.828	1.03
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	1.449	1.415	1.02
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	2.102	2.056	1.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	1.304	0.959	1.36
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	2.369	2.286	1.04
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	3.722	3.676	1.01
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	5.232	5.141	1.02

ChipKerchner · 2020-02-18T15:53:28Z

Looks like new way to vectorize 8UC1 to 64FC1 works better than existing AVX512 implementation.

Good find! I've implemented 8UC4->32SC4 and 8UC4->32FC4 so far and am seeing an additional 25-30% improvement.

Let me know your ideas for 8UC1->64FC1 or if you'd just like to update with your ideas for the AVX512 version. I don't really have a way to test AVX512 currently.

terfendail · 2020-02-18T16:19:18Z

Regarding AVX512 I've meant that I've tested the generic version that is disabled at the moment for AVX512 instead of specialized calculate_integral_avx512 implementation. So probably it make sense to use calculate_integral_avx512 for multichannel images only.
By the way, have you tried single vector processing for 8UC2 images?

…torizeIntegralSumPixels

ChipKerchner · 2020-02-19T16:28:42Z

I committed the changes for a single vector processing for 4-channels (8UC4->32SC4/32FC4/64FC4). I will look at similar changes for 2-channels when I have time (early testing shows speed to be similar to my version). If the 64FC1 and/or 64FC4 changes are faster than the AVX512 version, I will try to activate this version instead.

Please make sure the AVX512 code (CV_SIMD_WIDTH > 32) is correct. Also if you can rerun the timings including AVX512, that would be useful.

@terfendail, I think this smoke test is failing because of AVX512 (please suggest a fix since it is your code) -
[ FAILED ] Imgproc_Integral.accuracy (19 ms)

terfendail · 2020-02-20T18:01:44Z

Sorry. That was my fault. I've missed the fact that v_zip interleaves channels.
Right version of prev broadcast code should be

#if CV_SIMD_WIDTH == 16
                    prev = el4h;
#elif CV_SIMD_WIDTH == 32
                    prev = v_combine_high(el4h, el4h);
#else
                    v_int32 t = v_rotate_right<12>(el4h);
                    t |= v_rotate_left<4>(t);
                    prev = v_combine_low(t, t);
#endif

Performance for this version is almost the same

Performance for AVX512 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.004	4.79
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.739	0.237	3.11
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.956	0.972	3.04
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.316	2.550	2.48

terfendail · 2020-02-21T18:33:24Z

Performance for SSE2 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.003	0.003	0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.002	0.002	1.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.009	0.004	2.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.017	0.005	3.54
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.010	0.004	2.46
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.018	0.008	2.12
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.026	0.017	1.55
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.015	0.016	0.92
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.026	0.019	1.32
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.034	0.008	4.07
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.005	4.10
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.034	0.010	3.27
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.087	0.084	1.04
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.062	0.063	0.98
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.324	0.147	2.20
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.661	0.153	4.31
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.348	0.142	2.45
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.650	0.294	2.21
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	0.993	0.614	1.62
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	0.520	0.626	0.83
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	0.985	0.730	1.35
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	1.312	0.320	4.10
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.727	0.222	3.28
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	1.426	0.478	2.98
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	0.263	0.251	1.05
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	0.183	0.184	0.99
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	0.971	0.445	2.18
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	1.980	0.473	4.18
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	1.021	0.433	2.36
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	2.002	1.015	1.97
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	2.950	1.845	1.60
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	1.635	1.867	0.88
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	3.281	2.225	1.47
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	4.114	1.084	3.79
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.776	1.018	2.73
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	5.018	2.044	2.45
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	0.605	0.589	1.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	0.447	0.443	1.01
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	2.307	1.139	2.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	4.670	1.234	3.79
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	2.477	1.195	2.07
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	4.790	2.535	1.89
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	7.003	4.326	1.62
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	4.255	4.342	0.98
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	8.015	5.284	1.52
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	9.576	2.689	3.56
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.253	2.574	2.43
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	11.487	4.981	2.31

Performance for SSE3 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.003	0.003	1.03
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.002	0.002	0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.009	0.004	2.19
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.017	0.005	3.62
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.010	0.004	2.39
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.018	0.008	2.13
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.025	0.017	1.53
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.015	0.016	0.91
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.027	0.019	1.44
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.034	0.008	4.03
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.005	4.08
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.036	0.010	3.46
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.089	0.087	1.02
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.066	0.064	1.03
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.359	0.146	2.45
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.651	0.154	4.22
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.357	0.143	2.50
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.696	0.299	2.33
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	0.982	0.622	1.58
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	0.526	0.616	0.85
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	1.062	0.692	1.54
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	1.304	0.312	4.18
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.746	0.229	3.26
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	1.489	0.476	3.13
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	0.250	0.249	1.00
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	0.192	0.182	1.05
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	1.037	0.436	2.38
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	1.977	0.465	4.25
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	1.033	0.442	2.34
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	2.106	0.995	2.12
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	2.938	1.847	1.59
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	1.667	1.860	0.90
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	3.438	2.164	1.59
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	4.207	1.067	3.94
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.916	1.004	2.90
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	5.246	2.000	2.62
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	0.594	0.564	1.05
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	0.473	0.440	1.07
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	2.454	1.112	2.21
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	4.691	1.215	3.86
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	2.487	1.187	2.10
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	4.974	2.483	2.00
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	6.955	4.158	1.67
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	4.295	4.192	1.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	8.487	4.973	1.71
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	9.514	2.633	3.61
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.302	2.534	2.49
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	12.035	4.882	2.47

Performance for SSE4_2 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.003	0.003	0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.002	0.002	0.98
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.009	0.004	1.93
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.017	0.005	3.66
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.011	0.004	2.70
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.017	0.008	2.05
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.025	0.011	2.28
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.015	0.012	1.32
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.025	0.014	1.75
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.034	0.008	4.08
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.005	4.09
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.034	0.011	3.17
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.089	0.089	1.00
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.071	0.068	1.03
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.340	0.157	2.16
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.685	0.149	4.61
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.357	0.138	2.58
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.673	0.310	2.17
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	1.011	0.423	2.39
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	0.551	0.449	1.23
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	1.032	0.548	1.88
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	1.355	0.312	4.34
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.769	0.243	3.16
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	1.472	0.508	2.90
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	0.259	0.262	0.99
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	0.216	0.204	1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	1.025	0.468	2.19
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	2.083	0.457	4.56
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	1.077	0.439	2.45
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	2.091	0.997	2.10
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	3.087	1.218	2.53
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	1.731	1.303	1.33
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	3.420	1.787	1.91
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	4.190	1.091	3.84
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.902	1.082	2.68
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	5.128	2.057	2.49
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	0.612	0.599	1.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	0.499	0.492	1.01
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	2.336	1.196	1.95
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	4.718	1.202	3.92
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	2.525	1.157	2.18
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	4.921	2.524	1.95
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	7.053	2.828	2.49
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	4.398	2.968	1.48
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	8.256	4.198	1.97
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	9.619	2.695	3.57
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.353	2.613	2.43
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	11.662	5.051	2.31

Performance for AVX2 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.002	0.003	0.94
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.002	0.002	1.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.009	0.003	2.51
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.017	0.005	3.50
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.010	0.004	2.63
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.017	0.007	2.55
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.025	0.008	3.19
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.015	0.007	2.14
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.025	0.011	2.24
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.034	0.007	4.59
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.005	4.33
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.034	0.009	3.85
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.069	0.067	1.04
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.055	0.055	1.00
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.344	0.107	3.22
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.678	0.149	4.56
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.356	0.128	2.79
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.680	0.234	2.91
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	1.013	0.261	3.87
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	0.541	0.246	2.20
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	1.031	0.421	2.45
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	1.344	0.271	4.96
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.747	0.231	3.23
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	1.486	0.481	3.09
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	0.195	0.201	0.97
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	0.174	0.173	1.00
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	0.991	0.329	3.01
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	1.959	0.451	4.35
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	1.019	0.400	2.55
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	2.022	0.899	2.25
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	2.925	0.827	3.54
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	1.646	0.790	2.08
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	3.268	1.570	2.08
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	4.049	1.006	4.03
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.818	1.007	2.80
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	4.835	1.995	2.42
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	0.458	0.474	0.97
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	0.408	0.422	0.97
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	2.262	0.969	2.33
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	4.441	1.228	3.62
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	2.404	1.118	2.15
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	4.834	2.325	2.08
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	6.938	2.089	3.32
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	4.265	2.044	2.09
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	8.018	3.824	2.10
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	9.524	2.564	3.71
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.241	2.551	2.45
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	11.551	4.889	2.36

Performance for AVX512 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.003	0.003	0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.002	0.002	0.97
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.005	0.006	0.92
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.017	0.005	3.63
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.010	0.004	2.57
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.005	0.005	1.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.025	0.007	3.43
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.015	0.006	2.33
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.006	0.006	0.98
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.034	0.006	5.79
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.004	4.69
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.007	0.007	0.99
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.055	0.052	1.05
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.055	0.052	1.06
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.175	0.185	0.95
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.655	0.119	5.52
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.338	0.116	2.92
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.212	0.206	1.03
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	0.978	0.194	5.03
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	0.523	0.181	2.89
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	0.329	0.314	1.05
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	1.297	0.234	5.55
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.739	0.234	3.15
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	0.493	0.462	1.07
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	0.180	0.176	1.03
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	0.181	0.174	1.04
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	0.521	0.529	0.98
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	1.984	0.369	5.38
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	1.018	0.355	2.87
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	0.851	0.806	1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	3.049	0.666	4.58
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	1.732	0.635	2.73
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	1.449	1.373	1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	4.225	0.948	4.46
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.956	0.991	2.98
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	2.102	2.006	1.05
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	0.435	0.409	1.06
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	0.436	0.412	1.06
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	1.304	1.254	1.04
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	4.656	1.108	4.20
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	2.501	1.087	2.30
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	2.369	2.294	1.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	6.922	1.905	3.63
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	4.311	1.882	2.29
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	3.722	3.588	1.04
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	9.539	2.522	3.78
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.316	2.552	2.47
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	5.232	5.044	1.04

It looks like there is small performance degradation for 8UC3->32S on SSE2 and SSE3

ChipKerchner · 2020-02-21T20:09:27Z

It looks like there is small performance degradation for 8UC3->32S on SSE2 and SSE3

I'll have to think a little more about if there is a better way to do 8UC3->32S. For non-Intel platforms, this algorithm is much better than the scalar.

Could you measure the performance of my version of 8UC[1-4]->64F versus the current (old) version for AVX512? I want to know if it worth calling the current old version at all.

terfendail · 2020-02-25T12:42:49Z

Performance is better for 8UC1->64F while is almost the same for 8UC[2-4](I've manually disabled existing AVX512 code dispatching and enabled new code for AVX512 platform as well)

Performance for AVX512 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.005	0.004	1.50
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.005	0.006	0.96
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.006	0.009	0.71
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.007	0.007	1.01
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.175	0.115	1.53
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.212	0.230	0.92
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	0.329	0.348	0.95
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	0.493	0.504	0.98
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	0.521	0.338	1.54
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	0.851	0.856	0.99
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	1.449	1.444	1.00
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	2.102	2.048	1.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	1.304	0.988	1.32
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	2.369	2.314	1.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	3.722	3.713	1.00
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	5.232	5.165	1.01

ChipKerchner · 2020-02-26T15:03:15Z

What would be the best way to enable my 8UC1->64F for AVX512 but use the old code for 8UC[2-4]->64F?

terfendail · 2020-02-27T16:54:14Z

modules/imgproc/src/sumpixels.simd.hpp

+        double * sqsum, size_t,
+        double * tilted, size_t,
+        int width, int height, int cn) const
+    {


I think call to specific AVX512 implementation could be moved to the begging of this implementation with proper check for requested mode

#if CV_AVX512_SKX if (!tilted && cn <= 4 && (cn > 1 || sqsum)) { calculate_integral_avx512(src, _srcstep, sum, _sumstep, sqsum, _sqsumstep, width, height, cn); return true; } #endif

…sion of 8UC1 to 64F for AVX512.

terfendail · 2020-02-28T14:27:26Z

Performance for SSE2 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.003	0.003	0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.002	0.002	0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.009	0.004	2.01
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.017	0.005	3.55
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.010	0.004	2.57
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.018	0.008	2.13
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.026	0.017	1.56
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.015	0.015	0.97
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.026	0.019	1.33
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.034	0.008	4.08
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.005	3.91
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.034	0.010	3.27
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.087	0.084	1.04
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.062	0.063	0.99
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.324	0.147	2.20
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.661	0.152	4.35
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.348	0.136	2.57
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.650	0.298	2.18
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	0.993	0.618	1.61
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	0.520	0.513	1.01
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	0.985	0.725	1.36
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	1.312	0.310	4.23
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.727	0.223	3.26
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	1.426	0.479	2.97
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	0.263	0.248	1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	0.183	0.184	0.99
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	0.971	0.440	2.21
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	1.980	0.464	4.27
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	1.021	0.425	2.40
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	2.002	0.996	2.01
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	2.950	1.856	1.59
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	1.635	1.641	1.00
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	3.281	2.254	1.46
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	4.114	1.045	3.94
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.776	0.993	2.80
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	5.018	2.042	2.46
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	0.605	0.589	1.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	0.447	0.458	0.98
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	2.307	1.094	2.11
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	4.670	1.239	3.77
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	2.477	1.139	2.17
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	4.790	2.586	1.85
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	7.003	4.340	1.61
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	4.255	4.245	1.00
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	8.015	5.336	1.50
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	9.576	2.670	3.59
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.253	2.548	2.45
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	11.487	4.940	2.33
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.010	0.011	0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.009	0.009	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.010	0.010	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.021	0.021	0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.019	0.019	0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.020	0.020	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.031	0.031	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.028	0.028	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.029	0.029	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.042	0.041	1.01
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.038	0.038	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.044	0.047	0.93
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.391	0.391	1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.355	0.355	1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.359	0.357	1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.784	0.785	1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.732	0.723	1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.816	0.813	1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	1.327	1.318	1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	1.334	1.352	0.99
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	1.737	1.708	1.02
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	2.523	2.394	1.05
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	2.456	2.457	1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	3.159	3.121	1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	1.161	1.157	1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	1.053	1.052	1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	1.116	1.127	0.99
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	2.621	2.614	1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	2.573	2.528	1.02
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	3.099	3.088	1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	4.383	4.365	1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	4.466	4.433	1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	5.662	5.648	1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	8.039	7.825	1.03
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	8.136	8.005	1.02
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	9.650	9.510	1.01
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	2.617	2.498	1.05
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	2.382	2.309	1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	2.629	2.616	1.01
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	5.957	5.783	1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	5.692	5.559	1.02
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	6.611	6.585	1.00
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	10.290	10.379	0.99
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	10.285	10.406	0.99
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	14.734	14.812	0.99
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	18.505	18.012	1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	18.087	18.279	0.99
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	26.655	26.815	0.99

Performance for SSE3 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.003	0.003	0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.002	0.002	0.85
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.009	0.004	2.13
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.017	0.005	3.51
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.010	0.004	2.50
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.018	0.008	2.16
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.025	0.017	1.53
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.015	0.015	0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.027	0.019	1.44
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.034	0.008	4.03
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.005	3.86
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.036	0.010	3.47
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.089	0.087	1.02
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.066	0.064	1.03
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.359	0.157	2.29
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.651	0.163	4.00
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.357	0.143	2.51
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.696	0.316	2.20
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	0.982	0.643	1.53
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	0.526	0.533	0.99
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	1.062	0.725	1.46
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	1.304	0.331	3.94
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.746	0.231	3.23
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	1.489	0.490	3.04
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	0.250	0.261	0.96
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	0.192	0.192	1.00
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	1.037	0.453	2.29
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	1.977	0.479	4.13
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	1.033	0.433	2.39
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	2.106	1.025	2.05
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	2.938	1.933	1.52
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	1.667	1.698	0.98
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	3.438	2.268	1.52
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	4.207	1.075	3.92
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.916	0.976	2.99
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	5.246	2.026	2.59
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	0.594	0.595	1.00
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	0.473	0.455	1.04
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	2.454	1.141	2.15
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	4.691	1.237	3.79
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	2.487	1.169	2.13
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	4.974	2.587	1.92
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	6.955	4.337	1.60
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	4.295	4.253	1.01
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	8.487	5.197	1.63
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	9.514	2.681	3.55
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.302	2.568	2.45
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	12.035	4.986	2.41
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.010	0.011	0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.009	0.009	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.010	0.010	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.021	0.021	0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.019	0.019	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.020	0.019	1.02
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.031	0.031	0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.028	0.028	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.029	0.029	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.041	0.041	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.038	0.038	1.01
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.044	0.046	0.95
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.385	0.389	0.99
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.349	0.350	1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.361	0.364	0.99
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.791	0.787	1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.742	0.724	1.02
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.841	0.813	1.03
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	1.362	1.321	1.03
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	1.391	1.347	1.03
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	1.754	1.687	1.04
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	2.421	2.430	1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	2.433	2.460	0.99
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	2.991	3.048	0.98
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	1.157	1.175	0.98
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	1.049	1.052	1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	1.131	1.126	1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	2.693	2.617	1.03
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	2.613	2.553	1.02
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	3.196	3.094	1.03
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	4.458	4.385	1.02
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	4.569	4.448	1.03
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	5.665	5.634	1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	8.012	7.659	1.05
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	8.173	7.871	1.04
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	9.465	9.229	1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	2.611	2.525	1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	2.376	2.307	1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	2.658	2.553	1.04
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	5.976	5.783	1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	5.758	5.550	1.04
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	6.778	6.510	1.04
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	10.553	10.273	1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	10.598	10.106	1.05
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	14.643	14.732	0.99
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	18.124	18.257	0.99
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	17.959	18.373	0.98
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	26.009	26.812	0.97

Performance for SSE4_2 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.003	0.003	1.02
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.002	0.002	1.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.009	0.005	1.87
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.017	0.005	3.75
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.011	0.004	2.76
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.017	0.008	2.09
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.025	0.011	2.34
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.015	0.011	1.35
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.025	0.014	1.80
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.034	0.008	4.17
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.005	4.23
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.034	0.010	3.24
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.089	0.087	1.02
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.071	0.070	1.00
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.340	0.163	2.09
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.685	0.152	4.51
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.357	0.139	2.57
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.673	0.309	2.18
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	1.011	0.401	2.52
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	0.551	0.439	1.25
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	1.032	0.521	1.98
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	1.355	0.310	4.38
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.769	0.241	3.19
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	1.472	0.506	2.91
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	0.259	0.258	1.00
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	0.216	0.215	1.01
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	1.025	0.488	2.10
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	2.083	0.468	4.45
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	1.077	0.441	2.44
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	2.091	1.028	2.03
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	3.087	1.216	2.54
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	1.731	1.299	1.33
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	3.420	1.809	1.89
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	4.190	1.052	3.98
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.902	1.005	2.89
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	5.128	2.048	2.50
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	0.612	0.565	1.08
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	0.499	0.488	1.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	2.336	1.158	2.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	4.718	1.227	3.84
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	2.525	1.175	2.15
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	4.921	2.572	1.91
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	7.053	2.782	2.54
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	4.398	2.963	1.48
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	8.256	4.223	1.95
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	9.619	2.689	3.58
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.353	2.613	2.43
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	11.662	5.035	2.32
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.010	0.010	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.009	0.009	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.010	0.010	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.021	0.021	0.98
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.019	0.019	0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.019	0.019	1.01
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.031	0.031	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.027	0.028	0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.029	0.029	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.041	0.042	0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.038	0.038	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.047	0.043	1.08
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.386	0.387	1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.357	0.354	1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.362	0.358	1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.789	0.791	1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.737	0.744	0.99
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.832	0.816	1.02
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	1.343	1.312	1.02
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	1.362	1.346	1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	1.732	1.671	1.04
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	2.425	2.316	1.05
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	2.407	2.349	1.02
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	3.072	2.973	1.03
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	1.140	1.125	1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	1.014	1.004	1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	1.119	1.105	1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	2.648	2.569	1.03
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	2.571	2.534	1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	3.133	3.111	1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	4.424	4.449	0.99
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	4.507	4.499	1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	5.725	5.508	1.04
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	8.123	7.734	1.05
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	8.248	7.795	1.06
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	9.749	9.090	1.07
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	2.610	2.520	1.04
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	2.383	2.297	1.04
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	2.637	2.708	0.97
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	5.934	5.962	1.00
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	5.796	5.786	1.00
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	6.653	6.739	0.99
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	10.303	10.210	1.01
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	10.453	10.616	0.98
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	14.791	14.297	1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	18.669	17.500	1.07
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	18.299	17.622	1.04
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	27.205	25.768	1.06

Performance for AVX2 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.002	0.002	0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.002	0.002	1.03
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.009	0.003	2.59
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.017	0.005	3.54
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.010	0.004	2.65
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.017	0.007	2.58
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.025	0.008	3.21
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.015	0.007	2.15
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.025	0.011	2.28
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.034	0.007	4.64
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.005	4.37
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.034	0.009	3.94
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.069	0.066	1.04
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.055	0.054	1.01
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.344	0.104	3.30
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.678	0.139	4.87
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.356	0.122	2.93
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.680	0.220	3.09
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	1.013	0.249	4.07
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	0.541	0.232	2.33
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	1.031	0.392	2.63
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	1.344	0.257	5.23
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.747	0.226	3.31
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	1.486	0.461	3.22
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	0.195	0.194	1.01
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	0.174	0.170	1.03
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	0.991	0.317	3.13
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	1.959	0.427	4.59
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	1.019	0.375	2.71
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	2.022	0.871	2.32
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	2.925	0.785	3.73
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	1.646	0.753	2.19
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	3.268	1.572	2.08
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	4.049	0.965	4.19
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.818	0.962	2.93
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	4.835	1.982	2.44
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	0.458	0.450	1.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	0.408	0.412	0.99
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	2.262	0.960	2.36
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	4.441	1.159	3.83
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	2.404	1.096	2.19
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	4.834	2.319	2.08
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	6.938	2.044	3.39
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	4.265	1.989	2.14
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	8.018	3.795	2.11
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	9.524	2.578	3.69
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.241	2.537	2.46
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	11.551	4.905	2.36
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.010	0.010	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.009	0.009	0.97
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.009	0.009	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.021	0.021	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.018	0.018	1.01
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.019	0.019	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.031	0.031	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.027	0.027	1.01
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.028	0.028	0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.041	0.041	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.037	0.037	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.043	0.045	0.98
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.387	0.384	1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.347	0.349	1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.341	0.324	1.06
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.781	0.744	1.05
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.718	0.680	1.06
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.832	0.811	1.02
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	1.344	1.352	0.99
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	1.394	1.392	1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	1.751	1.748	1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	2.362	2.350	1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	2.465	2.464	1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	3.186	3.028	1.05
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	1.154	1.150	1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	1.027	1.039	0.99
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	1.067	1.077	0.99
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	2.617	2.633	0.99
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	2.653	2.672	0.99
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	3.210	3.169	1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	4.430	4.325	1.02
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	4.614	4.436	1.04
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	5.648	5.503	1.03
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	7.835	7.783	1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	8.088	7.804	1.04
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	9.602	9.462	1.01
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	2.526	2.599	0.97
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	2.357	2.341	1.01
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	2.460	2.605	0.94
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	5.726	5.881	0.97
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	5.527	5.708	0.97
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	6.529	6.686	0.98
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	10.133	10.438	0.97
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	10.412	10.601	0.98
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	14.218	14.803	0.96
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	17.647	17.891	0.99
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	17.977	18.400	0.98
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	26.532	26.297	1.01

Performance for AVX512 baseline

Performance test	Reference time	PR time	Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.003	0.003	0.96
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.002	0.002	1.02
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.006	0.004	1.54
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.017	0.005	3.61
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.010	0.004	2.60
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.005	0.005	1.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.025	0.007	3.43
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.015	0.006	2.38
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.006	0.006	1.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.034	0.006	5.82
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.020	0.004	4.67
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.007	0.007	0.97
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.056	0.053	1.06
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.055	0.052	1.06
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.180	0.105	1.72
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.681	0.116	5.87
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.342	0.114	3.00
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.215	0.200	1.07
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	1.010	0.203	4.99
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	0.546	0.184	2.97
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	0.336	0.311	1.08
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	1.349	0.230	5.87
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	0.765	0.228	3.35
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	0.503	0.463	1.09
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	0.180	0.171	1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	0.184	0.170	1.08
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	0.552	0.317	1.74
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	2.030	0.365	5.56
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	1.066	0.352	3.03
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	0.855	0.807	1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	3.060	0.666	4.59
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	1.728	0.643	2.69
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	1.455	1.378	1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	4.022	0.933	4.31
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	2.836	0.969	2.93
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	2.117	2.011	1.05
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	0.434	0.402	1.08
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	0.438	0.405	1.08
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	1.310	0.939	1.39
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	4.377	1.087	4.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	2.399	1.068	2.25
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	2.373	2.276	1.04
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	6.644	1.935	3.43
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	4.197	1.900	2.21
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	3.724	3.601	1.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	9.245	2.517	3.67
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	6.152	2.543	2.42
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	5.243	5.071	1.03
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F)	0.010	0.010	1.01
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S)	0.009	0.009	1.03
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F)	0.012	0.012	1.01
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F)	0.021	0.022	0.97
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S)	0.018	0.018	1.01
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F)	0.010	0.010	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F)	0.031	0.031	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S)	0.027	0.027	1.02
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F)	0.012	0.012	1.01
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F)	0.041	0.041	1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S)	0.038	0.037	1.03
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F)	0.016	0.017	0.96
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F)	0.384	0.381	1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S)	0.343	0.343	1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F)	0.429	0.425	1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F)	0.777	0.758	1.03
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S)	0.736	0.690	1.07
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F)	0.434	0.417	1.04
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F)	1.282	1.281	1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S)	1.331	1.344	0.99
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F)	0.789	0.749	1.05
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F)	2.426	2.346	1.03
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S)	2.866	2.482	1.15
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F)	1.246	1.171	1.06
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F)	1.108	1.143	0.97
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S)	0.985	1.009	0.98
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F)	1.308	1.255	1.04
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F)	2.646	2.526	1.05
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S)	2.645	2.531	1.05
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F)	2.018	1.927	1.05
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F)	4.430	4.370	1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S)	4.541	4.566	0.99
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F)	3.178	3.035	1.05
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F)	7.691	7.637	1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S)	8.730	7.895	1.11
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F)	4.303	4.174	1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F)	2.599	2.489	1.04
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S)	2.370	2.260	1.05
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F)	2.945	2.832	1.04
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F)	5.885	5.827	1.01
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S)	5.766	5.595	1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F)	4.617	4.536	1.02
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F)	10.065	10.016	1.00
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S)	10.303	10.151	1.01
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F)	7.903	7.657	1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F)	17.838	18.010	0.99
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S)	21.291	18.668	1.14
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F)	10.686	10.284	1.04

alalek · 2020-03-01T09:59:35Z

OOB access issue: #16708

Vectorize calculating integral for line for single and multiple channels

f60f67d

alalek reviewed Feb 12, 2020

View reviewed changes

terfendail reviewed Feb 18, 2020

View reviewed changes

ChipKerchner added 3 commits February 19, 2020 09:36

Single vector processing for 4-channels - 25-30% faster

40fd839

Single vector processing for 4-channels - 25-30% faster

0b160cf

Merge branch '3.4' of https://github.com/ChipKerchner/opencv into vec…

5721460

…torizeIntegralSumPixels

Fixed AVX512 code for 4 channels

9df7839

terfendail reviewed Feb 27, 2020

View reviewed changes

Disable 3 channel 8UC1 to 32S for SSE2 and SSE3 (slower). Use new ver…

1051c09

…sion of 8UC1 to 64F for AVX512.

terfendail approved these changes Feb 28, 2020

View reviewed changes

alalek assigned terfendail Feb 28, 2020

alalek merged commit 8c24af6 into opencv:3.4 Feb 28, 2020

This was referenced Feb 28, 2020

Merge 3.4 #16698

Merged

Imgproc: integral OOB access (2020-03-01) #16708

Closed

		prev = vx_setall_f64(v_extract_n<v_float64::nlanes - 1>(el4hh));
		// prev = v_broadcast_element<v_float64::nlanes - 1>(el4hh);

Uh oh!

Conversation

ChipKerchner commented Feb 11, 2020 • edited by alalek Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alalek Feb 12, 2020

Choose a reason for hiding this comment

Uh oh!

ChipKerchner Feb 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alalek Feb 12, 2020

Choose a reason for hiding this comment

Uh oh!

ChipKerchner Feb 12, 2020

Choose a reason for hiding this comment

Uh oh!

terfendail Feb 18, 2020

Choose a reason for hiding this comment

Uh oh!

terfendail commented Feb 18, 2020

Uh oh!

terfendail commented Feb 18, 2020

Uh oh!

terfendail commented Feb 18, 2020

Uh oh!

ChipKerchner commented Feb 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

terfendail commented Feb 18, 2020

Uh oh!

ChipKerchner commented Feb 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

terfendail commented Feb 20, 2020

Uh oh!

terfendail commented Feb 21, 2020

Uh oh!

ChipKerchner commented Feb 21, 2020

Uh oh!

terfendail commented Feb 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChipKerchner commented Feb 26, 2020

Uh oh!

terfendail Feb 27, 2020

Choose a reason for hiding this comment

Uh oh!

terfendail commented Feb 28, 2020

Uh oh!

alalek commented Mar 1, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ChipKerchner commented Feb 11, 2020 •

edited by alalek

Loading

ChipKerchner Feb 12, 2020 •

edited

Loading

ChipKerchner commented Feb 18, 2020 •

edited

Loading

ChipKerchner commented Feb 19, 2020 •

edited

Loading

terfendail commented Feb 25, 2020 •

edited

Loading