Skip to content

Vectorize calculating integral for line for single and multiple channels#16556

Merged
alalek merged 6 commits intoopencv:3.4from
ChipKerchner:vectorizeIntegralSumPixels
Feb 28, 2020
Merged

Vectorize calculating integral for line for single and multiple channels#16556
alalek merged 6 commits intoopencv:3.4from
ChipKerchner:vectorizeIntegralSumPixels

Conversation

@ChipKerchner
Copy link
Copy Markdown
Contributor

@ChipKerchner ChipKerchner commented Feb 11, 2020

Vectorize calculating integral for line for single and multiple channels - up to 2.75x faster.

force_builders=Linux AVX2,Custom
buildworker:Custom=linux-3
build_image:Custom=ubuntu:18.04
CPU_BASELINE:Custom=AVX512_SKX
disable_ipp=ON

Comment on lines +908 to +909
prev = vx_setall_f64(v_extract_n<v_float64::nlanes - 1>(el4hh));
// prev = v_broadcast_element<v_float64::nlanes - 1>(el4hh);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why removed v_broadcast_element()?

Copy link
Copy Markdown
Contributor Author

@ChipKerchner ChipKerchner Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v_broadcast_element for v_float64 is not available for all platforms. Left this in for when they are added.

}
};

#if CV_SIMD128_64F && !CV_AVX512_SKX
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is excluded CV_AVX512_SKX?
Do we want CV_SIMD_WIDTH <= 32 here instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already a AVX512 version for doubles. See in above code.

v_int32 prev_1 = vx_setzero_s32(), prev_2 = vx_setzero_s32(),
prev_3 = vx_setzero_s32(), prev_4 = vx_setzero_s32();
int j = 0;
for ( ; j + v_uint16::nlanes * cn <= width; j += v_uint16::nlanes * cn)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks over-complicated to me. IMO it would be better to process one vector at a time and reduce amount of shifts and additions starting with addition of element quads.

@terfendail
Copy link
Copy Markdown
Contributor

I've collected performance for the existing change on my setup

Performance for SSE2 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.003 0.003 0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.002 0.002 1.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.009 0.004 2.02
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.017 0.005 3.53
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.010 0.004 2.42
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.018 0.008 2.13
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.026 0.017 1.56
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.015 0.016 0.92
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.026 0.019 1.33
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.034 0.012 2.85
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.010 1.96
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.034 0.021 1.61
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.087 0.089 0.97
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.062 0.066 0.93
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.324 0.154 2.10
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.661 0.155 4.26
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.348 0.153 2.28
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.650 0.300 2.17
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 0.993 0.642 1.55
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 0.520 0.617 0.84
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 0.985 0.755 1.31
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 1.312 0.444 2.95
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.727 0.384 1.89
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 1.426 0.802 1.78
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 0.263 0.249 1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 0.183 0.193 0.95
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 0.971 0.440 2.21
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 1.980 0.478 4.14
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 1.021 0.461 2.21
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 2.002 1.008 1.99
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 2.950 1.850 1.59
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 1.635 1.852 0.88
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 3.281 2.246 1.46
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 4.114 1.403 2.93
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.776 1.274 2.18
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 5.018 2.699 1.86
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 0.605 0.588 1.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 0.447 0.462 0.97
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 2.307 1.156 2.00
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 4.670 1.256 3.72
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 2.477 1.218 2.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 4.790 2.587 1.85
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 7.003 4.342 1.61
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 4.255 4.356 0.98
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 8.015 5.343 1.50
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 9.576 3.276 2.92
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.253 3.006 2.08
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 11.487 6.133 1.87
Performance for SSE3 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.003 0.003 0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.002 0.002 0.98
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.009 0.004 2.13
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.017 0.005 3.52
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.010 0.004 2.37
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.018 0.008 2.16
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.025 0.017 1.53
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.015 0.016 0.92
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.027 0.019 1.44
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.034 0.012 2.81
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.010 1.94
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.036 0.020 1.75
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.089 0.089 1.00
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.066 0.064 1.03
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.359 0.146 2.46
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.651 0.155 4.20
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.357 0.147 2.44
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.696 0.307 2.27
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 0.982 0.641 1.53
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 0.526 0.615 0.86
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 1.062 0.692 1.54
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 1.304 0.421 3.10
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.746 0.373 2.00
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 1.489 0.809 1.84
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 0.250 0.250 1.00
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 0.192 0.192 1.00
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 1.037 0.445 2.33
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 1.977 0.480 4.12
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 1.033 0.451 2.29
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 2.106 1.009 2.09
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 2.938 1.848 1.59
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 1.667 1.851 0.90
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 3.438 2.161 1.59
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 4.207 1.352 3.11
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.916 1.230 2.37
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 5.246 2.597 2.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 0.594 0.580 1.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 0.473 0.460 1.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 2.454 1.141 2.15
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 4.691 1.238 3.79
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 2.487 1.193 2.08
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 4.974 2.554 1.95
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 6.955 4.257 1.63
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 4.295 4.332 0.99
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 8.487 5.008 1.69
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 9.514 3.176 3.00
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.302 2.953 2.13
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 12.035 5.964 2.02
Performance for SSE4_2 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.003 0.003 0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.002 0.002 1.01
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.009 0.005 1.88
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.017 0.005 3.67
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.011 0.004 2.55
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.017 0.008 2.05
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.025 0.011 2.28
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.015 0.012 1.32
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.025 0.014 1.76
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.034 0.012 2.89
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.010 1.98
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.034 0.020 1.65
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.089 0.087 1.02
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.071 0.068 1.04
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.340 0.158 2.15
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.685 0.149 4.60
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.357 0.137 2.60
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.673 0.301 2.23
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 1.011 0.404 2.50
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 0.551 0.439 1.26
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 1.032 0.527 1.96
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 1.355 0.413 3.28
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.769 0.372 2.07
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 1.472 0.797 1.85
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 0.259 0.251 1.03
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 0.216 0.205 1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 1.025 0.477 2.15
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 2.083 0.462 4.51
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 1.077 0.427 2.52
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 2.091 0.995 2.10
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 3.087 1.207 2.56
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 1.731 1.288 1.34
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 3.420 1.766 1.94
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 4.190 1.307 3.21
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.902 1.215 2.39
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 5.128 2.551 2.01
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 0.612 0.571 1.07
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 0.499 0.487 1.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 2.336 1.162 2.01
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 4.718 1.206 3.91
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 2.525 1.164 2.17
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 4.921 2.534 1.94
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 7.053 2.881 2.45
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 4.398 2.955 1.49
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 8.256 4.204 1.96
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 9.619 3.114 3.09
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.353 2.928 2.17
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 11.662 5.916 1.97
Performance for AVX2 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.002 0.003 0.98
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.002 0.002 0.98
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.009 0.003 2.60
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.017 0.005 3.52
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.010 0.004 2.65
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.017 0.007 2.59
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.025 0.008 3.23
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.015 0.007 2.16
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.025 0.011 2.27
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.034 0.010 3.22
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.010 2.07
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.034 0.017 1.96
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.069 0.067 1.03
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.055 0.056 0.98
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.344 0.106 3.24
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.678 0.149 4.54
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.356 0.126 2.83
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.680 0.227 3.00
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 1.013 0.260 3.89
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 0.541 0.246 2.20
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 1.031 0.405 2.54
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 1.344 0.356 3.78
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.747 0.354 2.11
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 1.486 0.794 1.87
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 0.195 0.193 1.01
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 0.174 0.171 1.02
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 0.991 0.318 3.11
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 1.959 0.426 4.60
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 1.019 0.392 2.60
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 2.022 0.871 2.32
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 2.925 0.794 3.69
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 1.646 0.761 2.16
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 3.268 1.523 2.15
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 4.049 1.193 3.39
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.818 1.180 2.39
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 4.835 2.465 1.96
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 0.458 0.451 1.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 0.408 0.413 0.99
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 2.262 0.945 2.39
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 4.441 1.150 3.86
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 2.404 1.093 2.20
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 4.834 2.291 2.11
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 6.938 2.018 3.44
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 4.265 1.993 2.14
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 8.018 3.729 2.15
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 9.524 2.879 3.31
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.241 2.854 2.19
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 11.551 5.547 2.08
Performance for AVX512 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.003 0.003 0.96
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.002 0.002 0.97
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.005 0.006 0.96
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.017 0.005 3.65
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.010 0.004 2.59
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.005 0.005 1.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.025 0.007 3.43
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.015 0.006 2.34
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.006 0.006 1.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.034 0.008 4.08
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.007 2.65
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.007 0.007 0.97
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.055 0.052 1.04
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.055 0.052 1.06
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.175 0.186 0.94
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.655 0.118 5.56
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.338 0.116 2.90
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.212 0.205 1.03
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 0.978 0.204 4.79
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 0.523 0.189 2.77
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 0.329 0.316 1.04
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 1.297 0.240 5.40
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.739 0.240 3.08
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 0.493 0.468 1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 0.180 0.174 1.04
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 0.181 0.172 1.05
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 0.521 0.531 0.98
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 1.984 0.370 5.36
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 1.018 0.358 2.85
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 0.851 0.822 1.04
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 3.049 0.694 4.39
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 1.732 0.656 2.64
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 1.449 1.387 1.04
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 4.225 1.002 4.22
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.956 1.028 2.88
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 2.102 2.035 1.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 0.435 0.423 1.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 0.436 0.406 1.07
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 1.304 1.305 1.00
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 4.656 1.093 4.26
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 2.501 1.076 2.32
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 2.369 2.285 1.04
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 6.922 1.923 3.60
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 4.311 1.896 2.27
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 3.722 3.583 1.04
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 9.539 2.585 3.69
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.316 2.631 2.40
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 5.232 5.068 1.03

@terfendail
Copy link
Copy Markdown
Contributor

I've tested single vector processing for 4-channel to 32S

                for ( ; j + v_uint16::nlanes <= width; j += v_uint16::nlanes)
                {
                    v_int16 el8 = v_reinterpret_as_s16(vx_load_expand(src_row + j));
                    v_int32 el4l, el4h;
#if CV_AVX2 && CV_SIMD_WIDTH == 32
                    __m256i vsum = _mm256_add_epi16(el8.val, _mm256_slli_si256(el8.val, 8));
                    __m256i shmask = _mm256_set1_epi32(7);
                    el4l.val = _mm256_add_epi32(_mm256_cvtepi16_epi32(_v256_extract_low(vsum)), prev.val);
                    el4h.val = _mm256_add_epi32(_mm256_cvtepi16_epi32(_v256_extract_high(vsum)), _mm256_permute2x128_si256(el4l.val, el4l.val, 0x31));
                    prev.val = _mm256_permute2x128_si256(el4h.val, el4h.val, 0x31);
#else
#if CV_SIMD_WIDTH >= 32
                    el8 += v_rotate_left<4>(el8);
#if CV_SIMD_WIDTH == 64
                    el8 += v_rotate_left<8>(el8);
#endif
#endif
                    v_expand(el8, el4l, el4h);
                    el4l += prev;
                    el4h += el4l;
#if CV_SIMD_WIDTH == 16
                    prev = el4h;
#elif CV_SIMD_WIDTH == 32
                    prev = v_combine_high(el4h, el4h);
#else
                    v_int32 t0, t1; v_zip(el4h, el4h, t0, t1);
                    prev = v_combine_high(t1, t1);
#endif
#endif
                    v_store(sum_row + j                  , el4l + vx_load(prev_sum_row + j                  ));
                    v_store(sum_row + j + v_int32::nlanes, el4h + vx_load(prev_sum_row + j + v_int32::nlanes));
                }

Performance is a bit better on my setup

Performance for SSE2 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.005 4.10
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.727 0.239 3.04
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.776 1.016 2.73
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.253 2.625 2.38
Performance for SSE3 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.005 4.08
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.746 0.229 3.26
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.916 0.988 2.95
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.302 2.562 2.46
Performance for SSE4_2 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.005 4.13
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.769 0.238 3.23
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.902 1.024 2.83
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.353 2.627 2.42
Performance for AVX2 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.005 4.37
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.747 0.227 3.29
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.818 0.993 2.84
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.241 2.568 2.43
Performance for AVX512 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.004 4.85
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.739 0.235 3.15
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.956 0.979 3.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.316 2.562 2.46

@terfendail
Copy link
Copy Markdown
Contributor

Looks like new way to vectorize 8UC1 to 64FC1 works better than existing AVX512 implementation.

Performance for AVX512 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.005 0.004 1.49
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.005 0.006 0.96
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.006 0.009 0.72
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.007 0.010 0.68
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.175 0.107 1.64
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.212 0.214 0.99
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 0.329 0.316 1.04
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 0.493 0.472 1.04
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 0.521 0.320 1.63
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 0.851 0.828 1.03
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 1.449 1.415 1.02
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 2.102 2.056 1.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 1.304 0.959 1.36
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 2.369 2.286 1.04
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 3.722 3.676 1.01
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 5.232 5.141 1.02

@ChipKerchner
Copy link
Copy Markdown
Contributor Author

ChipKerchner commented Feb 18, 2020

Looks like new way to vectorize 8UC1 to 64FC1 works better than existing AVX512 implementation.

Good find! I've implemented 8UC4->32SC4 and 8UC4->32FC4 so far and am seeing an additional 25-30% improvement.

Let me know your ideas for 8UC1->64FC1 or if you'd just like to update with your ideas for the AVX512 version. I don't really have a way to test AVX512 currently.

@terfendail
Copy link
Copy Markdown
Contributor

Regarding AVX512 I've meant that I've tested the generic version that is disabled at the moment for AVX512 instead of specialized calculate_integral_avx512 implementation. So probably it make sense to use calculate_integral_avx512 for multichannel images only.
By the way, have you tried single vector processing for 8UC2 images?

@ChipKerchner
Copy link
Copy Markdown
Contributor Author

ChipKerchner commented Feb 19, 2020

I committed the changes for a single vector processing for 4-channels (8UC4->32SC4/32FC4/64FC4). I will look at similar changes for 2-channels when I have time (early testing shows speed to be similar to my version). If the 64FC1 and/or 64FC4 changes are faster than the AVX512 version, I will try to activate this version instead.

Please make sure the AVX512 code (CV_SIMD_WIDTH > 32) is correct. Also if you can rerun the timings including AVX512, that would be useful.

@terfendail, I think this smoke test is failing because of AVX512 (please suggest a fix since it is your code) -
[ FAILED ] Imgproc_Integral.accuracy (19 ms)

@terfendail
Copy link
Copy Markdown
Contributor

Sorry. That was my fault. I've missed the fact that v_zip interleaves channels.
Right version of prev broadcast code should be

#if CV_SIMD_WIDTH == 16
                    prev = el4h;
#elif CV_SIMD_WIDTH == 32
                    prev = v_combine_high(el4h, el4h);
#else
                    v_int32 t = v_rotate_right<12>(el4h);
                    t |= v_rotate_left<4>(t);
                    prev = v_combine_low(t, t);
#endif

Performance for this version is almost the same

Performance for AVX512 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.004 4.79
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.739 0.237 3.11
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.956 0.972 3.04
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.316 2.550 2.48

@terfendail
Copy link
Copy Markdown
Contributor

Performance for SSE2 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.003 0.003 0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.002 0.002 1.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.009 0.004 2.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.017 0.005 3.54
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.010 0.004 2.46
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.018 0.008 2.12
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.026 0.017 1.55
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.015 0.016 0.92
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.026 0.019 1.32
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.034 0.008 4.07
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.005 4.10
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.034 0.010 3.27
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.087 0.084 1.04
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.062 0.063 0.98
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.324 0.147 2.20
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.661 0.153 4.31
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.348 0.142 2.45
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.650 0.294 2.21
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 0.993 0.614 1.62
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 0.520 0.626 0.83
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 0.985 0.730 1.35
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 1.312 0.320 4.10
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.727 0.222 3.28
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 1.426 0.478 2.98
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 0.263 0.251 1.05
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 0.183 0.184 0.99
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 0.971 0.445 2.18
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 1.980 0.473 4.18
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 1.021 0.433 2.36
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 2.002 1.015 1.97
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 2.950 1.845 1.60
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 1.635 1.867 0.88
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 3.281 2.225 1.47
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 4.114 1.084 3.79
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.776 1.018 2.73
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 5.018 2.044 2.45
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 0.605 0.589 1.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 0.447 0.443 1.01
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 2.307 1.139 2.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 4.670 1.234 3.79
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 2.477 1.195 2.07
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 4.790 2.535 1.89
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 7.003 4.326 1.62
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 4.255 4.342 0.98
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 8.015 5.284 1.52
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 9.576 2.689 3.56
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.253 2.574 2.43
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 11.487 4.981 2.31
Performance for SSE3 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.003 0.003 1.03
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.002 0.002 0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.009 0.004 2.19
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.017 0.005 3.62
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.010 0.004 2.39
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.018 0.008 2.13
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.025 0.017 1.53
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.015 0.016 0.91
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.027 0.019 1.44
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.034 0.008 4.03
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.005 4.08
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.036 0.010 3.46
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.089 0.087 1.02
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.066 0.064 1.03
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.359 0.146 2.45
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.651 0.154 4.22
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.357 0.143 2.50
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.696 0.299 2.33
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 0.982 0.622 1.58
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 0.526 0.616 0.85
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 1.062 0.692 1.54
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 1.304 0.312 4.18
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.746 0.229 3.26
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 1.489 0.476 3.13
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 0.250 0.249 1.00
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 0.192 0.182 1.05
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 1.037 0.436 2.38
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 1.977 0.465 4.25
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 1.033 0.442 2.34
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 2.106 0.995 2.12
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 2.938 1.847 1.59
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 1.667 1.860 0.90
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 3.438 2.164 1.59
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 4.207 1.067 3.94
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.916 1.004 2.90
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 5.246 2.000 2.62
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 0.594 0.564 1.05
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 0.473 0.440 1.07
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 2.454 1.112 2.21
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 4.691 1.215 3.86
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 2.487 1.187 2.10
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 4.974 2.483 2.00
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 6.955 4.158 1.67
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 4.295 4.192 1.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 8.487 4.973 1.71
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 9.514 2.633 3.61
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.302 2.534 2.49
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 12.035 4.882 2.47
Performance for SSE4_2 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.003 0.003 0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.002 0.002 0.98
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.009 0.004 1.93
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.017 0.005 3.66
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.011 0.004 2.70
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.017 0.008 2.05
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.025 0.011 2.28
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.015 0.012 1.32
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.025 0.014 1.75
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.034 0.008 4.08
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.005 4.09
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.034 0.011 3.17
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.089 0.089 1.00
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.071 0.068 1.03
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.340 0.157 2.16
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.685 0.149 4.61
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.357 0.138 2.58
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.673 0.310 2.17
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 1.011 0.423 2.39
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 0.551 0.449 1.23
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 1.032 0.548 1.88
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 1.355 0.312 4.34
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.769 0.243 3.16
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 1.472 0.508 2.90
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 0.259 0.262 0.99
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 0.216 0.204 1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 1.025 0.468 2.19
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 2.083 0.457 4.56
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 1.077 0.439 2.45
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 2.091 0.997 2.10
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 3.087 1.218 2.53
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 1.731 1.303 1.33
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 3.420 1.787 1.91
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 4.190 1.091 3.84
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.902 1.082 2.68
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 5.128 2.057 2.49
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 0.612 0.599 1.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 0.499 0.492 1.01
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 2.336 1.196 1.95
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 4.718 1.202 3.92
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 2.525 1.157 2.18
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 4.921 2.524 1.95
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 7.053 2.828 2.49
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 4.398 2.968 1.48
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 8.256 4.198 1.97
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 9.619 2.695 3.57
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.353 2.613 2.43
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 11.662 5.051 2.31
Performance for AVX2 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.002 0.003 0.94
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.002 0.002 1.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.009 0.003 2.51
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.017 0.005 3.50
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.010 0.004 2.63
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.017 0.007 2.55
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.025 0.008 3.19
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.015 0.007 2.14
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.025 0.011 2.24
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.034 0.007 4.59
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.005 4.33
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.034 0.009 3.85
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.069 0.067 1.04
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.055 0.055 1.00
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.344 0.107 3.22
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.678 0.149 4.56
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.356 0.128 2.79
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.680 0.234 2.91
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 1.013 0.261 3.87
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 0.541 0.246 2.20
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 1.031 0.421 2.45
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 1.344 0.271 4.96
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.747 0.231 3.23
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 1.486 0.481 3.09
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 0.195 0.201 0.97
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 0.174 0.173 1.00
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 0.991 0.329 3.01
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 1.959 0.451 4.35
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 1.019 0.400 2.55
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 2.022 0.899 2.25
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 2.925 0.827 3.54
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 1.646 0.790 2.08
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 3.268 1.570 2.08
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 4.049 1.006 4.03
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.818 1.007 2.80
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 4.835 1.995 2.42
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 0.458 0.474 0.97
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 0.408 0.422 0.97
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 2.262 0.969 2.33
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 4.441 1.228 3.62
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 2.404 1.118 2.15
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 4.834 2.325 2.08
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 6.938 2.089 3.32
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 4.265 2.044 2.09
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 8.018 3.824 2.10
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 9.524 2.564 3.71
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.241 2.551 2.45
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 11.551 4.889 2.36
Performance for AVX512 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.003 0.003 0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.002 0.002 0.97
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.005 0.006 0.92
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.017 0.005 3.63
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.010 0.004 2.57
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.005 0.005 1.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.025 0.007 3.43
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.015 0.006 2.33
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.006 0.006 0.98
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.034 0.006 5.79
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.004 4.69
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.007 0.007 0.99
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.055 0.052 1.05
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.055 0.052 1.06
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.175 0.185 0.95
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.655 0.119 5.52
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.338 0.116 2.92
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.212 0.206 1.03
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 0.978 0.194 5.03
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 0.523 0.181 2.89
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 0.329 0.314 1.05
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 1.297 0.234 5.55
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.739 0.234 3.15
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 0.493 0.462 1.07
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 0.180 0.176 1.03
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 0.181 0.174 1.04
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 0.521 0.529 0.98
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 1.984 0.369 5.38
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 1.018 0.355 2.87
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 0.851 0.806 1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 3.049 0.666 4.58
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 1.732 0.635 2.73
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 1.449 1.373 1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 4.225 0.948 4.46
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.956 0.991 2.98
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 2.102 2.006 1.05
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 0.435 0.409 1.06
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 0.436 0.412 1.06
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 1.304 1.254 1.04
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 4.656 1.108 4.20
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 2.501 1.087 2.30
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 2.369 2.294 1.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 6.922 1.905 3.63
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 4.311 1.882 2.29
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 3.722 3.588 1.04
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 9.539 2.522 3.78
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.316 2.552 2.47
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 5.232 5.044 1.04

It looks like there is small performance degradation for 8UC3->32S on SSE2 and SSE3

@ChipKerchner
Copy link
Copy Markdown
Contributor Author

It looks like there is small performance degradation for 8UC3->32S on SSE2 and SSE3

I'll have to think a little more about if there is a better way to do 8UC3->32S. For non-Intel platforms, this algorithm is much better than the scalar.

Could you measure the performance of my version of 8UC[1-4]->64F versus the current (old) version for AVX512? I want to know if it worth calling the current old version at all.

@terfendail
Copy link
Copy Markdown
Contributor

terfendail commented Feb 25, 2020

Performance is better for 8UC1->64F while is almost the same for 8UC[2-4](I've manually disabled existing AVX512 code dispatching and enabled new code for AVX512 platform as well)

Performance for AVX512 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.005 0.004 1.50
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.005 0.006 0.96
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.006 0.009 0.71
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.007 0.007 1.01
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.175 0.115 1.53
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.212 0.230 0.92
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 0.329 0.348 0.95
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 0.493 0.504 0.98
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 0.521 0.338 1.54
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 0.851 0.856 0.99
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 1.449 1.444 1.00
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 2.102 2.048 1.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 1.304 0.988 1.32
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 2.369 2.314 1.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 3.722 3.713 1.00
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 5.232 5.165 1.01

@ChipKerchner
Copy link
Copy Markdown
Contributor Author

What would be the best way to enable my 8UC1->64F for AVX512 but use the old code for 8UC[2-4]->64F?

double * sqsum, size_t,
double * tilted, size_t,
int width, int height, int cn) const
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think call to specific AVX512 implementation could be moved to the begging of this implementation with proper check for requested mode

#if CV_AVX512_SKX
if (!tilted && cn <= 4 && (cn > 1 || sqsum))
{
    calculate_integral_avx512(src, _srcstep, sum, _sumstep, sqsum, _sqsumstep, width, height, cn);
    return true;
}
#endif

@terfendail
Copy link
Copy Markdown
Contributor

Performance for SSE2 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.003 0.003 0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.002 0.002 0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.009 0.004 2.01
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.017 0.005 3.55
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.010 0.004 2.57
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.018 0.008 2.13
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.026 0.017 1.56
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.015 0.015 0.97
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.026 0.019 1.33
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.034 0.008 4.08
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.005 3.91
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.034 0.010 3.27
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.087 0.084 1.04
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.062 0.063 0.99
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.324 0.147 2.20
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.661 0.152 4.35
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.348 0.136 2.57
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.650 0.298 2.18
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 0.993 0.618 1.61
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 0.520 0.513 1.01
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 0.985 0.725 1.36
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 1.312 0.310 4.23
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.727 0.223 3.26
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 1.426 0.479 2.97
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 0.263 0.248 1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 0.183 0.184 0.99
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 0.971 0.440 2.21
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 1.980 0.464 4.27
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 1.021 0.425 2.40
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 2.002 0.996 2.01
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 2.950 1.856 1.59
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 1.635 1.641 1.00
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 3.281 2.254 1.46
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 4.114 1.045 3.94
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.776 0.993 2.80
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 5.018 2.042 2.46
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 0.605 0.589 1.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 0.447 0.458 0.98
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 2.307 1.094 2.11
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 4.670 1.239 3.77
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 2.477 1.139 2.17
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 4.790 2.586 1.85
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 7.003 4.340 1.61
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 4.255 4.245 1.00
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 8.015 5.336 1.50
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 9.576 2.670 3.59
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.253 2.548 2.45
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 11.487 4.940 2.33
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.010 0.011 0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.009 0.009 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.010 0.010 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.021 0.021 0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.019 0.019 0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.020 0.020 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.031 0.031 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.028 0.028 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.029 0.029 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.042 0.041 1.01
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.038 0.038 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.044 0.047 0.93
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.391 0.391 1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.355 0.355 1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.359 0.357 1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.784 0.785 1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.732 0.723 1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.816 0.813 1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 1.327 1.318 1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 1.334 1.352 0.99
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 1.737 1.708 1.02
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 2.523 2.394 1.05
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 2.456 2.457 1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 3.159 3.121 1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 1.161 1.157 1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 1.053 1.052 1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 1.116 1.127 0.99
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 2.621 2.614 1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 2.573 2.528 1.02
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 3.099 3.088 1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 4.383 4.365 1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 4.466 4.433 1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 5.662 5.648 1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 8.039 7.825 1.03
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 8.136 8.005 1.02
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 9.650 9.510 1.01
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 2.617 2.498 1.05
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 2.382 2.309 1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 2.629 2.616 1.01
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 5.957 5.783 1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 5.692 5.559 1.02
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 6.611 6.585 1.00
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 10.290 10.379 0.99
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 10.285 10.406 0.99
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 14.734 14.812 0.99
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 18.505 18.012 1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 18.087 18.279 0.99
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 26.655 26.815 0.99
Performance for SSE3 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.003 0.003 0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.002 0.002 0.85
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.009 0.004 2.13
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.017 0.005 3.51
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.010 0.004 2.50
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.018 0.008 2.16
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.025 0.017 1.53
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.015 0.015 0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.027 0.019 1.44
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.034 0.008 4.03
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.005 3.86
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.036 0.010 3.47
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.089 0.087 1.02
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.066 0.064 1.03
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.359 0.157 2.29
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.651 0.163 4.00
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.357 0.143 2.51
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.696 0.316 2.20
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 0.982 0.643 1.53
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 0.526 0.533 0.99
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 1.062 0.725 1.46
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 1.304 0.331 3.94
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.746 0.231 3.23
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 1.489 0.490 3.04
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 0.250 0.261 0.96
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 0.192 0.192 1.00
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 1.037 0.453 2.29
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 1.977 0.479 4.13
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 1.033 0.433 2.39
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 2.106 1.025 2.05
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 2.938 1.933 1.52
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 1.667 1.698 0.98
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 3.438 2.268 1.52
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 4.207 1.075 3.92
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.916 0.976 2.99
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 5.246 2.026 2.59
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 0.594 0.595 1.00
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 0.473 0.455 1.04
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 2.454 1.141 2.15
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 4.691 1.237 3.79
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 2.487 1.169 2.13
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 4.974 2.587 1.92
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 6.955 4.337 1.60
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 4.295 4.253 1.01
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 8.487 5.197 1.63
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 9.514 2.681 3.55
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.302 2.568 2.45
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 12.035 4.986 2.41
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.010 0.011 0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.009 0.009 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.010 0.010 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.021 0.021 0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.019 0.019 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.020 0.019 1.02
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.031 0.031 0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.028 0.028 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.029 0.029 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.041 0.041 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.038 0.038 1.01
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.044 0.046 0.95
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.385 0.389 0.99
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.349 0.350 1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.361 0.364 0.99
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.791 0.787 1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.742 0.724 1.02
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.841 0.813 1.03
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 1.362 1.321 1.03
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 1.391 1.347 1.03
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 1.754 1.687 1.04
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 2.421 2.430 1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 2.433 2.460 0.99
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 2.991 3.048 0.98
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 1.157 1.175 0.98
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 1.049 1.052 1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 1.131 1.126 1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 2.693 2.617 1.03
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 2.613 2.553 1.02
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 3.196 3.094 1.03
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 4.458 4.385 1.02
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 4.569 4.448 1.03
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 5.665 5.634 1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 8.012 7.659 1.05
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 8.173 7.871 1.04
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 9.465 9.229 1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 2.611 2.525 1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 2.376 2.307 1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 2.658 2.553 1.04
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 5.976 5.783 1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 5.758 5.550 1.04
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 6.778 6.510 1.04
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 10.553 10.273 1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 10.598 10.106 1.05
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 14.643 14.732 0.99
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 18.124 18.257 0.99
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 17.959 18.373 0.98
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 26.009 26.812 0.97
Performance for SSE4_2 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.003 0.003 1.02
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.002 0.002 1.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.009 0.005 1.87
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.017 0.005 3.75
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.011 0.004 2.76
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.017 0.008 2.09
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.025 0.011 2.34
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.015 0.011 1.35
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.025 0.014 1.80
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.034 0.008 4.17
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.005 4.23
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.034 0.010 3.24
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.089 0.087 1.02
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.071 0.070 1.00
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.340 0.163 2.09
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.685 0.152 4.51
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.357 0.139 2.57
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.673 0.309 2.18
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 1.011 0.401 2.52
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 0.551 0.439 1.25
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 1.032 0.521 1.98
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 1.355 0.310 4.38
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.769 0.241 3.19
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 1.472 0.506 2.91
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 0.259 0.258 1.00
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 0.216 0.215 1.01
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 1.025 0.488 2.10
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 2.083 0.468 4.45
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 1.077 0.441 2.44
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 2.091 1.028 2.03
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 3.087 1.216 2.54
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 1.731 1.299 1.33
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 3.420 1.809 1.89
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 4.190 1.052 3.98
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.902 1.005 2.89
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 5.128 2.048 2.50
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 0.612 0.565 1.08
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 0.499 0.488 1.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 2.336 1.158 2.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 4.718 1.227 3.84
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 2.525 1.175 2.15
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 4.921 2.572 1.91
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 7.053 2.782 2.54
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 4.398 2.963 1.48
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 8.256 4.223 1.95
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 9.619 2.689 3.58
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.353 2.613 2.43
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 11.662 5.035 2.32
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.010 0.010 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.009 0.009 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.010 0.010 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.021 0.021 0.98
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.019 0.019 0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.019 0.019 1.01
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.031 0.031 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.027 0.028 0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.029 0.029 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.041 0.042 0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.038 0.038 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.047 0.043 1.08
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.386 0.387 1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.357 0.354 1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.362 0.358 1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.789 0.791 1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.737 0.744 0.99
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.832 0.816 1.02
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 1.343 1.312 1.02
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 1.362 1.346 1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 1.732 1.671 1.04
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 2.425 2.316 1.05
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 2.407 2.349 1.02
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 3.072 2.973 1.03
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 1.140 1.125 1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 1.014 1.004 1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 1.119 1.105 1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 2.648 2.569 1.03
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 2.571 2.534 1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 3.133 3.111 1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 4.424 4.449 0.99
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 4.507 4.499 1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 5.725 5.508 1.04
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 8.123 7.734 1.05
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 8.248 7.795 1.06
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 9.749 9.090 1.07
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 2.610 2.520 1.04
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 2.383 2.297 1.04
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 2.637 2.708 0.97
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 5.934 5.962 1.00
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 5.796 5.786 1.00
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 6.653 6.739 0.99
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 10.303 10.210 1.01
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 10.453 10.616 0.98
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 14.791 14.297 1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 18.669 17.500 1.07
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 18.299 17.622 1.04
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 27.205 25.768 1.06
Performance for AVX2 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.002 0.002 0.99
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.002 0.002 1.03
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.009 0.003 2.59
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.017 0.005 3.54
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.010 0.004 2.65
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.017 0.007 2.58
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.025 0.008 3.21
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.015 0.007 2.15
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.025 0.011 2.28
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.034 0.007 4.64
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.005 4.37
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.034 0.009 3.94
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.069 0.066 1.04
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.055 0.054 1.01
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.344 0.104 3.30
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.678 0.139 4.87
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.356 0.122 2.93
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.680 0.220 3.09
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 1.013 0.249 4.07
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 0.541 0.232 2.33
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 1.031 0.392 2.63
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 1.344 0.257 5.23
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.747 0.226 3.31
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 1.486 0.461 3.22
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 0.195 0.194 1.01
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 0.174 0.170 1.03
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 0.991 0.317 3.13
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 1.959 0.427 4.59
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 1.019 0.375 2.71
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 2.022 0.871 2.32
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 2.925 0.785 3.73
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 1.646 0.753 2.19
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 3.268 1.572 2.08
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 4.049 0.965 4.19
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.818 0.962 2.93
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 4.835 1.982 2.44
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 0.458 0.450 1.02
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 0.408 0.412 0.99
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 2.262 0.960 2.36
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 4.441 1.159 3.83
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 2.404 1.096 2.19
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 4.834 2.319 2.08
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 6.938 2.044 3.39
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 4.265 1.989 2.14
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 8.018 3.795 2.11
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 9.524 2.578 3.69
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.241 2.537 2.46
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 11.551 4.905 2.36
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.010 0.010 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.009 0.009 0.97
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.009 0.009 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.021 0.021 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.018 0.018 1.01
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.019 0.019 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.031 0.031 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.027 0.027 1.01
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.028 0.028 0.99
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.041 0.041 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.037 0.037 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.043 0.045 0.98
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.387 0.384 1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.347 0.349 1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.341 0.324 1.06
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.781 0.744 1.05
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.718 0.680 1.06
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.832 0.811 1.02
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 1.344 1.352 0.99
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 1.394 1.392 1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 1.751 1.748 1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 2.362 2.350 1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 2.465 2.464 1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 3.186 3.028 1.05
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 1.154 1.150 1.00
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 1.027 1.039 0.99
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 1.067 1.077 0.99
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 2.617 2.633 0.99
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 2.653 2.672 0.99
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 3.210 3.169 1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 4.430 4.325 1.02
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 4.614 4.436 1.04
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 5.648 5.503 1.03
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 7.835 7.783 1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 8.088 7.804 1.04
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 9.602 9.462 1.01
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 2.526 2.599 0.97
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 2.357 2.341 1.01
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 2.460 2.605 0.94
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 5.726 5.881 0.97
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 5.527 5.708 0.97
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 6.529 6.686 0.98
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 10.133 10.438 0.97
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 10.412 10.601 0.98
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 14.218 14.803 0.96
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 17.647 17.891 0.99
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 17.977 18.400 0.98
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 26.532 26.297 1.01
Performance for AVX512 baseline
Performance test Reference time PR time Speedup
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.003 0.003 0.96
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.002 0.002 1.02
integral::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.006 0.004 1.54
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.017 0.005 3.61
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.010 0.004 2.60
integral::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.005 0.005 1.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.025 0.007 3.43
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.015 0.006 2.38
integral::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.006 0.006 1.00
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.034 0.006 5.82
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.020 0.004 4.67
integral::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.007 0.007 0.97
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.056 0.053 1.06
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.055 0.052 1.06
integral::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.180 0.105 1.72
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.681 0.116 5.87
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.342 0.114 3.00
integral::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.215 0.200 1.07
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 1.010 0.203 4.99
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 0.546 0.184 2.97
integral::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 0.336 0.311 1.08
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 1.349 0.230 5.87
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 0.765 0.228 3.35
integral::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 0.503 0.463 1.09
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 0.180 0.171 1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 0.184 0.170 1.08
integral::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 0.552 0.317 1.74
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 2.030 0.365 5.56
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 1.066 0.352 3.03
integral::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 0.855 0.807 1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 3.060 0.666 4.59
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 1.728 0.643 2.69
integral::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 1.455 1.378 1.06
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 4.022 0.933 4.31
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 2.836 0.969 2.93
integral::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 2.117 2.011 1.05
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 0.434 0.402 1.08
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 0.438 0.405 1.08
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 1.310 0.939 1.39
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 4.377 1.087 4.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 2.399 1.068 2.25
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 2.373 2.276 1.04
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 6.644 1.935 3.43
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 4.197 1.900 2.21
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 3.724 3.601 1.03
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 9.245 2.517 3.67
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 6.152 2.543 2.42
integral::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 5.243 5.071 1.03
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32F) 0.010 0.010 1.01
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_32S) 0.009 0.009 1.03
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC1, CV_64F) 0.012 0.012 1.01
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32F) 0.021 0.022 0.97
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_32S) 0.018 0.018 1.01
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC2, CV_64F) 0.010 0.010 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32F) 0.031 0.031 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_32S) 0.027 0.027 1.02
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC3, CV_64F) 0.012 0.012 1.01
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32F) 0.041 0.041 1.00
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_32S) 0.038 0.037 1.03
integral_sqsum::Size_MatType_OutMatDepth::(127x61, 8UC4, CV_64F) 0.016 0.017 0.96
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32F) 0.384 0.381 1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_32S) 0.343 0.343 1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC1, CV_64F) 0.429 0.425 1.01
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32F) 0.777 0.758 1.03
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_32S) 0.736 0.690 1.07
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC2, CV_64F) 0.434 0.417 1.04
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32F) 1.282 1.281 1.00
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_32S) 1.331 1.344 0.99
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC3, CV_64F) 0.789 0.749 1.05
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32F) 2.426 2.346 1.03
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_32S) 2.866 2.482 1.15
integral_sqsum::Size_MatType_OutMatDepth::(640x480, 8UC4, CV_64F) 1.246 1.171 1.06
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32F) 1.108 1.143 0.97
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_32S) 0.985 1.009 0.98
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC1, CV_64F) 1.308 1.255 1.04
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32F) 2.646 2.526 1.05
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_32S) 2.645 2.531 1.05
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC2, CV_64F) 2.018 1.927 1.05
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32F) 4.430 4.370 1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_32S) 4.541 4.566 0.99
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC3, CV_64F) 3.178 3.035 1.05
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32F) 7.691 7.637 1.01
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_32S) 8.730 7.895 1.11
integral_sqsum::Size_MatType_OutMatDepth::(1280x720, 8UC4, CV_64F) 4.303 4.174 1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32F) 2.599 2.489 1.04
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_32S) 2.370 2.260 1.05
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC1, CV_64F) 2.945 2.832 1.04
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32F) 5.885 5.827 1.01
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_32S) 5.766 5.595 1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC2, CV_64F) 4.617 4.536 1.02
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32F) 10.065 10.016 1.00
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_32S) 10.303 10.151 1.01
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC3, CV_64F) 7.903 7.657 1.03
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32F) 17.838 18.010 0.99
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_32S) 21.291 18.668 1.14
integral_sqsum::Size_MatType_OutMatDepth::(1920x1080, 8UC4, CV_64F) 10.686 10.284 1.04

@alalek alalek merged commit 8c24af6 into opencv:3.4 Feb 28, 2020
This was referenced Feb 28, 2020
@alalek
Copy link
Copy Markdown
Member

alalek commented Mar 1, 2020

OOB access issue: #16708

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants