Skip to content

Convert ImgWarp from SSE SIMD to HAL - 2.8x faster on Power (VSX) and…#15358

Merged
alalek merged 5 commits intoopencv:3.4from
ChipKerchner:imgwarpToHal
Aug 31, 2019
Merged

Convert ImgWarp from SSE SIMD to HAL - 2.8x faster on Power (VSX) and…#15358
alalek merged 5 commits intoopencv:3.4from
ChipKerchner:imgwarpToHal

Conversation

@ChipKerchner
Copy link
Copy Markdown
Contributor

Convert ImgWarp from SSE SIMD to HAL - 2.8x faster on Power (VSX) and 15% speedup on x86.

@alalek
Copy link
Copy Markdown
Member

alalek commented Aug 21, 2019

@terfendail Could you please check performance on x86 (SSE2, SSE4.2, AVX2 CPU baselines)?

@terfendail
Copy link
Copy Markdown
Contributor

terfendail commented Aug 21, 2019

Performance for SSE2 baseline
Performance test Reference time PR time Speedup
WarpPerspective::TestWarpPerspective::(640x480, INTER_LINEAR, BORDER_CONSTANT) 3.031 2.125 1.43
WarpPerspective::TestWarpPerspective::(640x480, INTER_LINEAR, BORDER_REPLICATE) 3.390 2.411 1.41
WarpPerspective::TestWarpPerspective::(640x480, INTER_NEAREST, BORDER_CONSTANT) 2.003 1.276 1.57
WarpPerspective::TestWarpPerspective::(640x480, INTER_NEAREST, BORDER_REPLICATE) 2.019 1.296 1.56
WarpPerspective::TestWarpPerspective::(1280x720, INTER_LINEAR, BORDER_CONSTANT) 9.240 6.259 1.48
WarpPerspective::TestWarpPerspective::(1280x720, INTER_LINEAR, BORDER_REPLICATE) 13.188 10.471 1.26
WarpPerspective::TestWarpPerspective::(1280x720, INTER_NEAREST, BORDER_CONSTANT) 6.326 4.183 1.51
WarpPerspective::TestWarpPerspective::(1280x720, INTER_NEAREST, BORDER_REPLICATE) 6.767 4.806 1.41
WarpPerspective::TestWarpPerspective::(1920x1080, INTER_LINEAR, BORDER_CONSTANT) 20.361 13.854 1.47
WarpPerspective::TestWarpPerspective::(1920x1080, INTER_LINEAR, BORDER_REPLICATE) 34.396 27.214 1.26
WarpPerspective::TestWarpPerspective::(1920x1080, INTER_NEAREST, BORDER_CONSTANT) 14.344 9.621 1.49
WarpPerspective::TestWarpPerspective::(1920x1080, INTER_NEAREST, BORDER_REPLICATE) 15.718 12.073 1.30
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC1) 3.309 2.318 1.43
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC4) 3.063 2.086 1.47
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC1) 3.357 2.316 1.45
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC4) 3.305 2.283 1.45
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_NEAREST, BORDER_CONSTANT, 8UC1) 1.665 0.915 1.82
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_NEAREST, BORDER_CONSTANT, 8UC4) 1.932 1.195 1.62
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_NEAREST, BORDER_REPLICATE, 8UC1) 1.711 0.946 1.81
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_NEAREST, BORDER_REPLICATE, 8UC4) 1.949 1.222 1.59
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC1) 21.766 15.183 1.43
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC4) 20.553 14.369 1.43
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC1) 22.257 16.063 1.39
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC4) 22.522 16.187 1.39
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_NEAREST, BORDER_CONSTANT, 8UC1) 11.356 6.145 1.85
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_NEAREST, BORDER_CONSTANT, 8UC4) 13.446 7.998 1.68
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_NEAREST, BORDER_REPLICATE, 8UC1) 11.375 6.365 1.79
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_NEAREST, BORDER_REPLICATE, 8UC4) 13.262 8.544 1.55
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_LINEAR, BORDER_CONSTANT, 8UC1) 54.484 37.296 1.46
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_LINEAR, BORDER_CONSTANT, 8UC4) 51.908 35.636 1.46
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_LINEAR, BORDER_REPLICATE, 8UC1) 55.525 38.363 1.45
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_LINEAR, BORDER_REPLICATE, 8UC4) 55.978 39.360 1.42
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_NEAREST, BORDER_CONSTANT, 8UC1) 27.704 15.008 1.85
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_NEAREST, BORDER_CONSTANT, 8UC4) 33.206 21.149 1.57
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_NEAREST, BORDER_REPLICATE, 8UC1) 28.299 15.596 1.81
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_NEAREST, BORDER_REPLICATE, 8UC4) 33.768 21.775 1.55
[94/1970] Performance for SSE3 baseline
Performance test Reference time PR time Speedup
WarpPerspective::TestWarpPerspective::(640x480, INTER_LINEAR, BORDER_CONSTANT) 3.099 1.978 1.57
WarpPerspective::TestWarpPerspective::(640x480, INTER_LINEAR, BORDER_REPLICATE) 3.371 2.248 1.50
WarpPerspective::TestWarpPerspective::(640x480, INTER_NEAREST, BORDER_CONSTANT) 2.015 1.192 1.69
WarpPerspective::TestWarpPerspective::(640x480, INTER_NEAREST, BORDER_REPLICATE) 2.050 1.230 1.67
WarpPerspective::TestWarpPerspective::(1280x720, INTER_LINEAR, BORDER_CONSTANT) 9.043 5.867 1.54
WarpPerspective::TestWarpPerspective::(1280x720, INTER_LINEAR, BORDER_REPLICATE) 12.911 9.990 1.29
WarpPerspective::TestWarpPerspective::(1280x720, INTER_NEAREST, BORDER_CONSTANT) 6.595 4.105 1.61
WarpPerspective::TestWarpPerspective::(1280x720, INTER_NEAREST, BORDER_REPLICATE) 7.345 4.812 1.53
WarpPerspective::TestWarpPerspective::(1920x1080, INTER_LINEAR, BORDER_CONSTANT) 19.390 13.723 1.41
WarpPerspective::TestWarpPerspective::(1920x1080, INTER_LINEAR, BORDER_REPLICATE) 32.112 26.954 1.19
WarpPerspective::TestWarpPerspective::(1920x1080, INTER_NEAREST, BORDER_CONSTANT) 15.255 10.309 1.48
WarpPerspective::TestWarpPerspective::(1920x1080, INTER_NEAREST, BORDER_REPLICATE) 17.301 12.755 1.36
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC1) 3.310 2.268 1.46
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC4) 3.091 2.046 1.51
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC1) 3.355 2.314 1.45
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC4) 3.308 2.259 1.46
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_NEAREST, BORDER_CONSTANT, 8UC1) 1.546 0.890 1.74
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_NEAREST, BORDER_CONSTANT, 8UC4) 1.813 1.164 1.56
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_NEAREST, BORDER_REPLICATE, 8UC1) 1.572 0.918 1.71
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_NEAREST, BORDER_REPLICATE, 8UC4) 1.958 1.194 1.64
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC1) 21.750 15.226 1.43
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC4) 20.907 13.629 1.53
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC1) 22.333 15.229 1.47
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC4) 22.628 15.860 1.43
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_NEAREST, BORDER_CONSTANT, 8UC1) 11.224 6.041 1.86
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_NEAREST, BORDER_CONSTANT, 8UC4) 13.646 8.328 1.64
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_NEAREST, BORDER_REPLICATE, 8UC1) 11.257 6.018 1.87
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_NEAREST, BORDER_REPLICATE, 8UC4) 13.737 8.626 1.59
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_LINEAR, BORDER_CONSTANT, 8UC1) 54.212 35.620 1.52
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_LINEAR, BORDER_CONSTANT, 8UC4) 51.013 34.853 1.46
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_LINEAR, BORDER_REPLICATE, 8UC1) 54.262 38.269 1.42
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_LINEAR, BORDER_REPLICATE, 8UC4) 55.094 39.326 1.40
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_NEAREST, BORDER_CONSTANT, 8UC1) 26.909 14.796 1.82
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_NEAREST, BORDER_CONSTANT, 8UC4) 33.618 21.130 1.59
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_NEAREST, BORDER_REPLICATE, 8UC1) 28.036 15.471 1.81
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_NEAREST, BORDER_REPLICATE, 8UC4) 33.788 21.736 1.55
[40/1970] Performance for SSE4_2 baseline
Performance test Reference time PR time Speedup
WarpPerspective::TestWarpPerspective::(640x480, INTER_LINEAR, BORDER_CONSTANT) 2.258 2.119 1.07
WarpPerspective::TestWarpPerspective::(640x480, INTER_LINEAR, BORDER_REPLICATE) 2.564 2.419 1.06
WarpPerspective::TestWarpPerspective::(640x480, INTER_NEAREST, BORDER_CONSTANT) 1.439 1.277 1.13
WarpPerspective::TestWarpPerspective::(640x480, INTER_NEAREST, BORDER_REPLICATE) 1.457 1.301 1.12
WarpPerspective::TestWarpPerspective::(1280x720, INTER_LINEAR, BORDER_CONSTANT) 6.745 6.425 1.05
WarpPerspective::TestWarpPerspective::(1280x720, INTER_LINEAR, BORDER_REPLICATE) 11.272 11.001 1.02
WarpPerspective::TestWarpPerspective::(1280x720, INTER_NEAREST, BORDER_CONSTANT) 4.677 4.186 1.12
WarpPerspective::TestWarpPerspective::(1280x720, INTER_NEAREST, BORDER_REPLICATE) 5.275 4.746 1.11
WarpPerspective::TestWarpPerspective::(1920x1080, INTER_LINEAR, BORDER_CONSTANT) 15.125 14.536 1.04
WarpPerspective::TestWarpPerspective::(1920x1080, INTER_LINEAR, BORDER_REPLICATE) 28.352 28.668 0.99
WarpPerspective::TestWarpPerspective::(1920x1080, INTER_NEAREST, BORDER_CONSTANT) 10.766 9.832 1.09
WarpPerspective::TestWarpPerspective::(1920x1080, INTER_NEAREST, BORDER_REPLICATE) 12.776 11.792 1.08
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC1) 1.859 1.856 1.00
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC4) 2.095 2.095 1.00
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC1) 1.903 1.901 1.00
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC4) 2.307 2.327 0.99
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_NEAREST, BORDER_CONSTANT, 8UC1) 1.034 0.917 1.13
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_NEAREST, BORDER_CONSTANT, 8UC4) 1.270 1.203 1.06
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_NEAREST, BORDER_REPLICATE, 8UC1) 1.048 0.947 1.11
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_NEAREST, BORDER_REPLICATE, 8UC4) 1.295 1.228 1.05
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC1) 12.439 12.476 1.00
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC4) 14.961 14.338 1.04
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC1) 13.521 12.978 1.04
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC4) 17.057 16.477 1.04
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_NEAREST, BORDER_CONSTANT, 8UC1) 6.892 6.236 1.11
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_NEAREST, BORDER_CONSTANT, 8UC4) 9.125 8.609 1.06
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_NEAREST, BORDER_REPLICATE, 8UC1) 7.451 6.509 1.14
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_NEAREST, BORDER_REPLICATE, 8UC4) 9.295 8.871 1.05
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_LINEAR, BORDER_CONSTANT, 8UC1) 31.860 31.185 1.02
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_LINEAR, BORDER_CONSTANT, 8UC4) 37.040 35.710 1.04
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_LINEAR, BORDER_REPLICATE, 8UC1) 32.850 31.591 1.04
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_LINEAR, BORDER_REPLICATE, 8UC4) 41.386 39.669 1.04
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_NEAREST, BORDER_CONSTANT, 8UC1) 17.715 15.389 1.15
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_NEAREST, BORDER_CONSTANT, 8UC4) 24.170 21.811 1.11
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_NEAREST, BORDER_REPLICATE, 8UC1) 18.245 15.958 1.14
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_NEAREST, BORDER_REPLICATE, 8UC4) 24.262 22.502 1.08
Performance for AVX2 baseline
Performance test Reference time PR time Speedup
WarpPerspective::TestWarpPerspective::(640x480, INTER_LINEAR, BORDER_CONSTANT) 2.098 2.058 1.02
WarpPerspective::TestWarpPerspective::(640x480, INTER_LINEAR, BORDER_REPLICATE) 2.395 2.339 1.02
WarpPerspective::TestWarpPerspective::(640x480, INTER_NEAREST, BORDER_CONSTANT) 1.346 1.204 1.12
WarpPerspective::TestWarpPerspective::(640x480, INTER_NEAREST, BORDER_REPLICATE) 1.376 1.232 1.12
WarpPerspective::TestWarpPerspective::(1280x720, INTER_LINEAR, BORDER_CONSTANT) 6.390 6.236 1.02
WarpPerspective::TestWarpPerspective::(1280x720, INTER_LINEAR, BORDER_REPLICATE) 10.966 10.850 1.01
WarpPerspective::TestWarpPerspective::(1280x720, INTER_NEAREST, BORDER_CONSTANT) 4.584 4.061 1.13
WarpPerspective::TestWarpPerspective::(1280x720, INTER_NEAREST, BORDER_REPLICATE) 5.274 4.793 1.10
WarpPerspective::TestWarpPerspective::(1920x1080, INTER_LINEAR, BORDER_CONSTANT) 14.594 14.171 1.03
WarpPerspective::TestWarpPerspective::(1920x1080, INTER_LINEAR, BORDER_REPLICATE) 28.781 28.291 1.02
WarpPerspective::TestWarpPerspective::(1920x1080, INTER_NEAREST, BORDER_CONSTANT) 11.047 9.837 1.12
WarpPerspective::TestWarpPerspective::(1920x1080, INTER_NEAREST, BORDER_REPLICATE) 13.251 12.256 1.08
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC1) 1.899 1.733 1.10
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_LINEAR, BORDER_CONSTANT, 8UC4) 2.146 1.944 1.10
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC1) 1.943 1.756 1.11
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_LINEAR, BORDER_REPLICATE, 8UC4) 2.382 2.210 1.08
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_NEAREST, BORDER_CONSTANT, 8UC1) 1.005 0.844 1.19
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_NEAREST, BORDER_CONSTANT, 8UC4) 1.291 1.124 1.15
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_NEAREST, BORDER_REPLICATE, 8UC1) 1.029 0.864 1.19
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(640x480, INTER_NEAREST, BORDER_REPLICATE, 8UC4) 1.326 1.155 1.15
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC1) 12.448 11.464 1.09
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_LINEAR, BORDER_CONSTANT, 8UC4) 14.304 13.616 1.05
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC1) 12.940 12.495 1.04
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_LINEAR, BORDER_REPLICATE, 8UC4) 16.316 15.222 1.07
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_NEAREST, BORDER_CONSTANT, 8UC1) 6.849 5.657 1.21
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_NEAREST, BORDER_CONSTANT, 8UC4) 9.121 8.112 1.12
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_NEAREST, BORDER_REPLICATE, 8UC1) 6.882 5.847 1.18
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(1920x1080, INTER_NEAREST, BORDER_REPLICATE, 8UC4) 9.258 7.909 1.17
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_LINEAR, BORDER_CONSTANT, 8UC1) 30.597 29.284 1.04
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_LINEAR, BORDER_CONSTANT, 8UC4) 35.566 34.929 1.02
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_LINEAR, BORDER_REPLICATE, 8UC1) 31.650 30.573 1.04
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_LINEAR, BORDER_REPLICATE, 8UC4) 40.847 39.315 1.04
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_NEAREST, BORDER_CONSTANT, 8UC1) 16.318 13.424 1.22
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_NEAREST, BORDER_CONSTANT, 8UC4) 22.184 20.287 1.09
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_NEAREST, BORDER_REPLICATE, 8UC1) 16.912 13.895 1.22
WarpPerspectiveNear::TestWarpPerspectiveNear_t::(2592x1944, INTER_NEAREST, BORDER_REPLICATE, 8UC4) 23.328 20.587 1.13


#if CV_TRY_SSE4_1
Ptr<opt_SSE4_1::WarpPerspectiveLine_SSE4> pwarp_impl_sse4;
if(CV_CPU_HAS_SUPPORT_SSE4_1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is necessary to dynamically call to specific implementation in case SSE4_1-instructions are available.
It looks like existing performance for SSE4 is better than UI-based performance for SSE2 for linear interpolation(at least for some cases) so IMO it make sense to retain dynamic dispatching of opt_SSE4_1::WarpPerspectiveLine_SSE4 here.
@alalek What's your opinion on this?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep SSE4_1 optimization here.

What is about performance difference with AVX2 baseline?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AVX2 baseline is better than SSE4_2 baseline both before and after UI implementation. AVX2 baseline also provide better performance improvement. However AVX2 baseline before UI implementation provide worse performance than SSE4_2 after. So IMO it make sense to implement UI-related dynamic dispatching for the module.

Copy link
Copy Markdown
Contributor Author

@ChipKerchner ChipKerchner Aug 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it from class function and dispatching to static functions. Hopefully this is what you are suggesting.

Do you want me to re-add in the line(s) ?
if(CV_CPU_HAS_SUPPORT_SSE4_1)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Checking for CV_CPU_HAS_SUPPORT_SSE4_1 allows to decide on SSE4 availability at runtime and dispatch execution accordingly. Preprocessor check for CV_TRY_SSE4_1 is performed to decide whether dynamic dispatching of the code is possible and necessary(AFAIK this feature is implemented for x86/64 only and could be completely disabled by the user for some reason)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to retain SSE4_1 implementation the way it was to make this PR cleaner

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-add dynamic runtime and dispatch execution.

if (pwarp_impl_sse4)
pwarp_impl_sse4->processNN(M, xy, X0, Y0, W0, bw);
#if CV_SIMD128_64F
if (pwarp_impl_CV_SIMD)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check is redundant, because the pointer couldn't be empty. I also think that creation of the class is redundant too, because it stores no additional information. IMO it would be better to just call to static functions processNN and process.

}

#if CV_TRY_SSE4_1
void WarpPerspectiveLine_ProcessNN_SSE41(const double *M, short* xy, double X0, double Y0, double W0, int bw);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to retain SSE4_1 optimizations inside opt_SSE4_1 namespace to avoid possible name conflicts

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored

@alalek alalek merged commit 26228e6 into opencv:3.4 Aug 31, 2019
@ChipKerchner ChipKerchner deleted the imgwarpToHal branch September 3, 2019 13:47
@alalek alalek mentioned this pull request Sep 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants