Skip to content

HResizeLinear reduce duplicate work and add partial VSX optimization ... and vectorize more copyMask variants#15257

Merged
alalek merged 5 commits intoopencv:3.4from
pmur:resize
Dec 9, 2019
Merged

HResizeLinear reduce duplicate work and add partial VSX optimization ... and vectorize more copyMask variants#15257
alalek merged 5 commits intoopencv:3.4from
pmur:resize

Conversation

@pmur
Copy link
Copy Markdown
Contributor

@pmur pmur commented Aug 7, 2019

Fix a typo in the unrolling of HResizeLinear when k > 1.

Next, add a VSX optimization for the 8u -> 32s operation. This provides a 1.4x speedup for P9 and 1.1X for P8 baselines.

force_builders=Linux AVX2,Custom
buildworker:Custom=linux-3
build_image:Custom=ubuntu:18.04
CPU_BASELINE:Custom=AVX512_SKX
disable_ipp=ON

@pmur pmur changed the title HResizeLinear reduce duplicate work HResizeLinear reduce duplicate work and add partial VSX optimization Aug 8, 2019
A1 ... A4
B1 ... B4
*/
void inline v_load_expand_deinterlace(const short int*_in, v_int32x4 &out_even, v_int32x4 &out_odd)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use the 'const short * ptr' as the 1st parameter to be consistent

@pmur pmur force-pushed the resize branch 3 times, most recently from f3189f9 to efa9dd7 Compare August 20, 2019 14:12
const ST *S1 = src[k+1];
DT *D1 = dst[k+1];

for( dx = 0; dx < (xmax - nlanes); dx+=nlanes )
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to use dx <= (xmax - nlanes) here to process one more vector.
The same at L#1558

@pmur pmur changed the title HResizeLinear reduce duplicate work and add partial VSX optimization HResizeLinear reduce duplicate work and add partial VSX optimization ... and vectorize more copyMask variants Sep 13, 2019
#else
/* This is compiler-agnostic, but will introduce an unneeded splat on the critical path. */
#define _LXSIWZX(out, ptr, T) out = (T)vec_udword2_sp(*(uint32_t*)(ptr));
#endif
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seiko2plus thank you for helping fix the llvm HAL bugs I created. Would you be able to review these changes?

@terfendail
Copy link
Copy Markdown
Contributor

@pmur Could you please move changes related to copyMask to separate PR?

@pmur pmur force-pushed the resize branch 2 times, most recently from 22fb1b5 to 5908f17 Compare September 20, 2019 22:58

inline void v_cleanup() {}

#if CV_SIM128_PERMUTE
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

CV_CPU_OPTIMIZATION_HAL_NAMESPACE_END
/* Permute 32B into a 16B vector using a control vector. Any control byte value greater than the number of
bytes in the aggregrate vector results in undefined behavior (e.x PPC will ignore those bits) */
inline v_uint8x16 v_permute_2(const v_uint8x16 &ctrl, const v_uint8x16 &in_lo, const v_uint8x16 &in_hi)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to use uniform v_permute name for the intrinsic. It's possible to get quantity of permuted vectors from the list of parameters.

OPENCV_HAL_IMPL_VSX_TRANSPOSE4x4(v_float32x4, vec_float4)

CV_CPU_OPTIMIZATION_HAL_NAMESPACE_END
/* Permute 32B into a 16B vector using a control vector. Any control byte value greater than the number of
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be plain scalar implementation for all intrinsics in intrin_cpp.hpp file. Intrinsic documentation should be added to that file as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I may be overlooking it, but how would I build to include the simulated intrinsics?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simulated intrinsics file is used to produce doxygen documentation(BUILD_DOCS cmake flag that leads to definition of CV_DOXYGEN to 1 and exclusion of non-simulated intrinsics). Another option to use simulated intrinsics is CPU_BASELINE cmake flag. It's possible to set it to empty value to prohibit usage of any platform specific features that disable non-simulated intrinsics for any platform as well.

void inline v_load_expand_deinterlace(const short *ptr, v_int32x4 &out_even, v_int32x4 &out_odd)
{
static const v_uint8x16 perm(0,1,4,5,8,9,12,13,2,3,6,7,10,11,14,15);
v_int16x8 in = v_reinterpret_as_s16(v_permute_1(perm, v_reinterpret_as_u8(v_load(ptr))));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it make sense to guard with CV_SIMD128_PERMUTE this part only. It's possible to substitute permute&expand with

v_int32x4 v_src = v_reinterpret_as_s32(v_load(ptr));
out_even = (v_src << 16) >> 16;
out_odd = v_src >> 16;

BTW the substitution use the same quantity of vector operations so probably it could give similar performance gain. @pmur Could you compare performance of both implementations for VSX?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, neat trick. That should perform nearly identical.

template<typename ST, typename DT, typename AT, typename DVT>
struct HResizeLinearVec_X4
{
int operator()(const uchar** _src, uchar** _dst, int count, const int* xofs,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be simpler to define operator as

int operator()(const ST** src, DT** dst, int count, const int* xofs, const AT* alpha, int, int, int cn, int, int xmax) const

DVT a_odd;

v_load_expand_deinterlace(&alpha[dx*2], a_even, a_odd);
DVT s0(S0[sx0], S0[sx1], S0[sx2], S0[sx3]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v_lut intrinsic could be used here DVT s0 = v_lut(S0, xofs + dx);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost. There is also an expanding type conversion happening here between ST and DT in most instantiations of the template.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed that. Sorry

@terfendail
Copy link
Copy Markdown
Contributor

terfendail commented Sep 27, 2019

Performance for SSE2 baseline
Performance test Reference time PR time Speedup
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 320x240)) 0.014 0.012 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (960x540, 640x480)) 0.354 0.476 0.74
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 213x120)) 0.042 0.061 0.69
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 320x240)) 0.117 0.174 0.67
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 640x480)) 0.396 0.568 0.70
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 320x240)) 0.029 0.027 1.05
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (960x540, 640x480)) 0.481 0.438 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 213x120)) 0.065 0.060 1.07
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 320x240)) 0.190 0.179 1.06
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 640x480)) 0.587 0.541 1.09
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 320x240)) 0.051 0.052 0.98
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (960x540, 640x480)) 0.317 0.287 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 213x120)) 0.057 0.053 1.07
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 320x240)) 0.135 0.134 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 640x480)) 0.361 0.336 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 320x240)) 0.480 0.476 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (960x540, 640x480)) 0.685 0.943 0.73
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 213x120)) 0.081 0.117 0.69
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 320x240)) 0.240 0.343 0.70
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 640x480)) 0.782 1.111 0.70
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 320x240)) 0.434 0.442 0.98
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (960x540, 640x480)) 0.937 0.872 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 213x120)) 0.128 0.117 1.09
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 320x240)) 0.372 0.344 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 640x480)) 1.165 1.072 1.09
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 320x240)) 0.337 0.337 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (960x540, 640x480)) 0.644 0.571 1.13
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 213x120)) 0.114 0.114 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 320x240)) 0.262 0.249 1.05
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 640x480)) 0.791 0.705 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 320x240)) 0.118 0.118 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (960x540, 640x480)) 1.017 1.430 0.71
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 213x120)) 0.117 0.173 0.67
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 320x240)) 0.336 0.508 0.66
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 640x480)) 1.166 1.656 0.70
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 320x240)) 0.121 0.123 0.98
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (960x540, 640x480)) 1.392 1.295 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 213x120)) 0.186 0.173 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 320x240)) 0.549 0.498 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 640x480)) 1.730 1.562 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 320x240)) 0.443 0.445 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (960x540, 640x480)) 1.005 0.886 1.13
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 213x120)) 0.170 0.167 1.02
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 320x240)) 0.406 0.400 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 640x480)) 1.258 1.161 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 320x240)) 0.087 0.087 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (960x540, 640x480)) 1.354 1.879 0.72
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 213x120)) 0.152 0.234 0.65
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 320x240)) 0.448 0.687 0.65
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 640x480)) 1.538 2.252 0.68
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 320x240)) 0.131 0.132 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (960x540, 640x480)) 1.843 1.680 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 213x120)) 0.246 0.223 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 320x240)) 0.729 0.659 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 640x480)) 2.297 2.071 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 320x240)) 0.231 0.232 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (960x540, 640x480)) 1.368 1.250 1.09
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 213x120)) 0.244 0.226 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 320x240)) 0.639 0.624 1.02
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 640x480)) 1.794 1.620 1.11
resizeUpLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 960x540)) 0.504 0.654 0.77
resizeUpLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 1280x720)) 0.680 0.896 0.76
resizeUpLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 960x540)) 0.679 0.623 1.09
resizeUpLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 1280x720)) 0.936 0.868 1.08
resizeUpLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 960x540)) 0.456 0.414 1.10
resizeUpLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 1280x720)) 0.650 0.589 1.10
resizeUpLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 960x540)) 0.946 1.274 0.74
resizeUpLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 1280x720)) 1.335 1.831 0.73
resizeUpLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 960x540)) 1.315 1.198 1.10
resizeUpLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 1280x720)) 1.850 1.694 1.09
resizeUpLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 960x540)) 0.934 0.836 1.12
resizeUpLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 1280x720)) 1.306 1.214 1.08
resizeUpLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 960x540)) 1.424 1.982 0.72
resizeUpLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 1280x720)) 2.101 2.772 0.76
resizeUpLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 960x540)) 1.955 1.704 1.15
resizeUpLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 1280x720)) 2.832 2.568 1.10
resizeUpLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 960x540)) 1.362 1.287 1.06
resizeUpLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 1280x720)) 2.043 1.926 1.06
resizeUpLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 960x540)) 1.982 2.647 0.75
resizeUpLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 1280x720)) 2.884 3.836 0.75
resizeUpLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 960x540)) 2.681 2.442 1.10
resizeUpLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 1280x720)) 3.788 3.517 1.08
resizeUpLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 960x540)) 1.943 1.756 1.11
resizeUpLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 1280x720)) 2.926 2.686 1.09
Performance for SSE3 baseline
Performance test Reference time PR time Speedup
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 320x240)) 0.014 0.013 1.07
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (960x540, 640x480)) 0.349 0.457 0.76
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 213x120)) 0.042 0.059 0.71
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 320x240)) 0.117 0.166 0.71
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 640x480)) 0.395 0.544 0.73
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 320x240)) 0.029 0.028 1.03
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (960x540, 640x480)) 0.480 0.432 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 213x120)) 0.064 0.058 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 320x240)) 0.190 0.171 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 640x480)) 0.589 0.533 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 320x240)) 0.052 0.054 0.98
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (960x540, 640x480)) 0.318 0.286 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 213x120)) 0.055 0.053 1.02
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 320x240)) 0.135 0.135 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 640x480)) 0.362 0.340 1.06
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 320x240)) 0.473 0.475 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (960x540, 640x480)) 0.686 0.941 0.73
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 213x120)) 0.081 0.120 0.67
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 320x240)) 0.244 0.351 0.70
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 640x480)) 0.778 1.137 0.68
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 320x240)) 0.436 0.430 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (960x540, 640x480)) 0.944 0.852 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 213x120)) 0.129 0.115 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 320x240)) 0.372 0.334 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 640x480)) 1.171 1.044 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 320x240)) 0.332 0.333 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (960x540, 640x480)) 0.629 0.570 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 213x120)) 0.115 0.115 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 320x240)) 0.256 0.252 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 640x480)) 0.783 0.704 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 320x240)) 0.121 0.121 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (960x540, 640x480)) 1.039 1.423 0.73
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 213x120)) 0.121 0.178 0.68
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 320x240)) 0.336 0.519 0.65
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 640x480)) 1.165 1.662 0.70
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 320x240)) 0.121 0.121 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (960x540, 640x480)) 1.394 1.264 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 213x120)) 0.189 0.169 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 320x240)) 0.550 0.490 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 640x480)) 1.731 1.558 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 320x240)) 0.448 0.445 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (960x540, 640x480)) 0.974 0.890 1.09
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 213x120)) 0.170 0.168 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 320x240)) 0.402 0.399 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 640x480)) 1.241 1.178 1.05
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 320x240)) 0.087 0.087 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (960x540, 640x480)) 1.351 1.853 0.73
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 213x120)) 0.154 0.228 0.67
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 320x240)) 0.445 0.668 0.67
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 640x480)) 1.536 2.203 0.70
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 320x240)) 0.129 0.130 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (960x540, 640x480)) 1.850 1.610 1.15
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 213x120)) 0.247 0.213 1.16
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 320x240)) 0.730 0.640 1.14
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 640x480)) 2.308 1.984 1.16
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 320x240)) 0.228 0.234 0.97
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (960x540, 640x480)) 1.357 1.250 1.09
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 213x120)) 0.241 0.224 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 320x240)) 0.635 0.609 1.04
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 640x480)) 1.782 1.614 1.10
resizeUpLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 960x540)) 0.484 0.653 0.74
resizeUpLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 1280x720)) 0.688 0.894 0.77
resizeUpLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 960x540)) 0.678 0.582 1.17
resizeUpLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 1280x720)) 0.931 0.858 1.08
resizeUpLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 960x540)) 0.459 0.397 1.16
resizeUpLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 1280x720)) 0.655 0.562 1.17
resizeUpLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 960x540)) 0.948 1.291 0.73
resizeUpLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 1280x720)) 1.340 1.762 0.76
resizeUpLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 960x540)) 1.319 1.199 1.10
resizeUpLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 1280x720)) 1.865 1.696 1.10
resizeUpLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 960x540)) 0.949 0.804 1.18
resizeUpLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 1280x720)) 1.350 1.193 1.13
resizeUpLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 960x540)) 1.425 1.899 0.75
resizeUpLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 1280x720)) 2.019 2.747 0.73
resizeUpLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 960x540)) 1.945 1.780 1.09
resizeUpLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 1280x720)) 2.873 2.492 1.15
resizeUpLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 960x540)) 1.443 1.276 1.13
resizeUpLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 1280x720)) 2.147 1.863 1.15
resizeUpLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 960x540)) 1.931 2.644 0.73
resizeUpLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 1280x720)) 2.930 3.668 0.80
resizeUpLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 960x540)) 2.724 2.385 1.14
resizeUpLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 1280x720)) 3.878 3.466 1.12
resizeUpLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 960x540)) 1.969 1.730 1.14
resizeUpLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 1280x720)) 2.955 2.591 1.14
Performance for SSE4_2 baseline
Performance test Reference time PR time Speedup
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 320x240)) 0.014 0.014 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (960x540, 640x480)) 0.347 0.354 0.98
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 213x120)) 0.042 0.048 0.89
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 320x240)) 0.118 0.134 0.88
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 640x480)) 0.406 0.436 0.93
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 320x240)) 0.028 0.028 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (960x540, 640x480)) 0.453 0.412 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 213x120)) 0.064 0.057 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 320x240)) 0.188 0.168 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 640x480)) 0.570 0.515 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 320x240)) 0.053 0.051 1.04
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (960x540, 640x480)) 0.318 0.277 1.15
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 213x120)) 0.055 0.050 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 320x240)) 0.134 0.127 1.06
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 640x480)) 0.359 0.321 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 320x240)) 0.477 0.484 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (960x540, 640x480)) 0.681 0.714 0.95
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 213x120)) 0.081 0.090 0.90
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 320x240)) 0.242 0.263 0.92
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 640x480)) 0.780 0.847 0.92
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 320x240)) 0.434 0.440 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (960x540, 640x480)) 0.887 0.816 1.09
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 213x120)) 0.127 0.113 1.13
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 320x240)) 0.366 0.319 1.15
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 640x480)) 1.130 0.990 1.14
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 320x240)) 0.335 0.338 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (960x540, 640x480)) 0.630 0.550 1.15
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 213x120)) 0.113 0.109 1.04
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 320x240)) 0.256 0.242 1.06
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 640x480)) 0.788 0.681 1.16
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 320x240)) 0.110 0.110 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (960x540, 640x480)) 1.016 1.030 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 213x120)) 0.117 0.130 0.90
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 320x240)) 0.337 0.381 0.88
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 640x480)) 1.146 1.228 0.93
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 320x240)) 0.131 0.131 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (960x540, 640x480)) 1.322 1.188 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 213x120)) 0.186 0.161 1.15
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 320x240)) 0.545 0.473 1.15
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 640x480)) 1.683 1.488 1.13
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 320x240)) 0.450 0.456 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (960x540, 640x480)) 0.981 0.859 1.14
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 213x120)) 0.170 0.161 1.05
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 320x240)) 0.407 0.380 1.07
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 640x480)) 1.240 1.135 1.09
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 320x240)) 0.087 0.086 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (960x540, 640x480)) 1.348 1.373 0.98
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 213x120)) 0.153 0.177 0.87
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 320x240)) 0.469 0.598 0.78
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 640x480)) 1.534 1.672 0.92
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 320x240)) 0.130 0.126 1.03
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (960x540, 640x480)) 1.756 1.619 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 213x120)) 0.245 0.219 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 320x240)) 0.723 0.643 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 640x480)) 2.243 2.032 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 320x240)) 0.229 0.231 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (960x540, 640x480)) 1.354 1.221 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 213x120)) 0.244 0.207 1.18
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 320x240)) 0.633 0.577 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 640x480)) 1.769 1.553 1.14
resizeUpLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 960x540)) 0.501 0.518 0.97
resizeUpLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 1280x720)) 0.705 0.708 1.00
resizeUpLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 960x540)) 0.621 0.562 1.10
resizeUpLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 1280x720)) 0.867 0.783 1.11
resizeUpLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 960x540)) 0.456 0.401 1.14
resizeUpLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 1280x720)) 0.645 0.564 1.14
resizeUpLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 960x540)) 0.987 0.987 1.00
resizeUpLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 1280x720)) 1.394 1.393 1.00
resizeUpLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 960x540)) 1.229 1.123 1.09
resizeUpLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 1280x720)) 1.692 1.574 1.07
resizeUpLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 960x540)) 0.934 0.812 1.15
resizeUpLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 1280x720)) 1.329 1.172 1.13
resizeUpLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 960x540)) 1.481 1.473 1.01
resizeUpLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 1280x720)) 2.093 2.086 1.00
resizeUpLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 960x540)) 1.830 1.693 1.08
resizeUpLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 1280x720)) 2.562 2.384 1.07
resizeUpLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 960x540)) 1.410 1.245 1.13
resizeUpLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 1280x720)) 2.104 1.861 1.13
resizeUpLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 960x540)) 1.975 1.969 1.00
resizeUpLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 1280x720)) 2.896 2.885 1.00
resizeUpLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 960x540)) 2.458 2.261 1.09
resizeUpLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 1280x720)) 3.497 3.238 1.08
resizeUpLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 960x540)) 1.940 1.707 1.14
resizeUpLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 1280x720)) 2.931 2.592 1.13
Performance for AVX2 baseline
Performance test Reference time PR time Speedup
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 320x240)) 0.010 0.010 0.97
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (960x540, 640x480)) 0.329 0.315 1.04
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 213x120)) 0.043 0.049 0.88
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 320x240)) 0.117 0.130 0.90
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 640x480)) 0.375 0.408 0.92
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 320x240)) 0.017 0.019 0.88
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (960x540, 640x480)) 0.454 0.371 1.22
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 213x120)) 0.055 0.054 1.02
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 320x240)) 0.162 0.163 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 640x480)) 0.521 0.480 1.09
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 320x240)) 0.055 0.058 0.96
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (960x540, 640x480)) 0.342 0.271 1.26
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 213x120)) 0.055 0.054 1.03
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 320x240)) 0.134 0.131 1.02
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 640x480)) 0.371 0.315 1.18
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 320x240)) 0.453 0.463 0.98
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (960x540, 640x480)) 0.674 0.655 1.03
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 213x120)) 0.082 0.084 0.98
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 320x240)) 0.255 0.242 1.05
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 640x480)) 0.797 0.776 1.03
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 320x240)) 0.429 0.429 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (960x540, 640x480)) 0.922 0.784 1.18
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 213x120)) 0.110 0.105 1.05
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 320x240)) 0.314 0.297 1.06
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 640x480)) 1.068 0.935 1.14
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 320x240)) 0.365 0.364 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (960x540, 640x480)) 0.692 0.543 1.28
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 213x120)) 0.115 0.111 1.04
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 320x240)) 0.260 0.244 1.07
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 640x480)) 0.801 0.670 1.20
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 320x240)) 0.079 0.074 1.07
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (960x540, 640x480)) 0.983 0.938 1.05
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 213x120)) 0.122 0.127 0.96
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 320x240)) 0.337 0.374 0.90
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 640x480)) 1.138 1.169 0.97
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 320x240)) 0.124 0.120 1.03
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (960x540, 640x480)) 1.347 1.102 1.22
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 213x120)) 0.158 0.152 1.04
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 320x240)) 0.467 0.467 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 640x480)) 1.559 1.420 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 320x240)) 0.495 0.494 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (960x540, 640x480)) 1.073 0.855 1.26
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 213x120)) 0.173 0.165 1.05
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 320x240)) 0.411 0.399 1.03
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 640x480)) 1.262 1.115 1.13
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 320x240)) 0.073 0.074 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (960x540, 640x480)) 1.316 1.214 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 213x120)) 0.155 0.167 0.93
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 320x240)) 0.441 0.484 0.91
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 640x480)) 1.520 1.505 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 320x240)) 0.122 0.120 1.02
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (960x540, 640x480)) 1.746 1.566 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 213x120)) 0.205 0.212 0.97
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 320x240)) 0.612 0.613 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 640x480)) 2.020 1.966 1.03
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 320x240)) 0.237 0.229 1.03
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (960x540, 640x480)) 1.459 1.233 1.18
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 213x120)) 0.242 0.223 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 320x240)) 0.630 0.603 1.05
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 640x480)) 1.803 1.591 1.13
resizeUpLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 960x540)) 0.488 0.477 1.02
resizeUpLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 1280x720)) 0.652 0.667 0.98
resizeUpLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 960x540)) 0.647 0.546 1.19
resizeUpLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 1280x720)) 0.917 0.762 1.20
resizeUpLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 960x540)) 0.508 0.393 1.30
resizeUpLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 1280x720)) 0.710 0.552 1.29
resizeUpLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 960x540)) 0.917 0.928 0.99
resizeUpLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 1280x720)) 1.277 1.176 1.09
resizeUpLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 960x540)) 1.322 1.095 1.21
resizeUpLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 1280x720)) 1.767 1.499 1.18
resizeUpLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 960x540)) 1.019 0.768 1.33
resizeUpLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 1280x720)) 1.457 1.136 1.28
resizeUpLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 960x540)) 1.431 1.321 1.08
resizeUpLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 1280x720)) 1.935 1.866 1.04
resizeUpLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 960x540)) 1.952 1.596 1.22
resizeUpLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 1280x720)) 2.826 2.303 1.23
resizeUpLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 960x540)) 1.555 1.195 1.30
resizeUpLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 1280x720)) 2.298 1.798 1.28
resizeUpLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 960x540)) 1.947 1.852 1.05
resizeUpLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 1280x720)) 2.647 2.608 1.01
resizeUpLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 960x540)) 2.644 2.230 1.19
resizeUpLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 1280x720)) 3.720 3.138 1.19
resizeUpLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 960x540)) 2.126 1.664 1.28
resizeUpLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 1280x720)) 3.160 2.519 1.25

Looks like the change cause performance regression for U8 on SSE2, SSE3 baselines and SSE42 baseline for downscaling. Going to investigate it further

@pmur
Copy link
Copy Markdown
Contributor Author

pmur commented Oct 21, 2019

@terfendail have you had a chance to investigate? Would it be acceptable to guard these optimizations with !CV_SSE? I think this is architectural differences showing themselves here. Maybe someone with ARM hardware can benchmark?

There appears to be a 2x unroll of the HResizeLinear against k,
however the k value is only incremented by 1 during the unroll. This
results in k - 1 duplicate passes when k > 1.

Likewise, the final pass may not respect the work done by the vector
loop. Start it with the offset returned by the vector op if
implemented. Note, no vector ops are implemented today.

The performance is most noticable on a linear downscale. A set of
performance tests are added to characterize this.  The performance
improvement is 10-50% depending on the scaling.
@terfendail
Copy link
Copy Markdown
Contributor

I've prepared PR as a performance investigation result

@pmur
Copy link
Copy Markdown
Contributor Author

pmur commented Dec 2, 2019

@terfendail thanks, I've cherry-picked your patch without modification, and cherry-picked the changes to avoid the regression caused by vsx v_load_expand_q.

IPP by default. For now, disable it on these targets until a more
exhaustive performance analysis can be done.
*/
#if CV_SIMD128 && !CV_SSE
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!CV_SSE guard isn't necessary now

pmur and others added 3 commits December 5, 2019 08:26
Performance is mostly gated by the gather operations
for x inputs.

Likewise, provide a 2x unroll against k, this reduces the
number of alpha gathers by 1/2 for larger k.

While not a 4x improvement, it still performs substantially
better under P9 for a 1.4x improvement. P8 baseline is
1.05-1.10x due to reduced VSX instruction set.

For float types, this results in a more modest
1.2x improvement.
With a little help, we can do this quickly without gprs on
all VSX enabled targets.
@pmur
Copy link
Copy Markdown
Contributor Author

pmur commented Dec 5, 2019

Is it possible to increase the timeout for the videoio tests?

@alalek
Copy link
Copy Markdown
Member

alalek commented Dec 6, 2019

Thank you for updates!

Tests from videoio hangs sporadically on "custom" builder (Ubuntu 18.04 + GStreamer), so timeout will not help there. Just ignore videoio test.
Need to check other failures first (like test_dnn).

else if(cn == 3)
{
const int step = 4;
const int len0 = xmax & -step;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's my fault. Here should be const int len0 = xmax - step;. While loop condition should be dx <= len0.
Otherwise overflow is possible because actual loop step isn't multiple of 4.

Per feedback, ensure we don't overrun. This was caught via the
failure observed in Test_TensorFlow.inception_accuracy.
@terfendail
Copy link
Copy Markdown
Contributor

Performance for SSE2 baseline
Performance test Reference time PR time Speedup
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 320x240)) 0.014 0.014 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (960x540, 640x480)) 0.354 0.212 1.67
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 213x120)) 0.042 0.031 1.36
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 320x240)) 0.117 0.082 1.43
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 640x480)) 0.396 0.249 1.59
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 320x240)) 0.029 0.027 1.06
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (960x540, 640x480)) 0.481 0.430 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 213x120)) 0.065 0.059 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 320x240)) 0.190 0.171 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 640x480)) 0.587 0.530 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 320x240)) 0.051 0.049 1.03
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (960x540, 640x480)) 0.317 0.287 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 213x120)) 0.057 0.049 1.18
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 320x240)) 0.135 0.126 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 640x480)) 0.361 0.337 1.07
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 320x240)) 0.480 0.483 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (960x540, 640x480)) 0.685 0.355 1.93
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 213x120)) 0.081 0.049 1.64
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 320x240)) 0.240 0.141 1.70
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 640x480)) 0.782 0.429 1.82
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 320x240)) 0.434 0.434 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (960x540, 640x480)) 0.937 0.852 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 213x120)) 0.128 0.117 1.09
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 320x240)) 0.372 0.338 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 640x480)) 1.165 1.070 1.09
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 320x240)) 0.337 0.339 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (960x540, 640x480)) 0.644 0.572 1.13
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 213x120)) 0.114 0.108 1.06
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 320x240)) 0.262 0.239 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 640x480)) 0.791 0.699 1.13
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 320x240)) 0.118 0.118 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (960x540, 640x480)) 1.017 0.548 1.85
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 213x120)) 0.117 0.073 1.61
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 320x240)) 0.336 0.203 1.65
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 640x480)) 1.166 0.651 1.79
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 320x240)) 0.121 0.122 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (960x540, 640x480)) 1.392 1.295 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 213x120)) 0.186 0.172 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 320x240)) 0.549 0.494 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 640x480)) 1.730 1.589 1.09
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 320x240)) 0.443 0.455 0.97
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (960x540, 640x480)) 1.005 0.887 1.13
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 213x120)) 0.170 0.158 1.07
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 320x240)) 0.406 0.379 1.07
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 640x480)) 1.258 1.161 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 320x240)) 0.087 0.086 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (960x540, 640x480)) 1.354 0.515 2.63
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 213x120)) 0.152 0.069 2.20
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 320x240)) 0.448 0.183 2.45
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 640x480)) 1.538 0.597 2.58
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 320x240)) 0.131 0.122 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (960x540, 640x480)) 1.843 1.674 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 213x120)) 0.246 0.221 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 320x240)) 0.729 0.669 1.09
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 640x480)) 2.297 2.081 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 320x240)) 0.231 0.224 1.03
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (960x540, 640x480)) 1.368 1.220 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 213x120)) 0.244 0.206 1.19
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 320x240)) 0.639 0.580 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 640x480)) 1.794 1.596 1.12
resizeUpLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 960x540)) 0.504 0.298 1.69
resizeUpLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 1280x720)) 0.680 0.425 1.60
resizeUpLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 960x540)) 0.679 0.608 1.12
resizeUpLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 1280x720)) 0.936 0.865 1.08
resizeUpLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 960x540)) 0.456 0.405 1.13
resizeUpLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 1280x720)) 0.650 0.571 1.14
resizeUpLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 960x540)) 0.946 0.482 1.96
resizeUpLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 1280x720)) 1.335 0.716 1.86
resizeUpLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 960x540)) 1.315 1.202 1.09
resizeUpLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 1280x720)) 1.850 1.688 1.10
resizeUpLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 960x540)) 0.934 0.823 1.13
resizeUpLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 1280x720)) 1.306 1.184 1.10
resizeUpLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 960x540)) 1.424 0.784 1.82
resizeUpLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 1280x720)) 2.101 1.156 1.82
resizeUpLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 960x540)) 1.955 1.759 1.11
resizeUpLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 1280x720)) 2.832 2.530 1.12
resizeUpLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 960x540)) 1.362 1.273 1.07
resizeUpLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 1280x720)) 2.043 1.875 1.09
resizeUpLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 960x540)) 1.982 0.811 2.44
resizeUpLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 1280x720)) 2.884 1.349 2.14
resizeUpLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 960x540)) 2.681 2.382 1.13
resizeUpLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 1280x720)) 3.788 3.478 1.09
resizeUpLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 960x540)) 1.943 1.729 1.12
resizeUpLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 1280x720)) 2.926 2.591 1.13
Performance for SSE3 baseline
Performance test Reference time PR time Speedup
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 320x240)) 0.014 0.013 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (960x540, 640x480)) 0.349 0.206 1.69
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 213x120)) 0.042 0.030 1.39
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 320x240)) 0.117 0.079 1.48
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 640x480)) 0.395 0.243 1.63
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 320x240)) 0.029 0.027 1.05
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (960x540, 640x480)) 0.480 0.417 1.15
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 213x120)) 0.064 0.057 1.14
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 320x240)) 0.190 0.171 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 640x480)) 0.589 0.526 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 320x240)) 0.052 0.051 1.02
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (960x540, 640x480)) 0.318 0.286 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 213x120)) 0.055 0.052 1.04
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 320x240)) 0.135 0.133 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 640x480)) 0.362 0.339 1.07
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 320x240)) 0.473 0.474 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (960x540, 640x480)) 0.686 0.347 1.98
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 213x120)) 0.081 0.048 1.67
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 320x240)) 0.244 0.140 1.75
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 640x480)) 0.778 0.417 1.86
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 320x240)) 0.436 0.436 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (960x540, 640x480)) 0.944 0.854 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 213x120)) 0.129 0.115 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 320x240)) 0.372 0.335 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 640x480)) 1.171 1.050 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 320x240)) 0.332 0.335 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (960x540, 640x480)) 0.629 0.570 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 213x120)) 0.115 0.110 1.04
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 320x240)) 0.256 0.245 1.04
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 640x480)) 0.783 0.698 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 320x240)) 0.121 0.118 1.02
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (960x540, 640x480)) 1.039 0.551 1.89
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 213x120)) 0.121 0.073 1.65
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 320x240)) 0.336 0.204 1.65
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 640x480)) 1.165 0.654 1.78
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 320x240)) 0.121 0.122 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (960x540, 640x480)) 1.394 1.268 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 213x120)) 0.189 0.169 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 320x240)) 0.550 0.494 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 640x480)) 1.731 1.558 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 320x240)) 0.448 0.447 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (960x540, 640x480)) 0.974 0.885 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 213x120)) 0.170 0.163 1.04
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 320x240)) 0.402 0.396 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 640x480)) 1.241 1.148 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 320x240)) 0.087 0.087 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (960x540, 640x480)) 1.351 0.514 2.63
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 213x120)) 0.154 0.071 2.18
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 320x240)) 0.445 0.179 2.49
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 640x480)) 1.536 0.569 2.70
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 320x240)) 0.129 0.127 1.02
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (960x540, 640x480)) 1.850 1.680 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 213x120)) 0.247 0.223 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 320x240)) 0.730 0.659 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 640x480)) 2.308 2.070 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 320x240)) 0.228 0.226 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (960x540, 640x480)) 1.357 1.234 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 213x120)) 0.241 0.223 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 320x240)) 0.635 0.605 1.05
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 640x480)) 1.782 1.623 1.10
resizeUpLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 960x540)) 0.484 0.287 1.68
resizeUpLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 1280x720)) 0.688 0.406 1.70
resizeUpLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 960x540)) 0.678 0.584 1.16
resizeUpLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 1280x720)) 0.931 0.831 1.12
resizeUpLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 960x540)) 0.459 0.414 1.11
resizeUpLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 1280x720)) 0.655 0.585 1.12
resizeUpLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 960x540)) 0.948 0.483 1.96
resizeUpLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 1280x720)) 1.340 0.728 1.84
resizeUpLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 960x540)) 1.319 1.195 1.10
resizeUpLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 1280x720)) 1.865 1.642 1.14
resizeUpLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 960x540)) 0.949 0.837 1.13
resizeUpLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 1280x720)) 1.350 1.209 1.12
resizeUpLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 960x540)) 1.425 0.768 1.86
resizeUpLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 1280x720)) 2.019 1.203 1.68
resizeUpLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 960x540)) 1.945 1.699 1.15
resizeUpLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 1280x720)) 2.873 2.529 1.14
resizeUpLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 960x540)) 1.443 1.272 1.13
resizeUpLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 1280x720)) 2.147 1.906 1.13
resizeUpLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 960x540)) 1.931 0.802 2.41
resizeUpLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 1280x720)) 2.930 1.320 2.22
resizeUpLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 960x540)) 2.724 2.398 1.14
resizeUpLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 1280x720)) 3.878 3.493 1.11
resizeUpLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 960x540)) 1.969 1.787 1.10
resizeUpLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 1280x720)) 2.955 2.666 1.11
Performance for SSE4_2 baseline
Performance test Reference time PR time Speedup
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 320x240)) 0.014 0.014 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (960x540, 640x480)) 0.347 0.202 1.72
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 213x120)) 0.042 0.030 1.44
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 320x240)) 0.118 0.076 1.55
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 640x480)) 0.406 0.237 1.72
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 320x240)) 0.028 0.027 1.03
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (960x540, 640x480)) 0.453 0.403 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 213x120)) 0.064 0.056 1.13
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 320x240)) 0.188 0.165 1.14
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 640x480)) 0.570 0.506 1.13
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 320x240)) 0.053 0.052 1.02
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (960x540, 640x480)) 0.318 0.279 1.14
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 213x120)) 0.055 0.053 1.04
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 320x240)) 0.134 0.134 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 640x480)) 0.359 0.326 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 320x240)) 0.477 0.482 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (960x540, 640x480)) 0.681 0.285 2.39
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 213x120)) 0.081 0.045 1.80
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 320x240)) 0.242 0.129 1.88
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 640x480)) 0.780 0.365 2.14
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 320x240)) 0.434 0.430 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (960x540, 640x480)) 0.887 0.797 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 213x120)) 0.127 0.110 1.16
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 320x240)) 0.366 0.321 1.14
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 640x480)) 1.130 1.014 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 320x240)) 0.335 0.340 0.98
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (960x540, 640x480)) 0.630 0.556 1.13
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 213x120)) 0.113 0.112 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 320x240)) 0.256 0.245 1.04
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 640x480)) 0.788 0.670 1.18
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 320x240)) 0.110 0.112 0.98
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (960x540, 640x480)) 1.016 0.504 2.02
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 213x120)) 0.117 0.063 1.86
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 320x240)) 0.337 0.171 1.97
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 640x480)) 1.146 0.573 2.00
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 320x240)) 0.131 0.132 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (960x540, 640x480)) 1.322 1.199 1.10
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 213x120)) 0.186 0.163 1.14
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 320x240)) 0.545 0.480 1.14
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 640x480)) 1.683 1.516 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 320x240)) 0.450 0.454 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (960x540, 640x480)) 0.981 0.861 1.14
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 213x120)) 0.170 0.164 1.04
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 320x240)) 0.407 0.392 1.04
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 640x480)) 1.240 1.133 1.09
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 320x240)) 0.087 0.088 0.98
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (960x540, 640x480)) 1.348 0.521 2.58
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 213x120)) 0.153 0.071 2.16
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 320x240)) 0.469 0.185 2.53
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 640x480)) 1.534 0.614 2.50
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 320x240)) 0.130 0.130 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (960x540, 640x480)) 1.756 1.610 1.09
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 213x120)) 0.245 0.219 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 320x240)) 0.723 0.647 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 640x480)) 2.243 2.018 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 320x240)) 0.229 0.227 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (960x540, 640x480)) 1.354 1.217 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 213x120)) 0.244 0.221 1.11
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 320x240)) 0.633 0.614 1.03
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 640x480)) 1.769 1.601 1.11
resizeUpLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 960x540)) 0.501 0.280 1.79
resizeUpLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 1280x720)) 0.705 0.394 1.79
resizeUpLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 960x540)) 0.621 0.563 1.10
resizeUpLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 1280x720)) 0.867 0.782 1.11
resizeUpLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 960x540)) 0.456 0.398 1.15
resizeUpLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 1280x720)) 0.645 0.566 1.14
resizeUpLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 960x540)) 0.987 0.376 2.63
resizeUpLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 1280x720)) 1.394 0.572 2.44
resizeUpLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 960x540)) 1.229 1.111 1.11
resizeUpLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 1280x720)) 1.692 1.558 1.09
resizeUpLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 960x540)) 0.934 0.805 1.16
resizeUpLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 1280x720)) 1.329 1.161 1.15
resizeUpLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 960x540)) 1.481 0.700 2.12
resizeUpLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 1280x720)) 2.093 1.074 1.95
resizeUpLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 960x540)) 1.830 1.676 1.09
resizeUpLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 1280x720)) 2.562 2.334 1.10
resizeUpLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 960x540)) 1.410 1.235 1.14
resizeUpLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 1280x720)) 2.104 1.874 1.12
resizeUpLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 960x540)) 1.975 0.776 2.55
resizeUpLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 1280x720)) 2.896 1.298 2.23
resizeUpLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 960x540)) 2.458 2.220 1.11
resizeUpLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 1280x720)) 3.497 3.180 1.10
resizeUpLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 960x540)) 1.940 1.734 1.12
resizeUpLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 1280x720)) 2.931 2.580 1.14
Performance for AVX2 baseline
Performance test Reference time PR time Speedup
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 320x240)) 0.010 0.010 0.98
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (960x540, 640x480)) 0.329 0.184 1.79
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 213x120)) 0.043 0.029 1.46
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 320x240)) 0.117 0.075 1.56
resizeDownLinearNonExact::MatInfo_SizePair::(8UC1, (1280x720, 640x480)) 0.375 0.220 1.70
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 320x240)) 0.017 0.019 0.90
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (960x540, 640x480)) 0.454 0.387 1.17
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 213x120)) 0.055 0.055 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 320x240)) 0.162 0.161 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(16UC1, (1280x720, 640x480)) 0.521 0.482 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 320x240)) 0.055 0.056 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (960x540, 640x480)) 0.342 0.264 1.29
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 213x120)) 0.055 0.053 1.05
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 320x240)) 0.134 0.132 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(32FC1, (1280x720, 640x480)) 0.371 0.299 1.24
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 320x240)) 0.453 0.456 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (960x540, 640x480)) 0.674 0.268 2.51
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 213x120)) 0.082 0.043 1.89
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 320x240)) 0.255 0.121 2.11
resizeDownLinearNonExact::MatInfo_SizePair::(8UC2, (1280x720, 640x480)) 0.797 0.348 2.29
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 320x240)) 0.429 0.429 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (960x540, 640x480)) 0.922 0.791 1.16
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 213x120)) 0.110 0.109 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 320x240)) 0.314 0.312 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(16UC2, (1280x720, 640x480)) 1.068 0.979 1.09
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 320x240)) 0.365 0.366 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (960x540, 640x480)) 0.692 0.544 1.27
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 213x120)) 0.115 0.113 1.02
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 320x240)) 0.260 0.245 1.06
resizeDownLinearNonExact::MatInfo_SizePair::(32FC2, (1280x720, 640x480)) 0.801 0.653 1.23
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 320x240)) 0.079 0.078 1.02
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (960x540, 640x480)) 0.983 0.441 2.23
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 213x120)) 0.122 0.061 2.00
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 320x240)) 0.337 0.162 2.08
resizeDownLinearNonExact::MatInfo_SizePair::(8UC3, (1280x720, 640x480)) 1.138 0.488 2.33
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 320x240)) 0.124 0.126 0.98
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (960x540, 640x480)) 1.347 1.176 1.14
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 213x120)) 0.158 0.164 0.97
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 320x240)) 0.467 0.467 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(16UC3, (1280x720, 640x480)) 1.559 1.458 1.07
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 320x240)) 0.495 0.493 1.00
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (960x540, 640x480)) 1.073 0.844 1.27
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 213x120)) 0.173 0.167 1.04
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 320x240)) 0.411 0.390 1.05
resizeDownLinearNonExact::MatInfo_SizePair::(32FC3, (1280x720, 640x480)) 1.262 1.104 1.14
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 320x240)) 0.073 0.074 0.98
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (960x540, 640x480)) 1.316 0.493 2.67
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 213x120)) 0.155 0.070 2.21
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 320x240)) 0.441 0.168 2.62
resizeDownLinearNonExact::MatInfo_SizePair::(8UC4, (1280x720, 640x480)) 1.520 0.567 2.68
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 320x240)) 0.122 0.123 0.99
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (960x540, 640x480)) 1.746 1.562 1.12
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 213x120)) 0.205 0.211 0.98
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 320x240)) 0.612 0.623 0.98
resizeDownLinearNonExact::MatInfo_SizePair::(16UC4, (1280x720, 640x480)) 2.020 2.004 1.01
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 320x240)) 0.237 0.231 1.03
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (960x540, 640x480)) 1.459 1.217 1.20
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 213x120)) 0.242 0.224 1.08
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 320x240)) 0.630 0.600 1.05
resizeDownLinearNonExact::MatInfo_SizePair::(32FC4, (1280x720, 640x480)) 1.803 1.538 1.17
resizeUpLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 960x540)) 0.488 0.269 1.81
resizeUpLinearNonExact::MatInfo_SizePair::(8UC1, (640x480, 1280x720)) 0.652 0.346 1.89
resizeUpLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 960x540)) 0.647 0.533 1.21
resizeUpLinearNonExact::MatInfo_SizePair::(16UC1, (640x480, 1280x720)) 0.917 0.743 1.24
resizeUpLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 960x540)) 0.508 0.403 1.26
resizeUpLinearNonExact::MatInfo_SizePair::(32FC1, (640x480, 1280x720)) 0.710 0.567 1.25
resizeUpLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 960x540)) 0.917 0.324 2.83
resizeUpLinearNonExact::MatInfo_SizePair::(8UC2, (640x480, 1280x720)) 1.277 0.500 2.56
resizeUpLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 960x540)) 1.322 1.071 1.23
resizeUpLinearNonExact::MatInfo_SizePair::(16UC2, (640x480, 1280x720)) 1.767 1.459 1.21
resizeUpLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 960x540)) 1.019 0.804 1.27
resizeUpLinearNonExact::MatInfo_SizePair::(32FC2, (640x480, 1280x720)) 1.457 1.173 1.24
resizeUpLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 960x540)) 1.431 0.617 2.32
resizeUpLinearNonExact::MatInfo_SizePair::(8UC3, (640x480, 1280x720)) 1.935 0.925 2.09
resizeUpLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 960x540)) 1.952 1.594 1.22
resizeUpLinearNonExact::MatInfo_SizePair::(16UC3, (640x480, 1280x720)) 2.826 2.270 1.25
resizeUpLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 960x540)) 1.555 1.262 1.23
resizeUpLinearNonExact::MatInfo_SizePair::(32FC3, (640x480, 1280x720)) 2.298 1.874 1.23
resizeUpLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 960x540)) 1.947 0.670 2.91
resizeUpLinearNonExact::MatInfo_SizePair::(8UC4, (640x480, 1280x720)) 2.647 1.097 2.41
resizeUpLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 960x540)) 2.644 2.202 1.20
resizeUpLinearNonExact::MatInfo_SizePair::(16UC4, (640x480, 1280x720)) 3.720 3.059 1.22
resizeUpLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 960x540)) 2.126 1.757 1.21
resizeUpLinearNonExact::MatInfo_SizePair::(32FC4, (640x480, 1280x720)) 3.160 2.608 1.21

@alalek alalek merged commit a011035 into opencv:3.4 Dec 9, 2019
for( dx = 0; dx < len0; dx += 3*step/4 )
{
v_int16x8 a = v_load(alpha+dx*2);
v_store(&D[dx], v_dotprod(v_reinterpret_as_s16(v_load_expand_q(S+xofs[dx]) | (v_load_expand_q(S+xofs[dx]+cn)<<16)), a));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v_load_expand_q(S+xofs[dx]+cn)<<16)

Out of buffer access: #16137

@alalek
Copy link
Copy Markdown
Member

alalek commented Dec 18, 2019

One more bug was introduced by this patch: #16189

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants