Skip to content

Adding support for vectorized masking for uchar/ushort.#15527

Merged
alalek merged 5 commits intoopencv:3.4from
everton1984:faster_acc
Oct 11, 2019
Merged

Adding support for vectorized masking for uchar/ushort.#15527
alalek merged 5 commits intoopencv:3.4from
everton1984:faster_acc

Conversation

@everton1984
Copy link
Copy Markdown
Contributor

Currently whenever a mask is used at the accumulate function opencv reverts to the generic non-vectorized implementation. This PR adds masking support for uchar and ushort's implementation. Performance improvement for ppc64le are up to 89%.

@everton1984
Copy link
Copy Markdown
Contributor Author

Just found a bug here, I was under the impression that the mask should return 0 but masking should just keep the value os dst as it is, also need to make some further improvements.

the mask and tweaked for further performance improvements.
@everton1984
Copy link
Copy Markdown
Contributor Author

Fixed the problem, got event further performance improvements up to almost 4x

int size = len * cn;
int y = -cVectorWidth;
int cc = 0;
for (; x <= size - cVectorWidth; x += cVectorWidth, cc = (cc+1) % cn)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc will always be equal to 0 since cn is 1 in this branch

v_store(dst + x + step * 3, v_dst11);
}
} else if ( cn == 1 ){
//#include <stdio.h>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove it

v_float32 v_dst10 = vx_load(dst + x + step * 2);
v_float32 v_dst11 = vx_load(dst + x + step * 3);

v_dst00 = v_fma(v_fma(v_dst00, v_beta, v_cvt_f32(v_reinterpret_as_s32(v_src00)) * v_alpha), d0, (~d0)*v_mf00);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like for zero mask this line will produce result equal to (~d0)*v_mf00 that is definitely unrelated to expected initial dst value

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm fixing that in a bit, thanks for the heads up.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to use v_select(d0, v_fma(...), v_dst00) intrinsic to choose between updated and original destination values

@terfendail
Copy link
Copy Markdown
Contributor

[355/1819] Performance for SSE2 baseline
Performance test Reference time PR time Speedup
Weighted::Accumulate::(127x61, 8UC1) 0.001 0.001 1.02
Weighted::Accumulate::(127x61, 16UC1) 0.001 0.001 1.01
Weighted::Accumulate::(127x61, 32FC1) 0.001 0.001 1.01
Weighted::Accumulate::(320x240, 8UC1) 0.012 0.012 1.00
Weighted::Accumulate::(320x240, 16UC1) 0.012 0.012 0.99
Weighted::Accumulate::(320x240, 32FC1) 0.011 0.012 0.89
Weighted::Accumulate::(640x480, 8UC1) 0.057 0.057 1.01
Weighted::Accumulate::(640x480, 16UC1) 0.068 0.070 0.97
Weighted::Accumulate::(640x480, 32FC1) 0.090 0.090 1.00
Weighted::Accumulate::(1280x720, 8UC1) 0.177 0.181 0.98
Weighted::Accumulate::(1280x720, 16UC1) 0.207 0.211 0.98
Weighted::Accumulate::(1280x720, 32FC1) 0.282 0.281 1.01
Weighted::Accumulate::(1920x1080, 8UC1) 0.474 0.491 0.97
Weighted::Accumulate::(1920x1080, 16UC1) 0.600 0.600 1.00
Weighted::Accumulate::(1920x1080, 32FC1) 0.839 0.848 0.99
WeightedDouble::Accumulate::(127x61, 8UC1) 0.003 0.003 1.00
WeightedDouble::Accumulate::(127x61, 16UC1) 0.003 0.003 0.99
WeightedDouble::Accumulate::(127x61, 32FC1) 0.003 0.003 1.01
WeightedDouble::Accumulate::(127x61, 64FC1) 0.002 0.002 1.00
WeightedDouble::Accumulate::(320x240, 8UC1) 0.025 0.025 0.99
WeightedDouble::Accumulate::(320x240, 16UC1) 0.025 0.025 0.99
WeightedDouble::Accumulate::(320x240, 32FC1) 0.026 0.026 0.99
WeightedDouble::Accumulate::(320x240, 64FC1) 0.033 0.033 0.98
WeightedDouble::Accumulate::(640x480, 8UC1) 0.115 0.113 1.02
WeightedDouble::Accumulate::(640x480, 16UC1) 0.120 0.120 1.00
WeightedDouble::Accumulate::(640x480, 32FC1) 0.137 0.138 0.99
WeightedDouble::Accumulate::(640x480, 64FC1) 0.184 0.184 1.00
WeightedDouble::Accumulate::(1280x720, 8UC1) 0.368 0.373 0.99
WeightedDouble::Accumulate::(1280x720, 16UC1) 0.394 0.401 0.98
WeightedDouble::Accumulate::(1280x720, 32FC1) 0.504 0.508 0.99
WeightedDouble::Accumulate::(1280x720, 64FC1) 0.698 0.719 0.97
WeightedDouble::Accumulate::(1920x1080, 8UC1) 1.343 1.368 0.98
WeightedDouble::Accumulate::(1920x1080, 16UC1) 1.362 1.350 1.01
WeightedDouble::Accumulate::(1920x1080, 32FC1) 1.606 1.586 1.01
WeightedDouble::Accumulate::(1920x1080, 64FC1) 2.095 2.083 1.01
WeightedDoubleMask::Accumulate::(127x61, 8UC1) 0.016 0.016 0.99
WeightedDoubleMask::Accumulate::(127x61, 16UC1) 0.016 0.016 1.00
WeightedDoubleMask::Accumulate::(127x61, 32FC1) 0.014 0.014 1.00
WeightedDoubleMask::Accumulate::(127x61, 64FC1) 0.014 0.014 0.99
WeightedDoubleMask::Accumulate::(127x61, 8UC3) 0.027 0.027 0.99
WeightedDoubleMask::Accumulate::(127x61, 16UC3) 0.029 0.029 1.00
WeightedDoubleMask::Accumulate::(127x61, 32FC3) 0.026 0.026 0.99
WeightedDoubleMask::Accumulate::(320x240, 8UC1) 0.147 0.155 0.95
WeightedDoubleMask::Accumulate::(320x240, 16UC1) 0.146 0.154 0.95
WeightedDoubleMask::Accumulate::(320x240, 32FC1) 0.129 0.137 0.94
WeightedDoubleMask::Accumulate::(320x240, 64FC1) 0.129 0.137 0.95
WeightedDoubleMask::Accumulate::(320x240, 8UC3) 0.248 0.258 0.96
WeightedDoubleMask::Accumulate::(320x240, 16UC3) 0.272 0.284 0.96
WeightedDoubleMask::Accumulate::(320x240, 32FC3) 0.247 0.248 1.00
WeightedDoubleMask::Accumulate::(640x480, 8UC1) 0.585 0.610 0.96
WeightedDoubleMask::Accumulate::(640x480, 16UC1) 0.584 0.603 0.97
WeightedDoubleMask::Accumulate::(640x480, 32FC1) 0.508 0.533 0.95
WeightedDoubleMask::Accumulate::(640x480, 64FC1) 0.515 0.535 0.96
WeightedDoubleMask::Accumulate::(640x480, 8UC3) 1.001 1.045 0.96
WeightedDoubleMask::Accumulate::(640x480, 16UC3) 1.096 1.145 0.96
WeightedDoubleMask::Accumulate::(640x480, 32FC3) 0.997 1.034 0.96
WeightedDoubleMask::Accumulate::(1280x720, 8UC1) 1.839 1.841 1.00
WeightedDoubleMask::Accumulate::(1280x720, 16UC1) 1.817 1.820 1.00
WeightedDoubleMask::Accumulate::(1280x720, 32FC1) 1.635 1.630 1.00
WeightedDoubleMask::Accumulate::(1280x720, 64FC1) 1.609 1.680 0.96
WeightedDoubleMask::Accumulate::(1280x720, 8UC3) 3.259 3.366 0.97
WeightedDoubleMask::Accumulate::(1280x720, 16UC3) 3.524 3.676 0.96
WeightedDoubleMask::Accumulate::(1280x720, 32FC3) 3.302 3.424 0.96
WeightedDoubleMask::Accumulate::(1920x1080, 8UC1) 4.233 4.313 0.98
WeightedDoubleMask::Accumulate::(1920x1080, 16UC1) 4.275 4.279 1.00
WeightedDoubleMask::Accumulate::(1920x1080, 32FC1) 3.771 3.773 1.00
WeightedDoubleMask::Accumulate::(1920x1080, 64FC1) 3.891 3.910 1.00
WeightedDoubleMask::Accumulate::(1920x1080, 8UC3) 7.887 7.815 1.01
WeightedDoubleMask::Accumulate::(1920x1080, 16UC3) 8.637 8.519 1.01
WeightedDoubleMask::Accumulate::(1920x1080, 32FC3) 7.845 7.870 1.00
WeightedMask::Accumulate::(127x61, 8UC1) 0.016 0.002 6.38
WeightedMask::Accumulate::(127x61, 16UC1) 0.015 0.002 6.06
WeightedMask::Accumulate::(127x61, 32FC1) 0.014 0.014 1.00
WeightedMask::Accumulate::(127x61, 8UC3) 0.027 0.017 1.53
WeightedMask::Accumulate::(127x61, 16UC3) 0.028 0.016 1.73
WeightedMask::Accumulate::(127x61, 32FC3) 0.022 0.022 0.99
WeightedMask::Accumulate::(320x240, 8UC1) 0.156 0.024 6.59
WeightedMask::Accumulate::(320x240, 16UC1) 0.145 0.023 6.26
WeightedMask::Accumulate::(320x240, 32FC1) 0.137 0.136 1.00
WeightedMask::Accumulate::(320x240, 8UC3) 0.258 0.169 1.53
WeightedMask::Accumulate::(320x240, 16UC3) 0.274 0.159 1.73
WeightedMask::Accumulate::(320x240, 32FC3) 0.210 0.210 1.00
WeightedMask::Accumulate::(640x480, 8UC1) 0.610 0.093 6.53
WeightedMask::Accumulate::(640x480, 16UC1) 0.565 0.094 6.00
WeightedMask::Accumulate::(640x480, 32FC1) 0.532 0.533 1.00
WeightedMask::Accumulate::(640x480, 8UC3) 1.021 0.659 1.55
WeightedMask::Accumulate::(640x480, 16UC3) 1.090 0.629 1.73
WeightedMask::Accumulate::(640x480, 32FC3) 0.841 0.839 1.00
WeightedMask::Accumulate::(1280x720, 8UC1) 1.822 0.268 6.79
WeightedMask::Accumulate::(1280x720, 16UC1) 1.696 0.279 6.08
WeightedMask::Accumulate::(1280x720, 32FC1) 1.603 1.556 1.03
WeightedMask::Accumulate::(1280x720, 8UC3) 3.109 1.960 1.59
WeightedMask::Accumulate::(1280x720, 16UC3) 3.320 1.920 1.73
WeightedMask::Accumulate::(1280x720, 32FC3) 2.652 2.648 1.00
WeightedMask::Accumulate::(1920x1080, 8UC1) 4.215 0.774 5.44
WeightedMask::Accumulate::(1920x1080, 16UC1) 3.836 0.808 4.75
WeightedMask::Accumulate::(1920x1080, 32FC1) 3.576 3.708 0.96
WeightedMask::Accumulate::(1920x1080, 8UC3) 6.721 4.640 1.45
WeightedMask::Accumulate::(1920x1080, 16UC3) 7.181 4.584 1.57
WeightedMask::Accumulate::(1920x1080, 32FC3) 5.951 5.815 1.02
[249/1819] Performance for SSE3 baseline
Performance test Reference time PR time Speedup
Weighted::Accumulate::(127x61, 8UC1) 0.001 0.001 1.02
Weighted::Accumulate::(127x61, 16UC1) 0.001 0.001 1.00
Weighted::Accumulate::(127x61, 32FC1) 0.001 0.001 1.01
Weighted::Accumulate::(320x240, 8UC1) 0.012 0.012 1.02
Weighted::Accumulate::(320x240, 16UC1) 0.012 0.011 1.02
Weighted::Accumulate::(320x240, 32FC1) 0.012 0.010 1.15
Weighted::Accumulate::(640x480, 8UC1) 0.058 0.058 1.01
Weighted::Accumulate::(640x480, 16UC1) 0.068 0.068 1.01
Weighted::Accumulate::(640x480, 32FC1) 0.091 0.089 1.03
Weighted::Accumulate::(1280x720, 8UC1) 0.175 0.176 1.00
Weighted::Accumulate::(1280x720, 16UC1) 0.207 0.205 1.01
Weighted::Accumulate::(1280x720, 32FC1) 0.285 0.278 1.02
Weighted::Accumulate::(1920x1080, 8UC1) 0.483 0.477 1.01
Weighted::Accumulate::(1920x1080, 16UC1) 0.592 0.591 1.00
Weighted::Accumulate::(1920x1080, 32FC1) 0.841 0.833 1.01
WeightedDouble::Accumulate::(127x61, 8UC1) 0.003 0.003 1.00
WeightedDouble::Accumulate::(127x61, 16UC1) 0.003 0.003 0.99
WeightedDouble::Accumulate::(127x61, 32FC1) 0.003 0.003 1.00
WeightedDouble::Accumulate::(127x61, 64FC1) 0.002 0.002 1.01
WeightedDouble::Accumulate::(320x240, 8UC1) 0.025 0.025 1.00
WeightedDouble::Accumulate::(320x240, 16UC1) 0.025 0.025 1.02
WeightedDouble::Accumulate::(320x240, 32FC1) 0.027 0.025 1.06
WeightedDouble::Accumulate::(320x240, 64FC1) 0.031 0.032 0.96
WeightedDouble::Accumulate::(640x480, 8UC1) 0.110 0.111 0.99
WeightedDouble::Accumulate::(640x480, 16UC1) 0.119 0.121 0.99
WeightedDouble::Accumulate::(640x480, 32FC1) 0.141 0.138 1.02
WeightedDouble::Accumulate::(640x480, 64FC1) 0.186 0.184 1.01
WeightedDouble::Accumulate::(1280x720, 8UC1) 0.364 0.364 1.00
WeightedDouble::Accumulate::(1280x720, 16UC1) 0.407 0.393 1.04
WeightedDouble::Accumulate::(1280x720, 32FC1) 0.488 0.509 0.96
WeightedDouble::Accumulate::(1280x720, 64FC1) 0.708 0.701 1.01
WeightedDouble::Accumulate::(1920x1080, 8UC1) 1.354 1.347 1.01
WeightedDouble::Accumulate::(1920x1080, 16UC1) 1.368 1.371 1.00
WeightedDouble::Accumulate::(1920x1080, 32FC1) 1.604 1.595 1.01
WeightedDouble::Accumulate::(1920x1080, 64FC1) 2.088 2.090 1.00
WeightedDoubleMask::Accumulate::(127x61, 8UC1) 0.016 0.016 1.00
WeightedDoubleMask::Accumulate::(127x61, 16UC1) 0.016 0.016 1.00
WeightedDoubleMask::Accumulate::(127x61, 32FC1) 0.014 0.014 1.00
WeightedDoubleMask::Accumulate::(127x61, 64FC1) 0.014 0.014 1.00
WeightedDoubleMask::Accumulate::(127x61, 8UC3) 0.027 0.027 0.99
WeightedDoubleMask::Accumulate::(127x61, 16UC3) 0.029 0.029 1.00
WeightedDoubleMask::Accumulate::(127x61, 32FC3) 0.026 0.026 1.00
WeightedDoubleMask::Accumulate::(320x240, 8UC1) 0.156 0.157 0.99
WeightedDoubleMask::Accumulate::(320x240, 16UC1) 0.154 0.154 1.00
WeightedDoubleMask::Accumulate::(320x240, 32FC1) 0.137 0.137 1.00
WeightedDoubleMask::Accumulate::(320x240, 64FC1) 0.137 0.137 1.00
WeightedDoubleMask::Accumulate::(320x240, 8UC3) 0.258 0.262 0.99
WeightedDoubleMask::Accumulate::(320x240, 16UC3) 0.284 0.285 1.00
WeightedDoubleMask::Accumulate::(320x240, 32FC3) 0.248 0.253 0.98
WeightedDoubleMask::Accumulate::(640x480, 8UC1) 0.610 0.616 0.99
WeightedDoubleMask::Accumulate::(640x480, 16UC1) 0.603 0.608 0.99
WeightedDoubleMask::Accumulate::(640x480, 32FC1) 0.533 0.533 1.00
WeightedDoubleMask::Accumulate::(640x480, 64FC1) 0.535 0.535 1.00
WeightedDoubleMask::Accumulate::(640x480, 8UC3) 1.042 1.045 1.00
WeightedDoubleMask::Accumulate::(640x480, 16UC3) 1.145 1.148 1.00
WeightedDoubleMask::Accumulate::(640x480, 32FC3) 1.020 1.041 0.98
WeightedDoubleMask::Accumulate::(1280x720, 8UC1) 1.802 1.838 0.98
WeightedDoubleMask::Accumulate::(1280x720, 16UC1) 1.742 1.816 0.96
WeightedDoubleMask::Accumulate::(1280x720, 32FC1) 1.550 1.641 0.94
WeightedDoubleMask::Accumulate::(1280x720, 64FC1) 1.679 1.691 0.99
WeightedDoubleMask::Accumulate::(1280x720, 8UC3) 3.267 3.513 0.93
WeightedDoubleMask::Accumulate::(1280x720, 16UC3) 3.593 3.736 0.96
WeightedDoubleMask::Accumulate::(1280x720, 32FC3) 3.406 3.438 0.99
WeightedDoubleMask::Accumulate::(1920x1080, 8UC1) 4.252 4.316 0.99
WeightedDoubleMask::Accumulate::(1920x1080, 16UC1) 4.180 4.186 1.00
WeightedDoubleMask::Accumulate::(1920x1080, 32FC1) 3.761 3.778 1.00
WeightedDoubleMask::Accumulate::(1920x1080, 64FC1) 3.904 3.885 1.01
WeightedDoubleMask::Accumulate::(1920x1080, 8UC3) 7.861 7.834 1.00
WeightedDoubleMask::Accumulate::(1920x1080, 16UC3) 8.465 8.464 1.00
WeightedDoubleMask::Accumulate::(1920x1080, 32FC3) 7.756 7.771 1.00
WeightedMask::Accumulate::(127x61, 8UC1) 0.016 0.002 6.55
WeightedMask::Accumulate::(127x61, 16UC1) 0.015 0.002 5.98
WeightedMask::Accumulate::(127x61, 32FC1) 0.014 0.014 1.00
WeightedMask::Accumulate::(127x61, 8UC3) 0.027 0.017 1.55
WeightedMask::Accumulate::(127x61, 16UC3) 0.028 0.016 1.74
WeightedMask::Accumulate::(127x61, 32FC3) 0.022 0.022 1.00
WeightedMask::Accumulate::(320x240, 8UC1) 0.156 0.024 6.61
WeightedMask::Accumulate::(320x240, 16UC1) 0.145 0.023 6.29
WeightedMask::Accumulate::(320x240, 32FC1) 0.137 0.136 1.00
WeightedMask::Accumulate::(320x240, 8UC3) 0.258 0.170 1.52
WeightedMask::Accumulate::(320x240, 16UC3) 0.280 0.159 1.76
WeightedMask::Accumulate::(320x240, 32FC3) 0.210 0.211 0.99
WeightedMask::Accumulate::(640x480, 8UC1) 0.613 0.094 6.52
WeightedMask::Accumulate::(640x480, 16UC1) 0.572 0.094 6.11
WeightedMask::Accumulate::(640x480, 32FC1) 0.546 0.532 1.02
WeightedMask::Accumulate::(640x480, 8UC3) 1.046 0.659 1.59
WeightedMask::Accumulate::(640x480, 16UC3) 1.090 0.629 1.73
WeightedMask::Accumulate::(640x480, 32FC3) 0.845 0.840 1.01
WeightedMask::Accumulate::(1280x720, 8UC1) 1.827 0.266 6.86
WeightedMask::Accumulate::(1280x720, 16UC1) 1.697 0.267 6.35
WeightedMask::Accumulate::(1280x720, 32FC1) 1.602 1.529 1.05
WeightedMask::Accumulate::(1280x720, 8UC3) 3.110 2.009 1.55
WeightedMask::Accumulate::(1280x720, 16UC3) 3.328 1.981 1.68
WeightedMask::Accumulate::(1280x720, 32FC3) 2.653 2.651 1.00
WeightedMask::Accumulate::(1920x1080, 8UC1) 4.217 0.764 5.52
WeightedMask::Accumulate::(1920x1080, 16UC1) 3.839 0.802 4.79
WeightedMask::Accumulate::(1920x1080, 32FC1) 3.633 3.638 1.00
WeightedMask::Accumulate::(1920x1080, 8UC3) 7.045 4.662 1.51
WeightedMask::Accumulate::(1920x1080, 16UC3) 7.508 4.469 1.68
WeightedMask::Accumulate::(1920x1080, 32FC3) 5.974 5.871 1.02
Performance for SSE4_2 baseline
Performance test Reference time PR time Speedup
Weighted::Accumulate::(127x61, 8UC1) 0.001 0.001 1.02
Weighted::Accumulate::(127x61, 16UC1) 0.001 0.001 0.98
Weighted::Accumulate::(127x61, 32FC1) 0.001 0.001 1.04
Weighted::Accumulate::(320x240, 8UC1) 0.011 0.012 0.98
Weighted::Accumulate::(320x240, 16UC1) 0.011 0.011 0.98
Weighted::Accumulate::(320x240, 32FC1) 0.010 0.012 0.87
Weighted::Accumulate::(640x480, 8UC1) 0.055 0.057 0.97
Weighted::Accumulate::(640x480, 16UC1) 0.069 0.069 1.00
Weighted::Accumulate::(640x480, 32FC1) 0.091 0.091 1.00
Weighted::Accumulate::(1280x720, 8UC1) 0.172 0.179 0.96
Weighted::Accumulate::(1280x720, 16UC1) 0.201 0.211 0.95
Weighted::Accumulate::(1280x720, 32FC1) 0.277 0.283 0.98
Weighted::Accumulate::(1920x1080, 8UC1) 0.470 0.484 0.97
Weighted::Accumulate::(1920x1080, 16UC1) 0.595 0.603 0.99
Weighted::Accumulate::(1920x1080, 32FC1) 0.838 0.842 1.00
WeightedDouble::Accumulate::(127x61, 8UC1) 0.003 0.003 1.00
WeightedDouble::Accumulate::(127x61, 16UC1) 0.003 0.003 0.99
WeightedDouble::Accumulate::(127x61, 32FC1) 0.003 0.003 1.00
WeightedDouble::Accumulate::(127x61, 64FC1) 0.002 0.002 1.00
WeightedDouble::Accumulate::(320x240, 8UC1) 0.026 0.026 0.99
WeightedDouble::Accumulate::(320x240, 16UC1) 0.024 0.025 0.99
WeightedDouble::Accumulate::(320x240, 32FC1) 0.026 0.027 0.96
WeightedDouble::Accumulate::(320x240, 64FC1) 0.032 0.033 0.97
WeightedDouble::Accumulate::(640x480, 8UC1) 0.115 0.117 0.98
WeightedDouble::Accumulate::(640x480, 16UC1) 0.121 0.122 0.99
WeightedDouble::Accumulate::(640x480, 32FC1) 0.141 0.145 0.97
WeightedDouble::Accumulate::(640x480, 64FC1) 0.184 0.182 1.01
WeightedDouble::Accumulate::(1280x720, 8UC1) 0.375 0.379 0.99
WeightedDouble::Accumulate::(1280x720, 16UC1) 0.395 0.398 0.99
WeightedDouble::Accumulate::(1280x720, 32FC1) 0.505 0.490 1.03
WeightedDouble::Accumulate::(1280x720, 64FC1) 0.689 0.721 0.96
WeightedDouble::Accumulate::(1920x1080, 8UC1) 1.393 1.366 1.02
WeightedDouble::Accumulate::(1920x1080, 16UC1) 1.368 1.351 1.01
WeightedDouble::Accumulate::(1920x1080, 32FC1) 1.595 1.601 1.00
WeightedDouble::Accumulate::(1920x1080, 64FC1) 2.088 2.087 1.00
WeightedDoubleMask::Accumulate::(127x61, 8UC1) 0.016 0.016 1.00
WeightedDoubleMask::Accumulate::(127x61, 16UC1) 0.016 0.016 1.00
WeightedDoubleMask::Accumulate::(127x61, 32FC1) 0.014 0.014 1.00
WeightedDoubleMask::Accumulate::(127x61, 64FC1) 0.014 0.014 1.00
WeightedDoubleMask::Accumulate::(127x61, 8UC3) 0.027 0.027 1.00
WeightedDoubleMask::Accumulate::(127x61, 16UC3) 0.029 0.029 1.00
WeightedDoubleMask::Accumulate::(127x61, 32FC3) 0.026 0.026 1.00
WeightedDoubleMask::Accumulate::(320x240, 8UC1) 0.156 0.156 1.00
WeightedDoubleMask::Accumulate::(320x240, 16UC1) 0.154 0.154 1.00
WeightedDoubleMask::Accumulate::(320x240, 32FC1) 0.135 0.137 0.99
WeightedDoubleMask::Accumulate::(320x240, 64FC1) 0.137 0.137 1.00
WeightedDoubleMask::Accumulate::(320x240, 8UC3) 0.259 0.259 1.00
WeightedDoubleMask::Accumulate::(320x240, 16UC3) 0.284 0.285 1.00
WeightedDoubleMask::Accumulate::(320x240, 32FC3) 0.248 0.248 1.00
WeightedDoubleMask::Accumulate::(640x480, 8UC1) 0.611 0.611 1.00
WeightedDoubleMask::Accumulate::(640x480, 16UC1) 0.602 0.602 1.00
WeightedDoubleMask::Accumulate::(640x480, 32FC1) 0.535 0.535 1.00
WeightedDoubleMask::Accumulate::(640x480, 64FC1) 0.536 0.535 1.00
WeightedDoubleMask::Accumulate::(640x480, 8UC3) 1.039 1.043 1.00
WeightedDoubleMask::Accumulate::(640x480, 16UC3) 1.145 1.142 1.00
WeightedDoubleMask::Accumulate::(640x480, 32FC3) 1.037 1.040 1.00
WeightedDoubleMask::Accumulate::(1280x720, 8UC1) 1.852 1.838 1.01
WeightedDoubleMask::Accumulate::(1280x720, 16UC1) 1.861 1.818 1.02
WeightedDoubleMask::Accumulate::(1280x720, 32FC1) 1.649 1.625 1.01
WeightedDoubleMask::Accumulate::(1280x720, 64FC1) 1.678 1.681 1.00
WeightedDoubleMask::Accumulate::(1280x720, 8UC3) 3.523 3.479 1.01
WeightedDoubleMask::Accumulate::(1280x720, 16UC3) 3.693 3.693 1.00
WeightedDoubleMask::Accumulate::(1280x720, 32FC3) 3.453 3.453 1.00
WeightedDoubleMask::Accumulate::(1920x1080, 8UC1) 4.318 4.313 1.00
WeightedDoubleMask::Accumulate::(1920x1080, 16UC1) 4.244 4.272 0.99
WeightedDoubleMask::Accumulate::(1920x1080, 32FC1) 3.766 3.855 0.98
WeightedDoubleMask::Accumulate::(1920x1080, 64FC1) 3.754 3.885 0.97
WeightedDoubleMask::Accumulate::(1920x1080, 8UC3) 7.802 7.856 0.99
WeightedDoubleMask::Accumulate::(1920x1080, 16UC3) 8.384 8.518 0.98
WeightedDoubleMask::Accumulate::(1920x1080, 32FC3) 7.569 7.908 0.96
WeightedMask::Accumulate::(127x61, 8UC1) 0.016 0.002 7.91
WeightedMask::Accumulate::(127x61, 16UC1) 0.015 0.002 6.87
WeightedMask::Accumulate::(127x61, 32FC1) 0.014 0.014 1.00
WeightedMask::Accumulate::(127x61, 8UC3) 0.027 0.013 2.03
WeightedMask::Accumulate::(127x61, 16UC3) 0.028 0.013 2.16
WeightedMask::Accumulate::(127x61, 32FC3) 0.022 0.022 1.00
WeightedMask::Accumulate::(320x240, 8UC1) 0.156 0.019 8.09
WeightedMask::Accumulate::(320x240, 16UC1) 0.145 0.020 7.18
WeightedMask::Accumulate::(320x240, 32FC1) 0.137 0.137 1.00
WeightedMask::Accumulate::(320x240, 8UC3) 0.258 0.128 2.01
WeightedMask::Accumulate::(320x240, 16UC3) 0.275 0.129 2.13
WeightedMask::Accumulate::(320x240, 32FC3) 0.211 0.211 1.00
WeightedMask::Accumulate::(640x480, 8UC1) 0.611 0.080 7.63
WeightedMask::Accumulate::(640x480, 16UC1) 0.567 0.088 6.47
WeightedMask::Accumulate::(640x480, 32FC1) 0.535 0.535 1.00
WeightedMask::Accumulate::(640x480, 8UC3) 1.020 0.506 2.02
WeightedMask::Accumulate::(640x480, 16UC3) 1.090 0.511 2.13
WeightedMask::Accumulate::(640x480, 32FC3) 0.840 0.839 1.00
WeightedMask::Accumulate::(1280x720, 8UC1) 1.868 0.235 7.95
WeightedMask::Accumulate::(1280x720, 16UC1) 1.701 0.259 6.56
WeightedMask::Accumulate::(1280x720, 32FC1) 1.613 1.604 1.01
WeightedMask::Accumulate::(1280x720, 8UC3) 3.113 1.607 1.94
WeightedMask::Accumulate::(1280x720, 16UC3) 3.320 1.640 2.02
WeightedMask::Accumulate::(1280x720, 32FC3) 2.654 2.655 1.00
WeightedMask::Accumulate::(1920x1080, 8UC1) 4.110 0.677 6.07
WeightedMask::Accumulate::(1920x1080, 16UC1) 3.831 0.767 4.99
WeightedMask::Accumulate::(1920x1080, 32FC1) 3.632 3.629 1.00
WeightedMask::Accumulate::(1920x1080, 8UC3) 6.737 3.782 1.78
WeightedMask::Accumulate::(1920x1080, 16UC3) 7.188 3.862 1.86
WeightedMask::Accumulate::(1920x1080, 32FC3) 5.958 5.973 1.00
Performance for AVX2 baseline
Performance test Reference time PR time Speedup
Weighted::Accumulate::(127x61, 8UC1) 0.001 0.001 1.00
Weighted::Accumulate::(127x61, 16UC1) 0.001 0.001 0.99
Weighted::Accumulate::(127x61, 32FC1) 0.001 0.001 1.00
Weighted::Accumulate::(320x240, 8UC1) 0.009 0.009 1.00
Weighted::Accumulate::(320x240, 16UC1) 0.009 0.009 1.00
Weighted::Accumulate::(320x240, 32FC1) 0.012 0.012 1.00
Weighted::Accumulate::(640x480, 8UC1) 0.055 0.052 1.04
Weighted::Accumulate::(640x480, 16UC1) 0.066 0.066 1.01
Weighted::Accumulate::(640x480, 32FC1) 0.092 0.091 1.02
Weighted::Accumulate::(1280x720, 8UC1) 0.171 0.170 1.01
Weighted::Accumulate::(1280x720, 16UC1) 0.204 0.202 1.01
Weighted::Accumulate::(1280x720, 32FC1) 0.289 0.282 1.03
Weighted::Accumulate::(1920x1080, 8UC1) 0.450 0.441 1.02
Weighted::Accumulate::(1920x1080, 16UC1) 0.571 0.566 1.01
Weighted::Accumulate::(1920x1080, 32FC1) 0.814 0.829 0.98
WeightedDouble::Accumulate::(127x61, 8UC1) 0.002 0.002 1.00
WeightedDouble::Accumulate::(127x61, 16UC1) 0.002 0.002 1.01
WeightedDouble::Accumulate::(127x61, 32FC1) 0.002 0.002 1.00
WeightedDouble::Accumulate::(127x61, 64FC1) 0.002 0.002 1.00
WeightedDouble::Accumulate::(320x240, 8UC1) 0.018 0.018 0.99
WeightedDouble::Accumulate::(320x240, 16UC1) 0.018 0.019 0.95
WeightedDouble::Accumulate::(320x240, 32FC1) 0.020 0.021 0.95
WeightedDouble::Accumulate::(320x240, 64FC1) 0.034 0.032 1.06
WeightedDouble::Accumulate::(640x480, 8UC1) 0.105 0.104 1.01
WeightedDouble::Accumulate::(640x480, 16UC1) 0.117 0.115 1.01
WeightedDouble::Accumulate::(640x480, 32FC1) 0.142 0.137 1.04
WeightedDouble::Accumulate::(640x480, 64FC1) 0.185 0.183 1.01
WeightedDouble::Accumulate::(1280x720, 8UC1) 0.334 0.330 1.01
WeightedDouble::Accumulate::(1280x720, 16UC1) 0.391 0.373 1.05
WeightedDouble::Accumulate::(1280x720, 32FC1) 0.487 0.485 1.00
WeightedDouble::Accumulate::(1280x720, 64FC1) 0.702 0.694 1.01
WeightedDouble::Accumulate::(1920x1080, 8UC1) 1.120 1.200 0.93
WeightedDouble::Accumulate::(1920x1080, 16UC1) 1.263 1.260 1.00
WeightedDouble::Accumulate::(1920x1080, 32FC1) 1.523 1.517 1.00
WeightedDouble::Accumulate::(1920x1080, 64FC1) 2.078 2.063 1.01
WeightedDoubleMask::Accumulate::(127x61, 8UC1) 0.016 0.016 1.00
WeightedDoubleMask::Accumulate::(127x61, 16UC1) 0.014 0.014 1.00
WeightedDoubleMask::Accumulate::(127x61, 32FC1) 0.013 0.013 1.00
WeightedDoubleMask::Accumulate::(127x61, 64FC1) 0.013 0.013 1.00
WeightedDoubleMask::Accumulate::(127x61, 8UC3) 0.026 0.026 1.00
WeightedDoubleMask::Accumulate::(127x61, 16UC3) 0.023 0.023 1.00
WeightedDoubleMask::Accumulate::(127x61, 32FC3) 0.024 0.024 1.00
WeightedDoubleMask::Accumulate::(320x240, 8UC1) 0.156 0.156 1.00
WeightedDoubleMask::Accumulate::(320x240, 16UC1) 0.136 0.135 1.00
WeightedDoubleMask::Accumulate::(320x240, 32FC1) 0.126 0.126 1.00
WeightedDoubleMask::Accumulate::(320x240, 64FC1) 0.128 0.127 1.00
WeightedDoubleMask::Accumulate::(320x240, 8UC3) 0.255 0.253 1.00
WeightedDoubleMask::Accumulate::(320x240, 16UC3) 0.228 0.231 0.99
WeightedDoubleMask::Accumulate::(320x240, 32FC3) 0.235 0.233 1.01
WeightedDoubleMask::Accumulate::(640x480, 8UC1) 0.610 0.624 0.98
WeightedDoubleMask::Accumulate::(640x480, 16UC1) 0.531 0.539 0.99
WeightedDoubleMask::Accumulate::(640x480, 32FC1) 0.490 0.502 0.98
WeightedDoubleMask::Accumulate::(640x480, 64FC1) 0.505 0.508 0.99
WeightedDoubleMask::Accumulate::(640x480, 8UC3) 1.004 1.022 0.98
WeightedDoubleMask::Accumulate::(640x480, 16UC3) 0.925 0.941 0.98
WeightedDoubleMask::Accumulate::(640x480, 32FC3) 0.970 0.990 0.98
WeightedDoubleMask::Accumulate::(1280x720, 8UC1) 1.838 1.835 1.00
WeightedDoubleMask::Accumulate::(1280x720, 16UC1) 1.606 1.591 1.01
WeightedDoubleMask::Accumulate::(1280x720, 32FC1) 1.541 1.463 1.05
WeightedDoubleMask::Accumulate::(1280x720, 64FC1) 1.594 1.577 1.01
WeightedDoubleMask::Accumulate::(1280x720, 8UC3) 3.335 3.286 1.02
WeightedDoubleMask::Accumulate::(1280x720, 16UC3) 3.116 2.997 1.04
WeightedDoubleMask::Accumulate::(1280x720, 32FC3) 3.268 3.103 1.05
WeightedDoubleMask::Accumulate::(1920x1080, 8UC1) 4.298 4.311 1.00
WeightedDoubleMask::Accumulate::(1920x1080, 16UC1) 3.776 3.781 1.00
WeightedDoubleMask::Accumulate::(1920x1080, 32FC1) 3.501 3.565 0.98
WeightedDoubleMask::Accumulate::(1920x1080, 64FC1) 3.662 3.545 1.03
WeightedDoubleMask::Accumulate::(1920x1080, 8UC3) 7.764 7.747 1.00
WeightedDoubleMask::Accumulate::(1920x1080, 16UC3) 7.217 7.198 1.00
WeightedDoubleMask::Accumulate::(1920x1080, 32FC3) 7.384 7.492 0.99
WeightedMask::Accumulate::(127x61, 8UC1) 0.016 0.002 9.73
WeightedMask::Accumulate::(127x61, 16UC1) 0.014 0.002 8.88
WeightedMask::Accumulate::(127x61, 32FC1) 0.013 0.013 1.00
WeightedMask::Accumulate::(127x61, 8UC3) 0.026 0.007 3.63
WeightedMask::Accumulate::(127x61, 16UC3) 0.024 0.007 3.56
WeightedMask::Accumulate::(127x61, 32FC3) 0.021 0.021 1.00
WeightedMask::Accumulate::(320x240, 8UC1) 0.156 0.015 10.63
WeightedMask::Accumulate::(320x240, 16UC1) 0.137 0.015 9.42
WeightedMask::Accumulate::(320x240, 32FC1) 0.128 0.127 1.01
WeightedMask::Accumulate::(320x240, 8UC3) 0.253 0.071 3.56
WeightedMask::Accumulate::(320x240, 16UC3) 0.234 0.066 3.56
WeightedMask::Accumulate::(320x240, 32FC3) 0.202 0.204 0.99
WeightedMask::Accumulate::(640x480, 8UC1) 0.624 0.070 8.91
WeightedMask::Accumulate::(640x480, 16UC1) 0.540 0.080 6.75
WeightedMask::Accumulate::(640x480, 32FC1) 0.498 0.499 1.00
WeightedMask::Accumulate::(640x480, 8UC3) 0.991 0.270 3.66
WeightedMask::Accumulate::(640x480, 16UC3) 0.921 0.258 3.57
WeightedMask::Accumulate::(640x480, 32FC3) 0.820 0.805 1.02
WeightedMask::Accumulate::(1280x720, 8UC1) 1.843 0.216 8.53
WeightedMask::Accumulate::(1280x720, 16UC1) 1.599 0.241 6.63
WeightedMask::Accumulate::(1280x720, 32FC1) 1.492 1.495 1.00
WeightedMask::Accumulate::(1280x720, 8UC3) 2.995 1.023 2.93
WeightedMask::Accumulate::(1280x720, 16UC3) 2.853 1.070 2.67
WeightedMask::Accumulate::(1280x720, 32FC3) 2.611 2.563 1.02
WeightedMask::Accumulate::(1920x1080, 8UC1) 4.112 0.604 6.80
WeightedMask::Accumulate::(1920x1080, 16UC1) 3.614 0.700 5.16
WeightedMask::Accumulate::(1920x1080, 32FC1) 3.383 3.386 1.00
WeightedMask::Accumulate::(1920x1080, 8UC3) 6.804 2.610 2.61
WeightedMask::Accumulate::(1920x1080, 16UC3) 6.369 2.799 2.28
WeightedMask::Accumulate::(1920x1080, 32FC3) 5.855 5.828 1.00

@alalek alalek merged commit 9ca9249 into opencv:3.4 Oct 11, 2019
@alalek alalek mentioned this pull request Oct 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants