Skip to content

RGB to/from Gray rewritten to wide intrinsics#13379

Merged
alalek merged 16 commits intoopencv:3.4from
savuor:color_5x5
Dec 14, 2018
Merged

RGB to/from Gray rewritten to wide intrinsics#13379
alalek merged 16 commits intoopencv:3.4from
savuor:color_5x5

Conversation

@savuor
Copy link
Copy Markdown
Contributor

@savuor savuor commented Dec 6, 2018

Merge with extra: opencv/opencv_extra#560

This pullrequest changes

  • All conversions in color_rgb.cpp rewritten to wide universal intrinsics
  • Platform-specific code removed
  • mRGBA2RGBA: saturation added

master branch has different RGB2Gray coefficients so there will be separate PR

Performance

Baseline AVX2
Name of Test rgb initial avx2 rgb wide avx2 rgb wide avx2 vs rgb initial avx2 (x-factor)
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR2BGR555) 0.011 0.004 3.09
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR2BGR565) 0.011 0.003 3.19
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR2GRAY) 0.006 0.004 1.55
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552BGR) 0.007 0.003 2.20
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552BGRA) 0.008 0.003 2.48
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552GRAY) 0.003 0.003 0.86
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552RGB) 0.007 0.003 2.11
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552RGBA) 0.008 0.003 2.41
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652BGR) 0.007 0.003 2.31
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652BGRA) 0.008 0.003 2.63
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652GRAY) 0.003 0.003 0.84
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652RGB) 0.007 0.003 2.24
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652RGBA) 0.008 0.003 2.40
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGRA2BGR555) 0.005 0.004 1.05
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGRA2BGR565) 0.004 0.004 1.10
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGRA2GRAY) 0.006 0.004 1.55
cvtColor8u::Size_CvtMode::(127x61, COLOR_GRAY2BGR555) 0.002 0.001 1.37
cvtColor8u::Size_CvtMode::(127x61, COLOR_GRAY2BGR565) 0.002 0.001 1.17
cvtColor8u::Size_CvtMode::(127x61, COLOR_GRAY2BGR) 0.002 0.002 0.93
cvtColor8u::Size_CvtMode::(127x61, COLOR_GRAY2BGRA) 0.003 0.003 1.05
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGB2BGR555) 0.011 0.003 3.11
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGB2BGR565) 0.011 0.004 2.94
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGB2GRAY) 0.006 0.004 1.50
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGBA2BGR555) 0.004 0.004 1.10
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGBA2BGR565) 0.004 0.004 1.15
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGBA2GRAY) 0.006 0.004 1.51
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR2BGR555) 0.452 0.067 6.76
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR2BGR565) 0.453 0.058 7.87
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR2GRAY) 0.229 0.094 2.42
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552BGR) 0.259 0.068 3.83
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552BGRA) 0.327 0.068 4.80
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552GRAY) 0.079 0.074 1.06
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552RGB) 0.285 0.066 4.29
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552RGBA) 0.329 0.068 4.81
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652BGR) 0.259 0.061 4.26
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652BGRA) 0.328 0.069 4.78
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652GRAY) 0.074 0.071 1.05
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652RGB) 0.288 0.059 4.87
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652RGBA) 0.324 0.067 4.84
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGRA2BGR555) 0.116 0.079 1.47
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGRA2BGR565) 0.104 0.074 1.41
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGRA2GRAY) 0.227 0.100 2.27
cvtColor8u::Size_CvtMode::(640x480, COLOR_GRAY2BGR555) 0.049 0.021 2.39
cvtColor8u::Size_CvtMode::(640x480, COLOR_GRAY2BGR565) 0.032 0.019 1.65
cvtColor8u::Size_CvtMode::(640x480, COLOR_GRAY2BGR) 0.039 0.037 1.05
cvtColor8u::Size_CvtMode::(640x480, COLOR_GRAY2BGRA) 0.053 0.052 1.03
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGB2BGR555) 0.453 0.062 7.36
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGB2BGR565) 0.455 0.061 7.47
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGB2GRAY) 0.229 0.096 2.39
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGBA2BGR555) 0.119 0.077 1.53
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGBA2BGR565) 0.109 0.076 1.42
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGBA2GRAY) 0.229 0.106 2.16
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR2BGR555) 3.027 0.491 6.16
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR2BGR565) 3.029 0.473 6.40
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR2GRAY) 1.561 0.633 2.47
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552BGR) 1.777 0.486 3.66
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552BGRA) 2.219 0.577 3.84
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552GRAY) 0.531 0.520 1.02
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552RGB) 1.906 0.476 4.00
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552RGBA) 2.218 0.542 4.10
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652BGR) 1.752 0.446 3.93
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652BGRA) 2.215 0.541 4.09
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652GRAY) 0.492 0.486 1.01
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652RGB) 1.914 0.454 4.22
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652RGBA) 2.215 0.521 4.25
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGRA2BGR555) 0.864 0.616 1.40
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGRA2BGR565) 0.747 0.597 1.25
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGRA2GRAY) 1.572 0.752 2.09
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_GRAY2BGR555) 0.323 0.231 1.40
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_GRAY2BGR565) 0.249 0.232 1.07
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_GRAY2BGR) 0.322 0.322 1.00
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_GRAY2BGRA) 0.424 0.405 1.05
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGB2BGR555) 3.050 0.464 6.58
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGB2BGR565) 3.059 0.467 6.55
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGB2GRAY) 1.530 0.622 2.46
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGBA2BGR555) 0.876 0.644 1.36
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGBA2BGR565) 0.797 0.600 1.33
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGBA2GRAY) 1.572 0.746 2.11
Baseline AVX512SKX
Name of Test rgb initial avx512skx rgb wide avx512skx rgb wide avx512skx vs rgb initial avx512skx (x-factor)
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR2BGR555) 0.011 0.004 2.93
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR2BGR565) 0.011 0.003 3.17
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR2GRAY) 0.007 0.004 1.58
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552BGR) 0.008 0.004 2.04
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552BGRA) 0.009 0.004 2.27
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552GRAY) 0.003 0.004 0.88
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552RGB) 0.008 0.004 2.21
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552RGBA) 0.009 0.004 2.17
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652BGR) 0.008 0.003 2.30
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652BGRA) 0.009 0.004 2.62
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652GRAY) 0.003 0.004 0.86
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652RGB) 0.008 0.003 2.36
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652RGBA) 0.009 0.004 2.67
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGRA2BGR555) 0.006 0.004 1.38
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGRA2BGR565) 0.005 0.004 1.33
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGRA2GRAY) 0.007 0.005 1.50
cvtColor8u::Size_CvtMode::(127x61, COLOR_GRAY2BGR555) 0.002 0.002 1.37
cvtColor8u::Size_CvtMode::(127x61, COLOR_GRAY2BGR565) 0.002 0.002 1.10
cvtColor8u::Size_CvtMode::(127x61, COLOR_GRAY2BGR) 0.004 0.003 1.58
cvtColor8u::Size_CvtMode::(127x61, COLOR_GRAY2BGRA) 0.005 0.003 1.69
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGB2BGR555) 0.011 0.004 2.88
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGB2BGR565) 0.011 0.004 2.90
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGB2GRAY) 0.007 0.004 1.54
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGBA2BGR555) 0.006 0.004 1.40
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGBA2BGR565) 0.005 0.004 1.30
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGBA2GRAY) 0.007 0.005 1.47
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR2BGR555) 0.493 0.074 6.69
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR2BGR565) 0.491 0.065 7.53
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR2GRAY) 0.256 0.108 2.37
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552BGR) 0.275 0.075 3.68
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552BGRA) 0.344 0.080 4.28
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552GRAY) 0.092 0.087 1.06
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552RGB) 0.286 0.074 3.86
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552RGBA) 0.342 0.079 4.31
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652BGR) 0.259 0.070 3.71
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652BGRA) 0.342 0.076 4.48
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652GRAY) 0.088 0.083 1.05
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652RGB) 0.285 0.066 4.30
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652RGBA) 0.331 0.076 4.37
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGRA2BGR555) 0.114 0.084 1.35
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGRA2BGR565) 0.096 0.084 1.15
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGRA2GRAY) 0.257 0.116 2.21
cvtColor8u::Size_CvtMode::(640x480, COLOR_GRAY2BGR555) 0.056 0.022 2.49
cvtColor8u::Size_CvtMode::(640x480, COLOR_GRAY2BGR565) 0.035 0.025 1.38
cvtColor8u::Size_CvtMode::(640x480, COLOR_GRAY2BGR) 0.050 0.039 1.27
cvtColor8u::Size_CvtMode::(640x480, COLOR_GRAY2BGRA) 0.065 0.054 1.20
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGB2BGR555) 0.496 0.072 6.89
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGB2BGR565) 0.493 0.065 7.55
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGB2GRAY) 0.255 0.109 2.34
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGBA2BGR555) 0.113 0.087 1.30
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGBA2BGR565) 0.098 0.081 1.22
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGBA2GRAY) 0.256 0.117 2.19
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR2BGR555) 3.316 0.552 6.01
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR2BGR565) 3.318 0.509 6.52
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR2GRAY) 1.638 0.659 2.49
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552BGR) 1.821 0.533 3.41
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552BGRA) 2.234 0.612 3.65
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552GRAY) 0.615 0.600 1.03
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552RGB) 1.942 0.550 3.53
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552RGBA) 2.265 0.614 3.69
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652BGR) 1.765 0.516 3.42
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652BGRA) 2.233 0.584 3.82
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652GRAY) 0.587 0.582 1.01
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652RGB) 1.933 0.509 3.80
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652RGBA) 2.259 0.598 3.78
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGRA2BGR555) 0.746 0.709 1.05
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGRA2BGR565) 0.670 0.691 0.97
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGRA2GRAY) 1.650 0.762 2.17
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_GRAY2BGR555) 0.357 0.241 1.48
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_GRAY2BGR565) 0.265 0.242 1.10
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_GRAY2BGR) 0.370 0.333 1.11
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_GRAY2BGRA) 0.483 0.435 1.11
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGB2BGR555) 3.332 0.550 6.06
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGB2BGR565) 3.329 0.512 6.51
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGB2GRAY) 1.624 0.680 2.39
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGBA2BGR555) 0.751 0.715 1.05
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGBA2BGR565) 0.678 0.706 0.96
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGBA2GRAY) 1.647 0.785 2.10
Baseline SSE2
Name of Test rgb initial sse2 rgb wide sse2 rgb wide sse2 vs rgb initial sse2 (x-factor)
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR2BGR555) 0.011 0.004 2.60
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR2BGR565) 0.011 0.004 2.69
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR2GRAY) 0.006 0.006 1.01
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552BGR) 0.007 0.005 1.51
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552BGRA) 0.008 0.003 2.34
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552GRAY) 0.003 0.004 0.69
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552RGB) 0.007 0.005 1.53
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552RGBA) 0.008 0.003 2.41
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652BGR) 0.007 0.004 1.62
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652BGRA) 0.008 0.003 2.97
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652GRAY) 0.003 0.004 0.67
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652RGB) 0.007 0.004 1.59
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652RGBA) 0.008 0.003 2.88
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGRA2BGR555) 0.004 0.004 1.20
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGRA2BGR565) 0.003 0.004 0.88
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGRA2GRAY) 0.006 0.005 1.13
cvtColor8u::Size_CvtMode::(127x61, COLOR_GRAY2BGR555) 0.002 0.001 1.79
cvtColor8u::Size_CvtMode::(127x61, COLOR_GRAY2BGR565) 0.003 0.001 2.04
cvtColor8u::Size_CvtMode::(127x61, COLOR_GRAY2BGR) 0.007 0.004 1.85
cvtColor8u::Size_CvtMode::(127x61, COLOR_GRAY2BGRA) 0.002 0.002 1.09
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGB2BGR555) 0.011 0.004 2.71
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGB2BGR565) 0.012 0.004 2.88
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGB2GRAY) 0.006 0.006 1.03
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGBA2BGR555) 0.004 0.004 1.20
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGBA2BGR565) 0.003 0.004 0.92
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGBA2GRAY) 0.006 0.006 1.13
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR2BGR555) 0.468 0.141 3.32
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR2BGR565) 0.466 0.143 3.25
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR2GRAY) 0.236 0.208 1.13
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552BGR) 0.271 0.165 1.64
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552BGRA) 0.332 0.101 3.29
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552GRAY) 0.088 0.150 0.58
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552RGB) 0.281 0.166 1.69
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552RGBA) 0.331 0.105 3.16
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652BGR) 0.260 0.171 1.52
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652BGRA) 0.329 0.090 3.65
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652GRAY) 0.085 0.153 0.56
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652RGB) 0.284 0.172 1.65
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652RGBA) 0.330 0.089 3.72
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGRA2BGR555) 0.141 0.112 1.25
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGRA2BGR565) 0.107 0.113 0.94
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGRA2GRAY) 0.232 0.190 1.22
cvtColor8u::Size_CvtMode::(640x480, COLOR_GRAY2BGR555) 0.059 0.037 1.60
cvtColor8u::Size_CvtMode::(640x480, COLOR_GRAY2BGR565) 0.038 0.037 1.03
cvtColor8u::Size_CvtMode::(640x480, COLOR_GRAY2BGR) 0.279 0.152 1.84
cvtColor8u::Size_CvtMode::(640x480, COLOR_GRAY2BGRA) 0.051 0.049 1.05
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGB2BGR555) 0.454 0.150 3.02
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGB2BGR565) 0.458 0.148 3.09
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGB2GRAY) 0.227 0.226 1.00
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGBA2BGR555) 0.148 0.120 1.24
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGBA2BGR565) 0.113 0.114 0.99
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGBA2GRAY) 0.229 0.195 1.17
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR2BGR555) 3.046 1.055 2.89
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR2BGR565) 3.076 1.034 2.97
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR2GRAY) 1.561 1.458 1.07
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552BGR) 1.784 1.153 1.55
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552BGRA) 2.282 0.741 3.08
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552GRAY) 0.607 1.033 0.59
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552RGB) 1.960 1.167 1.68
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552RGBA) 2.283 0.748 3.05
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652BGR) 1.807 1.122 1.61
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652BGRA) 2.318 0.632 3.67
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652GRAY) 0.574 0.965 0.60
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652RGB) 1.907 1.070 1.78
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652RGBA) 2.267 0.646 3.51
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGRA2BGR555) 1.064 0.827 1.29
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGRA2BGR565) 0.866 0.821 1.05
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGRA2GRAY) 1.564 1.295 1.21
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_GRAY2BGR555) 0.377 0.240 1.57
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_GRAY2BGR565) 0.255 0.249 1.02
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_GRAY2BGR) 1.917 0.968 1.98
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_GRAY2BGRA) 0.417 0.415 1.00
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGB2BGR555) 3.061 1.002 3.05
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGB2BGR565) 3.053 0.996 3.06
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGB2GRAY) 1.575 1.413 1.11
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGBA2BGR555) 1.110 0.830 1.34
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGBA2BGR565) 0.858 0.819 1.05
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGBA2GRAY) 1.593 1.341 1.19
Baseline SSE4.1
Name of Test rgb initial sse41 rgb wide sse41 rgb wide sse41 vs rgb initial sse41 (x-factor)
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR2BGR555) 0.009 0.003 2.88
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR2BGR565) 0.009 0.003 3.25
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR2GRAY) 0.006 0.004 1.61
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552BGR) 0.007 0.003 1.92
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552BGRA) 0.008 0.003 2.33
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552GRAY) 0.003 0.004 0.75
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552RGB) 0.007 0.004 1.98
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5552RGBA) 0.008 0.004 2.26
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652BGR) 0.007 0.003 2.26
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652BGRA) 0.008 0.003 2.71
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652GRAY) 0.003 0.003 0.75
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652RGB) 0.007 0.003 2.20
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGR5652RGBA) 0.008 0.003 2.62
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGRA2BGR555) 0.004 0.004 1.16
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGRA2BGR565) 0.003 0.004 0.95
cvtColor8u::Size_CvtMode::(127x61, COLOR_BGRA2GRAY) 0.006 0.004 1.42
cvtColor8u::Size_CvtMode::(127x61, COLOR_GRAY2BGR555) 0.002 0.001 1.68
cvtColor8u::Size_CvtMode::(127x61, COLOR_GRAY2BGR565) 0.002 0.001 1.25
cvtColor8u::Size_CvtMode::(127x61, COLOR_GRAY2BGR) 0.001 0.001 1.00
cvtColor8u::Size_CvtMode::(127x61, COLOR_GRAY2BGRA) 0.002 0.002 1.22
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGB2BGR555) 0.010 0.003 3.14
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGB2BGR565) 0.009 0.003 3.29
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGB2GRAY) 0.006 0.004 1.68
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGBA2BGR555) 0.004 0.004 1.16
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGBA2BGR565) 0.003 0.004 0.90
cvtColor8u::Size_CvtMode::(127x61, COLOR_RGBA2GRAY) 0.006 0.004 1.35
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR2BGR555) 0.353 0.092 3.82
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR2BGR565) 0.353 0.076 4.66
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR2GRAY) 0.220 0.109 2.01
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552BGR) 0.253 0.110 2.30
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552BGRA) 0.311 0.101 3.09
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552GRAY) 0.085 0.115 0.74
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552RGB) 0.272 0.116 2.34
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5552RGBA) 0.311 0.102 3.04
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652BGR) 0.253 0.100 2.54
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652BGRA) 0.323 0.088 3.68
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652GRAY) 0.083 0.106 0.78
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652RGB) 0.283 0.098 2.88
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGR5652RGBA) 0.343 0.090 3.80
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGRA2BGR555) 0.138 0.112 1.23
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGRA2BGR565) 0.114 0.114 1.00
cvtColor8u::Size_CvtMode::(640x480, COLOR_BGRA2GRAY) 0.222 0.158 1.40
cvtColor8u::Size_CvtMode::(640x480, COLOR_GRAY2BGR555) 0.058 0.040 1.46
cvtColor8u::Size_CvtMode::(640x480, COLOR_GRAY2BGR565) 0.036 0.042 0.87
cvtColor8u::Size_CvtMode::(640x480, COLOR_GRAY2BGR) 0.035 0.036 0.96
cvtColor8u::Size_CvtMode::(640x480, COLOR_GRAY2BGRA) 0.048 0.050 0.95
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGB2BGR555) 0.351 0.092 3.83
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGB2BGR565) 0.355 0.083 4.27
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGB2GRAY) 0.217 0.111 1.96
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGBA2BGR555) 0.140 0.112 1.25
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGBA2BGR565) 0.117 0.116 1.01
cvtColor8u::Size_CvtMode::(640x480, COLOR_RGBA2GRAY) 0.221 0.158 1.40
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR2BGR555) 2.414 0.652 3.70
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR2BGR565) 2.428 0.576 4.21
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR2GRAY) 1.514 0.761 1.99
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552BGR) 1.729 0.768 2.25
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552BGRA) 2.202 0.734 3.00
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552GRAY) 0.556 0.792 0.70
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552RGB) 1.874 0.781 2.40
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5552RGBA) 2.204 0.757 2.91
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652BGR) 1.708 0.673 2.54
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652BGRA) 2.164 0.648 3.34
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652GRAY) 0.531 0.719 0.74
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652RGB) 1.846 0.668 2.76
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGR5652RGBA) 2.137 0.655 3.26
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGRA2BGR555) 0.962 0.877 1.10
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGRA2BGR565) 0.825 0.876 0.94
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_BGRA2GRAY) 1.498 1.062 1.41
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_GRAY2BGR555) 0.359 0.267 1.34
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_GRAY2BGR565) 0.243 0.279 0.87
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_GRAY2BGR) 0.318 0.317 1.00
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_GRAY2BGRA) 0.404 0.407 0.99
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGB2BGR555) 2.392 0.658 3.64
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGB2BGR565) 2.395 0.563 4.25
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGB2GRAY) 1.503 0.740 2.03
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGBA2BGR555) 1.026 0.866 1.18
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGBA2BGR565) 0.856 0.867 0.99
cvtColor8u::Size_CvtMode::(1920x1080, COLOR_RGBA2GRAY) 1.516 1.062 1.43
Not covered Several conversions are not covered by perf tests so their performance was estimated with custom test: random image 511x511, time measured by loop of 20000 identical calls, averaged by 5 runs.

Resulting improvement depends on baseline and number of channels but doesn't differ significantly:

  • RGB2Gray<float>:
    • 1.1x to 1.2x
  • RGB5x52Gray:
    • the same except SSE2 which is 0.7x
  • RGB2Gray<ushort>:
    • 1.0x to 1.2x except SSE2 which is 1.3x to 1.8x
  • RGBA2mRGBA<uchar>:
    • from 2x on AVX512SKX to 3.7x on SSE2
      *mRGBA2RGBA<uchar>:
    • from 2.7x on SSE2 to 5.3 on AVX2
#disable_ipp=ON
force_builders=Linux AVX2

#if CV_SIMD
const int vsize = v_uint8::nlanes;
v_uint8 vz = vx_setzero_u8(), vn0 = vx_setall_u8(255);
for(; i < n-vsize+1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i < n-vsize+1

Other code uses this form for vectorized loops:
i <= n - vsize

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix ARM builds.


if( gb == 6 )
{
g0 = ((t0 >> 5) << 10) >> 8;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't one bit-mask be faster than two shifts?

Copy link
Copy Markdown
Contributor Author

@savuor savuor Dec 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did exactly that in the beginng then decided to use shifts instead since masking requres one extra register for mask.
Can't figure out which code works faster. Their performance differs less than between any two runs of identical code.

@savuor
Copy link
Copy Markdown
Contributor Author

savuor commented Dec 13, 2018

@alalek ARM builds fixed

@alalek alalek merged commit d99a4af into opencv:3.4 Dec 14, 2018
@savuor savuor deleted the color_5x5 branch December 14, 2018 14:34
@alalek alalek mentioned this pull request Dec 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants