Skip to content

support SIMD for larger symmetric Bit-exact 16U gaussian blur#18983

Merged
alalek merged 2 commits intoopencv:3.4from
Yosshi999:bitexact-gaussian-16U-faster
Dec 11, 2020
Merged

support SIMD for larger symmetric Bit-exact 16U gaussian blur#18983
alalek merged 2 commits intoopencv:3.4from
Yosshi999:bitexact-gaussian-16U-faster

Conversation

@Yosshi999
Copy link
Copy Markdown
Contributor

@Yosshi999 Yosshi999 commented Dec 1, 2020

perf test

env:
Parallel framework: pthreads (nthreads=8)
CPU features: SSE SSE2 SSE3 *SSE4.1 *SSE4.2 *FP16 *AVX *AVX2 *AVX512-SKX?

Measured performance of GaussianBlur(src, dst, Size(7,7), 0, 0, btype).
Conventional perftests (3x3 and 5x5) are not improved because these are already optimized in 698b2bf.

name image size, type, btype mean (usec) in 3.4 mean (usec) in this PR
gaussianBlur7x7/0 (640x480, 16UC1, BORDER_REPLICATE) 358 100
gaussianBlur7x7/1 (640x480, 16UC1, BORDER_CONSTANT) 351 97
gaussianBlur7x7/2 (640x480, 16UC1, BORDER_REFLECT) 605 100
gaussianBlur7x7/3 (640x480, 16UC1, BORDER_REFLECT101) 694 100
gaussianBlur7x7/4 (1280x720, 16UC1, BORDER_REPLICATE) 2081 343
gaussianBlur7x7/5 (1280x720, 16UC1, BORDER_CONSTANT) 1219 489
gaussianBlur7x7/6 (1280x720, 16UC1, BORDER_REFLECT) 1221 549
gaussianBlur7x7/7 (1280x720, 16UC1, BORDER_REFLECT101) 1453 422

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
force_builders=Linux AVX2,Custom
buildworker:Custom=linux-3
build_image:Custom=ubuntu:18.04
CPU_BASELINE:Custom=AVX512_SKX
disable_ipp=ON

@asmorkalov
Copy link
Copy Markdown
Contributor

@terfendail Could you take a look on the solution?

@terfendail
Copy link
Copy Markdown
Contributor

@Yosshi999 Could you please extend accuracy tests to 16U gaussian blur?

@Yosshi999
Copy link
Copy Markdown
Contributor Author

@asmorkalov
Copy link
Copy Markdown
Contributor

@terfendail Friendly reminder.

@alalek
Copy link
Copy Markdown
Member

alalek commented Dec 10, 2020

Results from DISABLED_FULL/OCL_GaussianBlurFixture.GaussianBlur* perf tests.

Configuration: i5-6600, 1 thread, no IPP
Parameters: --gtest_also_run_disabled_tests --gtest_filter=*FULL*auss* --perf_threads=1

Name of Test base patch speedup
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC1, 3) 7.498 7.071 1.06
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC1, 5) 12.050 13.087 0.92
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC1, 7) 24.956 4.365 5.72
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC1, 9) 31.764 5.215 6.09
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC1, 11) 38.540 6.356 6.06
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC1, 13) 44.926 7.654 5.87
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC1, 15) 51.635 8.848 5.84
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC1, 17) 58.032 9.443 6.15
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC1, 19) 64.759 10.835 5.98
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC1, 21) 71.598 12.189 5.87
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC1, 23) 78.125 13.729 5.69
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC1, 25) 85.296 15.249 5.59
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC1, 27) 92.261 15.528 5.94
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC1, 29) 99.694 16.858 5.91
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC2, 3) 15.323 14.170 1.08
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC2, 5) 23.935 26.356 0.91
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC2, 7) 49.970 8.383 5.96
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC2, 9) 63.157 10.710 5.90
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC2, 11) 76.411 12.999 5.88
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC2, 13) 89.464 15.747 5.68
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC2, 15) 103.414 16.913 6.11
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC2, 17) 116.021 19.377 5.99
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC2, 19) 129.263 22.736 5.69
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC2, 21) 143.478 25.887 5.54
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC2, 23) 156.401 26.601 5.88
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC2, 25) 170.235 29.110 5.85
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC2, 27) 184.335 32.559 5.66
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC2, 29) 197.766 35.903 5.51
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC3, 3) 23.365 21.230 1.10
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC3, 5) 35.730 39.332 0.91
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC3, 7) 75.842 13.147 5.77
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC3, 9) 95.262 16.609 5.74
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC3, 11) 115.406 18.734 6.16
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC3, 13) 135.004 22.408 6.02
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC3, 15) 155.203 26.811 5.79
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC3, 17) 175.163 30.711 5.70
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC3, 19) 195.481 32.315 6.05
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC3, 21) 215.616 35.873 6.01
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC3, 23) 235.929 41.340 5.71
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC3, 25) 255.522 45.972 5.56
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC3, 27) 276.735 46.550 5.94
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC3, 29) 297.379 50.300 5.91
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC4, 3) 31.000 28.539 1.09
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC4, 5) 47.669 52.243 0.91
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC4, 7) 100.822 16.707 6.03
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC4, 9) 126.625 21.203 5.97
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC4, 11) 153.166 26.250 5.83
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC4, 13) 180.047 31.758 5.67
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC4, 15) 207.392 34.135 6.08
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC4, 17) 237.175 40.819 5.81
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC4, 19) 270.267 48.091 5.62
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC4, 21) 302.194 53.799 5.62
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC4, 23) 335.328 61.909 5.42
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC4, 25) 368.538 66.779 5.52
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC4, 27) 401.667 70.240 5.72
GaussianBlur::DISABLED_FULL/OCL_GaussianBlurFixture::(1920x1080, 16UC4, 29) 427.746 77.757 5.50

Copy link
Copy Markdown
Contributor

@terfendail terfendail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks fine for me

@alalek
Copy link
Copy Markdown
Member

alalek commented Dec 10, 2020

Any thoughts about 8-9% regressions with 5x5 kernel?

@Yosshi999
Copy link
Copy Markdown
Contributor Author

Any thoughts about 8-9% regressions with 5x5 kernel?

I have no idea. But implementing SIMD version of 5Nabcba for ufixedpoint32 may improve its speed.

@alalek alalek merged commit fdeac73 into opencv:3.4 Dec 11, 2020
@alalek alalek mentioned this pull request Dec 11, 2020
@alalek alalek mentioned this pull request Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants