Skip to content

imgproc: medianblur: Performance improvement#27299

Merged
asmorkalov merged 3 commits intoopencv:4.xfrom
amd:fast_medianblur_simd
May 19, 2025
Merged

imgproc: medianblur: Performance improvement#27299
asmorkalov merged 3 commits intoopencv:4.xfrom
amd:fast_medianblur_simd

Conversation

@madanm3
Copy link
Copy Markdown
Contributor

@madanm3 madanm3 commented May 12, 2025

  • Bottleneck in non-vectorized path reduced.
  • AVX512 dispatch added for medianblur.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

* Bottleneck in non-vectorized path reduced.
* AVX512 added medianblur dispatch.
@asmorkalov
Copy link
Copy Markdown
Contributor

@fengyuentau Could you take a look?

@asmorkalov asmorkalov added this to the 4.12.0 milestone May 13, 2025
@madanm3
Copy link
Copy Markdown
Contributor Author

madanm3 commented May 13, 2025

Details on performance gain with avx512:

	 Name of test                                base	 patch      base/patch
medianBlur::Size_MatType_kSize::(127x61, 16SC1, 3)    0.01 ms    0.00 ms     INF x
medianBlur::Size_MatType_kSize::(127x61, 16SC1, 5)    0.04 ms    0.02 ms     2.0 x
medianBlur::Size_MatType_kSize::(127x61, 16UC1, 3)    0.01 ms    0.00 ms     INF x
medianBlur::Size_MatType_kSize::(127x61, 16UC1, 5)    0.04 ms    0.02 ms     2.0 x
medianBlur::Size_MatType_kSize::(127x61, 32FC1, 3)    0.00 ms    0.00 ms     INF x
medianBlur::Size_MatType_kSize::(127x61, 32FC1, 5)    0.04 ms    0.03 ms     1.3 x
medianBlur::Size_MatType_kSize::(127x61, 8UC1, 3)     0.02 ms    0.08 ms     0.3 x
medianBlur::Size_MatType_kSize::(127x61, 8UC1, 5)     0.15 ms    0.60 ms     0.3 x
medianBlur::Size_MatType_kSize::(127x61, 8UC4, 3)     0.02 ms    0.34 ms     0.1 x
medianBlur::Size_MatType_kSize::(127x61, 8UC4, 5)     0.14 ms    2.39 ms     0.1 x
medianBlur::Size_MatType_kSize::(1280x720, 16SC1, 3)  0.41 ms    0.15 ms     2.7 x
medianBlur::Size_MatType_kSize::(1280x720, 16SC1, 5)  2.80 ms    1.20 ms     2.3 x
medianBlur::Size_MatType_kSize::(1280x720, 16UC1, 3)  0.37 ms    0.16 ms     2.3 x
medianBlur::Size_MatType_kSize::(1280x720, 16UC1, 5)  2.78 ms    1.20 ms     2.3 x
medianBlur::Size_MatType_kSize::(1280x720, 32FC1, 3)  0.45 ms    0.44 ms     1.0 x
medianBlur::Size_MatType_kSize::(1280x720, 32FC1, 5)  4.14 ms    3.25 ms     1.3 x
medianBlur::Size_MatType_kSize::(1280x720, 8UC1, 3)   0.34 ms    0.09 ms     3.8 x
medianBlur::Size_MatType_kSize::(1280x720, 8UC1, 5)   2.09 ms    0.61 ms     3.4 x
medianBlur::Size_MatType_kSize::(1280x720, 8UC4, 3)   0.57 ms    0.34 ms     1.7 x
medianBlur::Size_MatType_kSize::(1280x720, 8UC4, 5)   3.20 ms    2.41 ms     1.3 x
medianBlur::Size_MatType_kSize::(320x240, 16SC1, 3)   0.03 ms    0.01 ms     3.0 x
medianBlur::Size_MatType_kSize::(320x240, 16SC1, 5)   0.74 ms    0.10 ms     7.4 x
medianBlur::Size_MatType_kSize::(320x240, 16UC1, 3)   0.03 ms    0.01 ms     3.0 x
medianBlur::Size_MatType_kSize::(320x240, 16UC1, 5)   0.73 ms    0.10 ms     7.3 x
medianBlur::Size_MatType_kSize::(320x240, 32FC1, 3)   0.04 ms    0.03 ms     1.3 x
medianBlur::Size_MatType_kSize::(320x240, 32FC1, 5)   0.67 ms    0.26 ms     2.6 x
medianBlur::Size_MatType_kSize::(320x240, 8UC1, 3)    0.09 ms    0.01 ms     9.0 x
medianBlur::Size_MatType_kSize::(320x240, 8UC1, 5)    0.60 ms    0.11 ms     5.5 x
medianBlur::Size_MatType_kSize::(320x240, 8UC4, 3)    0.11 ms    0.04 ms     2.8 x
medianBlur::Size_MatType_kSize::(320x240, 8UC4, 5)    0.69 ms    0.42 ms     1.6 x
medianBlur::Size_MatType_kSize::(640x480, 16SC1, 3)   0.09 ms    0.05 ms     1.8 x
medianBlur::Size_MatType_kSize::(640x480, 16SC1, 5)   1.61 ms    0.53 ms     3.0 x
medianBlur::Size_MatType_kSize::(640x480, 16UC1, 3)   0.09 ms    0.05 ms     1.8 x
medianBlur::Size_MatType_kSize::(640x480, 16UC1, 5)   1.60 ms    0.54 ms     3.0 x
medianBlur::Size_MatType_kSize::(640x480, 32FC1, 3)   0.15 ms    0.14 ms     1.1 x
medianBlur::Size_MatType_kSize::(640x480, 32FC1, 5)   1.83 ms    1.25 ms     1.5 x
medianBlur::Size_MatType_kSize::(640x480, 8UC1, 3)    0.20 ms    0.04 ms     5.0 x
medianBlur::Size_MatType_kSize::(640x480, 8UC1, 5)    1.27 ms    0.28 ms     4.5 x
medianBlur::Size_MatType_kSize::(640x480, 8UC4, 3)    0.28 ms    0.13 ms     2.2 x
medianBlur::Size_MatType_kSize::(640x480, 8UC4, 5)    1.64 ms    1.10 ms     1.5 x

Copy link
Copy Markdown
Member

@fengyuentau fengyuentau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also show the performance of this patch on other intrinsic sets, e.g. avx2?

@madanm3
Copy link
Copy Markdown
Contributor Author

madanm3 commented May 13, 2025

Could you also show the performance of this patch on other intrinsic sets, e.g. avx2?

avx2:
			Name of test                   base      patch      base/patch
medianBlur::Size_MatType_kSize::(127x61, 16SC1, 3)    0.01 ms    0.00 ms      INF x
medianBlur::Size_MatType_kSize::(127x61, 16SC1, 5)    0.04 ms    0.02 ms      2.0 x
medianBlur::Size_MatType_kSize::(127x61, 16UC1, 3)    0.01 ms    0.00 ms      INF x
medianBlur::Size_MatType_kSize::(127x61, 16UC1, 5)    0.04 ms    0.02 ms      2.0 x
medianBlur::Size_MatType_kSize::(127x61, 32FC1, 3)    0.00 ms    0.00 ms      INF x
medianBlur::Size_MatType_kSize::(127x61, 32FC1, 5)    0.04 ms    0.03 ms      1.3 x
medianBlur::Size_MatType_kSize::(127x61, 8UC1, 3)     0.02 ms    0.00 ms      INF x
medianBlur::Size_MatType_kSize::(127x61, 8UC1, 5)     0.15 ms    0.02 ms      7.5 x
medianBlur::Size_MatType_kSize::(127x61, 8UC4, 3)     0.02 ms    0.01 ms      2.0 x
medianBlur::Size_MatType_kSize::(127x61, 8UC4, 5)     0.14 ms    0.09 ms      1.6 x
medianBlur::Size_MatType_kSize::(1280x720, 16SC1, 3)  0.42 ms    0.17 ms      2.5 x
medianBlur::Size_MatType_kSize::(1280x720, 16SC1, 5)  2.79 ms    1.27 ms      2.2 x
medianBlur::Size_MatType_kSize::(1280x720, 16UC1, 3)  0.38 ms    0.17 ms      2.2 x
medianBlur::Size_MatType_kSize::(1280x720, 16UC1, 5)  2.80 ms    1.31 ms      2.1 x
medianBlur::Size_MatType_kSize::(1280x720, 32FC1, 3)  0.47 ms    0.44 ms      1.1 x
medianBlur::Size_MatType_kSize::(1280x720, 32FC1, 5)  4.15 ms    3.43 ms      1.2 x
medianBlur::Size_MatType_kSize::(1280x720, 8UC1, 3)   0.34 ms    0.10 ms      3.4 x
medianBlur::Size_MatType_kSize::(1280x720, 8UC1, 5)   2.09 ms    0.63 ms      3.3 x
medianBlur::Size_MatType_kSize::(1280x720, 8UC4, 3)   0.57 ms    0.37 ms      1.5 x
medianBlur::Size_MatType_kSize::(1280x720, 8UC4, 5)   3.20 ms    2.51 ms      1.3 x
medianBlur::Size_MatType_kSize::(320x240, 16SC1, 3)   0.03 ms    0.01 ms      3.0 x
medianBlur::Size_MatType_kSize::(320x240, 16SC1, 5)   0.74 ms    0.10 ms      7.4 x
medianBlur::Size_MatType_kSize::(320x240, 16UC1, 3)   0.03 ms    0.01 ms      3.0 x
medianBlur::Size_MatType_kSize::(320x240, 16UC1, 5)   0.74 ms    0.10 ms      7.4 x
medianBlur::Size_MatType_kSize::(320x240, 32FC1, 3)   0.04 ms    0.04 ms      1.0 x
medianBlur::Size_MatType_kSize::(320x240, 32FC1, 5)   0.67 ms    0.26 ms      2.6 x
medianBlur::Size_MatType_kSize::(320x240, 8UC1, 3)    0.09 ms    0.01 ms      9.0 x
medianBlur::Size_MatType_kSize::(320x240, 8UC1, 5)    0.60 ms    0.11 ms      5.5 x
medianBlur::Size_MatType_kSize::(320x240, 8UC4, 3)    0.11 ms    0.05 ms      2.2 x
medianBlur::Size_MatType_kSize::(320x240, 8UC4, 5)    0.69 ms    0.43 ms      1.6 x
medianBlur::Size_MatType_kSize::(640x480, 16SC1, 3)   0.09 ms    0.06 ms      1.5 x
medianBlur::Size_MatType_kSize::(640x480, 16SC1, 5)   1.61 ms    0.57 ms      2.8 x
medianBlur::Size_MatType_kSize::(640x480, 16UC1, 3)   0.08 ms    0.06 ms      1.3 x
medianBlur::Size_MatType_kSize::(640x480, 16UC1, 5)   1.62 ms    0.57 ms      2.8 x
medianBlur::Size_MatType_kSize::(640x480, 32FC1, 3)   0.15 ms    0.14 ms      1.1 x
medianBlur::Size_MatType_kSize::(640x480, 32FC1, 5)   1.85 ms    1.34 ms      1.4 x
medianBlur::Size_MatType_kSize::(640x480, 8UC1, 3)    0.20 ms    0.04 ms      5.0 x
medianBlur::Size_MatType_kSize::(640x480, 8UC1, 5)    1.27 ms    0.29 ms      4.4 x
medianBlur::Size_MatType_kSize::(640x480, 8UC4, 3)    0.28 ms    0.14 ms      2.0 x
medianBlur::Size_MatType_kSize::(640x480, 8UC4, 5)    1.64 ms    1.12 ms      1.5 x

Copy link
Copy Markdown
Member

@fengyuentau fengyuentau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@asmorkalov asmorkalov merged commit 84ea77a into opencv:4.x May 19, 2025
28 checks passed
@asmorkalov asmorkalov mentioned this pull request May 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants