Skip to content

Median and Gaussian Filter optimizations#681

Merged
rrawther merged 20 commits intoROCm:developfrom
r-abishek:ar/opt_median_gaussian_filter
Mar 30, 2026
Merged

Median and Gaussian Filter optimizations#681
rrawther merged 20 commits intoROCm:developfrom
r-abishek:ar/opt_median_gaussian_filter

Conversation

@r-abishek
Copy link
Copy Markdown
Member

This PR contains specific powerful optimizations in Median and Gaussian filter to match and marginally beat AVX2 level performance using OpenCV.

PERFORMANCE COMPARISONS FOR MEDIAN FILTER AND GAUSSIAN FILTER

image image
image image

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements significant performance optimizations for median and gaussian filters targeting AVX2-level performance comparable to OpenCV. The changes introduce sorting networks for small kernels (3×3, 5×5), histogram-based methods for large kernels with U8 types, and various SIMD optimizations.

Changes:

  • Median filter: Added specialized 5×5 sorting network implementations for HIP and CPU with AVX2 vectorization
  • Median filter: Introduced histogram-based median calculation for U8 types with large kernels (7×7, 9×9+)
  • Gaussian filter: Pre-computed filter coefficients, optimized kernel generation, added prefetching and loop unrolling

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 9 comments.

File Description
src/modules/tensor/hip/kernel/median_filter.cpp Fixed double semicolon, added optimized 5×5 median implementation with sorting network, refactored compute_median with histogram method for large U8 kernels
src/modules/tensor/cpu/kernel/median_filter.cpp Added AVX2-vectorized sorting networks for 3×3 and 5×5 median filters across multiple data types (U8, I8, F32, F16), implemented histogram-based median for large U8 kernels
src/modules/tensor/hip/kernel/gaussian_filter.cpp Changed expf to __expf intrinsic, optimized kernel generation with single-pass normalization
src/modules/tensor/cpu/kernel/gaussian_filter.cpp Pre-allocated and broadcast filter coefficients, optimized kernel generation, added prefetching, unrolled convolution loops for 3×3 kernel

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 27, 2026

Codecov Report

❌ Patch coverage is 98.93711% with 12 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/modules/tensor/cpu/kernel/median_filter.cpp 98.91% 12 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #681      +/-   ##
===========================================
+ Coverage    92.45%   92.48%   +0.03%     
===========================================
  Files          215      215              
  Lines        94897    95987    +1090     
===========================================
+ Hits         87729    88767    +1038     
- Misses        7168     7220      +52     
Files with missing lines Coverage Δ
src/modules/tensor/cpu/kernel/gaussian_filter.cpp 85.62% <100.00%> (+0.13%) ⬆️
src/modules/tensor/hip/kernel/gaussian_filter.cpp 99.73% <ø> (ø)
src/modules/tensor/hip/kernel/median_filter.cpp 99.66% <ø> (ø)
src/modules/tensor/cpu/kernel/median_filter.cpp 99.00% <98.91%> (-1.00%) ⬇️

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@rrawther rrawther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please address the review comments

@LakshmiKumar23
Copy link
Copy Markdown
Contributor

@r-abishek @HazarathKumarM code coverage drop 7% . Please take a look

@LakshmiKumar23 LakshmiKumar23 self-requested a review March 27, 2026 21:31
@HazarathKumarM
Copy link
Copy Markdown
Contributor

HazarathKumarM commented Mar 30, 2026

@LakshmiKumar23 With the recent CI runs, we are observing the code coverage of this PR as 92.48%. we are not observing any drop
image

@rrawther rrawther merged commit 08e9024 into ROCm:develop Mar 30, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci:precheckin enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants