Gaussian filter padding updates with QA support by RooseweltMcW · Pull Request #517 · r-abishek/rpp

RooseweltMcW · 2025-10-31T05:42:22Z

Implemented nearest neighbor padding logic with respect to gaussian filter HOST and HIP and updated QA test

…QA U8 and F32

HazarathKumarM · 2025-10-31T06:17:01Z

@RooseweltMcW please update the branch with latest develop changes

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

HazarathKumarM · 2025-11-03T06:21:43Z

src/include/common/cpu/rpp_cpu_filter.hpp

-        rpp_load16_u8_to_f32_avx(srcPtrTemp[k], &pRow[k * 2]);
-    for (int k = rowKernelLoopLimit * 2; k < 10; k += 2)
-        pRow[k] = pRow[k + 1] = avx_p0;
+    const int radius = 5 - rowKernelLoopLimit;


what is radius here

HazarathKumarM · 2025-11-03T06:30:10Z

src/include/common/cpu/rpp_cpu_filter.hpp

-        pRow[k] = pRow[k + 1] = avx_p0;
+    const int radius = 7 - rowKernelLoopLimit;
+    int centerRowOffset = padIndex ? radius : 0;    // The offset tells us where the center row is located within srcPtrTemp
+    for (int k = 0; k < 7; k++)


I can see many additional computations here that are not needed. Please check and revert to the older style of loads.

applicable to all the load funcs in this file

The additional computations are required because unlike box filter the order of the padding needs to be preserved. This is evident from 3x3 cases where there are multiple conditions handled just to preserve the order

The functions can be templated. Will attempt it

r-abishek

@Srihari-mcw Pls address comments

r-abishek · 2025-11-04T04:54:27Z

src/modules/tensor/hip/kernel/gaussian_filter.cpp

-    dst_f8->f1[7] = fmaf(src_f1, filter[7], dst_f8->f1[7]);
-    src_f1 = rpp_hip_unpack3(src_ui4.w);
-    dst_f8->f1[7] = fmaf(src_f1, filter[8], dst_f8->f1[7]);
+    #pragma unroll


tab in for these pragmas. Check other instances

Tab is there for the pragma in all other places in RPP too, it is indented with respect to the loop on which its applied

r-abishek · 2025-11-04T04:54:43Z

src/modules/tensor/hip/kernel/gaussian_filter.cpp

@@ -309,15 +58,22 @@ __global__ void gaussian_filter_3x3_pkd_tensor(T *srcPtr,
                                               RpptROIPtr roiTensorPtrSrc,
                                               float *filterTensor)
 {
-    int hipThreadIdx_x8 = hipThreadIdx_x << 3;
+     int hipThreadIdx_x8 = hipThreadIdx_x << 3;


revert space

r-abishek · 2025-11-04T04:57:40Z

src/modules/tensor/hip/kernel/gaussian_filter.cpp

+    {
+        #pragma unroll
+        for(int k = 0; k < filterSize; ++k)
+            dst_f8->f1[j] = fmaf(srcPtr[j + k], filter[k], dst_f8->f1[j]);


check u8 old vs new performance

r-abishek · 2025-11-04T05:06:23Z

src/modules/tensor/hip/kernel/gaussian_filter.cpp

+        gaussian_row_hip_compute<7>(&src_smem[hipThreadIdx_y_channel.x + 6][hipThreadIdx_x8], &sum_f24.f8[0], filter_row7);
+        gaussian_row_hip_compute<7>(&src_smem[hipThreadIdx_y_channel.y + 6][hipThreadIdx_x8], &sum_f24.f8[1], filter_row7);
+        gaussian_row_hip_compute<7>(&src_smem[hipThreadIdx_y_channel.z + 6][hipThreadIdx_x8], &sum_f24.f8[2], filter_row7);
+        if constexpr (std::is_same<T, float>::value)


add for half

Pixel check was found to be not needed

r-abishek · 2025-11-04T05:26:04Z

src/include/common/cpu/rpp_cpu_filter.hpp

 template<typename T>
 inline void process_left_border_columns_pln_pln(T **srcPtrTemp, T *dstPtrTemp, Rpp32u kernelSize, Rpp32u padLength,
-                                                Rpp32u unpaddedWidth, Rpp32s rowKernelLoopLimit, Rpp32f *filterTensor)
+                                                Rpp32u unpaddedWidth, Rpp32s rowKernelLoopLimit, Rpp32f *filterTensor, Rpp32s padVertical)


Can padVertical and paddHorizontal be boolean or preferably enums? TOP_EDGE etc or other

r-abishek · 2025-11-04T05:46:22Z

src/include/common/cpu/rpp_cpu_filter.hpp

-}
-
-inline void rpp_load_filter_3x3_pkd_host(__m256 *pRow, Rpp8u **srcPtrTemp, Rpp32s rowKernelLoopLimit)
+inline void rpp_load_filter_3x3_pln_host(__m256 *pRow, Rpp8u **srcPtrTemp, Rpp32s rowKernelLoopLimit, Rpp32s padIndex)


can these be templated below too?

r-abishek · 2025-11-04T05:49:20Z

src/modules/tensor/cpu/kernel/gaussian_filter.cpp

                        Rpp32s rowKernelLoopLimit = kernelSize;
                        get_kernel_loop_limit(i, rowKernelLoopLimit, padLength, unpaddedHeight);
-                        process_left_border_columns_pln_pln(srcPtrTemp, dstPtrTemp, kernelSize, padLength, unpaddedWidth, rowKernelLoopLimit, filterTensor);
+                        Rpp32s padVertical = i < padLength ? 0 : 1; 


why? inline comment

may get clarified with enum

r-abishek · 2025-11-04T05:51:04Z

src/modules/tensor/cpu/kernel/gaussian_filter.cpp

+                        process_left_border_columns_pln_pln(srcPtrTemp, dstPtrTemp, kernelSize, padLength, unpaddedWidth, rowKernelLoopLimit, filterTensor, padVertical);
                        dstPtrTemp += padLength;
 #if __AVX2__
+                        Rpp32s padindex = (padVertical == 1) ?  rowKernelLoopLimit - 1 : 0;


r-abishek · 2025-11-04T05:51:21Z

src/modules/tensor/cpu/kernel/gaussian_filter.cpp

                            }
-
+                            if constexpr (std::is_same<T, Rpp32f>::value)
+                                rpp_pixel_check_0to1(pDst, 2);


r-abishek · 2025-11-04T05:54:37Z

src/modules/tensor/cpu/kernel/gaussian_filter.cpp

+                            if constexpr (std::is_same<T, Rpp32f>::value)
+                                rpp_pixel_check_0to1(pDst, 2);
                            rpp_store_filter_3x3_host(dstPtrTemp, pDst);
                            increment_row_ptrs(srcPtrTemp, kernelSize, 14);


why 14 - all instances - either in-line comment or variable

r-abishek

@Srihari-mcw Please address these

r-abishek · 2025-11-11T23:07:49Z

api/rppdefs.h

+{
+    LEFT_EDGE = 0,
+    RIGHT_EDGE
+} RpptBorderHorizontalDirection;


Just like the one below, call this RpptImageBorderEdge instead of RpptImageBorderType.

Are two separate enums for horizontal and vertical needed, can one suffice?

r-abishek · 2025-11-11T23:17:28Z

src/include/common/cpu/rpp_cpu_filter.hpp

+                                              Rpp32u kernelSize, Rpp32u padLength, Rpp32u unpaddedWidth,
+                                              Rpp32s rowKernelLoopLimit, Rpp32f *filterTensor, Rpp32u channels = 1,
+                                              RpptBorderVerticalDirection padVertical = RpptBorderVerticalDirection::BOTTOM_EDGE,
+                                              RpptBorderHorizontalDirection padHorizontal = RpptBorderHorizontalDirection::RIGHT_EDGE)


With a combined enum, these arguments can be:

RpptImageBorderEdge padVertical = RpptImageBorderEdge::BOTTOM_EDGE, RpptImageBorderEdge padHorizontal = RpptImageBorderEdge::RIGHT_EDGE)

Srihari-mcw · 2025-11-12T06:34:39Z

Adds nearest neighbors padding for box filter augmentation (Current ToT version had border pixels fading out, the PR attempts to fix the same )
Added updated QA support for testing the HOST and HIP Backends for various layout types - F32 test support was introduced, U8 test support was updated
Updated the compute functions for gaussian filter with fmadd on HOST Backend
Templated the load filter functions significantly on HOST Backend - 36 helper functions reduced to 3 templated functions
Introduced enums to indicate the various image border edge types - left, right, bottom and top
Separated the compute functions of all bit depths on HIP Backend (Avoids additional bit depth conversions)
HIP Backend upgrades also help in performance improvements of specific F32 variants compared to existing ToT version : 37% - 82% for kernel Sizes 5,7 and 9 for PKD3 variants

Srihari-mcw · 2025-11-12T16:11:33Z

Variant which is mentioned in description and for which gains are seen

Srihari-mcw · 2025-11-12T16:15:50Z

Srihari-mcw · 2025-11-12T16:26:29Z

Srihari-mcw · 2025-11-12T16:29:42Z

…shek#517) Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>

HazarathKumarM added 3 commits October 30, 2025 06:41

Updated gaussian filter padding logic HOST and updated bin files for …

053f123

…QA U8 and F32

Updated gaussian filter for HIP backend with the border padding fix

a25132b

Updated docs image for gaussian filter

771920d

RooseweltMcW changed the title ~~Gaussian filter padding updated with QA support~~ Gaussian filter padding updates with QA support Oct 31, 2025

r-abishek requested a review from Copilot October 31, 2025 18:38

r-abishek assigned HazarathKumarM Oct 31, 2025

r-abishek added the enhancement New feature or request label Oct 31, 2025

Copilot AI reviewed Oct 31, 2025

View reviewed changes

r-abishek requested a review from Copilot October 31, 2025 21:50

r-abishek changed the base branch from develop to ar/opt_gaussian_filter_qa_f32 October 31, 2025 22:07

Copilot AI reviewed Oct 31, 2025

View reviewed changes

HazarathKumarM reviewed Nov 3, 2025

View reviewed changes

Template helper load functions

8199150

r-abishek requested changes Nov 4, 2025

View reviewed changes

Srihari-mcw added 11 commits November 4, 2025 13:15

Update changes with enum for clear distinction

f318906

Compilation fixes

349960d

Fixes for enum changes

fb6c3b2

Remove unwanted pixel checks

3b507c5

Add modifications to run and update 3x3 kernel

e81ff77

Remove filter load functions for 3x3 kernel

6c5bf25

Standardize store function calls

b7e9f9e

Changes to outer API and variable renaming

2aa9b74

Changes to outer API in headers

164ba43

Additional comments

9cc1a6a

Update the load function with further comments

805acb4

RooseweltMcW force-pushed the apr/gaussian_updates branch from 6893fa3 to 805acb4 Compare November 11, 2025 15:38

r-abishek reviewed Nov 12, 2025

View reviewed changes

Modifications to edge enum

a4a15a0

r-abishek approved these changes Nov 14, 2025

View reviewed changes

r-abishek merged commit 3e058d5 into r-abishek:ar/opt_gaussian_filter_qa_f32 Nov 14, 2025

ManasaDattaT pushed a commit to ManasaDattaT/rpp that referenced this pull request Dec 10, 2025

CXX Compiler:G++ -- Update the code to fix host issues for g++ (r-abi…

e76d820

…shek#517) Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>

ManasaDattaT pushed a commit to ManasaDattaT/rpp that referenced this pull request Dec 19, 2025

CXX Compiler:G++ -- Update the code to fix host issues for g++ (r-abi…

77af22a

…shek#517) Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>

Conversation

RooseweltMcW commented Oct 31, 2025

Uh oh!

HazarathKumarM commented Oct 31, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r-abishek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Srihari-mcw Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r-abishek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Srihari-mcw commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Srihari-mcw commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Srihari-mcw Nov 4, 2025 •

edited

Loading

Srihari-mcw commented Nov 12, 2025 •

edited

Loading

Srihari-mcw commented Nov 12, 2025 •

edited

Loading

Srihari-mcw commented Nov 12, 2025 •

edited

Loading