Skip to content

Gaussian filter padding updates with QA support#517

Merged
r-abishek merged 16 commits intor-abishek:ar/opt_gaussian_filter_qa_f32from
RooseweltMcW:apr/gaussian_updates
Nov 14, 2025
Merged

Gaussian filter padding updates with QA support#517
r-abishek merged 16 commits intor-abishek:ar/opt_gaussian_filter_qa_f32from
RooseweltMcW:apr/gaussian_updates

Conversation

@RooseweltMcW
Copy link
Copy Markdown

Implemented nearest neighbor padding logic with respect to gaussian filter HOST and HIP and updated QA test

@RooseweltMcW RooseweltMcW changed the title Gaussian filter padding updated with QA support Gaussian filter padding updates with QA support Oct 31, 2025
@HazarathKumarM
Copy link
Copy Markdown
Collaborator

@RooseweltMcW please update the branch with latest develop changes

@r-abishek r-abishek requested a review from Copilot October 31, 2025 18:38
@r-abishek r-abishek added the enhancement New feature or request label Oct 31, 2025
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@r-abishek r-abishek requested a review from Copilot October 31, 2025 21:50
@r-abishek r-abishek changed the base branch from develop to ar/opt_gaussian_filter_qa_f32 October 31, 2025 22:07
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

rpp_load16_u8_to_f32_avx(srcPtrTemp[k], &pRow[k * 2]);
for (int k = rowKernelLoopLimit * 2; k < 10; k += 2)
pRow[k] = pRow[k + 1] = avx_p0;
const int radius = 5 - rowKernelLoopLimit;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is radius here

pRow[k] = pRow[k + 1] = avx_p0;
const int radius = 7 - rowKernelLoopLimit;
int centerRowOffset = padIndex ? radius : 0; // The offset tells us where the center row is located within srcPtrTemp
for (int k = 0; k < 7; k++)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see many additional computations here that are not needed. Please check and revert to the older style of loads.

applicable to all the load funcs in this file

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional computations are required because unlike box filter the order of the padding needs to be preserved. This is evident from 3x3 cases where there are multiple conditions handled just to preserve the order

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions can be templated. Will attempt it

Copy link
Copy Markdown
Owner

@r-abishek r-abishek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Srihari-mcw Pls address comments

dst_f8->f1[7] = fmaf(src_f1, filter[7], dst_f8->f1[7]);
src_f1 = rpp_hip_unpack3(src_ui4.w);
dst_f8->f1[7] = fmaf(src_f1, filter[8], dst_f8->f1[7]);
#pragma unroll
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tab in for these pragmas. Check other instances

Copy link
Copy Markdown
Collaborator

@Srihari-mcw Srihari-mcw Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tab is there for the pragma in all other places in RPP too, it is indented with respect to the loop on which its applied

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

@@ -309,15 +58,22 @@ __global__ void gaussian_filter_3x3_pkd_tensor(T *srcPtr,
RpptROIPtr roiTensorPtrSrc,
float *filterTensor)
{
int hipThreadIdx_x8 = hipThreadIdx_x << 3;
int hipThreadIdx_x8 = hipThreadIdx_x << 3;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert space

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

{
#pragma unroll
for(int k = 0; k < filterSize; ++k)
dst_f8->f1[j] = fmaf(srcPtr[j + k], filter[k], dst_f8->f1[j]);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check u8 old vs new performance

gaussian_row_hip_compute<7>(&src_smem[hipThreadIdx_y_channel.x + 6][hipThreadIdx_x8], &sum_f24.f8[0], filter_row7);
gaussian_row_hip_compute<7>(&src_smem[hipThreadIdx_y_channel.y + 6][hipThreadIdx_x8], &sum_f24.f8[1], filter_row7);
gaussian_row_hip_compute<7>(&src_smem[hipThreadIdx_y_channel.z + 6][hipThreadIdx_x8], &sum_f24.f8[2], filter_row7);
if constexpr (std::is_same<T, float>::value)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add for half

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pixel check was found to be not needed

template<typename T>
inline void process_left_border_columns_pln_pln(T **srcPtrTemp, T *dstPtrTemp, Rpp32u kernelSize, Rpp32u padLength,
Rpp32u unpaddedWidth, Rpp32s rowKernelLoopLimit, Rpp32f *filterTensor)
Rpp32u unpaddedWidth, Rpp32s rowKernelLoopLimit, Rpp32f *filterTensor, Rpp32s padVertical)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can padVertical and paddHorizontal be boolean or preferably enums? TOP_EDGE etc or other

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

}

inline void rpp_load_filter_3x3_pkd_host(__m256 *pRow, Rpp8u **srcPtrTemp, Rpp32s rowKernelLoopLimit)
inline void rpp_load_filter_3x3_pln_host(__m256 *pRow, Rpp8u **srcPtrTemp, Rpp32s rowKernelLoopLimit, Rpp32s padIndex)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these be templated below too?

Rpp32s rowKernelLoopLimit = kernelSize;
get_kernel_loop_limit(i, rowKernelLoopLimit, padLength, unpaddedHeight);
process_left_border_columns_pln_pln(srcPtrTemp, dstPtrTemp, kernelSize, padLength, unpaddedWidth, rowKernelLoopLimit, filterTensor);
Rpp32s padVertical = i < padLength ? 0 : 1;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? inline comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may get clarified with enum

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

process_left_border_columns_pln_pln(srcPtrTemp, dstPtrTemp, kernelSize, padLength, unpaddedWidth, rowKernelLoopLimit, filterTensor, padVertical);
dstPtrTemp += padLength;
#if __AVX2__
Rpp32s padindex = (padVertical == 1) ? rowKernelLoopLimit - 1 : 0;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment

}

if constexpr (std::is_same<T, Rpp32f>::value)
rpp_pixel_check_0to1(pDst, 2);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is 2?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

if constexpr (std::is_same<T, Rpp32f>::value)
rpp_pixel_check_0to1(pDst, 2);
rpp_store_filter_3x3_host(dstPtrTemp, pDst);
increment_row_ptrs(srcPtrTemp, kernelSize, 14);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 14 - all instances - either in-line comment or variable

Copy link
Copy Markdown
Owner

@r-abishek r-abishek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Srihari-mcw Please address these

api/rppdefs.h Outdated
{
LEFT_EDGE = 0,
RIGHT_EDGE
} RpptBorderHorizontalDirection;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like the one below, call this RpptImageBorderEdge instead of RpptImageBorderType.

Are two separate enums for horizontal and vertical needed, can one suffice?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

Rpp32u kernelSize, Rpp32u padLength, Rpp32u unpaddedWidth,
Rpp32s rowKernelLoopLimit, Rpp32f *filterTensor, Rpp32u channels = 1,
RpptBorderVerticalDirection padVertical = RpptBorderVerticalDirection::BOTTOM_EDGE,
RpptBorderHorizontalDirection padHorizontal = RpptBorderHorizontalDirection::RIGHT_EDGE)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a combined enum, these arguments can be:

RpptImageBorderEdge padVertical = RpptImageBorderEdge::BOTTOM_EDGE,
RpptImageBorderEdge padHorizontal = RpptImageBorderEdge::RIGHT_EDGE)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

@Srihari-mcw
Copy link
Copy Markdown
Collaborator

Srihari-mcw commented Nov 12, 2025

  • Adds nearest neighbors padding for box filter augmentation (Current ToT version had border pixels fading out, the PR attempts to fix the same )
  • Added updated QA support for testing the HOST and HIP Backends for various layout types - F32 test support was introduced, U8 test support was updated
  • Updated the compute functions for gaussian filter with fmadd on HOST Backend
  • Templated the load filter functions significantly on HOST Backend - 36 helper functions reduced to 3 templated functions
  • Introduced enums to indicate the various image border edge types - left, right, bottom and top
  • Separated the compute functions of all bit depths on HIP Backend (Avoids additional bit depth conversions)
  • HIP Backend upgrades also help in performance improvements of specific F32 variants compared to existing ToT version : 37% - 82% for kernel Sizes 5,7 and 9 for PKD3 variants

@Srihari-mcw
Copy link
Copy Markdown
Collaborator

Srihari-mcw commented Nov 12, 2025

Variant which is mentioned in description and for which gains are seen
image
image

@Srihari-mcw
Copy link
Copy Markdown
Collaborator

Srihari-mcw commented Nov 12, 2025

image image

@Srihari-mcw
Copy link
Copy Markdown
Collaborator

image image

@Srihari-mcw
Copy link
Copy Markdown
Collaborator

image image

@r-abishek r-abishek merged commit 3e058d5 into r-abishek:ar/opt_gaussian_filter_qa_f32 Nov 14, 2025
ManasaDattaT pushed a commit to ManasaDattaT/rpp that referenced this pull request Dec 10, 2025
…shek#517)

Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>
ManasaDattaT pushed a commit to ManasaDattaT/rpp that referenced this pull request Dec 19, 2025
…shek#517)

Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants