Skip to content

RPP Dilate on HOST and HIP#554

Merged
r-abishek merged 18 commits intor-abishek:ar/opt_dilatefrom
HazarathKumarM:dilate_rebased
Jan 27, 2026
Merged

RPP Dilate on HOST and HIP#554
r-abishek merged 18 commits intor-abishek:ar/opt_dilatefrom
HazarathKumarM:dilate_rebased

Conversation

@HazarathKumarM
Copy link
Copy Markdown
Collaborator

No description provided.

blend_shuffle_max_7x7_host<7, 63, 1, 15, 127, 3>(&pxTemp[0], pxMaskPkd, blendRegisterOrder);
blend_shuffle_max_7x7_host<7, 63, 1, 15, 127, 3>(&pxTemp[1], pxMaskPkd, blendRegisterOrder);

if constexpr (std::is_same<T, Rpp8s>::value)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove an empty space before if

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -26,287 +26,38 @@ SOFTWARE.

// -------------------- Set 0 - dilate device helpers --------------------

__device__ void dilate_3x3_row_hip_compute(uchar *srcPtr, d_float8 *dst_f8)
// Templated dilate row compute function - works for any filter size (3, 5, 7, 9)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

device void erode_row_hip_compute(T *srcPtr, d_float8 *dst_f8)

{

#pragma unroll

for (int k = 0; k < 8; k++)

{

    float minVal = static_cast<float>(srcPtr[k]);

    for (int j = 1; j < filterSize; j++)

        minVal = fminf(minVal, static_cast<float>(srcPtr[k + j]));

    dst_f8->f1[k] = fminf(dst_f8->f1[k], minVal);

}

}

Modify function similar to erode

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

dilate_row_hip_compute<7>(&src_smem[hipThreadIdx_y + 4][hipThreadIdx_x8], &sum_f8);
dilate_row_hip_compute<7>(&src_smem[hipThreadIdx_y + 5][hipThreadIdx_x8], &sum_f8);
dilate_row_hip_compute<7>(&src_smem[hipThreadIdx_y + 6][hipThreadIdx_x8], &sum_f8);
if constexpr (std::is_same<T, Rpp8s>::value)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont have any if else just have

rpp_hip_pack_float8_and_store8(dstPtr + dstIdx, &sum_f8);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

int clampedX = roiBeginX + max(0, min(id_x_i + i, (roiWidth - 1)));
int clampedIdx = (id_z * srcStridesNH.x) + (clampedY * srcStridesNH.y) + (clampedX * 3);

src_smem[hipThreadIdx_y_channel.x][hipThreadIdx_x8 + i] = srcPtr[clampedIdx]; // R
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls indent R similar to G and B in all places across the code

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (roiTypeSrc == RpptRoiType::LTRB)
convert_roi(roiTensorPtrDst, RpptRoiType::XYWH, dstDescPtr->n);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the whitespace

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// If DEBUG_MODE is set to 1 dump the outputs to csv files for debugging
if(DEBUG_MODE && iterCount == 0)
if (DEBUG_MODE && iterCount == 0)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restore all these unnecessary changes

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Owner

@r-abishek r-abishek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@r-abishek r-abishek changed the base branch from develop to ar/opt_dilate January 21, 2026 03:53
@r-abishek r-abishek added the enhancement New feature or request label Jan 21, 2026
@r-abishek r-abishek requested a review from Copilot January 21, 2026 03:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds HOST backend support for the RPP (Rocm Performance Primitives) dilate morphological operation, extending the existing HIP-only implementation.

Changes:

  • Enabled dilate operation for both HOST and HIP backends in the test suite configuration
  • Added CPU implementation of dilate operation with support for multiple data types (U8, I8, F16, F32)
  • Implemented SIMD-optimized helper functions for efficient dilate computation across different kernel sizes (3x3, 5x5, 7x7, 9x9)

Reviewed changes

Copilot reviewed 7 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
utilities/test_suite/common.py Updated dilate backend support from HIP-only to include HOST
utilities/test_suite/HOST/runImageTests.py Added dilate to kernel size test configurations
utilities/test_suite/HOST/Tensor_image_host.cpp Implemented dilate test case with HOST backend API call
src/modules/tensor/rppt_tensor_morphological_operations.cpp Added rppt_dilate_host function with multi-datatype support
src/include/tensor/host_tensor_executors.hpp Added function declarations for dilate HOST implementations
src/include/common/cpu/rpp_cpu_filter.hpp Added SIMD helper functions and morphological operation utilities
api/rppt_tensor_morphological_operations.h Added HOST API documentation and fixed HIP documentation typo

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1870 to +1873

for (int i = 0; i < oBufferSize; i++)
refFile << static_cast<int>(*(outputu8 + i)) << ",";

Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added blank lines (1870 and 1873) create inconsistent spacing around the for loop. This section appears to be unrelated to the dilate implementation and these formatting changes are unnecessary.

Suggested change
for (int i = 0; i < oBufferSize; i++)
refFile << static_cast<int>(*(outputu8 + i)) << ",";
for (int i = 0; i < oBufferSize; i++)
refFile << static_cast<int>(*(outputu8 + i)) << ",";

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

{
/* For PLN inputs | For PKD inputs
pSrc[0] - [X01|X02|X03|X04|X05|X06|X07|X08], pSrc[1] - [X09|X10|X11|X12|X13|X14|X15|X16]| pSrc[0] - [R01|G01|B01|R02|G02|B02|R03|G03], pSrc[1] - [B03|R04|G04|B04|R05|G05|B05|R06],
[X02|X03|X04|X05|X06|X07|X08|X09] (blend with mask [0000 0001] and permute) pSrc[2] - [G06|B06|R07|G07|B07|R08|G08|B08], pSrc[3] - [R09|G09|B09|R10|G10|B10|R11|G11]
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing pipe character '|' separator before 'pSrc[2]' in this comment line. All other similar comment blocks in this file consistently use '|' to separate PLN and PKD input descriptions.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@r-abishek r-abishek merged commit ad42d02 into r-abishek:ar/opt_dilate Jan 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants