Skip to content

RPP Tensor Support - Swap channels on HOST and HIP#400

Merged
r-abishek merged 15 commits intor-abishek:ar/opt_swap_channelsfrom
Dineshbabu-Ravichandran:db/randomChannelPermute
Apr 22, 2025
Merged

RPP Tensor Support - Swap channels on HOST and HIP#400
r-abishek merged 15 commits intor-abishek:ar/opt_swap_channelsfrom
Dineshbabu-Ravichandran:db/randomChannelPermute

Conversation

@Dineshbabu-Ravichandran
Copy link
Copy Markdown

  • Extends tensor support of Swap channels Augmentation for all permutation and optimized using AVX2 on HOST backend
  • Extends tensor support of Swap channels Augmentation for all permutation on HIP backend
  • Adds unit and performance tests support for the Swap channels Augmentation in test suite

@r-abishek r-abishek requested a review from Copilot April 17, 2025 19:22
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR extends tensor support for the Swap Channels Augmentation on both HOST and HIP backends and adds corresponding unit and performance tests. Key changes include:

  • Adding SWAP_CHANNELS to the set of augmentation cases.
  • Updating test suite scripts and executor functions to handle a new permTensor parameter.
  • Modifying both HOST and HIP kernel invocations and API declarations to accommodate the swap channels permutation.

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
utilities/test_suite/rpp_test_suite_image.h Added SWAP_CHANNELS to augmentation test cases and compare_output logic
utilities/test_suite/HOST/runImageTests.py Added a new branch to run swap channels test variants on HOST
utilities/test_suite/HOST/Tensor_image_host.cpp Updated swap channels handling by incorporating a permTensor parameter
utilities/test_suite/HIP/runImageTests.py Added a new branch to run swap channels test variants on HIP
utilities/test_suite/HIP/Tensor_image_hip.cpp Updated swap channels handling with HIP host memory allocation for permTensor
src/modules/tensor/rppt_tensor_data_exchange_operations.cpp Modified swap channels functions to accept a permTensor parameter
src/modules/tensor/hip/kernel/swap_channels.cpp Updated GPU kernel functions to incorporate the permTensor parameter
src/include/tensor/host_tensor_executors.hpp Updated host function prototypes for swap channels operations
src/include/tensor/hip_tensor_executors.hpp Updated HIP function prototypes for swap channels operations
api/rppt_tensor_data_exchange_operations.h Updated API declarations to reflect the new swap channels parameter
Comments suppressed due to low confidence (1)

utilities/test_suite/HOST/runImageTests.py:85

  • [nitpick] The use of the literal "85" for the swap channels test case may reduce clarity and maintainability. Consider introducing a defined constant (e.g., SWAP_CHANNELS) to replace the magic number.
elif case == "85":

{
testCaseName = "swap_channels";

Rpp32u *permTensor = nullptr;
Copy link

Copilot AI Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memory allocated with hipHostMalloc for 'permTensor' is not freed after usage. Consider adding a call to hipFree(permTensor) at the end of its scope to prevent a memory leak.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Owner

@r-abishek r-abishek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HazarathKumarM Please address comments to merge this PR

* \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = 3)
* \param [out] dstPtr destination tensor in HOST memory
* \param [in] dstDescPtr destination tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = same as that of srcDescPtr)
* \param [in] permTensor permutation tensor for swap channels operation
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just like hip, also say "in HOST memory" here

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pSwap[0] = p[permTensor[0] * 2]; // channel swap
pSwap[1] = p[permTensor[0] * 2 + 1]; // channel swap
pSwap[2] = p[permTensor[1] * 2]; // channel swap
pSwap[3] = p[permTensor[1] * 2 + 1]; // channel swap
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Init an array [6] outside = {permTensor[0] * 2, permTensor[0] * 2 + 1, ...}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Adjust permutation tensor for NHWC - NCHW layout conversion
if((srcDescPtr->c == 3) && (srcDescPtr->layout == RpptLayout::NHWC) && (dstDescPtr->layout == RpptLayout::NCHW))
{
Rpp32u count = 0;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have this adjustment? If this is common to all bit depths, perhaps create a helper function at the top?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

testCaseName = "swap_channels";

Rpp32u *permTensor = nullptr;
CHECK_RETURN_STATUS(hipHostMalloc(&permTensor, 3 * sizeof(Rpp32u)));
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copilot says there's no free for this.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

elif case == "85":
swapOrderRange = 6
# Run all variants of swap channels functions with additional argument of perm order
for swapOrder in range(swapOrderRange):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a little more explanation in that comment.
Not quite clear what is swapOrderRange etc and what's being run

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

elif case == "85":
swapOrderRange = 6
# Run all variants of swap channel functions with additional argument of swapOrder (0 - 5)
for swapOrder in range(swapOrderRange):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Rpp8u mapping[][3] = {
{0, 1, 2}, // axisMask 0 → R, G, B
{0, 2, 1}, // axisMask 1 → R, B, G
{1, 0, 2}, // axisMask 2 → G, R, B
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add content from this comment in one line in the python comment too. This is more clear.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for (; vectorLoopCount < alignedLength; vectorLoopCount += vectorIncrement)
{
__m256 p[6], pSwap[6];
rpp_simd_load(rpp_load48_u8pkd3_to_f32pln3_avx, srcPtrTemp, p); // simd loads
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be preferrably done with u8 load and store completely instead of conversion as it is a load store operation with some permutations?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls check and change across the file such instances

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for (; vectorLoopCount < alignedLength; vectorLoopCount += vectorIncrement)
{
__m256 p[6], pSwap[6];
rpp_simd_load(rpp_load48_u8pkd3_to_f32pln3_avx, srcPtrTemp, p); // simd loads
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls check and change across the file such instances

@r-abishek r-abishek changed the base branch from develop to ar/opt_swap_channels April 22, 2025 23:11
@r-abishek r-abishek added the enhancement New feature or request label Apr 22, 2025
@r-abishek r-abishek added this to the sow12ms3 milestone Apr 22, 2025
@r-abishek r-abishek merged commit acecafe into r-abishek:ar/opt_swap_channels Apr 22, 2025
ManasaDattaT pushed a commit to ManasaDattaT/rpp that referenced this pull request Dec 19, 2025
* Fix header levels in changelog

* Wrap bare URL in changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants