RPP Tensor Support - Swap channels on HOST and HIP#400
Conversation
Dineshbabu-Ravichandran
commented
Jan 31, 2025
- Extends tensor support of Swap channels Augmentation for all permutation and optimized using AVX2 on HOST backend
- Extends tensor support of Swap channels Augmentation for all permutation on HIP backend
- Adds unit and performance tests support for the Swap channels Augmentation in test suite
There was a problem hiding this comment.
Pull Request Overview
This PR extends tensor support for the Swap Channels Augmentation on both HOST and HIP backends and adds corresponding unit and performance tests. Key changes include:
- Adding SWAP_CHANNELS to the set of augmentation cases.
- Updating test suite scripts and executor functions to handle a new permTensor parameter.
- Modifying both HOST and HIP kernel invocations and API declarations to accommodate the swap channels permutation.
Reviewed Changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| utilities/test_suite/rpp_test_suite_image.h | Added SWAP_CHANNELS to augmentation test cases and compare_output logic |
| utilities/test_suite/HOST/runImageTests.py | Added a new branch to run swap channels test variants on HOST |
| utilities/test_suite/HOST/Tensor_image_host.cpp | Updated swap channels handling by incorporating a permTensor parameter |
| utilities/test_suite/HIP/runImageTests.py | Added a new branch to run swap channels test variants on HIP |
| utilities/test_suite/HIP/Tensor_image_hip.cpp | Updated swap channels handling with HIP host memory allocation for permTensor |
| src/modules/tensor/rppt_tensor_data_exchange_operations.cpp | Modified swap channels functions to accept a permTensor parameter |
| src/modules/tensor/hip/kernel/swap_channels.cpp | Updated GPU kernel functions to incorporate the permTensor parameter |
| src/include/tensor/host_tensor_executors.hpp | Updated host function prototypes for swap channels operations |
| src/include/tensor/hip_tensor_executors.hpp | Updated HIP function prototypes for swap channels operations |
| api/rppt_tensor_data_exchange_operations.h | Updated API declarations to reflect the new swap channels parameter |
Comments suppressed due to low confidence (1)
utilities/test_suite/HOST/runImageTests.py:85
- [nitpick] The use of the literal "85" for the swap channels test case may reduce clarity and maintainability. Consider introducing a defined constant (e.g., SWAP_CHANNELS) to replace the magic number.
elif case == "85":
| { | ||
| testCaseName = "swap_channels"; | ||
|
|
||
| Rpp32u *permTensor = nullptr; |
There was a problem hiding this comment.
The memory allocated with hipHostMalloc for 'permTensor' is not freed after usage. Consider adding a call to hipFree(permTensor) at the end of its scope to prevent a memory leak.
r-abishek
left a comment
There was a problem hiding this comment.
@HazarathKumarM Please address comments to merge this PR
| * \param [in] srcDescPtr source tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = 3) | ||
| * \param [out] dstPtr destination tensor in HOST memory | ||
| * \param [in] dstDescPtr destination tensor descriptor (Restrictions - numDims = 4, offsetInBytes >= 0, dataType = U8/F16/F32/I8, layout = NCHW/NHWC, c = same as that of srcDescPtr) | ||
| * \param [in] permTensor permutation tensor for swap channels operation |
There was a problem hiding this comment.
Just like hip, also say "in HOST memory" here
| pSwap[0] = p[permTensor[0] * 2]; // channel swap | ||
| pSwap[1] = p[permTensor[0] * 2 + 1]; // channel swap | ||
| pSwap[2] = p[permTensor[1] * 2]; // channel swap | ||
| pSwap[3] = p[permTensor[1] * 2 + 1]; // channel swap |
There was a problem hiding this comment.
Init an array [6] outside = {permTensor[0] * 2, permTensor[0] * 2 + 1, ...}
| // Adjust permutation tensor for NHWC - NCHW layout conversion | ||
| if((srcDescPtr->c == 3) && (srcDescPtr->layout == RpptLayout::NHWC) && (dstDescPtr->layout == RpptLayout::NCHW)) | ||
| { | ||
| Rpp32u count = 0; |
There was a problem hiding this comment.
Why do we have this adjustment? If this is common to all bit depths, perhaps create a helper function at the top?
| testCaseName = "swap_channels"; | ||
|
|
||
| Rpp32u *permTensor = nullptr; | ||
| CHECK_RETURN_STATUS(hipHostMalloc(&permTensor, 3 * sizeof(Rpp32u))); |
There was a problem hiding this comment.
copilot says there's no free for this.
| elif case == "85": | ||
| swapOrderRange = 6 | ||
| # Run all variants of swap channels functions with additional argument of perm order | ||
| for swapOrder in range(swapOrderRange): |
There was a problem hiding this comment.
This needs a little more explanation in that comment.
Not quite clear what is swapOrderRange etc and what's being run
| elif case == "85": | ||
| swapOrderRange = 6 | ||
| # Run all variants of swap channel functions with additional argument of swapOrder (0 - 5) | ||
| for swapOrder in range(swapOrderRange): |
| Rpp8u mapping[][3] = { | ||
| {0, 1, 2}, // axisMask 0 → R, G, B | ||
| {0, 2, 1}, // axisMask 1 → R, B, G | ||
| {1, 0, 2}, // axisMask 2 → G, R, B |
There was a problem hiding this comment.
Add content from this comment in one line in the python comment too. This is more clear.
| for (; vectorLoopCount < alignedLength; vectorLoopCount += vectorIncrement) | ||
| { | ||
| __m256 p[6], pSwap[6]; | ||
| rpp_simd_load(rpp_load48_u8pkd3_to_f32pln3_avx, srcPtrTemp, p); // simd loads |
There was a problem hiding this comment.
Can't this be preferrably done with u8 load and store completely instead of conversion as it is a load store operation with some permutations?
There was a problem hiding this comment.
Pls check and change across the file such instances
| for (; vectorLoopCount < alignedLength; vectorLoopCount += vectorIncrement) | ||
| { | ||
| __m256 p[6], pSwap[6]; | ||
| rpp_simd_load(rpp_load48_u8pkd3_to_f32pln3_avx, srcPtrTemp, p); // simd loads |
There was a problem hiding this comment.
Pls check and change across the file such instances
* Fix header levels in changelog * Wrap bare URL in changelog