Solarize HIP and HOST implementation by HazarathKumarM · Pull Request #454 · r-abishek/rpp

HazarathKumarM · 2025-06-26T15:07:42Z

Solarize HIP and HOST tensor implementation

r-abishek

@HazarathKumarM Please address comments

r-abishek · 2025-08-08T00:19:45Z

utilities/test_suite/HIP/Tensor_image_hip.cpp

+                    testCaseName = "solarize";
+
+
+                    for (int i = 0; i < batchSize; i++)


Extra blank lines

r-abishek · 2025-08-08T00:21:04Z

src/modules/tensor/rppt_tensor_effects_augmentations.cpp

+                             RpptDescPtr srcDescPtr,
+                             RppPtr_t dstPtr,
+                             RpptDescPtr dstDescPtr,
+                             Rpp32f *tresholdTensor,


Typo - thresholdTensor

r-abishek · 2025-08-08T00:22:10Z

src/modules/tensor/rppt_tensor_effects_augmentations.cpp

+{
+    RppLayoutParams layoutParams = get_layout_params(srcDescPtr->layout, srcDescPtr->c);
+
+    if ((srcDescPtr->dataType == RpptDataType::U8) && (dstDescPtr->dataType == RpptDataType::U8))


Can thresholdTensor have any float value with no restriction or checks?

r-abishek · 2025-08-08T00:22:56Z

api/rppt_tensor_effects_augmentations.h

+RppStatus rppt_solarize_host(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, Rpp32f* thresholdTensor, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);
+
+#ifdef GPU_SUPPORT
+RppStatus rppt_solarize_gpu(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, Rpp32f *thresholdTensor, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);


Header documentation to be added?

r-abishek · 2025-08-08T00:23:54Z

api/rppt_tensor_effects_augmentations.h

 RppStatus rppt_fog_gpu(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, Rpp32f *intensityFactor, Rpp32f *greyFactor, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);
 #endif // GPU_SUPPORT
+
+RppStatus rppt_solarize_host(RppPtr_t srcPtr, RpptDescPtr srcDescPtr, RppPtr_t dstPtr, RpptDescPtr dstDescPtr, Rpp32f* thresholdTensor, RpptROIPtr roiTensorPtrSrc, RpptRoiType roiType, rppHandle_t rppHandle);


ROCm docs input/output images to be added?
Bin images too for QA

r-abishek · 2025-08-08T00:26:53Z

src/include/common/cpu/rpp_cpu_simd_load_store.hpp

    px[2] = _mm256_shuffle_epi8(_mm256_unpacklo_epi8(pxSrc[2], pxSrc[3]), pxMaskRGB);    /* unpack 8 lo-pixels of pxSrc[2] and pxSrc[3] to get B01-16 */
 }

+inline void rpp_load96_i8pkd3_to_i8pln3(Rpp8s *srcPtr, __m256i *px)


This rpp_load96_i8pkd3_to_i8pln3() and the helper above this rpp_load96_u8pkd3_to_u8pln3() are exactly the same on Diffchecker. Why are two needed? Can be templated?

all the changes in the file are reverted as these changes are no longer needed in the updated version

r-abishek · 2025-08-08T00:30:38Z

src/include/common/cpu/rpp_cpu_simd_load_store.hpp

    _mm256_storeu_si256((__m256i *)dstPtrB, px[2]);    /* store [B01|B02|B03|B04|B05|B06|B07|B08|B09|B10|B11|B12|B13|B14|B15|B16|B17|B18|B19|B20|B21|B22|B23|B24|B25|B26|B27|B28|B29|B30|B31|B32] */
 }

+inline void rpp_store96_i8pln3_to_i8pln3(Rpp8s *dstPtrR, Rpp8s *dstPtrG, Rpp8s *dstPtrB, __m256i *px)


Same comment

all the changes in the file are reverted as these changes are no longer needed in the updated version

r-abishek · 2025-08-08T00:36:24Z

src/include/common/cpu/rpp_cpu_simd_load_store.hpp

    _mm_storeu_si128((__m128i *)(dstPtr + 84), _mm256_extractf128_si256(pxDst[7], 1));    /* store [R29|G29|B29|R30|G30|B30|R31|G31|B31|R32|G32|B32|00|00|00|00] */
 }

+inline void rpp_store96_i8pln3_to_i8pkd3(Rpp8s *dstPtr, __m256i *px)


Same comment:

I would either template or suggest this naming without even templating:

rpp_load96_u8pkd3_to_u8pln3(), rpp_store96_i8pln3_to_i8pln3() and rpp_store96_i8pln3_to_i8pkd3() seem to be used only in bitwise_not, bitwise_xor and channel_permute.
If this functionality really works correct for both u8 and i8, lets rename these 3 to rpp_load96_8bitpkd3_to_8bitpln3(), rpp_store96_8bitpln3_to_8bitpln3() and rpp_store96_8bitpln3_to_8bitpkd3() respectively.

Then just call the same 3 helpers everywhere.

Pls add comments above helper functions that this is meant to accept only U8/I8.

all the changes in the file are reverted as these changes are no longer needed in the updated version

r-abishek · 2025-08-08T00:41:27Z

src/modules/tensor/hip/kernel/solarize.cpp

+
+#include "hip_tensor_executors.hpp"
+
+__device__ void solarize_hip_rgb_compute(d_float24 *pix_f24, float thresholdParam, float maxVal)


Change to float &thresholdParam, float &maxVal)
These are already local variables inside gpu kernel. No need to re-init another local variable

r-abishek · 2025-08-08T00:55:12Z

src/modules/tensor/cpu/kernel/solarize.cpp

+                // {
+                //     __m256i p[3];
+                //     rpp_simd_load(rpp_load96_u8pkd3_to_u8pln3, srcPtrTemp, p);                               // simd loads
+                //     compute_solarize_96_host(p, pxThresholdParam);                                           // threshold adjustment


Need to remove commented code

r-abishek

@HazarathKumarM Pls check two comments

r-abishek · 2025-08-25T22:28:02Z

utilities/test_suite/common.py

 ImageAugmentationGroupMap = {
    "color_augmentations" : [0, 1, 2, 3, 4, 13, 31, 34, 36, 42, 43, 45, 81],
-    "effects_augmentations" : [5, 6, 8, 10, 11, 29, 30, 32, 35, 46, 82, 83, 84],
+    "effects_augmentations" : [5, 6, 8, 10, 11, 29, 30, 32, 35, 46, 82, 83, 84, 95],


Lets make a change in this PR for this.
Or before issuing this PR, lets issue another separate PR to fix this.

Lets ensure that once case numbers have been tagged to function names, we only use the names and not the numbers again.
(ROCm#586 (comment))

sure Abishek
we will issue a seperate PR for this test suite changes

r-abishek · 2025-08-25T22:33:48Z

utilities/test_suite/HOST/Tensor_image_host.cpp

+
+                    startWallTime = omp_get_wtime();
+                    startCpuTime = clock();
+                    if (inputBitDepth == 0 || inputBitDepth == 1 || inputBitDepth == 2 || inputBitDepth == 5)


In another seperate PR please also convert the 0/1/2/3 to enum.
For all test suite types.

Copilot

Pull Request Overview

This PR implements the solarize image augmentation effect for both HIP (GPU) and HOST (CPU) tensor operations. Solarize is an image effect that inverts pixel values above a specified threshold, creating a photographic negative effect for those regions.

Key changes include:

Complete implementation of solarize augmentation supporting multiple data types (U8, F32, F16, I8) and tensor layouts (NCHW, NHWC)
Integration of solarize into test suites and augmentation groupings
Addition of proper API headers and function declarations

Reviewed Changes

Copilot reviewed 10 out of 13 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
utilities/test_suite/rpp_test_suite_image.h	Adds solarize enum (95) and string mapping for test framework
utilities/test_suite/common.py	Registers solarize in augmentation maps and effect groupings
utilities/test_suite/HOST/Tensor_image_host.cpp	Implements HOST test case for solarize with threshold parameter
utilities/test_suite/HIP/Tensor_image_hip.cpp	Implements HIP test case with memory allocation and kernel invocation
src/modules/tensor/rppt_tensor_effects_augmentations.cpp	Core solarize implementation for both HOST and GPU backends
src/modules/tensor/hip/kernel/solarize.cpp	HIP kernel implementation with CUDA device functions
src/modules/tensor/cpu/kernel/solarize.cpp	CPU implementation with SIMD optimizations using AVX2
src/include/tensor/host_tensor_executors.hpp	Function declarations for HOST solarize implementations
src/include/tensor/hip_tensor_executors.hpp	Template function declaration for HIP solarize executor
api/rppt_tensor_effects_augmentations.h	Public API documentation and function declarations

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

utilities/test_suite/common.py

src/modules/tensor/cpu/kernel/solarize.cpp

Srihari-mcw · 2025-08-26T05:30:27Z

src/include/tensor/hip_tensor_executors.hpp

                                RpptRoiType roiType,
                                rpp::Handle& handle);

+// -------------------- Solarize --------------------


small case here?

Srihari-mcw · 2025-08-26T06:00:24Z

src/modules/tensor/cpu/kernel/solarize.cpp

+        Rpp32u bufferLength = roi.xywhROI.roiWidth * layoutParams.bufferMultiplier;
+        Rpp32u vectorIncrement = 48;
+        Rpp32u vectorIncrementPerChannel = 16;
+        Rpp32u alignedLength = (bufferLength / vectorIncrement) * vectorIncrement;


Please put alignedLength inside AVX2 condition

Srihari-mcw · 2025-08-26T06:03:45Z

src/modules/tensor/cpu/kernel/solarize.cpp

+        srcPtrChannel = srcPtrImage + (roi.xywhROI.xy.y * srcDescPtr->strides.hStride) + (roi.xywhROI.xy.x * layoutParams.bufferMultiplier);
+        dstPtrChannel = dstPtrImage;
+
+        Rpp32u bufferLength = roi.xywhROI.roiWidth * layoutParams.bufferMultiplier;


Declare buffer length alone before srcPtrChannel to maintain uniformity across kernels

Srihari-mcw · 2025-08-26T06:04:56Z

src/modules/tensor/cpu/kernel/solarize.cpp

+            dstPtrRowR = dstPtrChannel;
+            dstPtrRowG = dstPtrRowR + dstDescPtr->strides.cStride;
+            dstPtrRowB = dstPtrRowG + dstDescPtr->strides.cStride;
+            for(int i = 0; i < roi.xywhROI.roiHeight; i++)


Add an empty line before

Srihari-mcw · 2025-08-26T06:05:16Z

src/modules/tensor/cpu/kernel/solarize.cpp

+                dstPtrTempR = dstPtrRowR;
+                dstPtrTempG = dstPtrRowG;
+                dstPtrTempB = dstPtrRowB;
+                int vectorLoopCount = 0;


Add an empty line

before vectorLoopCount= 0

Srihari-mcw · 2025-08-26T06:06:27Z

src/modules/tensor/cpu/kernel/solarize.cpp

+                {
+                    __m256 p[6];
+                    rpp_simd_load(rpp_load48_u8pkd3_to_f32pln3_avx, srcPtrTemp, p);                               // simd loads
+                    compute_solarize_host<6>(p, pxThresholdParam, avx_p255);                                           // threshold adjustment


Keep comments in the same line vertically

Srihari-mcw · 2025-08-26T06:08:26Z

src/modules/tensor/cpu/kernel/solarize.cpp

+                    *dstPtrTempR++ = (*(srcPtrTemp) >= thresholdParam) ? (maxVal - *(srcPtrTemp)) : *(srcPtrTemp);
+                    *dstPtrTempG++ = (*(srcPtrTemp + 1) >= thresholdParam) ? (maxVal - *(srcPtrTemp + 1)) : *(srcPtrTemp + 1);
+                    *dstPtrTempB++ = (*(srcPtrTemp + 2) >= thresholdParam) ? (maxVal - *(srcPtrTemp + 2)) : *(srcPtrTemp + 2);
+                    srcPtrTemp += 3;


Please add empty lines appropriately before and after lines

similar to other files

Srihari-mcw · 2025-08-26T06:11:14Z

src/modules/tensor/cpu/kernel/solarize.cpp

+        }
+        else
+        {
+            Rpp32u alignedLength = bufferLength & ~(vectorIncrementPerChannel - 1);


Enclose this and others inside AVX2 flag

Srihari-mcw · 2025-08-26T06:23:58Z

src/modules/tensor/cpu/kernel/solarize.cpp

+}
+
+RppStatus solarize_f16_f16_host_tensor(Rpp16f *srcPtr,
+                                        RpptDescPtr srcDescPtr,


Please indent all params in same line and check for similar places

Srihari-mcw · 2025-08-26T06:25:29Z

src/modules/tensor/cpu/kernel/solarize.cpp

+        Rpp32u bufferLength = roi.xywhROI.roiWidth * layoutParams.bufferMultiplier;
+        const Rpp32u vectorIncrement = 48;
+        const Rpp32u vectorIncrementPerChannel = 16;
+        Rpp32u alignedLength = (bufferLength / vectorIncrement) * vectorIncrement;


Same comment as earlier, pls check for instances across the file. Thanks

Srihari-mcw · 2025-08-26T06:33:29Z

src/modules/tensor/cpu/kernel/solarize.cpp

+                    for (; vectorLoopCount < alignedLength; vectorLoopCount += vectorIncrementPerChannel)
+                    {
+#if __AVX2__
+                        __m256 p[1];


Please use __m256 p; and pass the address. Thanks - For maintaining consistency

Srihari-mcw · 2025-08-26T06:40:54Z

src/modules/tensor/cpu/kernel/solarize.cpp

+                    }
+                    for (; vectorLoopCount < bufferLength; vectorLoopCount++)
+                    {
+                        *dstPtrTemp++ = ((*srcPtrTemp + offset) >= thresholdParam) ? (maxVal - (*srcPtrTemp)) : *srcPtrTemp;


Can we add a comment on the requirement of the offset

api/rppt_tensor_effects_augmentations.h

src/modules/tensor/hip/kernel/solarize.cpp

Srihari-mcw

Please address review comments

r-abishek changed the base branch from master to develop August 8, 2025 00:18

r-abishek requested a review from Copilot August 8, 2025 00:18

r-abishek assigned HazarathKumarM Aug 8, 2025

r-abishek added the enhancement New feature or request label Aug 8, 2025

This comment was marked as outdated.

Sign in to view

r-abishek requested changes Aug 8, 2025

View reviewed changes

HazarathKumarM added 5 commits August 19, 2025 10:41

Solarize HIP and HOST implementation

eb33698

cleanup the code and fix pkd3-pkd3 performance

eaf72b8

Add golden output and doxygen comments

4c9037e

Add cheks for Threshold param

4723b89

modified case num for solarize

d9f65d3

HazarathKumarM force-pushed the hk/solarize branch from 84437fe to d9f65d3 Compare August 19, 2025 10:44

minor fix

e3677cd

r-abishek reviewed Aug 26, 2025

View reviewed changes

r-abishek requested a review from Copilot August 26, 2025 00:22

Copilot AI reviewed Aug 26, 2025

View reviewed changes

fix load/store calls

226f330

Srihari-mcw reviewed Aug 26, 2025

View reviewed changes

api/rppt_tensor_effects_augmentations.h Outdated Show resolved Hide resolved

Srihari-mcw reviewed Aug 26, 2025

View reviewed changes

src/modules/tensor/hip/kernel/solarize.cpp Show resolved Hide resolved

Srihari-mcw reviewed Aug 26, 2025

View reviewed changes

src/modules/tensor/hip/kernel/solarize.cpp Outdated Show resolved Hide resolved

Srihari-mcw requested changes Aug 26, 2025

View reviewed changes

Maddisetty and others added 2 commits August 26, 2025 06:56

Address review comments

1c44d7f

Merge remote-tracking branch '/develop' into hk/solarize

04e143c

r-abishek approved these changes Aug 28, 2025

View reviewed changes

r-abishek changed the base branch from develop to ar/solarize August 28, 2025 06:08

r-abishek merged commit 2564faa into r-abishek:ar/solarize Aug 28, 2025

		testCaseName = "solarize";


		for (int i = 0; i < batchSize; i++)


		#include "hip_tensor_executors.hpp"

		__device__ void solarize_hip_rgb_compute(d_float24 *pix_f24, float thresholdParam, float maxVal)

Conversation

HazarathKumarM commented Jun 26, 2025

Uh oh!

This comment was marked as outdated.

Uh oh!

r-abishek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r-abishek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!