JPEG Compression distortion Tensor HIP support by HazarathKumarM · Pull Request #413 · r-abishek/rpp

HazarathKumarM · 2025-03-20T03:40:00Z

No description provided.

…ocs/sphinx (r-abishek#487) Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.10.0 to 1.11.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.10.0...v1.11.0) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* adds Fog tensor support * modified comments minor fix in HIP kernel * changed c style casting to static_cast inside HIP kernel modified few variable names * added note in documentation added fog test case to randomOutputCase in HIP and HOST test suites * Introduced Greyness factor on Host FOG Kernel * Introduced Greyness factor on HIP Side * Added Grey scale support for Raw C code and PLN3 variant * Modified Converstion factor variable * Doxygen Outputs changed * Variable name changes * Changes in HUe Saturation Matrix * Fix output issue * Update CHANGELOG.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Updates to 1.9.10 including fog feature --------- Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: Lakshmi Kumar <lakshmi.kumar@amd.com> Co-authored-by: Rajy Rawther <Rajy.MeeyakhanRawther@amd.com> Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>

* Make initial changes for raw CPP version of warp perspective * Fix calls to compute_warp_perspective_src_loc function * Update changes to go through nearest neighbours case * AVX HOST codes for warp perspective initial * Fixes for accuracy in warp perspective * More fixes for accuracy in warp perspective * Update the cide for AVX version of Planar to Planar * Add bilinear u8 host code for warp perspective * Make updates to include functions for F32 data type * Make updates to use cast instead of set and fix issues with raw C implementation * Add i8 host codes * Add updates for F16 Bilinear Code * Update the initial HIP code for warp perspective * Update fixes for HIP code * Add Warp Perspective Nearest Neighbors F16 code for PKD3_to_PLN3 and PLN3_to_PLN3 * Add updates for PLN to PLN configuration * Add updates for PKD3 to PKD3 case * Rename variables * Update changes to log images separately for Bilinear and Nearest Neighbors * fixed bug in raw c code of PKD-PLN variant * minor bug fix for F16 PLN variants * minor fixes in HOST test suite * Update the HIP code for review comments and refactoring of device functions * Update the comments alignment * Rename functions and add cases in HOST and HIP runTests.py * Update indentations for compuatations and rename vectors * Update documentations and add more reference variables * Make more formatting changes * Make further updates by including test cases * Make updates to use reinterpret cast * Update reinterpret casts for PLN to PLN configuration u8 and i8 codes * Make updates to enclose code inside AVX2 flag * Make further changes to update type casting * Update the version * Make updates to add warp perspective image * Modify comments, update CHANGELOG and update flags * Update further comments in warp perspective * Add more comments for warp perspective * Update based on further review comments * Update the case number for warp_perspective in common.py * Address review comments * Make initial changes for raw CPP version of warp perspective * Fix calls to compute_warp_perspective_src_loc function * Update changes to go through nearest neighbours case * AVX HOST codes for warp perspective initial * Fixes for accuracy in warp perspective * More fixes for accuracy in warp perspective * Update the cide for AVX version of Planar to Planar * Add bilinear u8 host code for warp perspective * Make updates to include functions for F32 data type * Make updates to use cast instead of set and fix issues with raw C implementation * Add i8 host codes * Add updates for F16 Bilinear Code * Update the initial HIP code for warp perspective * Update fixes for HIP code * Add Warp Perspective Nearest Neighbors F16 code for PKD3_to_PLN3 and PLN3_to_PLN3 * Add updates for PLN to PLN configuration * Add updates for PKD3 to PKD3 case * Rename variables * Update changes to log images separately for Bilinear and Nearest Neighbors * fixed bug in raw c code of PKD-PLN variant * minor bug fix for F16 PLN variants * minor fixes in HOST test suite * Update the HIP code for review comments and refactoring of device functions * Update the comments alignment * Rename functions and add cases in HOST and HIP runTests.py * Update indentations for compuatations and rename vectors * Update documentations and add more reference variables * Make more formatting changes * Make further updates by including test cases * Make updates to use reinterpret cast * Update reinterpret casts for PLN to PLN configuration u8 and i8 codes * Make updates to enclose code inside AVX2 flag * Make further changes to update type casting * Make updates to add warp perspective image * Modify comments, update CHANGELOG and update flags * Update further comments in warp perspective * Add more comments for warp perspective * Update based on further review comments * Update the case number for warp_perspective in common.py * Address review comments * Fix conflits with warp perspective * Update version details * Merge branch 'ar/opt_warp_perspective' of https://github.com/r-abishek/rpp into opt_warp_perspective_rebased * Update version to 1.9.10 including warp perspective * Updates to convert to XYWH from LTRB instead of opposite * Update CHANGELOG.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Revert changes and convert to ltrb instead of xywh --------- Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: Rajy Rawther <Rajy.MeeyakhanRawther@amd.com> Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Package - remove clang from test * CMakeLists - remove BUILD_WITH_AMD_ADVANCE * Package - Add OMP dependency * Find Packages - Updates * Test Package - Deps * Test - backend Info * Tests - Fix HIP Test Add * Test - Add HIP Path * Test - Find HIP Updates * Tests - Fix HIP compilation

* Jenkins - Fix Test * Test - Create sepreate test folder * FileSystem - Find and process * Find Filesystem - Updates * Test - Compiler Updates

…ocs/sphinx (r-abishek#491) Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.11.0 to 1.12.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.11.0...v1.12.0) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…ocs/sphinx (r-abishek#495) Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.12.0 to 1.12.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.12.0...v1.12.1) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Test Suite - Fix HIP Link * Tests - HIP Updates * Test - Link OpenCV * Test - Link dir updates * CMakeLists - Updates * Test Package - Deps * CXX Compiler & ROCm path updates * ROCm Path - Display info * OpenMP - Updates * Package Deps - Remove OpenMP

* pthread removal * c++14 removal --------- Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>

…ocs/sphinx (r-abishek#500) Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.12.1 to 1.13.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.12.1...v1.13.0) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* pthread removal * c++14 removal --------- Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>

* Readme - Updates * Package - Deps to OpenMP

* Rename Tensor_host to Tensor_image_host, and runTests to runImageTests * Rename Tensor_hip to Tensor_image_hip, and runTests to runImageTests * Rename all readme runTests.py to runImageTests.py --------- Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>

* added initial support for U8 PLN1-PLN1 variant * added support for U8 PKD3, PLN3 variants * modified algorithm to give RGB output for RGB images * moved common code outside the layout branch conditions * added support for toggle variation of U8 * added golden output for threshold * added threshold output for doxygen * added support for F32 bit depth * added support for I8 bitdepth * added F16 bitdepth support * added HIP support for U8 bitdepth * made changes to support remaining bitdepths * fixed output issues with I8 variant * removed commented code in HOST * added threshold test case in maps used in common.py * modified RPP_VERSION_MINOR value and changelog * fixed issues with doxygen modified globalThreads_x value for HIP kernel * made changes in I8 variants as per review comments * added more details for threshold documentation * Update version to 1.9.10 including threshold * Remove duplicate definitions of functions and minor bug fix * Minor docs fix * Minor docs fix --------- Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com> Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>

* Add Intial u8 implementation for Rain * Add I8 implementation and Changes based on the Review comments * Initial HIP implementation Add F32 and F16 Tensor Implementation * Add test case for Rain in HIP test suite code cleanup * minor code cleanup * Modified func names and removed unnecessary code * Resolve Review comments * replaced pinned memory with HIP memory for Rain Layer computation * Modified RGB Rain Mask to planar Rain Mask in HIP * Address review comments * Add Rain compute function * Add version changes and Resolve review comments * fix build warnings * Fix the outputs of f16 toggle variants * Revert Rain width changes * Fix pln3 outputs for u8 and i8 bitdepths * Resolve review comments * Modified load and store routines for planar cases * Resolve review comments * Modify docs image * Fix versioning * Updates to 1.9.10 including rain feature --------- Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>

…rmation (r-abishek#504) Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>

…k#506) * Lock nifti_clib commit in readme * Lock nifti_clib commit in common.groovy * Change to sudo apt install half in common.groovy

* experimental changes to detect error and print in the end in image test suite * Error handling added for host side * Error detection implemented on HIP SIDE * Voxel test suite changes * Changed variable name to camelCase * Changes in srcpath * Changes in Voxel test suite for nonQA case * Consolidated the repeated code and move it as common code * Minor changes based on review commands * Modification for bitDepth in voxel host * Merge with develop branch * Fix on error code display and gaussian Filter * F string bug is resolved * Fix for CI failure and some improvement in error detection * Changes based on review comments * Fix for rain and warp_perspective issue of unable to open file * Enhanced display for non implimented functionality --------- Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com> Co-authored-by: dineshbabu-ravichandran <dineshbabu.ravichandran@multicorewareinc.com> Co-authored-by: dineshbabu-ravichandran <dineshbabu.ravichandran@multicoreinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>

…ility (r-abishek#499) * initial commit * removed the usage of supported case list on HOST backend * Add Augmentation enum * removed supported caselists from all the python scripts in HIP backend * replacing case numbers with the enum's on the test suite files * Fix Indentation * Add enums for the swith cases * update Enum's for new cases * Fix errors in unit tests * Semicolon typo fix * Fix the test case naming in QA mode * removes unncessary print statements --------- Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>

…ocs/sphinx (r-abishek#511) Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.13.0 to 1.14.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.13.0...v1.14.1) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

r-abishek

@HazarathKumarM Please address few more comments

r-abishek · 2025-03-28T03:12:29Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+}
+
+// Horizontal Upsampling and Color conversion to RGB
+__device__ inline void Upsample_and_RGB_hip_compute(float4 Cb_f4, float4 Cr_f4, d_float8 *Ch1_f8, d_float8* Ch2_f8, d_float8 *Ch3_f8)


snake_case - change Upsample to upsample

r-abishek · 2025-03-28T18:41:34Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+    int hipThreadIdx_x4 = hipThreadIdx_x * 4;
+
+    int alignedWidth = ((roiTensorPtrSrc[id_z].xywhROI.roiWidth + 15) / 16) * 16;
+    int alignedHeight = ((roiTensorPtrSrc[id_z].xywhROI.roiHeight + 15) / 16) * 16;


Would roiTensorPtrSrc[id_z].xywhROI.roiWidth & ~15 work for these? Please print value on host and check. Its more optimal than the divide and multiply

r-abishek · 2025-03-28T19:13:43Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+        // Load valid pixels (only if id_x is within valid range)
+        if (validPixels > 0)
+        {
+            for (int i = 0; i < validPixels; i++)


Change to something like below. Please confirm output match.

for (int i = 0, idx = srcIdx; i < validPixels; i++, idx += 3) { src_smem[hipThreadIdx_y_channel.x][hipThreadIdx_x8 + i] = srcPtr[idx]; src_smem[hipThreadIdx_y_channel.y][hipThreadIdx_x8 + i] = srcPtr[idx + 1]; src_smem[hipThreadIdx_y_channel.z][hipThreadIdx_x8 + i] = srcPtr[idx + 2]; }

r-abishek · 2025-03-28T19:16:57Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+        for (int i = validPixels; i < min(validPixels + 16, 8); i++)
+        {
+            src_smem[hipThreadIdx_y_channel.x][hipThreadIdx_x8 + i] = srcPtr[srcIdx + ((validPixels - 1) * 3)];
+            src_smem[hipThreadIdx_y_channel.y][hipThreadIdx_x8 + i] = srcPtr[srcIdx + ((validPixels - 1) * 3) + 1];


Same comment here

r-abishek · 2025-03-28T19:17:53Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+    __syncthreads();
+    d_float8 y_f8;
+    int cbcrY = hipThreadIdx_y * 2;
+    y_hip_compute(srcPtr,(d_float8*)&src_smem[hipThreadIdx_y][hipThreadIdx_x8],(d_float8*)&src_smem[hipThreadIdx_y + 16][hipThreadIdx_x8],(d_float8*)&src_smem[hipThreadIdx_y + 32][hipThreadIdx_x8],&y_f8);


Spaces after commas

r-abishek · 2025-03-28T19:25:35Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+    {
+        float4 cb_f4, cr_f4;
+        // Downsample RGB and convert to CbCr
+        downsample_cbcr_hip_compute((d_float8*)&src_smem[cbcrY][hipThreadIdx_x8],(d_float8*)&src_smem[cbcrY + 1][hipThreadIdx_x8],(d_float8*)&src_smem[cbcrY + 16][hipThreadIdx_x8],(d_float8*)&src_smem[cbcrY + 17][hipThreadIdx_x8],(d_float8*)&src_smem[cbcrY + 32][hipThreadIdx_x8],(d_float8*)&src_smem[cbcrY + 33][hipThreadIdx_x8],&cb_f4, &cr_f4);


spaces after commas

r-abishek · 2025-03-28T23:14:38Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+
+    d_float8 y_f8;
+    int cbcrY = hipThreadIdx_y * 2;
+    y_hip_compute(srcPtr,(d_float8*)&src_smem[hipThreadIdx_y][hipThreadIdx_x8],(d_float8*)&src_smem[hipThreadIdx_y + 16][hipThreadIdx_x8],(d_float8*)&src_smem[hipThreadIdx_y + 32][hipThreadIdx_x8],&y_f8);


Spaces after commas

r-abishek · 2025-03-28T23:17:42Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+
+    //----INVERSE STEPS---
+    // 1D column wise IDCT for Y Cb and Cr channels
+    if (col < 128 && col < alignedWidth)


Please introduce parentheses for every instance like these

Copilot

Pull Request Overview

This pull request adds HIP support for JPEG compression distortion tensor tests by updating the test mapping and adjusting the test boundaries accordingly.

Added a new entry for "jpeg_compression_distortion" in the test mapping.
Updated the caseMax value in the HIP test runner to include the new test.

Reviewed Changes

Copilot reviewed 4 out of 10 changed files in this pull request and generated 1 comment.

File	Description
utilities/test_suite/common.py	Added a new mapping entry for "jpeg_compression_distortion" with HIP support.
utilities/test_suite/HIP/runImageTests.py	Updated caseMax to account for the new test mapping entry.

Files not reviewed (6)

api/rppt_tensor_geometric_augmentations.h: Language not supported
src/include/common/hip/rpp_hip_load_store.hpp: Language not supported
src/include/tensor/hip_tensor_executors.hpp: Language not supported
src/modules/tensor/rppt_tensor_geometric_augmentations.cpp: Language not supported
utilities/test_suite/HIP/Tensor_image_hip.cpp: Language not supported
utilities/test_suite/rpp_test_suite_image.h: Language not supported

Comments suppressed due to low confidence (1)

utilities/test_suite/HIP/runImageTests.py:43

Ensure that the update of caseMax to 93 aligns correctly with the updated test mapping in common.py, accounting for all test cases.

caseMax = 93

Copilot · 2025-04-01T17:12:51Z

utilities/test_suite/common.py

    91: ["tensor_stddev", "HOST", "HIP"],
-    92: ["slice", "HOST", "HIP"]
+    92: ["slice", "HOST", "HIP"],
+    93: ["jpeg_compression_distortion","HIP"],


The new test mapping entry for jpeg_compression_distortion only includes 'HIP' whereas similar entries include both 'HOST' and 'HIP'. If intended to be consistent with other mappings, please add 'HOST' as the missing parameter.

r-abishek

@HazarathKumarM Please address this round of comments

r-abishek · 2025-04-02T03:10:03Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+}
+
+// Generic clamping
+__device__ inline void clamp_range(float* values)


clamp_range can be changed to function overloading instead of templating. There will be four instances of the function, but lows and highs could be hard cooded, and no if-statements would exist.

This is for both the clamp_range functions here.

r-abishek · 2025-04-02T03:12:26Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+// Horizontal Upsampling and Color conversion to RGB
+__device__ inline void upsample_and_RGB_hip_compute(float4 Cb_f4, float4 Cr_f4, d_float8 *Ch1_f8, d_float8* Ch2_f8, d_float8 *Ch3_f8)
+{
+    d_float8 Cb_f8, Cr_f8, R_f8 , G_f8, B_f8;


rename to cb_f8, cr_f8, r_f8, g_f8, b_f8

r-abishek · 2025-04-02T03:12:51Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+{
+    d_float8 Cb_f8, Cr_f8, R_f8 , G_f8, B_f8;
+    // Copy Y values
+    d_float8 y_f8 = *((d_float8*)Ch1_f8);


rename to ch1_f8

r-abishek · 2025-04-02T03:32:28Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+
+    const float4 yR_f4 = (float4)0.299f;
+    const float4 yG_f4 = (float4)0.587f;
+    const float4 yB_f4 = (float4)0.114f;


Take these 3 constants out as global device constants, change this y_hip_compute to function overloading instead of templating to avoid the if-statement too.

__device__ const float4 yR_f4 = (float4)0.299f; __device__ const float4 yG_f4 = (float4)0.587f; __device__ const float4 yB_f4 = (float4)0.114f; __device__ const float4 maxVal255_f4 = (float4)255.0f; __device__ const float4 maxVal128_f4 = (float4)128.0f; __device__ inline void rgb_range_fix_hip_compute(float *src , d_float8 *r_f8, d_float8 *g_f8, d_float8 *b_f8) { rpp_hip_math_multiply8_const(r_f8, r_f8, maxVal255_f4); rpp_hip_math_multiply8_const(g_f8, g_f8, maxVal255_f4); rpp_hip_math_multiply8_const(b_f8, b_f8, maxVal255_f4); } __device__ inline void rgb_range_fix_hip_compute(half *src , d_float8 *r_f8, d_float8 *g_f8, d_float8 *b_f8) { rpp_hip_math_multiply8_const(r_f8, r_f8, maxVal255_f4); rpp_hip_math_multiply8_const(g_f8, g_f8, maxVal255_f4); rpp_hip_math_multiply8_const(b_f8, b_f8, maxVal255_f4); } __device__ inline void rgb_range_fix_hip_compute(schar *src , d_float8 *r_f8, d_float8 *g_f8, d_float8 *b_f8) { rpp_hip_math_multiply8_const(r_f8, r_f8, maxVal128_f4); rpp_hip_math_multiply8_const(g_f8, g_f8, maxVal128_f4); rpp_hip_math_multiply8_const(b_f8, b_f8, maxVal128_f4); } __device__ inline void rgb_range_fix_hip_compute(uchar *src , d_float8 *r_f8, d_float8 *g_f8, d_float8 *b_f8) {} __device__ inline void y_hip_compute(float *src , d_float8 *r_f8, d_float8 *g_f8, d_float8 *b_f8, d_float8 *y_f8) { // rgb range fix rgb_range_fix_hip_compute(src, r_f8, g_f8, b_f8); // RGB to Y conversion y_f8->f4[0] = r_f8->f4[0] * yR_f4 + g_f8->f4[0] * yG_f4 + b_f8->f4[0] * yB_f4; y_f8->f4[1] = r_f8->f4[1] * yR_f4 + g_f8->f4[1] * yG_f4 + b_f8->f4[1] * yB_f4; }

r-abishek · 2025-04-02T03:38:39Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+    int val = (-128.0f * sub_128);
+    float inp[8];
+    for(int i = 0; i < 8; i ++)
+       inp[i] = vec[i] + val;


You could use two float4 adds for this loop?

r-abishek · 2025-04-02T03:40:58Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+// DCT forward 1D implementation
+__device__ inline void dct_fwd_8x8_1d(float *vec, bool sub_128)
+{
+    int val = (-128.0f * sub_128);


Please change the variable name.

We ideally only use "_" followed by text in 3 instances:

_ like _f4 or _f8

_ in id_x, id_y, id_z

_smem if variable is shared memory

r-abishek · 2025-04-02T03:41:18Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+// Inverse 1D DCT
+__device__ inline void dct_inv_8x8_1d(float *vec, bool add_128)
+{
+    int val = (128.0f * add_128);


same comment

r-abishek · 2025-04-02T04:03:09Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+    vec[1] = normCoeff * ((dctA * temp4) - (dctC * temp5) + (dctD * temp6) - (dctF * temp7));
+    vec[3] = normCoeff * ((dctC * temp4) + (dctF * temp5) - (dctA * temp6) + (dctD * temp7));
+    vec[5] = normCoeff * ((dctD * temp4) + (dctA * temp5) + (dctF * temp6) - (dctC * temp7));
+    vec[7] = normCoeff * ((dctF * temp4) + (dctD * temp5) + (dctC * temp6) + (dctA * temp7));


Please vectorize these as much as possible. Some ideas are below, please explore other ideas and use the best version.

As an example, the variables defined, and their usage could be:

__device__ const float dctACDF = make_float4(1.387039845322148f, ...); __device__ const float dctCFAD = make_float4(1.175875602419359f, ...); __device__ const float dctDAFC = make_float4(0.785694958387102f, ...); __device__ const float dctFDCA = make_float4(0.275899379282943f, ...); { vec_f4 = dctACDF_f4 * temp4567_f4; vec[1] = vec_f4.x - vec_f4.y + vec_f4.z - vec_f4.w; vec_f4 = dctCFAD_f4 * temp4567_f4; vec[3] = vec_f4.x + vec_f4.y - vec_f4.z + vec_f4.w; <so same for vec[5] and vec[7]> Finally once all of vec[0-7] are available, multiply in one shot by a variable 'normCoeff_f8' }

facing this error and not supporting device float4 in HIP

error: dynamic initialization is not supported for __device__, __constant__, __shared__, and __managed__ variables. 163 | __device__ const float4 dctFDCA = make_float4(0.275899379282943f, 0.785694958387102f, 1.175875602419359f, 1.387039845322148f); | ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Yes, we cant use make_float4 globally. First lets change __device__ const to __constant__ if its only device memory. (For all instances).

Secondly, try something like this - https://rocmdocs.amd.com/projects/HIP/en/latest/how-to/hip_cpp_language_extensions.html#constant

r-abishek · 2025-04-02T04:03:53Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+
+    float temp8 = temp6 + temp7;
+    float temp9 = temp6 - temp7;
+    float temp10 = (dctC * inp[1]) - (dctD * inp[7]) - (dctF * inp[3]) - (dctA * inp[5]);


similar comment for inverse dct too

facing this error and not supporting device float4 in HIP

error: dynamic initialization is not supported for device, constant, shared, and managed variables.
163 | device const float4 dctFDCA = make_float4(0.275899379282943f, 0.785694958387102f, 1.175875602419359f, 1.387039845322148f);
| ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

same comment as above

r-abishek · 2025-04-04T03:40:10Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+// Clamping for signed char (schar)
+__device__ inline void clamp_range(schar *src, float* values)
+{
+    int low = -128, high = 127;


remove additional inits of low and high

r-abishek · 2025-04-04T03:51:34Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+{
+    rpp_hip_math_multiply8_const(r_f8, r_f8, maxVal255_f4);
+    rpp_hip_math_multiply8_const(g_f8, g_f8, maxVal255_f4);
+    rpp_hip_math_multiply8_const(b_f8, b_f8, maxVal255_f4);


try with multiply24_const

r-abishek · 2025-04-04T03:55:13Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+
+    // Convert to CbCr
+    *cb_f4 = avgR_f4 * (float4)-0.168736f + avgG_f4 * (float4)-0.331264f + avgB_f4 * (float4)0.500000f + (float4)128.0f;
+    *cr_f4 = avgR_f4 * (float4)0.500000f  + avgG_f4 * (float4)-0.418688f + avgB_f4 * (float4)-0.081312f + (float4)128.0f;


Can we consolidate all the 0.5 multiplies to one (float4)0.5f multiply

r-abishek · 2025-04-04T03:55:26Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+    d_float8 avgR_f8, avgG_f8, avgB_f8;
+    avgR_f8.f4[0] = (r1_f8->f4[0] + r2_f8->f4[0]) * (float4)0.5f;
+    avgR_f8.f4[1] = (r1_f8->f4[1] + r2_f8->f4[1]) * (float4)0.5f;
+    avgG_f8.f4[0] = (g1_f8->f4[0] + g2_f8->f4[0]) * (float4)0.5f;


Can we consolidate to 0.25

r-abishek · 2025-04-04T04:00:30Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+    vec[6] = normCoeff * ((dctBE.y * (temp1_f4.x - temp1_f4.w)) - (dctBE.x * (temp1_f4.y - temp1_f4.z)));
+
+    vec[1] = normCoeff * ((dctACDF.x * temp2_f4.x) - (dctACDF.y * temp2_f4.y) + (dctACDF.z * temp2_f4.z) - (dctACDF.w * temp2_f4.w));
+    vec[3] = normCoeff * ((dctACDF.y * temp2_f4.x) + (dctACDF.w * temp2_f4.y) - (dctACDF.x * temp2_f4.z) + (dctACDF.z * temp2_f4.w));


First priority float4 multiplies, second priority fmaf of single floats, third priority is above

r-abishek · 2025-04-04T04:00:47Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+    vec[1] = normCoeff * ((dctACDF.x * temp2_f4.x) - (dctACDF.y * temp2_f4.y) + (dctACDF.z * temp2_f4.z) - (dctACDF.w * temp2_f4.w));
+    vec[3] = normCoeff * ((dctACDF.y * temp2_f4.x) + (dctACDF.w * temp2_f4.y) - (dctACDF.x * temp2_f4.z) + (dctACDF.z * temp2_f4.w));
+    vec[5] = normCoeff * ((dctACDF.z * temp2_f4.x) + (dctACDF.x * temp2_f4.y) + (dctACDF.w * temp2_f4.z) - (dctACDF.y * temp2_f4.w));
+    vec[7] = normCoeff * ((dctACDF.w * temp2_f4.x) + (dctACDF.z * temp2_f4.y) + (dctACDF.y * temp2_f4.z) + (dctACDF.x * temp2_f4.w));


(float4)normCoeff

r-abishek · 2025-04-04T04:08:16Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+    upsample_and_RGB_hip_compute(cb_f4, cr_f4, (d_float8*)&src_smem[hipThreadIdx_y_channel.x][hipThreadIdx_x8], (d_float8*)&src_smem[hipThreadIdx_y_channel.y][hipThreadIdx_x8], (d_float8*)&src_smem[hipThreadIdx_y_channel.z][hipThreadIdx_x8]);
+    __syncthreads();
+
+    // Clamp values and store results


Add broader step grouping

r-abishek · 2025-04-08T00:44:35Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+}
+
+// Computing Y from R G B
+__device__ inline void y_hip_compute(float* src, d_float24 rgb_f24, d_float8* y_f8)


@HazarathKumarM We are creating a second copy of rgb_f24 here. Pls pass the pointer.

Please receive call-by-reference as d_float24* rgb_f24 and use as below. And again please add parentheses for mul and adds here for all instances.

__device__ inline void y_hip_compute(float* src, d_float24* rgb_f24, d_float8* y_f8) { rpp_hip_math_multiply24_const(rgb_f24, rgb_f24, maxVal255_f4); y_f8->f4[0] = (rgb_f24->f8[0].f4[0] * yR_f4) + (rgb_f24->f8[1].f4[0] * yG_f4) + (rgb_f24->f8[2].f4[0] * yB_f4); y_f8->f4[1] = (rgb_f24->f8[0].f4[1] * yR_f4) + (rgb_f24->f8[1].f4[1] * yG_f4) + (rgb_f24->f8[2].f4[1] * yB_f4); }

r-abishek · 2025-04-08T01:21:02Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+}
+
+template <typename T>
+__global__ void jpeg_compression_distortion_pkd3_hip_tensor(T *srcPtr,


Doing a diffchecker on all these variants, I see a lot of common code that can be consolidated rather than having separate versions - should be able to reduce lines of code.

Please see diffs here:

pkd3 vs pln3 -> Entire code from "Step 2" is 99% same, just written with formatting differences -> https://www.diffchecker.com/ApUeog6b/

pkd3 vs pkd3_pln3 -> Lot of similarities -> https://www.diffchecker.com/9Segl36S/

r-abishek · 2025-04-08T01:23:17Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+    float temp8 = temp6 + temp7;
+    float temp9 = temp6 - temp7;
+    float temp10 = (dctACDF.y * vec1_f4.y) - (dctACDF.z * vec2_f4.w) - (dctACDF.w * vec1_f4.w) - (dctACDF.x * vec2_f4.y);
+    float temp11 = (dctACDF.z * vec1_f4.y) + (dctACDF.y * vec2_f4.w) - (dctACDF.x * vec1_f4.w) + (dctACDF.w * vec2_f4.y);


Please use fmaf(fmaf(fmaf())) to consolidate temp4, temp5, temp10, temp11 etc

r-abishek · 2025-04-08T01:34:23Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

+    avgB_f4 = avgB_f4 * (float4)0.25;
+
+    // Convert to CbCr
+    *cb_f4 = (avgR_f4 * (float4)-0.168736f) + (avgG_f4 * (float4)-0.331264f) + (avgB_f4 * (float4)0.500000f) + maxVal128_f4;


Initialize the multiplied vectorized version of the constants as a local variable inside the kernel, and pass into this function.

Premulitply and hard code all the stuff in <>.

float4 cbConsts_f4[3] = {(float4)<-0.168736f * 0.25f>, (float4)<-0.331264f * 0.25f>, (float4)<0.500000f * 0.25f>}; float4 crConsts_f4[3] = {(float4)<0.500000f * 0.25f>, (float4)<-0.418688f * 0.25f>, (float4)<-0.081312f * 0.25f>}; <then pass as reference>: downsample_cbcr_hip_compute(..., ..., cbConsts_f4, crConsts_f4); <receive as>: __device__ inline void downsample_cbcr_hip_compute(d_float8 *r1_f8, d_float8 *r2_f8, d_float8 *g1_f8, d_float8 *g2_f8, d_float8 *b1_f8, d_float8 *b2_f8, float4 *cb_f4, float4 *cr_f4, float4* cbConsts_f4, float4* crConsts_f4) { }

r-abishek · 2025-04-09T03:10:43Z

src/include/common/hip/rpp_hip_load_store.hpp

+}
+
+// For half precision implementation
+__device__ __forceinline__ void rpp_hip_load24_pkd3_to_float24_pln3(half *srcPtr, float **srcPtrs_f24)


"For half precision implementation" isn't clear

@HazarathKumarM For all the 8 new functions added in this file, change comments to follow same convention of other functions in this file - Both phrased and formatted the same way.

dependabot bot and others added 30 commits March 19, 2025 14:20

Jenkins - Fix Test (r-abishek#489)

bb30925

* Jenkins - Fix Test * Test - Create sepreate test folder * FileSystem - Find and process * Find Filesystem - Updates * Test - Compiler Updates

jpeg

0c61a44

Compiler updates - C++ 14 Removal (r-abishek#496)

814df5d

* pthread removal * c++14 removal --------- Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>

zigzag ordering

9a330e2

zigzag ordering

d7c1560

Compiler updates - C++ 14 Removal (r-abishek#496)

c8dca6d

* pthread removal * c++14 removal --------- Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>

Readme & Package - Updates (r-abishek#503)

dca75e5

* Readme - Updates * Package - Deps to OpenMP

minor changes

8a9a331

Docs - Created a more comprehensive installation from the readme info…

3815fb7

…rmation (r-abishek#504) Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>

Test suite build fix - Lock to specific 'nifti_clib' commit (r-abishe…

10fb983

…k#506) * Lock nifti_clib commit in readme * Lock nifti_clib commit in common.groovy * Change to sudo apt install half in common.groovy

Debugging stage

1b2b62c

Test Package - Updates (r-abishek#507)

aac238b

OpenMP - Updates (r-abishek#510)

f2788fd

Update common.groovy

aa3c841

Fixes for Segmentation fault

46c51e3

r-abishek requested changes Mar 28, 2025

View reviewed changes

r-abishek assigned HazarathKumarM Mar 28, 2025

r-abishek added the enhancement New feature or request label Mar 28, 2025

r-abishek added this to the sow12ms2 milestone Mar 28, 2025

HazarathKumarM added 2 commits March 31, 2025 17:44

resolve review comments

736481e

cleanup the code

1f27bd4

r-abishek requested a review from Copilot April 1, 2025 17:12

Copilot AI reviewed Apr 1, 2025

View reviewed changes

r-abishek requested changes Apr 2, 2025

View reviewed changes

HazarathKumarM added 2 commits April 2, 2025 16:30

Resolve review comments

90f0748

Optimize dct fwd and inverse functions

023eda4

r-abishek reviewed Apr 4, 2025

View reviewed changes

HazarathKumarM added 2 commits April 4, 2025 16:08

resolve review comments

4a99271

categorized the code

f343822

r-abishek reviewed Apr 8, 2025

View reviewed changes

moved common code of all function

ad45033

r-abishek reviewed Apr 9, 2025

View reviewed changes

Add comments for load/store functions

3f1fec3

r-abishek approved these changes Apr 9, 2025

View reviewed changes

r-abishek changed the base branch from develop to ar/opt_jpeg_distortion_hip April 9, 2025 03:42

r-abishek merged commit 483a37a into r-abishek:ar/opt_jpeg_distortion_hip Apr 9, 2025

Conversation

HazarathKumarM commented Mar 20, 2025

Uh oh!

r-abishek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

r-abishek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r-abishek Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

r-abishek Apr 2, 2025 •

edited

Loading