Skip to content

JPEG Compression distortion Tensor HIP support#413

Merged
r-abishek merged 69 commits intor-abishek:ar/opt_jpeg_distortion_hipfrom
HazarathKumarM:hk/jpeg_distortion_hip
Apr 9, 2025
Merged

JPEG Compression distortion Tensor HIP support#413
r-abishek merged 69 commits intor-abishek:ar/opt_jpeg_distortion_hipfrom
HazarathKumarM:hk/jpeg_distortion_hip

Conversation

@HazarathKumarM
Copy link
Copy Markdown
Collaborator

No description provided.

dependabot bot and others added 30 commits March 19, 2025 14:20
…ocs/sphinx (r-abishek#487)

Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.10.0 to 1.11.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v1.10.0...v1.11.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* adds Fog tensor support

* modified comments

minor fix in HIP kernel

* changed c style casting to static_cast inside HIP kernel

modified few variable names

* added note in documentation

added fog test case to randomOutputCase in HIP and HOST test suites

* Introduced Greyness factor on Host FOG Kernel

* Introduced Greyness factor on HIP Side

* Added Grey scale support for Raw C code and PLN3 variant

* Modified Converstion factor variable

* Doxygen Outputs changed

* Variable name changes

* Changes in HUe Saturation Matrix

* Fix output issue

* Update CHANGELOG.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Updates to 1.9.10 including fog feature

---------

Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
Co-authored-by: Lakshmi Kumar <lakshmi.kumar@amd.com>
Co-authored-by: Rajy Rawther <Rajy.MeeyakhanRawther@amd.com>
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>
* Make initial changes for raw CPP version of warp perspective

* Fix calls to compute_warp_perspective_src_loc function

* Update changes to go through nearest neighbours case

* AVX HOST codes for warp perspective initial

* Fixes for accuracy in warp perspective

* More fixes for accuracy in warp perspective

* Update the cide for AVX version of Planar to Planar

* Add bilinear u8 host code for warp perspective

* Make updates to include functions for F32 data type

* Make updates to use cast instead of set and fix issues with raw C implementation

* Add i8 host codes

* Add updates for F16 Bilinear Code

* Update the initial HIP code for warp perspective

* Update fixes for HIP code

* Add Warp Perspective Nearest Neighbors F16 code for PKD3_to_PLN3 and PLN3_to_PLN3

* Add updates for PLN to PLN configuration

* Add updates for PKD3 to PKD3 case

* Rename variables

* Update changes to log images separately for Bilinear and Nearest Neighbors

* fixed bug in raw c code of PKD-PLN variant

* minor bug fix for F16 PLN variants

* minor fixes in HOST test suite

* Update the HIP code for review comments and refactoring of device functions

* Update the comments alignment

* Rename functions and add cases in HOST and HIP runTests.py

* Update indentations for compuatations and rename vectors

* Update documentations and add more reference variables

* Make more formatting changes

* Make further updates by including test cases

* Make updates to use reinterpret cast

* Update reinterpret casts for PLN to PLN configuration u8 and i8 codes

* Make updates to enclose code inside AVX2 flag

* Make further changes to update type casting

* Update the version

* Make updates to add warp perspective image

* Modify comments, update CHANGELOG and update flags

* Update further comments in warp perspective

* Add more comments for warp perspective

* Update based on further review comments

* Update the case number for warp_perspective in common.py

* Address review comments

* Make initial changes for raw CPP version of warp perspective

* Fix calls to compute_warp_perspective_src_loc function

* Update changes to go through nearest neighbours case

* AVX HOST codes for warp perspective initial

* Fixes for accuracy in warp perspective

* More fixes for accuracy in warp perspective

* Update the cide for AVX version of Planar to Planar

* Add bilinear u8 host code for warp perspective

* Make updates to include functions for F32 data type

* Make updates to use cast instead of set and fix issues with raw C implementation

* Add i8 host codes

* Add updates for F16 Bilinear Code

* Update the initial HIP code for warp perspective

* Update fixes for HIP code

* Add Warp Perspective Nearest Neighbors F16 code for PKD3_to_PLN3 and PLN3_to_PLN3

* Add updates for PLN to PLN configuration

* Add updates for PKD3 to PKD3 case

* Rename variables

* Update changes to log images separately for Bilinear and Nearest Neighbors

* fixed bug in raw c code of PKD-PLN variant

* minor bug fix for F16 PLN variants

* minor fixes in HOST test suite

* Update the HIP code for review comments and refactoring of device functions

* Update the comments alignment

* Rename functions and add cases in HOST and HIP runTests.py

* Update indentations for compuatations and rename vectors

* Update documentations and add more reference variables

* Make more formatting changes

* Make further updates by including test cases

* Make updates to use reinterpret cast

* Update reinterpret casts for PLN to PLN configuration u8 and i8 codes

* Make updates to enclose code inside AVX2 flag

* Make further changes to update type casting

* Make updates to add warp perspective image

* Modify comments, update CHANGELOG and update flags

* Update further comments in warp perspective

* Add more comments for warp perspective

* Update based on further review comments

* Update the case number for warp_perspective in common.py

* Address review comments

* Fix conflits with warp perspective

* Update version details

* Merge branch 'ar/opt_warp_perspective' of https://github.com/r-abishek/rpp into opt_warp_perspective_rebased

* Update version to 1.9.10 including warp perspective

* Updates to convert to XYWH from LTRB instead of opposite

* Update CHANGELOG.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Revert changes and convert to ltrb instead of xywh

---------

Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>
Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
Co-authored-by: Rajy Rawther <Rajy.MeeyakhanRawther@amd.com>
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
* Package - remove clang from test

* CMakeLists - remove BUILD_WITH_AMD_ADVANCE

* Package - Add OMP dependency

* Find Packages - Updates

* Test Package - Deps

* Test - backend Info

* Tests - Fix HIP Test Add

* Test - Add HIP Path

* Test - Find HIP Updates

* Tests - Fix HIP compilation
* Jenkins - Fix Test

* Test - Create sepreate test folder

* FileSystem - Find and process

* Find Filesystem - Updates

* Test - Compiler Updates
…ocs/sphinx (r-abishek#491)

Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.11.0 to 1.12.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v1.11.0...v1.12.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ocs/sphinx (r-abishek#495)

Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.12.0 to 1.12.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v1.12.0...v1.12.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Test Suite - Fix HIP Link

* Tests - HIP Updates

* Test - Link OpenCV

* Test - Link dir updates

* CMakeLists - Updates

* Test Package - Deps

* CXX Compiler & ROCm path updates

* ROCm Path - Display info

* OpenMP - Updates

* Package Deps - Remove OpenMP
* pthread removal

* c++14 removal

---------

Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
…ocs/sphinx (r-abishek#500)

Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.12.1 to 1.13.0.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v1.12.1...v1.13.0)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* pthread removal

* c++14 removal

---------

Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
* Readme - Updates

* Package - Deps to OpenMP
* Rename Tensor_host to Tensor_image_host, and runTests to runImageTests

* Rename Tensor_hip to Tensor_image_hip, and runTests to runImageTests

* Rename all readme runTests.py to runImageTests.py

---------

Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>
* Rename Tensor_host to Tensor_image_host, and runTests to runImageTests

* Rename Tensor_hip to Tensor_image_hip, and runTests to runImageTests

* Rename all readme runTests.py to runImageTests.py

---------

Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>
* added initial support for U8 PLN1-PLN1 variant

* added support for U8 PKD3, PLN3 variants

* modified algorithm to give RGB output for RGB images

* moved common code outside the layout branch conditions

* added support for toggle variation of U8

* added golden output for threshold

* added threshold output for doxygen

* added support for F32 bit depth

* added support for I8 bitdepth

* added F16 bitdepth support

* added HIP support for U8 bitdepth

* made changes to support remaining bitdepths

* fixed output issues with I8 variant

* removed commented code in HOST

* added threshold test case in maps used in common.py

* modified RPP_VERSION_MINOR value and changelog

* fixed issues with doxygen

modified globalThreads_x value for HIP kernel

* made changes in I8 variants as per review comments

* added more details for threshold documentation

* Update version to 1.9.10 including threshold

* Remove duplicate definitions of functions and minor bug fix

* Minor docs fix

* Minor docs fix

---------

Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>
Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>
* Add Intial u8 implementation for Rain

* Add I8 implementation and Changes based on the Review comments

* Initial HIP implementation
Add F32 and F16 Tensor Implementation

* Add test case for Rain in HIP test suite
code cleanup

* minor code cleanup

* Modified func names and removed unnecessary code

* Resolve Review comments

* replaced pinned memory with HIP memory for Rain Layer computation

* Modified RGB Rain Mask to planar Rain Mask in HIP

* Address review comments

* Add Rain compute function

* Add version changes and Resolve review comments

* fix build warnings

* Fix the outputs of f16 toggle variants

* Revert Rain width changes

* Fix pln3 outputs for u8 and i8 bitdepths

* Resolve review comments

* Modified load and store routines for planar cases

* Resolve review comments

* Modify docs image

* Fix versioning

* Updates to 1.9.10 including rain feature

---------

Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>
…rmation (r-abishek#504)

Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
…k#506)

* Lock nifti_clib commit in readme

* Lock nifti_clib commit in common.groovy

* Change to sudo apt install half in common.groovy
* experimental changes to detect error and print in the  end in image test suite

* Error handling added for host side

* Error detection implemented on HIP SIDE

* Voxel test suite changes

* Changed variable name to camelCase

* Changes in srcpath

* Changes in Voxel test suite for nonQA case

* Consolidated the repeated code and move it as common code

* Minor changes based on review commands

* Modification for bitDepth in voxel host

* Merge with develop branch

* Fix on error code display and gaussian Filter

* F string bug is resolved

* Fix for CI failure and some improvement in error detection

* Changes based on review comments

* Fix for rain and warp_perspective issue of unable to open file

* Enhanced display for non implimented functionality

---------

Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>
Co-authored-by: dineshbabu-ravichandran <dineshbabu.ravichandran@multicorewareinc.com>
Co-authored-by: dineshbabu-ravichandran <dineshbabu.ravichandran@multicoreinc.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
…ility (r-abishek#499)

* initial commit

* removed the usage of supported case list on HOST backend

* Add Augmentation enum

* removed supported caselists from all the python scripts in HIP backend

* replacing case numbers with the enum's on the test suite files

* Fix Indentation

* Add enums for the swith cases

* update Enum's for new cases

* Fix errors in unit tests

* Semicolon typo fix

* Fix the test case naming in QA mode

* removes unncessary print statements

---------

Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
…ocs/sphinx (r-abishek#511)

Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.13.0 to 1.14.1.
- [Release notes](https://github.com/ROCm/rocm-docs-core/releases)
- [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md)
- [Commits](ROCm/rocm-docs-core@v1.13.0...v1.14.1)

---
updated-dependencies:
- dependency-name: rocm-docs-core[api_reference]
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Owner

@r-abishek r-abishek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HazarathKumarM Please address few more comments

}

// Horizontal Upsampling and Color conversion to RGB
__device__ inline void Upsample_and_RGB_hip_compute(float4 Cb_f4, float4 Cr_f4, d_float8 *Ch1_f8, d_float8* Ch2_f8, d_float8 *Ch3_f8)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snake_case - change Upsample to upsample

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

int hipThreadIdx_x4 = hipThreadIdx_x * 4;

int alignedWidth = ((roiTensorPtrSrc[id_z].xywhROI.roiWidth + 15) / 16) * 16;
int alignedHeight = ((roiTensorPtrSrc[id_z].xywhROI.roiHeight + 15) / 16) * 16;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would roiTensorPtrSrc[id_z].xywhROI.roiWidth & ~15 work for these? Please print value on host and check. Its more optimal than the divide and multiply

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Load valid pixels (only if id_x is within valid range)
if (validPixels > 0)
{
for (int i = 0; i < validPixels; i++)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to something like below. Please confirm output match.

for (int i = 0, idx = srcIdx; i < validPixels; i++, idx += 3)
{
    src_smem[hipThreadIdx_y_channel.x][hipThreadIdx_x8 + i] = srcPtr[idx];
    src_smem[hipThreadIdx_y_channel.y][hipThreadIdx_x8 + i] = srcPtr[idx + 1];
    src_smem[hipThreadIdx_y_channel.z][hipThreadIdx_x8 + i] = srcPtr[idx + 2];
}

for (int i = validPixels; i < min(validPixels + 16, 8); i++)
{
src_smem[hipThreadIdx_y_channel.x][hipThreadIdx_x8 + i] = srcPtr[srcIdx + ((validPixels - 1) * 3)];
src_smem[hipThreadIdx_y_channel.y][hipThreadIdx_x8 + i] = srcPtr[srcIdx + ((validPixels - 1) * 3) + 1];
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

__syncthreads();
d_float8 y_f8;
int cbcrY = hipThreadIdx_y * 2;
y_hip_compute(srcPtr,(d_float8*)&src_smem[hipThreadIdx_y][hipThreadIdx_x8],(d_float8*)&src_smem[hipThreadIdx_y + 16][hipThreadIdx_x8],(d_float8*)&src_smem[hipThreadIdx_y + 32][hipThreadIdx_x8],&y_f8);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spaces after commas

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

{
float4 cb_f4, cr_f4;
// Downsample RGB and convert to CbCr
downsample_cbcr_hip_compute((d_float8*)&src_smem[cbcrY][hipThreadIdx_x8],(d_float8*)&src_smem[cbcrY + 1][hipThreadIdx_x8],(d_float8*)&src_smem[cbcrY + 16][hipThreadIdx_x8],(d_float8*)&src_smem[cbcrY + 17][hipThreadIdx_x8],(d_float8*)&src_smem[cbcrY + 32][hipThreadIdx_x8],(d_float8*)&src_smem[cbcrY + 33][hipThreadIdx_x8],&cb_f4, &cr_f4);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaces after commas

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


d_float8 y_f8;
int cbcrY = hipThreadIdx_y * 2;
y_hip_compute(srcPtr,(d_float8*)&src_smem[hipThreadIdx_y][hipThreadIdx_x8],(d_float8*)&src_smem[hipThreadIdx_y + 16][hipThreadIdx_x8],(d_float8*)&src_smem[hipThreadIdx_y + 32][hipThreadIdx_x8],&y_f8);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spaces after commas

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


//----INVERSE STEPS---
// 1D column wise IDCT for Y Cb and Cr channels
if (col < 128 && col < alignedWidth)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please introduce parentheses for every instance like these

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@r-abishek r-abishek added the enhancement New feature or request label Mar 28, 2025
@r-abishek r-abishek added this to the sow12ms2 milestone Mar 28, 2025
@r-abishek r-abishek requested a review from Copilot April 1, 2025 17:12
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds HIP support for JPEG compression distortion tensor tests by updating the test mapping and adjusting the test boundaries accordingly.

  • Added a new entry for "jpeg_compression_distortion" in the test mapping.
  • Updated the caseMax value in the HIP test runner to include the new test.

Reviewed Changes

Copilot reviewed 4 out of 10 changed files in this pull request and generated 1 comment.

File Description
utilities/test_suite/common.py Added a new mapping entry for "jpeg_compression_distortion" with HIP support.
utilities/test_suite/HIP/runImageTests.py Updated caseMax to account for the new test mapping entry.
Files not reviewed (6)
  • api/rppt_tensor_geometric_augmentations.h: Language not supported
  • src/include/common/hip/rpp_hip_load_store.hpp: Language not supported
  • src/include/tensor/hip_tensor_executors.hpp: Language not supported
  • src/modules/tensor/rppt_tensor_geometric_augmentations.cpp: Language not supported
  • utilities/test_suite/HIP/Tensor_image_hip.cpp: Language not supported
  • utilities/test_suite/rpp_test_suite_image.h: Language not supported
Comments suppressed due to low confidence (1)

utilities/test_suite/HIP/runImageTests.py:43

  • Ensure that the update of caseMax to 93 aligns correctly with the updated test mapping in common.py, accounting for all test cases.
caseMax = 93

91: ["tensor_stddev", "HOST", "HIP"],
92: ["slice", "HOST", "HIP"]
92: ["slice", "HOST", "HIP"],
93: ["jpeg_compression_distortion","HIP"],
Copy link

Copilot AI Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new test mapping entry for jpeg_compression_distortion only includes 'HIP' whereas similar entries include both 'HOST' and 'HIP'. If intended to be consistent with other mappings, please add 'HOST' as the missing parameter.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner

@r-abishek r-abishek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HazarathKumarM Please address this round of comments

}

// Generic clamping
__device__ inline void clamp_range(float* values)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clamp_range can be changed to function overloading instead of templating. There will be four instances of the function, but lows and highs could be hard cooded, and no if-statements would exist.

This is for both the clamp_range functions here.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Horizontal Upsampling and Color conversion to RGB
__device__ inline void upsample_and_RGB_hip_compute(float4 Cb_f4, float4 Cr_f4, d_float8 *Ch1_f8, d_float8* Ch2_f8, d_float8 *Ch3_f8)
{
d_float8 Cb_f8, Cr_f8, R_f8 , G_f8, B_f8;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to cb_f8, cr_f8, r_f8, g_f8, b_f8

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

{
d_float8 Cb_f8, Cr_f8, R_f8 , G_f8, B_f8;
// Copy Y values
d_float8 y_f8 = *((d_float8*)Ch1_f8);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to ch1_f8

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


const float4 yR_f4 = (float4)0.299f;
const float4 yG_f4 = (float4)0.587f;
const float4 yB_f4 = (float4)0.114f;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take these 3 constants out as global device constants, change this y_hip_compute to function overloading instead of templating to avoid the if-statement too.

__device__ const float4 yR_f4 = (float4)0.299f;
__device__ const float4 yG_f4 = (float4)0.587f;
__device__ const float4 yB_f4 = (float4)0.114f;
__device__ const float4 maxVal255_f4 = (float4)255.0f;
__device__ const float4 maxVal128_f4 = (float4)128.0f;

__device__ inline void rgb_range_fix_hip_compute(float *src , d_float8 *r_f8, d_float8 *g_f8, d_float8 *b_f8)
{
        rpp_hip_math_multiply8_const(r_f8, r_f8, maxVal255_f4);
        rpp_hip_math_multiply8_const(g_f8, g_f8, maxVal255_f4);
        rpp_hip_math_multiply8_const(b_f8, b_f8, maxVal255_f4);
}

__device__ inline void rgb_range_fix_hip_compute(half *src , d_float8 *r_f8, d_float8 *g_f8, d_float8 *b_f8)
{
        rpp_hip_math_multiply8_const(r_f8, r_f8, maxVal255_f4);
        rpp_hip_math_multiply8_const(g_f8, g_f8, maxVal255_f4);
        rpp_hip_math_multiply8_const(b_f8, b_f8, maxVal255_f4);
}

__device__ inline void rgb_range_fix_hip_compute(schar *src , d_float8 *r_f8, d_float8 *g_f8, d_float8 *b_f8)
{
        rpp_hip_math_multiply8_const(r_f8, r_f8, maxVal128_f4);
        rpp_hip_math_multiply8_const(g_f8, g_f8, maxVal128_f4);
        rpp_hip_math_multiply8_const(b_f8, b_f8, maxVal128_f4);
}

__device__ inline void rgb_range_fix_hip_compute(uchar *src , d_float8 *r_f8, d_float8 *g_f8, d_float8 *b_f8)
{}

__device__ inline void y_hip_compute(float *src , d_float8 *r_f8, d_float8 *g_f8, d_float8 *b_f8, d_float8 *y_f8)
{
    // rgb range fix
    rgb_range_fix_hip_compute(src, r_f8, g_f8, b_f8);

    // RGB to Y conversion
    y_f8->f4[0] = r_f8->f4[0] * yR_f4 + g_f8->f4[0] * yG_f4 + b_f8->f4[0] * yB_f4;
    y_f8->f4[1] = r_f8->f4[1] * yR_f4 + g_f8->f4[1] * yG_f4 + b_f8->f4[1] * yB_f4;
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

int val = (-128.0f * sub_128);
float inp[8];
for(int i = 0; i < 8; i ++)
inp[i] = vec[i] + val;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use two float4 adds for this loop?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// DCT forward 1D implementation
__device__ inline void dct_fwd_8x8_1d(float *vec, bool sub_128)
{
int val = (-128.0f * sub_128);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the variable name.

We ideally only use "_" followed by text in 3 instances:

  • _ like _f4 or _f8
  • _ in id_x, id_y, id_z
  • _smem if variable is shared memory

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// Inverse 1D DCT
__device__ inline void dct_inv_8x8_1d(float *vec, bool add_128)
{
int val = (128.0f * add_128);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

vec[1] = normCoeff * ((dctA * temp4) - (dctC * temp5) + (dctD * temp6) - (dctF * temp7));
vec[3] = normCoeff * ((dctC * temp4) + (dctF * temp5) - (dctA * temp6) + (dctD * temp7));
vec[5] = normCoeff * ((dctD * temp4) + (dctA * temp5) + (dctF * temp6) - (dctC * temp7));
vec[7] = normCoeff * ((dctF * temp4) + (dctD * temp5) + (dctC * temp6) + (dctA * temp7));
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please vectorize these as much as possible. Some ideas are below, please explore other ideas and use the best version.

As an example, the variables defined, and their usage could be:

__device__ const float dctACDF = make_float4(1.387039845322148f, ...);
__device__ const float dctCFAD = make_float4(1.175875602419359f, ...);
__device__ const float dctDAFC = make_float4(0.785694958387102f, ...);
__device__ const float dctFDCA = make_float4(0.275899379282943f, ...);

{
    vec_f4 = dctACDF_f4 * temp4567_f4;
    vec[1] = vec_f4.x - vec_f4.y + vec_f4.z - vec_f4.w;
    vec_f4 = dctCFAD_f4 * temp4567_f4;
    vec[3] = vec_f4.x + vec_f4.y - vec_f4.z + vec_f4.w;
    <so same for vec[5] and vec[7]>
    
    Finally once all of vec[0-7] are available, multiply in one shot by a variable 'normCoeff_f8'
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

facing this error and not supporting device float4 in HIP

error: dynamic initialization is not supported for __device__, __constant__, __shared__, and __managed__ variables.
  163 | __device__ const float4 dctFDCA = make_float4(0.275899379282943f, 0.785694958387102f, 1.175875602419359f, 1.387039845322148f);
      |                         ^         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Copy link
Copy Markdown
Owner

@r-abishek r-abishek Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we cant use make_float4 globally. First lets change __device__ const to __constant__ if its only device memory. (For all instances).

Secondly, try something like this - https://rocmdocs.amd.com/projects/HIP/en/latest/how-to/hip_cpp_language_extensions.html#constant

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


float temp8 = temp6 + temp7;
float temp9 = temp6 - temp7;
float temp10 = (dctC * inp[1]) - (dctD * inp[7]) - (dctF * inp[3]) - (dctA * inp[5]);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar comment for inverse dct too

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

facing this error and not supporting device float4 in HIP

error: dynamic initialization is not supported for device, constant, shared, and managed variables.
163 | device const float4 dctFDCA = make_float4(0.275899379282943f, 0.785694958387102f, 1.175875602419359f, 1.387039845322148f);
| ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// Clamping for signed char (schar)
__device__ inline void clamp_range(schar *src, float* values)
{
int low = -128, high = 127;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove additional inits of low and high

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

{
rpp_hip_math_multiply8_const(r_f8, r_f8, maxVal255_f4);
rpp_hip_math_multiply8_const(g_f8, g_f8, maxVal255_f4);
rpp_hip_math_multiply8_const(b_f8, b_f8, maxVal255_f4);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try with multiply24_const

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


// Convert to CbCr
*cb_f4 = avgR_f4 * (float4)-0.168736f + avgG_f4 * (float4)-0.331264f + avgB_f4 * (float4)0.500000f + (float4)128.0f;
*cr_f4 = avgR_f4 * (float4)0.500000f + avgG_f4 * (float4)-0.418688f + avgB_f4 * (float4)-0.081312f + (float4)128.0f;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consolidate all the 0.5 multiplies to one (float4)0.5f multiply

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

d_float8 avgR_f8, avgG_f8, avgB_f8;
avgR_f8.f4[0] = (r1_f8->f4[0] + r2_f8->f4[0]) * (float4)0.5f;
avgR_f8.f4[1] = (r1_f8->f4[1] + r2_f8->f4[1]) * (float4)0.5f;
avgG_f8.f4[0] = (g1_f8->f4[0] + g2_f8->f4[0]) * (float4)0.5f;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consolidate to 0.25

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

vec[6] = normCoeff * ((dctBE.y * (temp1_f4.x - temp1_f4.w)) - (dctBE.x * (temp1_f4.y - temp1_f4.z)));

vec[1] = normCoeff * ((dctACDF.x * temp2_f4.x) - (dctACDF.y * temp2_f4.y) + (dctACDF.z * temp2_f4.z) - (dctACDF.w * temp2_f4.w));
vec[3] = normCoeff * ((dctACDF.y * temp2_f4.x) + (dctACDF.w * temp2_f4.y) - (dctACDF.x * temp2_f4.z) + (dctACDF.z * temp2_f4.w));
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First priority float4 multiplies, second priority fmaf of single floats, third priority is above

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DOne

vec[1] = normCoeff * ((dctACDF.x * temp2_f4.x) - (dctACDF.y * temp2_f4.y) + (dctACDF.z * temp2_f4.z) - (dctACDF.w * temp2_f4.w));
vec[3] = normCoeff * ((dctACDF.y * temp2_f4.x) + (dctACDF.w * temp2_f4.y) - (dctACDF.x * temp2_f4.z) + (dctACDF.z * temp2_f4.w));
vec[5] = normCoeff * ((dctACDF.z * temp2_f4.x) + (dctACDF.x * temp2_f4.y) + (dctACDF.w * temp2_f4.z) - (dctACDF.y * temp2_f4.w));
vec[7] = normCoeff * ((dctACDF.w * temp2_f4.x) + (dctACDF.z * temp2_f4.y) + (dctACDF.y * temp2_f4.z) + (dctACDF.x * temp2_f4.w));
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(float4)normCoeff

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

upsample_and_RGB_hip_compute(cb_f4, cr_f4, (d_float8*)&src_smem[hipThreadIdx_y_channel.x][hipThreadIdx_x8], (d_float8*)&src_smem[hipThreadIdx_y_channel.y][hipThreadIdx_x8], (d_float8*)&src_smem[hipThreadIdx_y_channel.z][hipThreadIdx_x8]);
__syncthreads();

// Clamp values and store results
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add broader step grouping

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

// Computing Y from R G B
__device__ inline void y_hip_compute(float* src, d_float24 rgb_f24, d_float8* y_f8)
Copy link
Copy Markdown
Owner

@r-abishek r-abishek Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HazarathKumarM We are creating a second copy of rgb_f24 here. Pls pass the pointer.

Please receive call-by-reference as d_float24* rgb_f24 and use as below. And again please add parentheses for mul and adds here for all instances.

__device__ inline void y_hip_compute(float* src, d_float24* rgb_f24, d_float8* y_f8)
{
    rpp_hip_math_multiply24_const(rgb_f24, rgb_f24, maxVal255_f4);
    y_f8->f4[0] = (rgb_f24->f8[0].f4[0] * yR_f4) + (rgb_f24->f8[1].f4[0] * yG_f4) + (rgb_f24->f8[2].f4[0] * yB_f4);
    y_f8->f4[1] = (rgb_f24->f8[0].f4[1] * yR_f4) + (rgb_f24->f8[1].f4[1] * yG_f4) + (rgb_f24->f8[2].f4[1] * yB_f4);
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

template <typename T>
__global__ void jpeg_compression_distortion_pkd3_hip_tensor(T *srcPtr,
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing a diffchecker on all these variants, I see a lot of common code that can be consolidated rather than having separate versions - should be able to reduce lines of code.

Please see diffs here:

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

float temp8 = temp6 + temp7;
float temp9 = temp6 - temp7;
float temp10 = (dctACDF.y * vec1_f4.y) - (dctACDF.z * vec2_f4.w) - (dctACDF.w * vec1_f4.w) - (dctACDF.x * vec2_f4.y);
float temp11 = (dctACDF.z * vec1_f4.y) + (dctACDF.y * vec2_f4.w) - (dctACDF.x * vec1_f4.w) + (dctACDF.w * vec2_f4.y);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use fmaf(fmaf(fmaf())) to consolidate temp4, temp5, temp10, temp11 etc

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

avgB_f4 = avgB_f4 * (float4)0.25;

// Convert to CbCr
*cb_f4 = (avgR_f4 * (float4)-0.168736f) + (avgG_f4 * (float4)-0.331264f) + (avgB_f4 * (float4)0.500000f) + maxVal128_f4;
Copy link
Copy Markdown
Owner

@r-abishek r-abishek Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initialize the multiplied vectorized version of the constants as a local variable inside the kernel, and pass into this function.

Premulitply and hard code all the stuff in <>.

float4 cbConsts_f4[3] = {(float4)<-0.168736f * 0.25f>, (float4)<-0.331264f * 0.25f>, (float4)<0.500000f * 0.25f>};
float4 crConsts_f4[3] = {(float4)<0.500000f * 0.25f>, (float4)<-0.418688f * 0.25f>, (float4)<-0.081312f * 0.25f>};

<then pass as reference>:
downsample_cbcr_hip_compute(..., ..., cbConsts_f4, crConsts_f4);

<receive as>:
__device__ inline void downsample_cbcr_hip_compute(d_float8 *r1_f8, d_float8 *r2_f8, d_float8 *g1_f8, d_float8 *g2_f8, d_float8 *b1_f8, d_float8 *b2_f8, float4 *cb_f4, float4 *cr_f4, float4* cbConsts_f4, float4* crConsts_f4)
{
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

// For half precision implementation
__device__ __forceinline__ void rpp_hip_load24_pkd3_to_float24_pln3(half *srcPtr, float **srcPtrs_f24)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"For half precision implementation" isn't clear

@HazarathKumarM For all the 8 new functions added in this file, change comments to follow same convention of other functions in this file - Both phrased and formatted the same way.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@r-abishek r-abishek changed the base branch from develop to ar/opt_jpeg_distortion_hip April 9, 2025 03:42
@r-abishek r-abishek merged commit 483a37a into r-abishek:ar/opt_jpeg_distortion_hip Apr 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants