JPEG Compression distortion Tensor HIP support#413
JPEG Compression distortion Tensor HIP support#413r-abishek merged 69 commits intor-abishek:ar/opt_jpeg_distortion_hipfrom
Conversation
…ocs/sphinx (r-abishek#487) Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.10.0 to 1.11.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.10.0...v1.11.0) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* adds Fog tensor support * modified comments minor fix in HIP kernel * changed c style casting to static_cast inside HIP kernel modified few variable names * added note in documentation added fog test case to randomOutputCase in HIP and HOST test suites * Introduced Greyness factor on Host FOG Kernel * Introduced Greyness factor on HIP Side * Added Grey scale support for Raw C code and PLN3 variant * Modified Converstion factor variable * Doxygen Outputs changed * Variable name changes * Changes in HUe Saturation Matrix * Fix output issue * Update CHANGELOG.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Updates to 1.9.10 including fog feature --------- Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: Lakshmi Kumar <lakshmi.kumar@amd.com> Co-authored-by: Rajy Rawther <Rajy.MeeyakhanRawther@amd.com> Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>
* Make initial changes for raw CPP version of warp perspective * Fix calls to compute_warp_perspective_src_loc function * Update changes to go through nearest neighbours case * AVX HOST codes for warp perspective initial * Fixes for accuracy in warp perspective * More fixes for accuracy in warp perspective * Update the cide for AVX version of Planar to Planar * Add bilinear u8 host code for warp perspective * Make updates to include functions for F32 data type * Make updates to use cast instead of set and fix issues with raw C implementation * Add i8 host codes * Add updates for F16 Bilinear Code * Update the initial HIP code for warp perspective * Update fixes for HIP code * Add Warp Perspective Nearest Neighbors F16 code for PKD3_to_PLN3 and PLN3_to_PLN3 * Add updates for PLN to PLN configuration * Add updates for PKD3 to PKD3 case * Rename variables * Update changes to log images separately for Bilinear and Nearest Neighbors * fixed bug in raw c code of PKD-PLN variant * minor bug fix for F16 PLN variants * minor fixes in HOST test suite * Update the HIP code for review comments and refactoring of device functions * Update the comments alignment * Rename functions and add cases in HOST and HIP runTests.py * Update indentations for compuatations and rename vectors * Update documentations and add more reference variables * Make more formatting changes * Make further updates by including test cases * Make updates to use reinterpret cast * Update reinterpret casts for PLN to PLN configuration u8 and i8 codes * Make updates to enclose code inside AVX2 flag * Make further changes to update type casting * Update the version * Make updates to add warp perspective image * Modify comments, update CHANGELOG and update flags * Update further comments in warp perspective * Add more comments for warp perspective * Update based on further review comments * Update the case number for warp_perspective in common.py * Address review comments * Make initial changes for raw CPP version of warp perspective * Fix calls to compute_warp_perspective_src_loc function * Update changes to go through nearest neighbours case * AVX HOST codes for warp perspective initial * Fixes for accuracy in warp perspective * More fixes for accuracy in warp perspective * Update the cide for AVX version of Planar to Planar * Add bilinear u8 host code for warp perspective * Make updates to include functions for F32 data type * Make updates to use cast instead of set and fix issues with raw C implementation * Add i8 host codes * Add updates for F16 Bilinear Code * Update the initial HIP code for warp perspective * Update fixes for HIP code * Add Warp Perspective Nearest Neighbors F16 code for PKD3_to_PLN3 and PLN3_to_PLN3 * Add updates for PLN to PLN configuration * Add updates for PKD3 to PKD3 case * Rename variables * Update changes to log images separately for Bilinear and Nearest Neighbors * fixed bug in raw c code of PKD-PLN variant * minor bug fix for F16 PLN variants * minor fixes in HOST test suite * Update the HIP code for review comments and refactoring of device functions * Update the comments alignment * Rename functions and add cases in HOST and HIP runTests.py * Update indentations for compuatations and rename vectors * Update documentations and add more reference variables * Make more formatting changes * Make further updates by including test cases * Make updates to use reinterpret cast * Update reinterpret casts for PLN to PLN configuration u8 and i8 codes * Make updates to enclose code inside AVX2 flag * Make further changes to update type casting * Make updates to add warp perspective image * Modify comments, update CHANGELOG and update flags * Update further comments in warp perspective * Add more comments for warp perspective * Update based on further review comments * Update the case number for warp_perspective in common.py * Address review comments * Fix conflits with warp perspective * Update version details * Merge branch 'ar/opt_warp_perspective' of https://github.com/r-abishek/rpp into opt_warp_perspective_rebased * Update version to 1.9.10 including warp perspective * Updates to convert to XYWH from LTRB instead of opposite * Update CHANGELOG.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Revert changes and convert to ltrb instead of xywh --------- Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: Rajy Rawther <Rajy.MeeyakhanRawther@amd.com> Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
* Package - remove clang from test * CMakeLists - remove BUILD_WITH_AMD_ADVANCE * Package - Add OMP dependency * Find Packages - Updates * Test Package - Deps * Test - backend Info * Tests - Fix HIP Test Add * Test - Add HIP Path * Test - Find HIP Updates * Tests - Fix HIP compilation
* Jenkins - Fix Test * Test - Create sepreate test folder * FileSystem - Find and process * Find Filesystem - Updates * Test - Compiler Updates
…ocs/sphinx (r-abishek#491) Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.11.0 to 1.12.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.11.0...v1.12.0) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ocs/sphinx (r-abishek#495) Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.12.0 to 1.12.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.12.0...v1.12.1) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Test Suite - Fix HIP Link * Tests - HIP Updates * Test - Link OpenCV * Test - Link dir updates * CMakeLists - Updates * Test Package - Deps * CXX Compiler & ROCm path updates * ROCm Path - Display info * OpenMP - Updates * Package Deps - Remove OpenMP
* pthread removal * c++14 removal --------- Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
…ocs/sphinx (r-abishek#500) Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.12.1 to 1.13.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.12.1...v1.13.0) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* pthread removal * c++14 removal --------- Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
* Readme - Updates * Package - Deps to OpenMP
* Rename Tensor_host to Tensor_image_host, and runTests to runImageTests * Rename Tensor_hip to Tensor_image_hip, and runTests to runImageTests * Rename all readme runTests.py to runImageTests.py --------- Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>
* Rename Tensor_host to Tensor_image_host, and runTests to runImageTests * Rename Tensor_hip to Tensor_image_hip, and runTests to runImageTests * Rename all readme runTests.py to runImageTests.py --------- Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>
* added initial support for U8 PLN1-PLN1 variant * added support for U8 PKD3, PLN3 variants * modified algorithm to give RGB output for RGB images * moved common code outside the layout branch conditions * added support for toggle variation of U8 * added golden output for threshold * added threshold output for doxygen * added support for F32 bit depth * added support for I8 bitdepth * added F16 bitdepth support * added HIP support for U8 bitdepth * made changes to support remaining bitdepths * fixed output issues with I8 variant * removed commented code in HOST * added threshold test case in maps used in common.py * modified RPP_VERSION_MINOR value and changelog * fixed issues with doxygen modified globalThreads_x value for HIP kernel * made changes in I8 variants as per review comments * added more details for threshold documentation * Update version to 1.9.10 including threshold * Remove duplicate definitions of functions and minor bug fix * Minor docs fix * Minor docs fix --------- Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com> Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>
* Add Intial u8 implementation for Rain * Add I8 implementation and Changes based on the Review comments * Initial HIP implementation Add F32 and F16 Tensor Implementation * Add test case for Rain in HIP test suite code cleanup * minor code cleanup * Modified func names and removed unnecessary code * Resolve Review comments * replaced pinned memory with HIP memory for Rain Layer computation * Modified RGB Rain Mask to planar Rain Mask in HIP * Address review comments * Add Rain compute function * Add version changes and Resolve review comments * fix build warnings * Fix the outputs of f16 toggle variants * Revert Rain width changes * Fix pln3 outputs for u8 and i8 bitdepths * Resolve review comments * Modified load and store routines for planar cases * Resolve review comments * Modify docs image * Fix versioning * Updates to 1.9.10 including rain feature --------- Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>
…rmation (r-abishek#504) Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
…k#506) * Lock nifti_clib commit in readme * Lock nifti_clib commit in common.groovy * Change to sudo apt install half in common.groovy
* experimental changes to detect error and print in the end in image test suite * Error handling added for host side * Error detection implemented on HIP SIDE * Voxel test suite changes * Changed variable name to camelCase * Changes in srcpath * Changes in Voxel test suite for nonQA case * Consolidated the repeated code and move it as common code * Minor changes based on review commands * Modification for bitDepth in voxel host * Merge with develop branch * Fix on error code display and gaussian Filter * F string bug is resolved * Fix for CI failure and some improvement in error detection * Changes based on review comments * Fix for rain and warp_perspective issue of unable to open file * Enhanced display for non implimented functionality --------- Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com> Co-authored-by: dineshbabu-ravichandran <dineshbabu.ravichandran@multicorewareinc.com> Co-authored-by: dineshbabu-ravichandran <dineshbabu.ravichandran@multicoreinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
…ility (r-abishek#499) * initial commit * removed the usage of supported case list on HOST backend * Add Augmentation enum * removed supported caselists from all the python scripts in HIP backend * replacing case numbers with the enum's on the test suite files * Fix Indentation * Add enums for the swith cases * update Enum's for new cases * Fix errors in unit tests * Semicolon typo fix * Fix the test case naming in QA mode * removes unncessary print statements --------- Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
…ocs/sphinx (r-abishek#511) Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.13.0 to 1.14.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.13.0...v1.14.1) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
r-abishek
left a comment
There was a problem hiding this comment.
@HazarathKumarM Please address few more comments
| } | ||
|
|
||
| // Horizontal Upsampling and Color conversion to RGB | ||
| __device__ inline void Upsample_and_RGB_hip_compute(float4 Cb_f4, float4 Cr_f4, d_float8 *Ch1_f8, d_float8* Ch2_f8, d_float8 *Ch3_f8) |
There was a problem hiding this comment.
snake_case - change Upsample to upsample
| int hipThreadIdx_x4 = hipThreadIdx_x * 4; | ||
|
|
||
| int alignedWidth = ((roiTensorPtrSrc[id_z].xywhROI.roiWidth + 15) / 16) * 16; | ||
| int alignedHeight = ((roiTensorPtrSrc[id_z].xywhROI.roiHeight + 15) / 16) * 16; |
There was a problem hiding this comment.
Would roiTensorPtrSrc[id_z].xywhROI.roiWidth & ~15 work for these? Please print value on host and check. Its more optimal than the divide and multiply
| // Load valid pixels (only if id_x is within valid range) | ||
| if (validPixels > 0) | ||
| { | ||
| for (int i = 0; i < validPixels; i++) |
There was a problem hiding this comment.
Change to something like below. Please confirm output match.
for (int i = 0, idx = srcIdx; i < validPixels; i++, idx += 3)
{
src_smem[hipThreadIdx_y_channel.x][hipThreadIdx_x8 + i] = srcPtr[idx];
src_smem[hipThreadIdx_y_channel.y][hipThreadIdx_x8 + i] = srcPtr[idx + 1];
src_smem[hipThreadIdx_y_channel.z][hipThreadIdx_x8 + i] = srcPtr[idx + 2];
}
| for (int i = validPixels; i < min(validPixels + 16, 8); i++) | ||
| { | ||
| src_smem[hipThreadIdx_y_channel.x][hipThreadIdx_x8 + i] = srcPtr[srcIdx + ((validPixels - 1) * 3)]; | ||
| src_smem[hipThreadIdx_y_channel.y][hipThreadIdx_x8 + i] = srcPtr[srcIdx + ((validPixels - 1) * 3) + 1]; |
| __syncthreads(); | ||
| d_float8 y_f8; | ||
| int cbcrY = hipThreadIdx_y * 2; | ||
| y_hip_compute(srcPtr,(d_float8*)&src_smem[hipThreadIdx_y][hipThreadIdx_x8],(d_float8*)&src_smem[hipThreadIdx_y + 16][hipThreadIdx_x8],(d_float8*)&src_smem[hipThreadIdx_y + 32][hipThreadIdx_x8],&y_f8); |
| { | ||
| float4 cb_f4, cr_f4; | ||
| // Downsample RGB and convert to CbCr | ||
| downsample_cbcr_hip_compute((d_float8*)&src_smem[cbcrY][hipThreadIdx_x8],(d_float8*)&src_smem[cbcrY + 1][hipThreadIdx_x8],(d_float8*)&src_smem[cbcrY + 16][hipThreadIdx_x8],(d_float8*)&src_smem[cbcrY + 17][hipThreadIdx_x8],(d_float8*)&src_smem[cbcrY + 32][hipThreadIdx_x8],(d_float8*)&src_smem[cbcrY + 33][hipThreadIdx_x8],&cb_f4, &cr_f4); |
|
|
||
| d_float8 y_f8; | ||
| int cbcrY = hipThreadIdx_y * 2; | ||
| y_hip_compute(srcPtr,(d_float8*)&src_smem[hipThreadIdx_y][hipThreadIdx_x8],(d_float8*)&src_smem[hipThreadIdx_y + 16][hipThreadIdx_x8],(d_float8*)&src_smem[hipThreadIdx_y + 32][hipThreadIdx_x8],&y_f8); |
|
|
||
| //----INVERSE STEPS--- | ||
| // 1D column wise IDCT for Y Cb and Cr channels | ||
| if (col < 128 && col < alignedWidth) |
There was a problem hiding this comment.
Please introduce parentheses for every instance like these
There was a problem hiding this comment.
Pull Request Overview
This pull request adds HIP support for JPEG compression distortion tensor tests by updating the test mapping and adjusting the test boundaries accordingly.
- Added a new entry for "jpeg_compression_distortion" in the test mapping.
- Updated the caseMax value in the HIP test runner to include the new test.
Reviewed Changes
Copilot reviewed 4 out of 10 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| utilities/test_suite/common.py | Added a new mapping entry for "jpeg_compression_distortion" with HIP support. |
| utilities/test_suite/HIP/runImageTests.py | Updated caseMax to account for the new test mapping entry. |
Files not reviewed (6)
- api/rppt_tensor_geometric_augmentations.h: Language not supported
- src/include/common/hip/rpp_hip_load_store.hpp: Language not supported
- src/include/tensor/hip_tensor_executors.hpp: Language not supported
- src/modules/tensor/rppt_tensor_geometric_augmentations.cpp: Language not supported
- utilities/test_suite/HIP/Tensor_image_hip.cpp: Language not supported
- utilities/test_suite/rpp_test_suite_image.h: Language not supported
Comments suppressed due to low confidence (1)
utilities/test_suite/HIP/runImageTests.py:43
- Ensure that the update of caseMax to 93 aligns correctly with the updated test mapping in common.py, accounting for all test cases.
caseMax = 93
| 91: ["tensor_stddev", "HOST", "HIP"], | ||
| 92: ["slice", "HOST", "HIP"] | ||
| 92: ["slice", "HOST", "HIP"], | ||
| 93: ["jpeg_compression_distortion","HIP"], |
There was a problem hiding this comment.
The new test mapping entry for jpeg_compression_distortion only includes 'HIP' whereas similar entries include both 'HOST' and 'HIP'. If intended to be consistent with other mappings, please add 'HOST' as the missing parameter.
r-abishek
left a comment
There was a problem hiding this comment.
@HazarathKumarM Please address this round of comments
| } | ||
|
|
||
| // Generic clamping | ||
| __device__ inline void clamp_range(float* values) |
There was a problem hiding this comment.
clamp_range can be changed to function overloading instead of templating. There will be four instances of the function, but lows and highs could be hard cooded, and no if-statements would exist.
This is for both the clamp_range functions here.
| // Horizontal Upsampling and Color conversion to RGB | ||
| __device__ inline void upsample_and_RGB_hip_compute(float4 Cb_f4, float4 Cr_f4, d_float8 *Ch1_f8, d_float8* Ch2_f8, d_float8 *Ch3_f8) | ||
| { | ||
| d_float8 Cb_f8, Cr_f8, R_f8 , G_f8, B_f8; |
There was a problem hiding this comment.
rename to cb_f8, cr_f8, r_f8, g_f8, b_f8
| { | ||
| d_float8 Cb_f8, Cr_f8, R_f8 , G_f8, B_f8; | ||
| // Copy Y values | ||
| d_float8 y_f8 = *((d_float8*)Ch1_f8); |
|
|
||
| const float4 yR_f4 = (float4)0.299f; | ||
| const float4 yG_f4 = (float4)0.587f; | ||
| const float4 yB_f4 = (float4)0.114f; |
There was a problem hiding this comment.
Take these 3 constants out as global device constants, change this y_hip_compute to function overloading instead of templating to avoid the if-statement too.
__device__ const float4 yR_f4 = (float4)0.299f;
__device__ const float4 yG_f4 = (float4)0.587f;
__device__ const float4 yB_f4 = (float4)0.114f;
__device__ const float4 maxVal255_f4 = (float4)255.0f;
__device__ const float4 maxVal128_f4 = (float4)128.0f;
__device__ inline void rgb_range_fix_hip_compute(float *src , d_float8 *r_f8, d_float8 *g_f8, d_float8 *b_f8)
{
rpp_hip_math_multiply8_const(r_f8, r_f8, maxVal255_f4);
rpp_hip_math_multiply8_const(g_f8, g_f8, maxVal255_f4);
rpp_hip_math_multiply8_const(b_f8, b_f8, maxVal255_f4);
}
__device__ inline void rgb_range_fix_hip_compute(half *src , d_float8 *r_f8, d_float8 *g_f8, d_float8 *b_f8)
{
rpp_hip_math_multiply8_const(r_f8, r_f8, maxVal255_f4);
rpp_hip_math_multiply8_const(g_f8, g_f8, maxVal255_f4);
rpp_hip_math_multiply8_const(b_f8, b_f8, maxVal255_f4);
}
__device__ inline void rgb_range_fix_hip_compute(schar *src , d_float8 *r_f8, d_float8 *g_f8, d_float8 *b_f8)
{
rpp_hip_math_multiply8_const(r_f8, r_f8, maxVal128_f4);
rpp_hip_math_multiply8_const(g_f8, g_f8, maxVal128_f4);
rpp_hip_math_multiply8_const(b_f8, b_f8, maxVal128_f4);
}
__device__ inline void rgb_range_fix_hip_compute(uchar *src , d_float8 *r_f8, d_float8 *g_f8, d_float8 *b_f8)
{}
__device__ inline void y_hip_compute(float *src , d_float8 *r_f8, d_float8 *g_f8, d_float8 *b_f8, d_float8 *y_f8)
{
// rgb range fix
rgb_range_fix_hip_compute(src, r_f8, g_f8, b_f8);
// RGB to Y conversion
y_f8->f4[0] = r_f8->f4[0] * yR_f4 + g_f8->f4[0] * yG_f4 + b_f8->f4[0] * yB_f4;
y_f8->f4[1] = r_f8->f4[1] * yR_f4 + g_f8->f4[1] * yG_f4 + b_f8->f4[1] * yB_f4;
}
| int val = (-128.0f * sub_128); | ||
| float inp[8]; | ||
| for(int i = 0; i < 8; i ++) | ||
| inp[i] = vec[i] + val; |
There was a problem hiding this comment.
You could use two float4 adds for this loop?
| // DCT forward 1D implementation | ||
| __device__ inline void dct_fwd_8x8_1d(float *vec, bool sub_128) | ||
| { | ||
| int val = (-128.0f * sub_128); |
There was a problem hiding this comment.
Please change the variable name.
We ideally only use "_" followed by text in 3 instances:
- _ like _f4 or _f8
- _ in id_x, id_y, id_z
- _smem if variable is shared memory
| // Inverse 1D DCT | ||
| __device__ inline void dct_inv_8x8_1d(float *vec, bool add_128) | ||
| { | ||
| int val = (128.0f * add_128); |
| vec[1] = normCoeff * ((dctA * temp4) - (dctC * temp5) + (dctD * temp6) - (dctF * temp7)); | ||
| vec[3] = normCoeff * ((dctC * temp4) + (dctF * temp5) - (dctA * temp6) + (dctD * temp7)); | ||
| vec[5] = normCoeff * ((dctD * temp4) + (dctA * temp5) + (dctF * temp6) - (dctC * temp7)); | ||
| vec[7] = normCoeff * ((dctF * temp4) + (dctD * temp5) + (dctC * temp6) + (dctA * temp7)); |
There was a problem hiding this comment.
Please vectorize these as much as possible. Some ideas are below, please explore other ideas and use the best version.
As an example, the variables defined, and their usage could be:
__device__ const float dctACDF = make_float4(1.387039845322148f, ...);
__device__ const float dctCFAD = make_float4(1.175875602419359f, ...);
__device__ const float dctDAFC = make_float4(0.785694958387102f, ...);
__device__ const float dctFDCA = make_float4(0.275899379282943f, ...);
{
vec_f4 = dctACDF_f4 * temp4567_f4;
vec[1] = vec_f4.x - vec_f4.y + vec_f4.z - vec_f4.w;
vec_f4 = dctCFAD_f4 * temp4567_f4;
vec[3] = vec_f4.x + vec_f4.y - vec_f4.z + vec_f4.w;
<so same for vec[5] and vec[7]>
Finally once all of vec[0-7] are available, multiply in one shot by a variable 'normCoeff_f8'
}
There was a problem hiding this comment.
facing this error and not supporting device float4 in HIP
error: dynamic initialization is not supported for __device__, __constant__, __shared__, and __managed__ variables.
163 | __device__ const float4 dctFDCA = make_float4(0.275899379282943f, 0.785694958387102f, 1.175875602419359f, 1.387039845322148f);
| ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
There was a problem hiding this comment.
Yes, we cant use make_float4 globally. First lets change __device__ const to __constant__ if its only device memory. (For all instances).
Secondly, try something like this - https://rocmdocs.amd.com/projects/HIP/en/latest/how-to/hip_cpp_language_extensions.html#constant
|
|
||
| float temp8 = temp6 + temp7; | ||
| float temp9 = temp6 - temp7; | ||
| float temp10 = (dctC * inp[1]) - (dctD * inp[7]) - (dctF * inp[3]) - (dctA * inp[5]); |
There was a problem hiding this comment.
similar comment for inverse dct too
There was a problem hiding this comment.
facing this error and not supporting device float4 in HIP
error: dynamic initialization is not supported for device, constant, shared, and managed variables.
163 | device const float4 dctFDCA = make_float4(0.275899379282943f, 0.785694958387102f, 1.175875602419359f, 1.387039845322148f);
| ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| // Clamping for signed char (schar) | ||
| __device__ inline void clamp_range(schar *src, float* values) | ||
| { | ||
| int low = -128, high = 127; |
There was a problem hiding this comment.
remove additional inits of low and high
| { | ||
| rpp_hip_math_multiply8_const(r_f8, r_f8, maxVal255_f4); | ||
| rpp_hip_math_multiply8_const(g_f8, g_f8, maxVal255_f4); | ||
| rpp_hip_math_multiply8_const(b_f8, b_f8, maxVal255_f4); |
|
|
||
| // Convert to CbCr | ||
| *cb_f4 = avgR_f4 * (float4)-0.168736f + avgG_f4 * (float4)-0.331264f + avgB_f4 * (float4)0.500000f + (float4)128.0f; | ||
| *cr_f4 = avgR_f4 * (float4)0.500000f + avgG_f4 * (float4)-0.418688f + avgB_f4 * (float4)-0.081312f + (float4)128.0f; |
There was a problem hiding this comment.
Can we consolidate all the 0.5 multiplies to one (float4)0.5f multiply
| d_float8 avgR_f8, avgG_f8, avgB_f8; | ||
| avgR_f8.f4[0] = (r1_f8->f4[0] + r2_f8->f4[0]) * (float4)0.5f; | ||
| avgR_f8.f4[1] = (r1_f8->f4[1] + r2_f8->f4[1]) * (float4)0.5f; | ||
| avgG_f8.f4[0] = (g1_f8->f4[0] + g2_f8->f4[0]) * (float4)0.5f; |
| vec[6] = normCoeff * ((dctBE.y * (temp1_f4.x - temp1_f4.w)) - (dctBE.x * (temp1_f4.y - temp1_f4.z))); | ||
|
|
||
| vec[1] = normCoeff * ((dctACDF.x * temp2_f4.x) - (dctACDF.y * temp2_f4.y) + (dctACDF.z * temp2_f4.z) - (dctACDF.w * temp2_f4.w)); | ||
| vec[3] = normCoeff * ((dctACDF.y * temp2_f4.x) + (dctACDF.w * temp2_f4.y) - (dctACDF.x * temp2_f4.z) + (dctACDF.z * temp2_f4.w)); |
There was a problem hiding this comment.
First priority float4 multiplies, second priority fmaf of single floats, third priority is above
| vec[1] = normCoeff * ((dctACDF.x * temp2_f4.x) - (dctACDF.y * temp2_f4.y) + (dctACDF.z * temp2_f4.z) - (dctACDF.w * temp2_f4.w)); | ||
| vec[3] = normCoeff * ((dctACDF.y * temp2_f4.x) + (dctACDF.w * temp2_f4.y) - (dctACDF.x * temp2_f4.z) + (dctACDF.z * temp2_f4.w)); | ||
| vec[5] = normCoeff * ((dctACDF.z * temp2_f4.x) + (dctACDF.x * temp2_f4.y) + (dctACDF.w * temp2_f4.z) - (dctACDF.y * temp2_f4.w)); | ||
| vec[7] = normCoeff * ((dctACDF.w * temp2_f4.x) + (dctACDF.z * temp2_f4.y) + (dctACDF.y * temp2_f4.z) + (dctACDF.x * temp2_f4.w)); |
| upsample_and_RGB_hip_compute(cb_f4, cr_f4, (d_float8*)&src_smem[hipThreadIdx_y_channel.x][hipThreadIdx_x8], (d_float8*)&src_smem[hipThreadIdx_y_channel.y][hipThreadIdx_x8], (d_float8*)&src_smem[hipThreadIdx_y_channel.z][hipThreadIdx_x8]); | ||
| __syncthreads(); | ||
|
|
||
| // Clamp values and store results |
| } | ||
|
|
||
| // Computing Y from R G B | ||
| __device__ inline void y_hip_compute(float* src, d_float24 rgb_f24, d_float8* y_f8) |
There was a problem hiding this comment.
@HazarathKumarM We are creating a second copy of rgb_f24 here. Pls pass the pointer.
Please receive call-by-reference as d_float24* rgb_f24 and use as below. And again please add parentheses for mul and adds here for all instances.
__device__ inline void y_hip_compute(float* src, d_float24* rgb_f24, d_float8* y_f8)
{
rpp_hip_math_multiply24_const(rgb_f24, rgb_f24, maxVal255_f4);
y_f8->f4[0] = (rgb_f24->f8[0].f4[0] * yR_f4) + (rgb_f24->f8[1].f4[0] * yG_f4) + (rgb_f24->f8[2].f4[0] * yB_f4);
y_f8->f4[1] = (rgb_f24->f8[0].f4[1] * yR_f4) + (rgb_f24->f8[1].f4[1] * yG_f4) + (rgb_f24->f8[2].f4[1] * yB_f4);
}
| } | ||
|
|
||
| template <typename T> | ||
| __global__ void jpeg_compression_distortion_pkd3_hip_tensor(T *srcPtr, |
There was a problem hiding this comment.
Doing a diffchecker on all these variants, I see a lot of common code that can be consolidated rather than having separate versions - should be able to reduce lines of code.
Please see diffs here:
- pkd3 vs pln3 -> Entire code from "Step 2" is 99% same, just written with formatting differences -> https://www.diffchecker.com/ApUeog6b/
- pkd3 vs pkd3_pln3 -> Lot of similarities -> https://www.diffchecker.com/9Segl36S/
| float temp8 = temp6 + temp7; | ||
| float temp9 = temp6 - temp7; | ||
| float temp10 = (dctACDF.y * vec1_f4.y) - (dctACDF.z * vec2_f4.w) - (dctACDF.w * vec1_f4.w) - (dctACDF.x * vec2_f4.y); | ||
| float temp11 = (dctACDF.z * vec1_f4.y) + (dctACDF.y * vec2_f4.w) - (dctACDF.x * vec1_f4.w) + (dctACDF.w * vec2_f4.y); |
There was a problem hiding this comment.
Please use fmaf(fmaf(fmaf())) to consolidate temp4, temp5, temp10, temp11 etc
| avgB_f4 = avgB_f4 * (float4)0.25; | ||
|
|
||
| // Convert to CbCr | ||
| *cb_f4 = (avgR_f4 * (float4)-0.168736f) + (avgG_f4 * (float4)-0.331264f) + (avgB_f4 * (float4)0.500000f) + maxVal128_f4; |
There was a problem hiding this comment.
Initialize the multiplied vectorized version of the constants as a local variable inside the kernel, and pass into this function.
Premulitply and hard code all the stuff in <>.
float4 cbConsts_f4[3] = {(float4)<-0.168736f * 0.25f>, (float4)<-0.331264f * 0.25f>, (float4)<0.500000f * 0.25f>};
float4 crConsts_f4[3] = {(float4)<0.500000f * 0.25f>, (float4)<-0.418688f * 0.25f>, (float4)<-0.081312f * 0.25f>};
<then pass as reference>:
downsample_cbcr_hip_compute(..., ..., cbConsts_f4, crConsts_f4);
<receive as>:
__device__ inline void downsample_cbcr_hip_compute(d_float8 *r1_f8, d_float8 *r2_f8, d_float8 *g1_f8, d_float8 *g2_f8, d_float8 *b1_f8, d_float8 *b2_f8, float4 *cb_f4, float4 *cr_f4, float4* cbConsts_f4, float4* crConsts_f4)
{
}
| } | ||
|
|
||
| // For half precision implementation | ||
| __device__ __forceinline__ void rpp_hip_load24_pkd3_to_float24_pln3(half *srcPtr, float **srcPtrs_f24) |
There was a problem hiding this comment.
"For half precision implementation" isn't clear
@HazarathKumarM For all the 8 new functions added in this file, change comments to follow same convention of other functions in this file - Both phrased and formatted the same way.
No description provided.