Skip to content

F32 QA test suite upgradation#534

Merged
r-abishek merged 28 commits intor-abishek:ar/test_suite_upgrade_13_f32qafrom
RooseweltMcW:apr/f32_qa_set2
Jan 22, 2026
Merged

F32 QA test suite upgradation#534
r-abishek merged 28 commits intor-abishek:ar/test_suite_upgrade_13_f32qafrom
RooseweltMcW:apr/f32_qa_set2

Conversation

@RooseweltMcW
Copy link
Copy Markdown

@RooseweltMcW RooseweltMcW commented Nov 27, 2025

Updated code and include bin files for the following kernels to match QA for F32 bitdepth

  1. pixelate
  2. Resize
  3. Rotate
  4. lens_correction
  5. resize crop mirror
  6. tensor_sum
  7. tensor_min
  8. tensor_max
  9. tensor_mean
  10. tensor_stddev

Srihari-mcw added a commit to Srihari-mcw/rpp that referenced this pull request Dec 8, 2025
* Initial log1p implementation in C++

* Added for nDim = 4 separately instead of recursive loop in log1p

* Test by converting existing input F32 to I16

* log1p_HIP_Implementation

* added abs in AVX2

* HIP calls

* log1p HOST

* log1p HOST

* calls in HIP backend

* #

* Add files via upload

* reference output files for log1p

* log1p HOST implementation

* removed print statements

* Worked on the review comment

* Worked on the review comment

* Update rpp_hip_common.hpp

* Minor changes after review

* Reverted the testsuite changes, which were added in support for I16

* removed the testsuite support

* Resolve review comments

* Add additional blank line

* Add rpp_hip_math_log1p to rpp_hip_math.hpp

* Update header imports of log1p.cpp cpu version and move compute_log_16_host

* Updated file header imports for log1p.cpp hip

* Add test suite support to test log1p

* Removing Templates

* Test suite sepate for log1p and review changes

* Further cleanup

* Rename log1p functions

* Cleanup code further - Moving functions, removing hard coding etc

* Optimize and use one abs instruction lesser

* Add minor cleanup

* Restore code for i16 load

* Revise documentation to have only i16 i/p and f32 o/p

* Minor cleanup

* Declare variables outside condition

* Address minor review comments

* Update the number of threads required for execution

* Further update the definition of log1p

* Update rppt_tensor_arithmetic_operations.h

* Update rppt_tensor_arithmetic_operations.h

* Update CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update license to 2025

---------

Co-authored-by: ManasaDattaT <tammisetti.manasadatta@multicorewareinc.com>
Co-authored-by: root <root@ixt-sjc2-52.local.lan>
Co-authored-by: Snehaa Giridharan <snehaa@multicorewareinc.com>
Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>
Co-authored-by: root <root@x1001c1s1b1n0.hostmgmt2001.cm.lockhart.amd.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
ManasaDattaT added a commit to ManasaDattaT/rpp that referenced this pull request Dec 19, 2025
* Initial log1p implementation in C++

* Added for nDim = 4 separately instead of recursive loop in log1p

* Test by converting existing input F32 to I16

* log1p_HIP_Implementation

* added abs in AVX2

* HIP calls

* log1p HOST

* log1p HOST

* calls in HIP backend

* #

* Add files via upload

* reference output files for log1p

* log1p HOST implementation

* removed print statements

* Worked on the review comment

* Worked on the review comment

* Update rpp_hip_common.hpp

* Minor changes after review

* Reverted the testsuite changes, which were added in support for I16

* removed the testsuite support

* Resolve review comments

* Add additional blank line

* Add rpp_hip_math_log1p to rpp_hip_math.hpp

* Update header imports of log1p.cpp cpu version and move compute_log_16_host

* Updated file header imports for log1p.cpp hip

* Add test suite support to test log1p

* Removing Templates

* Test suite sepate for log1p and review changes

* Further cleanup

* Rename log1p functions

* Cleanup code further - Moving functions, removing hard coding etc

* Optimize and use one abs instruction lesser

* Add minor cleanup

* Restore code for i16 load

* Revise documentation to have only i16 i/p and f32 o/p

* Minor cleanup

* Declare variables outside condition

* Address minor review comments

* Update the number of threads required for execution

* Further update the definition of log1p

* Update rppt_tensor_arithmetic_operations.h

* Update rppt_tensor_arithmetic_operations.h

* Update CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>

* Update license to 2025

---------

Co-authored-by: ManasaDattaT <tammisetti.manasadatta@multicorewareinc.com>
Co-authored-by: root <root@ixt-sjc2-52.local.lan>
Co-authored-by: Snehaa Giridharan <snehaa@multicorewareinc.com>
Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>
Co-authored-by: root <root@x1001c1s1b1n0.hostmgmt2001.cm.lockhart.amd.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com>
Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
@HazarathKumarM HazarathKumarM marked this pull request as draft December 22, 2025 17:05
@HazarathKumarM HazarathKumarM changed the base branch from develop to master December 22, 2025 17:11
@HazarathKumarM HazarathKumarM changed the base branch from master to develop December 22, 2025 17:12
* Removed memcpy and used hipHostMalloc for allocation : blend

* Removed memcpy and used hipHostMalloc for allocation : brightness

* Removed memcpy and used hipHostMalloc for allocation : color cast

* Removed memcpy and used hipHostMalloc for allocation : color twist

* Removed memcpy and used hipHostMalloc for allocation : contrast

* Removed memcpy and used hipHostMalloc for allocation : crop mirror normalize

* Removed memcpy and used hipHostMalloc for allocation : Exposure

* Removed memcpy and used hipHostMalloc for allocation : Gamma correction

* Removed memcpy and used hipHostMalloc for allocation : gaussian filter

* Removed memcpy and used hipHostMalloc for allocation : Noise

* Removed memcpy and used hipHostMalloc for allocation : Non linear blend

* Removed memcpy and used hipHostMalloc for allocation : Resize mirror normalize

* Removed memcpy and used hipHostMalloc for allocation : Water

* Added hipHostFree for all kernels in test suite

* Added hipHostFree for all kernels in test suite

* Removed memcpy and used hipHostMalloc for allocation : Flip, spatter, rcm, color temperature

* Resolved copilot review comments

* Updated version

* Removed unused parameter

* Updated version in cmakeList

* removed the host to device mem copies for warp affine and rotate

* Updated version

* Removed comment

* Updated Chnagelog file

* Update patch version from 2.2.0 to 2.2.1

* Update CHANGELOG

* Address copilot comments for HIP HOST consistent allocation

* Documentation changes for updated memcpy changes

* Update ricap outer API to use pinned memory and remove mem copy

* Fix memory allocation and deallocation for permutationTensor

* Update api/rppt_tensor_effects_augmentations.h

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Fix spelling of noiseProbability and saltProbability

* Fix deallocation

---------

Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>
Co-authored-by: hmaddise <HazarathKumar.Maddisetty@amd.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@HazarathKumarM HazarathKumarM marked this pull request as ready for review December 22, 2025 18:04
Copy link
Copy Markdown
Collaborator

@Srihari-mcw Srihari-mcw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls make the requested changes and add more description in the PR for changes made. Otherwise looks fine from my side

@Srihari-mcw
Copy link
Copy Markdown
Collaborator

Unified API PR must be updated based on jpeg compression changes

@Srihari-mcw
Copy link
Copy Markdown
Collaborator

Maybe warp perspective should also eventually contain the warp affine changes for consistency. Not required for this PR though

@r-abishek r-abishek changed the base branch from develop to ar/test_suite_upgrade_13_f32qa January 21, 2026 03:54
@r-abishek r-abishek requested a review from Copilot January 21, 2026 03:55
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR upgrades the F32 QA test suite by adding reference outputs, updating comparison logic, fixing kernel implementations, and including binary reference files for F32 bitdepth across multiple image processing kernels including pixelate, resize, rotate, lens_correction, resize_crop_mirror, and tensor reduction operations (sum, min, max, mean, stddev).

Changes:

  • Added F32-specific golden reference outputs for tensor reduction operations
  • Updated comparison functions to support F32 bitdepth with appropriate tolerance thresholds
  • Fixed JPEG compression distortion kernel implementations for proper F32 handling and quality parameter support
  • Corrected bilinear interpolation bounds checking and AVX2 optimizations in geometric augmentation kernels

Reviewed changes

Copilot reviewed 13 out of 25 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
utilities/test_suite/rpp_test_suite_image.h Added F32 reference outputs for tensor operations and updated comparison functions with test-case-specific tolerances
utilities/test_suite/common.py Enabled HOST backend for jpeg_compression_distortion test case
utilities/test_suite/HOST/Tensor_image_host.cpp Updated QA validation to support F32 bitdepth for tensor reduction operations
utilities/test_suite/HIP/Tensor_image_hip.cpp Added quality tensor initialization and fixed duplicate memory allocations
src/modules/tensor/rppt_tensor_geometric_augmentations.cpp Added qualityTensor parameter to JPEG compression distortion GPU API
src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp Refactored quantization to match HOST reference and added proper F32 scaling
src/modules/tensor/cpu/kernel/warp_affine.cpp Fixed AVX2 bilinear interpolation coordinate calculations
src/modules/tensor/cpu/kernel/resize_crop_mirror.cpp Added source location clamping for bilinear interpolation bounds safety
src/modules/tensor/cpu/kernel/resize.cpp Fixed buffer allocation for separable horizontal resample
src/modules/tensor/cpu/kernel/lens_correction.cpp Optimized camera coordinate calculations using FMA instructions
src/modules/tensor/cpu/kernel/jpeg_compression_distortion.cpp Removed incorrect quantization clamping to match reference implementation
src/include/tensor/hip_tensor_executors.hpp Updated function signature to include qualityTensor parameter
api/rppt_tensor_geometric_augmentations.h Updated API signature to include qualityTensor parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +601 to +611
Rpp32u *permutationTensor = nullptr;
if(testCase == CHANNEL_PERMUTE)
CHECK_RETURN_STATUS(hipHostMalloc(&permutationTensor, 3 * batchSize * sizeof(Rpp32u)));
if(testCase == RICAP)
CHECK_RETURN_STATUS(hipHostMalloc(&permutationTensor, 4 * batchSize * sizeof(Rpp32u)));
Rpp32s *qualityTensor = nullptr;
if(testCase == JPEG_COMPRESSION_DISTORTION)
CHECK_RETURN_STATUS(hipHostMalloc(&qualityTensor, batchSize * sizeof(Rpp32s)));
Rpp32f *angle = nullptr;
if(testCase == ROTATE)
CHECK_RETURN_STATUS(hipHostMalloc(&angle, batchSize * sizeof(Rpp32f)));
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate variable declarations: permutationTensor, qualityTensor, and angle are declared twice (lines 601-611 and 613 onwards). The second declarations at lines 613+ will shadow the first set, making the allocations at lines 601-611 unreachable and causing memory leaks. Remove the duplicate declarations starting at line 601.

Suggested change
Rpp32u *permutationTensor = nullptr;
if(testCase == CHANNEL_PERMUTE)
CHECK_RETURN_STATUS(hipHostMalloc(&permutationTensor, 3 * batchSize * sizeof(Rpp32u)));
if(testCase == RICAP)
CHECK_RETURN_STATUS(hipHostMalloc(&permutationTensor, 4 * batchSize * sizeof(Rpp32u)));
Rpp32s *qualityTensor = nullptr;
if(testCase == JPEG_COMPRESSION_DISTORTION)
CHECK_RETURN_STATUS(hipHostMalloc(&qualityTensor, batchSize * sizeof(Rpp32s)));
Rpp32f *angle = nullptr;
if(testCase == ROTATE)
CHECK_RETURN_STATUS(hipHostMalloc(&angle, batchSize * sizeof(Rpp32f)));
Rpp32s *qualityTensor = nullptr;
if(testCase == JPEG_COMPRESSION_DISTORTION)
CHECK_RETURN_STATUS(hipHostMalloc(&qualityTensor, batchSize * sizeof(Rpp32s)));

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, Done.

Comment on lines +2183 to +2184
if (permutationTensor != nullptr)
CHECK_RETURN_STATUS(hipHostFree(permutationTensor));
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

permutationTensor is freed twice at lines 2180 and 2184. This will cause a double-free error. Remove the duplicate free at line 2184.

Suggested change
if (permutationTensor != nullptr)
CHECK_RETURN_STATUS(hipHostFree(permutationTensor));

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, Done.

Comment on lines +223 to +224
__m128 pOutputChannel[(numVecs + 1) * 3]; // add 1 with numVecs for additional vector for transpose function with zero initialization
set_zeros(pOutputChannel, (numVecs + 1) * 3);
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'Multiply' to match American English standard. The comment should consistently use 'Multiply' as used elsewhere in the codebase.

Copilot uses AI. Check for mistakes.
__syncthreads();

// Inverse DCT
// Row-wise Inverse DCT
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment style is inconsistent with other step comments in this function. Should use the dashed format '// ----------- Step N: Description -----------' to match surrounding comments for steps 3, 4, and 5.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be ignored, since the inverse DCT belongs to step 5, Done.

compute_dst_size_cap_host(&dstImgSize[batchCount], dstDescPtr); // Check if the dstImgSize exceeds dst buffer size
Rpp32f wRatio = ((Rpp32f)(roi.xywhROI.roiWidth)) / ((Rpp32f)(dstImgSize[batchCount].width));
Rpp32f hRatio = ((Rpp32f)(roi.xywhROI.roiHeight)) / ((Rpp32f)(dstImgSize[batchCount].height));
// Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds.
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identical comment repeated 4 times across different data type implementations. Consider extracting this into a shared documentation section or function comment to reduce duplication and improve maintainability.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment in kernel is kept for better understanding and readability, can be ignored. Done.

Rpp32f hRatio = ((Rpp32f)(roi.xywhROI.roiHeight)) / ((Rpp32f)(dstImgSize[batchCount].height));
Rpp32u maxHeightLimit = roi.xywhROI.roiHeight - 1;
Rpp32u maxWidthLimit = (roi.xywhROI.roiWidth - 1) * srcDescPtr->strides.wStride;
// Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds.
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identical comment repeated 4 times across different data type implementations. Consider extracting this into a shared documentation section or function comment to reduce duplication and improve maintainability.

Suggested change
// Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds.
// For bilinear interpolation, keep a 2-pixel margin so right/bottom neighbor accesses stay in-bounds.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be ignored, Done.

Rpp32f hRatio = ((Rpp32f)(roi.xywhROI.roiHeight)) / ((Rpp32f)(dstImgSize[batchCount].height));
Rpp32u maxHeightLimit = roi.xywhROI.roiHeight - 1;
Rpp32u maxWidthLimit = (roi.xywhROI.roiWidth - 1) * srcDescPtr->strides.wStride;
// Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds.
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identical comment repeated 4 times across different data type implementations. Consider extracting this into a shared documentation section or function comment to reduce duplication and improve maintainability.

Suggested change
// Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds.
// Reserve a 2-pixel border so bilinear interpolation can safely access the right and bottom neighbor samples.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be ignored, Done.

Rpp32f hRatio = ((Rpp32f)(roi.xywhROI.roiHeight)) / ((Rpp32f)(dstImgSize[batchCount].height));
Rpp32u maxHeightLimit = roi.xywhROI.roiHeight - 1;
Rpp32u maxWidthLimit = (roi.xywhROI.roiWidth - 1) * srcDescPtr->strides.wStride;
// Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds.
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identical comment repeated 4 times across different data type implementations. Consider extracting this into a shared documentation section or function comment to reduce duplication and improve maintainability.

Suggested change
// Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be ignored, Done.

@RooseweltMcW
Copy link
Copy Markdown
Author

@r-abishek resolved copilot review comments and ready for review.

@r-abishek r-abishek merged commit 99db80d into r-abishek:ar/test_suite_upgrade_13_f32qa Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants