F32 QA test suite upgradation by RooseweltMcW · Pull Request #534 · r-abishek/rpp

RooseweltMcW · 2025-11-27T12:04:43Z

Updated code and include bin files for the following kernels to match QA for F32 bitdepth

pixelate
Resize
Rotate
lens_correction
resize crop mirror
tensor_sum
tensor_min
tensor_max
tensor_mean
tensor_stddev

* Initial log1p implementation in C++ * Added for nDim = 4 separately instead of recursive loop in log1p * Test by converting existing input F32 to I16 * log1p_HIP_Implementation * added abs in AVX2 * HIP calls * log1p HOST * log1p HOST * calls in HIP backend * # * Add files via upload * reference output files for log1p * log1p HOST implementation * removed print statements * Worked on the review comment * Worked on the review comment * Update rpp_hip_common.hpp * Minor changes after review * Reverted the testsuite changes, which were added in support for I16 * removed the testsuite support * Resolve review comments * Add additional blank line * Add rpp_hip_math_log1p to rpp_hip_math.hpp * Update header imports of log1p.cpp cpu version and move compute_log_16_host * Updated file header imports for log1p.cpp hip * Add test suite support to test log1p * Removing Templates * Test suite sepate for log1p and review changes * Further cleanup * Rename log1p functions * Cleanup code further - Moving functions, removing hard coding etc * Optimize and use one abs instruction lesser * Add minor cleanup * Restore code for i16 load * Revise documentation to have only i16 i/p and f32 o/p * Minor cleanup * Declare variables outside condition * Address minor review comments * Update the number of threads required for execution * Further update the definition of log1p * Update rppt_tensor_arithmetic_operations.h * Update rppt_tensor_arithmetic_operations.h * Update CHANGELOG.md * Update CHANGELOG.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update license to 2025 --------- Co-authored-by: ManasaDattaT <tammisetti.manasadatta@multicorewareinc.com> Co-authored-by: root <root@ixt-sjc2-52.local.lan> Co-authored-by: Snehaa Giridharan <snehaa@multicorewareinc.com> Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> Co-authored-by: root <root@x1001c1s1b1n0.hostmgmt2001.cm.lockhart.amd.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>

…KumarM#9)

* Removed memcpy and used hipHostMalloc for allocation : blend * Removed memcpy and used hipHostMalloc for allocation : brightness * Removed memcpy and used hipHostMalloc for allocation : color cast * Removed memcpy and used hipHostMalloc for allocation : color twist * Removed memcpy and used hipHostMalloc for allocation : contrast * Removed memcpy and used hipHostMalloc for allocation : crop mirror normalize * Removed memcpy and used hipHostMalloc for allocation : Exposure * Removed memcpy and used hipHostMalloc for allocation : Gamma correction * Removed memcpy and used hipHostMalloc for allocation : gaussian filter * Removed memcpy and used hipHostMalloc for allocation : Noise * Removed memcpy and used hipHostMalloc for allocation : Non linear blend * Removed memcpy and used hipHostMalloc for allocation : Resize mirror normalize * Removed memcpy and used hipHostMalloc for allocation : Water * Added hipHostFree for all kernels in test suite * Added hipHostFree for all kernels in test suite * Removed memcpy and used hipHostMalloc for allocation : Flip, spatter, rcm, color temperature * Resolved copilot review comments * Updated version * Removed unused parameter * Updated version in cmakeList * removed the host to device mem copies for warp affine and rotate * Updated version * Removed comment * Updated Chnagelog file * Update patch version from 2.2.0 to 2.2.1 * Update CHANGELOG * Address copilot comments for HIP HOST consistent allocation * Documentation changes for updated memcpy changes * Update ricap outer API to use pinned memory and remove mem copy * Fix memory allocation and deallocation for permutationTensor * Update api/rppt_tensor_effects_augmentations.h Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fix spelling of noiseProbability and saltProbability * Fix deallocation --------- Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> Co-authored-by: hmaddise <HazarathKumar.Maddisetty@amd.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

src/modules/tensor/cpu/kernel/resize_crop_mirror.cpp

src/modules/tensor/cpu/kernel/warp_affine.cpp

Srihari-mcw

Pls make the requested changes and add more description in the PR for changes made. Otherwise looks fine from my side

Srihari-mcw · 2025-12-30T06:38:31Z

Unified API PR must be updated based on jpeg compression changes

Srihari-mcw · 2025-12-30T06:39:32Z

Maybe warp perspective should also eventually contain the warp affine changes for consistency. Not required for this PR though

Copilot

Pull request overview

This PR upgrades the F32 QA test suite by adding reference outputs, updating comparison logic, fixing kernel implementations, and including binary reference files for F32 bitdepth across multiple image processing kernels including pixelate, resize, rotate, lens_correction, resize_crop_mirror, and tensor reduction operations (sum, min, max, mean, stddev).

Changes:

Added F32-specific golden reference outputs for tensor reduction operations
Updated comparison functions to support F32 bitdepth with appropriate tolerance thresholds
Fixed JPEG compression distortion kernel implementations for proper F32 handling and quality parameter support
Corrected bilinear interpolation bounds checking and AVX2 optimizations in geometric augmentation kernels

Reviewed changes

Copilot reviewed 13 out of 25 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
utilities/test_suite/rpp_test_suite_image.h	Added F32 reference outputs for tensor operations and updated comparison functions with test-case-specific tolerances
utilities/test_suite/common.py	Enabled HOST backend for jpeg_compression_distortion test case
utilities/test_suite/HOST/Tensor_image_host.cpp	Updated QA validation to support F32 bitdepth for tensor reduction operations
utilities/test_suite/HIP/Tensor_image_hip.cpp	Added quality tensor initialization and fixed duplicate memory allocations
src/modules/tensor/rppt_tensor_geometric_augmentations.cpp	Added qualityTensor parameter to JPEG compression distortion GPU API
src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp	Refactored quantization to match HOST reference and added proper F32 scaling
src/modules/tensor/cpu/kernel/warp_affine.cpp	Fixed AVX2 bilinear interpolation coordinate calculations
src/modules/tensor/cpu/kernel/resize_crop_mirror.cpp	Added source location clamping for bilinear interpolation bounds safety
src/modules/tensor/cpu/kernel/resize.cpp	Fixed buffer allocation for separable horizontal resample
src/modules/tensor/cpu/kernel/lens_correction.cpp	Optimized camera coordinate calculations using FMA instructions
src/modules/tensor/cpu/kernel/jpeg_compression_distortion.cpp	Removed incorrect quantization clamping to match reference implementation
src/include/tensor/hip_tensor_executors.hpp	Updated function signature to include qualityTensor parameter
api/rppt_tensor_geometric_augmentations.h	Updated API signature to include qualityTensor parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-21T03:56:58Z

utilities/test_suite/HIP/Tensor_image_hip.cpp

+    Rpp32u *permutationTensor = nullptr;
+    if(testCase == CHANNEL_PERMUTE)
+        CHECK_RETURN_STATUS(hipHostMalloc(&permutationTensor, 3 * batchSize * sizeof(Rpp32u)));
+    if(testCase == RICAP)
+        CHECK_RETURN_STATUS(hipHostMalloc(&permutationTensor, 4 * batchSize * sizeof(Rpp32u)));
+    Rpp32s *qualityTensor = nullptr;
+    if(testCase == JPEG_COMPRESSION_DISTORTION)
+        CHECK_RETURN_STATUS(hipHostMalloc(&qualityTensor, batchSize * sizeof(Rpp32s)));
+    Rpp32f *angle = nullptr;
+    if(testCase == ROTATE)
+        CHECK_RETURN_STATUS(hipHostMalloc(&angle, batchSize * sizeof(Rpp32f)));


Duplicate variable declarations: permutationTensor, qualityTensor, and angle are declared twice (lines 601-611 and 613 onwards). The second declarations at lines 613+ will shadow the first set, making the allocations at lines 601-611 unreachable and causing memory leaks. Remove the duplicate declarations starting at line 601.

Suggested change

Rpp32u *permutationTensor = nullptr;

if(testCase == CHANNEL_PERMUTE)

CHECK_RETURN_STATUS(hipHostMalloc(&permutationTensor, 3 * batchSize * sizeof(Rpp32u)));

if(testCase == RICAP)

CHECK_RETURN_STATUS(hipHostMalloc(&permutationTensor, 4 * batchSize * sizeof(Rpp32u)));

Rpp32s *qualityTensor = nullptr;

if(testCase == JPEG_COMPRESSION_DISTORTION)

CHECK_RETURN_STATUS(hipHostMalloc(&qualityTensor, batchSize * sizeof(Rpp32s)));

Rpp32f *angle = nullptr;

if(testCase == ROTATE)

CHECK_RETURN_STATUS(hipHostMalloc(&angle, batchSize * sizeof(Rpp32f)));

Rpp32s *qualityTensor = nullptr;

if(testCase == JPEG_COMPRESSION_DISTORTION)

CHECK_RETURN_STATUS(hipHostMalloc(&qualityTensor, batchSize * sizeof(Rpp32s)));

Removed, Done.

Copilot · 2026-01-21T03:56:59Z

utilities/test_suite/HIP/Tensor_image_hip.cpp

+    if (permutationTensor != nullptr)
+        CHECK_RETURN_STATUS(hipHostFree(permutationTensor));


permutationTensor is freed twice at lines 2180 and 2184. This will cause a double-free error. Remove the duplicate free at line 2184.

Suggested change

if (permutationTensor != nullptr)

CHECK_RETURN_STATUS(hipHostFree(permutationTensor));

Removed, Done.

Copilot · 2026-01-21T03:56:59Z

src/modules/tensor/cpu/kernel/resize.cpp

+                __m128 pOutputChannel[(numVecs + 1) * 3];   // add 1 with numVecs for additional vector for transpose function with zero initialization 
+                set_zeros(pOutputChannel, (numVecs + 1) * 3);


Corrected spelling of 'Multiply' to match American English standard. The comment should consistently use 'Multiply' as used elsewhere in the codebase.

Copilot · 2026-01-21T03:56:59Z

src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp

    __syncthreads();

-    // Inverse DCT
+    // Row-wise Inverse DCT 


Comment style is inconsistent with other step comments in this function. Should use the dashed format '// ----------- Step N: Description -----------' to match surrounding comments for steps 3, 4, and 5.

Can be ignored, since the inverse DCT belongs to step 5, Done.

Copilot · 2026-01-21T03:57:00Z

src/modules/tensor/cpu/kernel/resize_crop_mirror.cpp

        compute_dst_size_cap_host(&dstImgSize[batchCount], dstDescPtr);     // Check if the dstImgSize exceeds dst buffer size
        Rpp32f wRatio = ((Rpp32f)(roi.xywhROI.roiWidth)) / ((Rpp32f)(dstImgSize[batchCount].width));
        Rpp32f hRatio = ((Rpp32f)(roi.xywhROI.roiHeight)) / ((Rpp32f)(dstImgSize[batchCount].height));
+        // Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds.


Identical comment repeated 4 times across different data type implementations. Consider extracting this into a shared documentation section or function comment to reduce duplication and improve maintainability.

Comment in kernel is kept for better understanding and readability, can be ignored. Done.

Copilot · 2026-01-21T03:57:00Z

src/modules/tensor/cpu/kernel/resize_crop_mirror.cpp

        Rpp32f hRatio = ((Rpp32f)(roi.xywhROI.roiHeight)) / ((Rpp32f)(dstImgSize[batchCount].height));
-        Rpp32u maxHeightLimit = roi.xywhROI.roiHeight - 1;
-        Rpp32u maxWidthLimit = (roi.xywhROI.roiWidth - 1) * srcDescPtr->strides.wStride;
+        // Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds.


Identical comment repeated 4 times across different data type implementations. Consider extracting this into a shared documentation section or function comment to reduce duplication and improve maintainability.

Suggested change

// Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds.

// For bilinear interpolation, keep a 2-pixel margin so right/bottom neighbor accesses stay in-bounds.

Can be ignored, Done.

Copilot · 2026-01-21T03:57:00Z

src/modules/tensor/cpu/kernel/resize_crop_mirror.cpp

        Rpp32f hRatio = ((Rpp32f)(roi.xywhROI.roiHeight)) / ((Rpp32f)(dstImgSize[batchCount].height));
-        Rpp32u maxHeightLimit = roi.xywhROI.roiHeight - 1;
-        Rpp32u maxWidthLimit = (roi.xywhROI.roiWidth - 1) * srcDescPtr->strides.wStride;
+        // Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds.


Identical comment repeated 4 times across different data type implementations. Consider extracting this into a shared documentation section or function comment to reduce duplication and improve maintainability.

Suggested change

// Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds.

// Reserve a 2-pixel border so bilinear interpolation can safely access the right and bottom neighbor samples.

Can be ignored, Done.

Copilot · 2026-01-21T03:57:01Z

src/modules/tensor/cpu/kernel/resize_crop_mirror.cpp

        Rpp32f hRatio = ((Rpp32f)(roi.xywhROI.roiHeight)) / ((Rpp32f)(dstImgSize[batchCount].height));
-        Rpp32u maxHeightLimit = roi.xywhROI.roiHeight - 1;
-        Rpp32u maxWidthLimit = (roi.xywhROI.roiWidth - 1) * srcDescPtr->strides.wStride;
+        // Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds.


Identical comment repeated 4 times across different data type implementations. Consider extracting this into a shared documentation section or function comment to reduce duplication and improve maintainability.

Suggested change

// Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds.

Can be ignored, Done.

RooseweltMcW · 2026-01-21T06:22:59Z

@r-abishek resolved copilot review comments and ready for review.

ManasaDattaT force-pushed the apr/f32_qa_set2 branch from 858ef77 to 4d5e9c5 Compare December 19, 2025 08:26

HazarathKumarM marked this pull request as draft December 22, 2025 17:05

HazarathKumarM changed the base branch from develop to master December 22, 2025 17:11

HazarathKumarM changed the base branch from master to develop December 22, 2025 17:12

kiritigowda and others added 24 commits December 22, 2025 12:56

Travis CI - key error fix

516ea9b

Fix Bug in ColorTwist (HazarathKumarM#6) (HazarathKumarM#8) (Hazarath…

73e6f31

…KumarM#9)

Added golden outputs and resolved HOST backend

d25aeda

Updated bin files for median filter and resize crop mirror

b4e4153

Updated bin files

e59c033

Updated bin files for the next set of kernel F32 QA

7a2d46f

Updated bin files for jpeg_compression_distortion

67429cd

Fixed resize QA failures

0e18158

Fix for Resize bilinear F32 QA HOST and HIP

15d4716

Fix for lens correction QA f32 for HOST and HIP for 1e-4 precision

417cb26

Fixed HIP rcm QA

f8234c8

updates for warp Affine F32 QA

b43bf9e

Fix for RCM QA match for U8 and F32 updates AVX

09d1288

Fix for lens correction AVX

9acbf25

Removed space

b092b1c

Fixed warp affine for every other varient with the updated changes

3d98a42

Add fixes to match precision in quantization

48f704c

Fix Precision mismatches

0fba494

Update default cutoff to 1e-5 and specialized cutoff to 1e-4

0fbe2a1

F32 QA Fix

8951bc7

Made Quality percentage as arg from testsuite

6219425

Resolved copilot comments

a0cbdd3

Resolved the copilot comments

679d899

Resolved Codex comments

cd375a3

HazarathKumarM force-pushed the apr/f32_qa_set2 branch from 6d99978 to dc24897 Compare December 22, 2025 18:03

HazarathKumarM marked this pull request as ready for review December 22, 2025 18:04

Srihari-mcw reviewed Dec 30, 2025

View reviewed changes

src/modules/tensor/cpu/kernel/resize_crop_mirror.cpp Show resolved Hide resolved

Srihari-mcw reviewed Dec 30, 2025

View reviewed changes

src/modules/tensor/cpu/kernel/resize_crop_mirror.cpp Show resolved Hide resolved

Srihari-mcw reviewed Dec 30, 2025

View reviewed changes

src/modules/tensor/cpu/kernel/warp_affine.cpp Outdated Show resolved Hide resolved

Srihari-mcw reviewed Dec 30, 2025

View reviewed changes

HazarathKumarM added 2 commits January 7, 2026 01:30

resolved review comments

82515a7

minor comment change

da2ed27

r-abishek changed the base branch from develop to ar/test_suite_upgrade_13_f32qa January 21, 2026 03:54

r-abishek requested a review from Copilot January 21, 2026 03:55

Copilot AI reviewed Jan 21, 2026

View reviewed changes

Resolved copilot review comments

9deaf25

r-abishek approved these changes Jan 22, 2026

View reviewed changes

r-abishek merged commit 99db80d into r-abishek:ar/test_suite_upgrade_13_f32qa Jan 22, 2026

		if (permutationTensor != nullptr)
		CHECK_RETURN_STATUS(hipHostFree(permutationTensor));

		__m128 pOutputChannel[(numVecs + 1) * 3]; // add 1 with numVecs for additional vector for transpose function with zero initialization
		set_zeros(pOutputChannel, (numVecs + 1) * 3);

	// Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds.
	// For bilinear interpolation, keep a 2-pixel margin so right/bottom neighbor accesses stay in-bounds.

	// Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds.
	// Reserve a 2-pixel border so bilinear interpolation can safely access the right and bottom neighbor samples.

Conversation

RooseweltMcW commented Nov 27, 2025 • edited by Srihari-mcw Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Srihari-mcw left a comment

Choose a reason for hiding this comment

Uh oh!

Srihari-mcw commented Dec 30, 2025

Uh oh!

Srihari-mcw commented Dec 30, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

RooseweltMcW Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

RooseweltMcW Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

RooseweltMcW Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

RooseweltMcW Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

RooseweltMcW Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

RooseweltMcW Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

RooseweltMcW Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

RooseweltMcW commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

RooseweltMcW commented Nov 27, 2025 •

edited by Srihari-mcw

Loading