F32 QA test suite upgradation#534
F32 QA test suite upgradation#534r-abishek merged 28 commits intor-abishek:ar/test_suite_upgrade_13_f32qafrom
Conversation
* Initial log1p implementation in C++ * Added for nDim = 4 separately instead of recursive loop in log1p * Test by converting existing input F32 to I16 * log1p_HIP_Implementation * added abs in AVX2 * HIP calls * log1p HOST * log1p HOST * calls in HIP backend * # * Add files via upload * reference output files for log1p * log1p HOST implementation * removed print statements * Worked on the review comment * Worked on the review comment * Update rpp_hip_common.hpp * Minor changes after review * Reverted the testsuite changes, which were added in support for I16 * removed the testsuite support * Resolve review comments * Add additional blank line * Add rpp_hip_math_log1p to rpp_hip_math.hpp * Update header imports of log1p.cpp cpu version and move compute_log_16_host * Updated file header imports for log1p.cpp hip * Add test suite support to test log1p * Removing Templates * Test suite sepate for log1p and review changes * Further cleanup * Rename log1p functions * Cleanup code further - Moving functions, removing hard coding etc * Optimize and use one abs instruction lesser * Add minor cleanup * Restore code for i16 load * Revise documentation to have only i16 i/p and f32 o/p * Minor cleanup * Declare variables outside condition * Address minor review comments * Update the number of threads required for execution * Further update the definition of log1p * Update rppt_tensor_arithmetic_operations.h * Update rppt_tensor_arithmetic_operations.h * Update CHANGELOG.md * Update CHANGELOG.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update license to 2025 --------- Co-authored-by: ManasaDattaT <tammisetti.manasadatta@multicorewareinc.com> Co-authored-by: root <root@ixt-sjc2-52.local.lan> Co-authored-by: Snehaa Giridharan <snehaa@multicorewareinc.com> Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> Co-authored-by: root <root@x1001c1s1b1n0.hostmgmt2001.cm.lockhart.amd.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
* Initial log1p implementation in C++ * Added for nDim = 4 separately instead of recursive loop in log1p * Test by converting existing input F32 to I16 * log1p_HIP_Implementation * added abs in AVX2 * HIP calls * log1p HOST * log1p HOST * calls in HIP backend * # * Add files via upload * reference output files for log1p * log1p HOST implementation * removed print statements * Worked on the review comment * Worked on the review comment * Update rpp_hip_common.hpp * Minor changes after review * Reverted the testsuite changes, which were added in support for I16 * removed the testsuite support * Resolve review comments * Add additional blank line * Add rpp_hip_math_log1p to rpp_hip_math.hpp * Update header imports of log1p.cpp cpu version and move compute_log_16_host * Updated file header imports for log1p.cpp hip * Add test suite support to test log1p * Removing Templates * Test suite sepate for log1p and review changes * Further cleanup * Rename log1p functions * Cleanup code further - Moving functions, removing hard coding etc * Optimize and use one abs instruction lesser * Add minor cleanup * Restore code for i16 load * Revise documentation to have only i16 i/p and f32 o/p * Minor cleanup * Declare variables outside condition * Address minor review comments * Update the number of threads required for execution * Further update the definition of log1p * Update rppt_tensor_arithmetic_operations.h * Update rppt_tensor_arithmetic_operations.h * Update CHANGELOG.md * Update CHANGELOG.md Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> * Update license to 2025 --------- Co-authored-by: ManasaDattaT <tammisetti.manasadatta@multicorewareinc.com> Co-authored-by: root <root@ixt-sjc2-52.local.lan> Co-authored-by: Snehaa Giridharan <snehaa@multicorewareinc.com> Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> Co-authored-by: root <root@x1001c1s1b1n0.hostmgmt2001.cm.lockhart.amd.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: spolifroni-amd <Sandra.Polifroni@amd.com> Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com>
858ef77 to
4d5e9c5
Compare
* Removed memcpy and used hipHostMalloc for allocation : blend * Removed memcpy and used hipHostMalloc for allocation : brightness * Removed memcpy and used hipHostMalloc for allocation : color cast * Removed memcpy and used hipHostMalloc for allocation : color twist * Removed memcpy and used hipHostMalloc for allocation : contrast * Removed memcpy and used hipHostMalloc for allocation : crop mirror normalize * Removed memcpy and used hipHostMalloc for allocation : Exposure * Removed memcpy and used hipHostMalloc for allocation : Gamma correction * Removed memcpy and used hipHostMalloc for allocation : gaussian filter * Removed memcpy and used hipHostMalloc for allocation : Noise * Removed memcpy and used hipHostMalloc for allocation : Non linear blend * Removed memcpy and used hipHostMalloc for allocation : Resize mirror normalize * Removed memcpy and used hipHostMalloc for allocation : Water * Added hipHostFree for all kernels in test suite * Added hipHostFree for all kernels in test suite * Removed memcpy and used hipHostMalloc for allocation : Flip, spatter, rcm, color temperature * Resolved copilot review comments * Updated version * Removed unused parameter * Updated version in cmakeList * removed the host to device mem copies for warp affine and rotate * Updated version * Removed comment * Updated Chnagelog file * Update patch version from 2.2.0 to 2.2.1 * Update CHANGELOG * Address copilot comments for HIP HOST consistent allocation * Documentation changes for updated memcpy changes * Update ricap outer API to use pinned memory and remove mem copy * Fix memory allocation and deallocation for permutationTensor * Update api/rppt_tensor_effects_augmentations.h Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fix spelling of noiseProbability and saltProbability * Fix deallocation --------- Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> Co-authored-by: hmaddise <HazarathKumar.Maddisetty@amd.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
6d99978 to
dc24897
Compare
Srihari-mcw
left a comment
There was a problem hiding this comment.
Pls make the requested changes and add more description in the PR for changes made. Otherwise looks fine from my side
|
Unified API PR must be updated based on jpeg compression changes |
|
Maybe warp perspective should also eventually contain the warp affine changes for consistency. Not required for this PR though |
There was a problem hiding this comment.
Pull request overview
This PR upgrades the F32 QA test suite by adding reference outputs, updating comparison logic, fixing kernel implementations, and including binary reference files for F32 bitdepth across multiple image processing kernels including pixelate, resize, rotate, lens_correction, resize_crop_mirror, and tensor reduction operations (sum, min, max, mean, stddev).
Changes:
- Added F32-specific golden reference outputs for tensor reduction operations
- Updated comparison functions to support F32 bitdepth with appropriate tolerance thresholds
- Fixed JPEG compression distortion kernel implementations for proper F32 handling and quality parameter support
- Corrected bilinear interpolation bounds checking and AVX2 optimizations in geometric augmentation kernels
Reviewed changes
Copilot reviewed 13 out of 25 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| utilities/test_suite/rpp_test_suite_image.h | Added F32 reference outputs for tensor operations and updated comparison functions with test-case-specific tolerances |
| utilities/test_suite/common.py | Enabled HOST backend for jpeg_compression_distortion test case |
| utilities/test_suite/HOST/Tensor_image_host.cpp | Updated QA validation to support F32 bitdepth for tensor reduction operations |
| utilities/test_suite/HIP/Tensor_image_hip.cpp | Added quality tensor initialization and fixed duplicate memory allocations |
| src/modules/tensor/rppt_tensor_geometric_augmentations.cpp | Added qualityTensor parameter to JPEG compression distortion GPU API |
| src/modules/tensor/hip/kernel/jpeg_compression_distortion.cpp | Refactored quantization to match HOST reference and added proper F32 scaling |
| src/modules/tensor/cpu/kernel/warp_affine.cpp | Fixed AVX2 bilinear interpolation coordinate calculations |
| src/modules/tensor/cpu/kernel/resize_crop_mirror.cpp | Added source location clamping for bilinear interpolation bounds safety |
| src/modules/tensor/cpu/kernel/resize.cpp | Fixed buffer allocation for separable horizontal resample |
| src/modules/tensor/cpu/kernel/lens_correction.cpp | Optimized camera coordinate calculations using FMA instructions |
| src/modules/tensor/cpu/kernel/jpeg_compression_distortion.cpp | Removed incorrect quantization clamping to match reference implementation |
| src/include/tensor/hip_tensor_executors.hpp | Updated function signature to include qualityTensor parameter |
| api/rppt_tensor_geometric_augmentations.h | Updated API signature to include qualityTensor parameter |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Rpp32u *permutationTensor = nullptr; | ||
| if(testCase == CHANNEL_PERMUTE) | ||
| CHECK_RETURN_STATUS(hipHostMalloc(&permutationTensor, 3 * batchSize * sizeof(Rpp32u))); | ||
| if(testCase == RICAP) | ||
| CHECK_RETURN_STATUS(hipHostMalloc(&permutationTensor, 4 * batchSize * sizeof(Rpp32u))); | ||
| Rpp32s *qualityTensor = nullptr; | ||
| if(testCase == JPEG_COMPRESSION_DISTORTION) | ||
| CHECK_RETURN_STATUS(hipHostMalloc(&qualityTensor, batchSize * sizeof(Rpp32s))); | ||
| Rpp32f *angle = nullptr; | ||
| if(testCase == ROTATE) | ||
| CHECK_RETURN_STATUS(hipHostMalloc(&angle, batchSize * sizeof(Rpp32f))); |
There was a problem hiding this comment.
Duplicate variable declarations: permutationTensor, qualityTensor, and angle are declared twice (lines 601-611 and 613 onwards). The second declarations at lines 613+ will shadow the first set, making the allocations at lines 601-611 unreachable and causing memory leaks. Remove the duplicate declarations starting at line 601.
| Rpp32u *permutationTensor = nullptr; | |
| if(testCase == CHANNEL_PERMUTE) | |
| CHECK_RETURN_STATUS(hipHostMalloc(&permutationTensor, 3 * batchSize * sizeof(Rpp32u))); | |
| if(testCase == RICAP) | |
| CHECK_RETURN_STATUS(hipHostMalloc(&permutationTensor, 4 * batchSize * sizeof(Rpp32u))); | |
| Rpp32s *qualityTensor = nullptr; | |
| if(testCase == JPEG_COMPRESSION_DISTORTION) | |
| CHECK_RETURN_STATUS(hipHostMalloc(&qualityTensor, batchSize * sizeof(Rpp32s))); | |
| Rpp32f *angle = nullptr; | |
| if(testCase == ROTATE) | |
| CHECK_RETURN_STATUS(hipHostMalloc(&angle, batchSize * sizeof(Rpp32f))); | |
| Rpp32s *qualityTensor = nullptr; | |
| if(testCase == JPEG_COMPRESSION_DISTORTION) | |
| CHECK_RETURN_STATUS(hipHostMalloc(&qualityTensor, batchSize * sizeof(Rpp32s))); |
| if (permutationTensor != nullptr) | ||
| CHECK_RETURN_STATUS(hipHostFree(permutationTensor)); |
There was a problem hiding this comment.
permutationTensor is freed twice at lines 2180 and 2184. This will cause a double-free error. Remove the duplicate free at line 2184.
| if (permutationTensor != nullptr) | |
| CHECK_RETURN_STATUS(hipHostFree(permutationTensor)); |
| __m128 pOutputChannel[(numVecs + 1) * 3]; // add 1 with numVecs for additional vector for transpose function with zero initialization | ||
| set_zeros(pOutputChannel, (numVecs + 1) * 3); |
There was a problem hiding this comment.
Corrected spelling of 'Multiply' to match American English standard. The comment should consistently use 'Multiply' as used elsewhere in the codebase.
| __syncthreads(); | ||
|
|
||
| // Inverse DCT | ||
| // Row-wise Inverse DCT |
There was a problem hiding this comment.
Comment style is inconsistent with other step comments in this function. Should use the dashed format '// ----------- Step N: Description -----------' to match surrounding comments for steps 3, 4, and 5.
There was a problem hiding this comment.
Can be ignored, since the inverse DCT belongs to step 5, Done.
| compute_dst_size_cap_host(&dstImgSize[batchCount], dstDescPtr); // Check if the dstImgSize exceeds dst buffer size | ||
| Rpp32f wRatio = ((Rpp32f)(roi.xywhROI.roiWidth)) / ((Rpp32f)(dstImgSize[batchCount].width)); | ||
| Rpp32f hRatio = ((Rpp32f)(roi.xywhROI.roiHeight)) / ((Rpp32f)(dstImgSize[batchCount].height)); | ||
| // Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds. |
There was a problem hiding this comment.
Identical comment repeated 4 times across different data type implementations. Consider extracting this into a shared documentation section or function comment to reduce duplication and improve maintainability.
There was a problem hiding this comment.
Comment in kernel is kept for better understanding and readability, can be ignored. Done.
| Rpp32f hRatio = ((Rpp32f)(roi.xywhROI.roiHeight)) / ((Rpp32f)(dstImgSize[batchCount].height)); | ||
| Rpp32u maxHeightLimit = roi.xywhROI.roiHeight - 1; | ||
| Rpp32u maxWidthLimit = (roi.xywhROI.roiWidth - 1) * srcDescPtr->strides.wStride; | ||
| // Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds. |
There was a problem hiding this comment.
Identical comment repeated 4 times across different data type implementations. Consider extracting this into a shared documentation section or function comment to reduce duplication and improve maintainability.
| // Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds. | |
| // For bilinear interpolation, keep a 2-pixel margin so right/bottom neighbor accesses stay in-bounds. |
| Rpp32f hRatio = ((Rpp32f)(roi.xywhROI.roiHeight)) / ((Rpp32f)(dstImgSize[batchCount].height)); | ||
| Rpp32u maxHeightLimit = roi.xywhROI.roiHeight - 1; | ||
| Rpp32u maxWidthLimit = (roi.xywhROI.roiWidth - 1) * srcDescPtr->strides.wStride; | ||
| // Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds. |
There was a problem hiding this comment.
Identical comment repeated 4 times across different data type implementations. Consider extracting this into a shared documentation section or function comment to reduce duplication and improve maintainability.
| // Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds. | |
| // Reserve a 2-pixel border so bilinear interpolation can safely access the right and bottom neighbor samples. |
| Rpp32f hRatio = ((Rpp32f)(roi.xywhROI.roiHeight)) / ((Rpp32f)(dstImgSize[batchCount].height)); | ||
| Rpp32u maxHeightLimit = roi.xywhROI.roiHeight - 1; | ||
| Rpp32u maxWidthLimit = (roi.xywhROI.roiWidth - 1) * srcDescPtr->strides.wStride; | ||
| // Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds. |
There was a problem hiding this comment.
Identical comment repeated 4 times across different data type implementations. Consider extracting this into a shared documentation section or function comment to reduce duplication and improve maintainability.
| // Bilinear interpolation reads the right/bottom neighbor, so leave a 2-pixel margin to stay in bounds. |
|
@r-abishek resolved copilot review comments and ready for review. |
Updated code and include bin files for the following kernels to match QA for F32 bitdepth