F16 variants - Update loads and stores to AVX2 - Group 5#637
F16 variants - Update loads and stores to AVX2 - Group 5#637kiritigowda merged 12 commits intoROCm:developfrom
Conversation
FP16 Load/Store Updates
There was a problem hiding this comment.
Pull Request Overview
This PR updates F16 (half-precision floating point) kernels to use AVX2 intrinsics for loading and storing data, replacing scalar conversions to FP32. The changes target the blend, color_cast, flip, and crop_mirror_normalize kernels, delivering performance improvements of 28.7% to 48.5% for FP16 operations.
- Replaces scalar F16 to F32 conversions with AVX2 SIMD intrinsics
- Introduces new AVX2 load/store functions for F16 data with mirroring support
- Updates conditional branching for better code clarity
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/modules/tensor/cpu/kernel/flip.cpp | Updates flip kernel to use AVX2 F16 load/store functions, removes temporary F32 buffers, adjusts flip factor calculation for RGB channels |
| src/modules/tensor/cpu/kernel/crop_mirror_normalize.cpp | Replaces scalar F16 conversions with AVX2 intrinsics, improves conditional structure with else if |
| src/modules/tensor/cpu/kernel/color_cast.cpp | Adds AVX2 code paths with preprocessing directives, updates to use F16 load/store functions |
| src/modules/tensor/cpu/kernel/blend.cpp | Converts to AVX2 F16 operations, adds compile-time AVX2 feature detection |
| src/include/common/cpu/rpp_cpu_simd_load_store.hpp | Adds new F16 mirror load functions for AVX2 (pkd3 and pln3 variants) |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| _MM_TRANSPOSE4_PS(p128[4], p128[5], p128[6], p128[7]); /* Transpose the 4x4 matrix and forms [[R05 R06 R07 R08][B05 B06 B07 B08][G05 G06 G07 G08][R06 R07 R08 R09]] */ | ||
| p[0] = _mm256_setr_m128(p128[0], p128[4]); /* packs as R01-R08 */ | ||
| p[1] = _mm256_setr_m128(p128[1], p128[5]); /* packs as G01-G08 */ | ||
| p[2] = _mm256_setr_m128(p128[2], p128[6]); /* packs as B01-R08 */ |
There was a problem hiding this comment.
The comment incorrectly states "B01-R08" when it should be "B01-B08" to match the pattern of the other channels and correctly describe what is being packed.
| p[2] = _mm256_setr_m128(p128[2], p128[6]); /* packs as B01-R08 */ | |
| p[2] = _mm256_setr_m128(p128[2], p128[6]); /* packs as B01-B08 */ |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #637 +/- ##
===========================================
+ Coverage 88.24% 88.28% +0.04%
===========================================
Files 195 195
Lines 82712 82619 -93
===========================================
- Hits 72985 72934 -51
+ Misses 9727 9685 -42
🚀 New features to boost your workflow:
|
Fix comment in common function
|
@rrawther @LakshmiKumar23 CI failure is only due to (-0.09%) reduction in coverage. |
* Updates for crop mirror normalize * Updated flip F16 rawC and load store modifications * Updated blend with AVX support for F16 bitdepth * Updated color cast with AVX support for F16 bitdepth * Remove empty lines * Update comments * Fix comment in common function --------- Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
* Updates for crop mirror normalize * Updated flip F16 rawC and load store modifications * Updated blend with AVX support for F16 bitdepth * Updated color cast with AVX support for F16 bitdepth * Remove empty lines * Update comments * Fix comment in common function --------- Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
* F16 variants - Update loads and stores to AVX2 - Group 4 (#627) * Make changes for exposure, log and spatter * Updates for crop mirror normalize * Fix memory issues with log 1D * Remove changes for crop mirror normalize and restore rpp_cpu_simd_load_store.hpp * Update the alignedLength for log --------- Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: Lakshmi Kumar <lakshmi.kumar@amd.com> * Package - Enable Lintian Support rpp (#633) * fix lintian errors * fix lintian overrides static error * lintian errors fixed * move lintian overrides into if deb check * use existing changelog. fix formatting * not installing lintian overrides. keeping original changelog name * remove overrides --------- Co-authored-by: Lakshmi Kumar <lakshmi.kumar@amd.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> * Docs - Bump rocm-docs-core[api_reference] from 1.27.0 to 1.29.0 in /docs/sphinx (#638) Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.27.0 to 1.29.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.27.0...v1.29.0) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-version: 1.29.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> * Test suite - Add QA pass/fail tests for F32 bit depth (#631) * Added golden outputs and resolved HOST backend * Updated bin files for median filter and resize crop mirror * Fix for median filter F32 QA * Updated bin files * Updated rcm review comments * Updated comments for rmn * Modified bitdepths and resolved review comments * Fix typo * resolve review comments --------- Co-authored-by: sampath117 <snehaa@multicorewareinc.com> Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: Lakshmi Kumar <lakshmi.kumar@amd.com> * Test Suite - Error Code Capture for all tests (#635) * Updates to capture error code * Intialize RPP_SUCCESS as default value * Update the code to display error status as part of the C++ code execution * Update rpp_test_suite_common.h * Update utilities/test_suite/HIP/Tensor_audio_hip.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update utilities/test_suite/HIP/Tensor_image_hip.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update utilities/test_suite/HIP/Tensor_misc_hip.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update utilities/test_suite/HIP/Tensor_voxel_hip.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update utilities/test_suite/HOST/Tensor_audio_host.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update utilities/test_suite/HOST/Tensor_image_host.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update utilities/test_suite/HOST/Tensor_misc_host.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update utilities/test_suite/HOST/Tensor_voxel_host.cpp Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fixes for CI issues * Restore naming convention in voxel test suite * Fix compilation issues * Update the code to use func for funcName * Modify error message * Modify the print statements --------- Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> * F16 variants - Update loads and stores to AVX2 - Group 5 (#637) * Updates for crop mirror normalize * Updated flip F16 rawC and load store modifications * Updated blend with AVX support for F16 bitdepth * Updated color cast with AVX support for F16 bitdepth * Remove empty lines * Update comments * Fix comment in common function --------- Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> * Docs - Bump rocm-docs-core[api_reference] from 1.29.0 to 1.30.0 in /docs/sphinx (#640) Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.29.0 to 1.30.0. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.29.0...v1.30.0) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-version: 1.30.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * HOST and HIP - pinned buffers for respective API (#628) * Removed memcpy and used hipHostMalloc for allocation : blend * Removed memcpy and used hipHostMalloc for allocation : brightness * Removed memcpy and used hipHostMalloc for allocation : color cast * Removed memcpy and used hipHostMalloc for allocation : color twist * Removed memcpy and used hipHostMalloc for allocation : contrast * Removed memcpy and used hipHostMalloc for allocation : crop mirror normalize * Removed memcpy and used hipHostMalloc for allocation : Exposure * Removed memcpy and used hipHostMalloc for allocation : Gamma correction * Removed memcpy and used hipHostMalloc for allocation : gaussian filter * Removed memcpy and used hipHostMalloc for allocation : Noise * Removed memcpy and used hipHostMalloc for allocation : Non linear blend * Removed memcpy and used hipHostMalloc for allocation : Resize mirror normalize * Removed memcpy and used hipHostMalloc for allocation : Water * Added hipHostFree for all kernels in test suite * Added hipHostFree for all kernels in test suite * Removed memcpy and used hipHostMalloc for allocation : Flip, spatter, rcm, color temperature * Resolved copilot review comments * Updated version * Removed unused parameter * Updated version in cmakeList * removed the host to device mem copies for warp affine and rotate * Updated version * Removed comment * Updated Chnagelog file * Update patch version from 2.2.0 to 2.2.1 * Update CHANGELOG * Address copilot comments for HIP HOST consistent allocation * Documentation changes for updated memcpy changes * Update ricap outer API to use pinned memory and remove mem copy * Fix memory allocation and deallocation for permutationTensor * Update api/rppt_tensor_effects_augmentations.h Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fix spelling of noiseProbability and saltProbability * Fix deallocation --------- Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com> Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> Co-authored-by: hmaddise <HazarathKumar.Maddisetty@amd.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Docs - Bump rocm-docs-core[api_reference] from 1.30.0 to 1.30.1 in /docs/sphinx (#643) Bumps [rocm-docs-core[api_reference]](https://github.com/ROCm/rocm-docs-core) from 1.30.0 to 1.30.1. - [Release notes](https://github.com/ROCm/rocm-docs-core/releases) - [Changelog](https://github.com/ROCm/rocm-docs-core/blob/develop/CHANGELOG.md) - [Commits](ROCm/rocm-docs-core@v1.30.0...v1.30.1) --- updated-dependencies: - dependency-name: rocm-docs-core[api_reference] dependency-version: 1.30.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * CMakelists - Add optional GPU targets (#641) * add optional gpu targets * add addiitonal gpu targets * Rename function - hip_exec_roi_converison_ltrb_to_xywh to hip_exec_roi_conversion_ltrb_to_xywh (#645) Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> * Docs - Update CHANGELOG.md (#646) Updates --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Abishek <52214183+r-abishek@users.noreply.github.com> Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> Co-authored-by: Lakshmi Kumar <lakshmi.kumar@amd.com> Co-authored-by: jonatluu <jonatluu@amd.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: sampath117 <snehaa@multicorewareinc.com> Co-authored-by: HazarathKumarM <hazarathkumar@multicorewareinc.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: hmaddise <HazarathKumar.Maddisetty@amd.com>
F16 Load/Store updates for blend, color_cast, flip, crop_mirror_normalize.