F16 variants - Update loads and stores to AVX2 - Group 3#579
Merged
kiritigowda merged 10 commits intoROCm:developfrom Jul 25, 2025
Merged
F16 variants - Update loads and stores to AVX2 - Group 3#579kiritigowda merged 10 commits intoROCm:developfrom
kiritigowda merged 10 commits intoROCm:developfrom
Conversation
Member
r-abishek
commented
Jul 9, 2025
- Replacement of scalar load/store and conversion to FP32, with AVX2 intrinsics - no additions or removals to external user API.
- 4-12% improvements in performance for the updated kernels for the FP16 bit depth.
- F16 Load/Store updates for vignette, magnitude, contrast, brightness.

…nels F16 load/store updates for 4 kernels
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR enhances performance of FP16 processing kernels by replacing scalar conversions with AVX2 intrinsics for loads and stores, and adds boundary checks in brightness routines without altering the external API.
- Swapped out scalar Rpp32f conversion loops for direct FP16-to-FP32 and FP32-to-FP16 AVX2 intrinsics in vignette, magnitude, contrast, and brightness kernels.
- Inserted
rpp_pixel_check_0to1boundary checks for brightness in both f32 and f16 paths. - No changes to public API; purely internal performance optimizations.
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/modules/tensor/cpu/kernel/vignette.cpp | Replaced scalar loops with rpp_simd_load/store FP16 intrinsics |
| src/modules/tensor/cpu/kernel/magnitude.cpp | Updated magnitude kernels to use FP16 AVX2 load/store intrinsics |
| src/modules/tensor/cpu/kernel/contrast.cpp | Swapped in FP16 intrinsics for contrast kernels |
| src/modules/tensor/cpu/kernel/brightness.cpp | Applied FP16 intrinsics and added boundary checks to brightness kernels |
Comments suppressed due to low confidence (1)
src/modules/tensor/cpu/kernel/vignette.cpp:1076
- [nitpick] New AVX2 intrinsics for FP16 load/store have been introduced here; consider adding or updating unit tests to cover both aligned and unaligned lengths and verify correctness of the FP16 paths.
rpp_simd_load(rpp_load24_f16pkd3_to_f32pln3_avx, srcPtrTemp, p); // simd loads
F16 Review Comments - Update the brightness pixel check - SSE Code
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #579 +/- ##
===========================================
- Coverage 87.97% 87.95% -0.02%
===========================================
Files 190 190
Lines 80802 80675 -127
===========================================
- Hits 71080 70951 -129
- Misses 9722 9724 +2
🚀 New features to boost your workflow:
|
rrawther
approved these changes
Jul 23, 2025
ManasaDattaT
pushed a commit
to ManasaDattaT/rpp
that referenced
this pull request
Dec 19, 2025
* Updates for 4 kernels * Update the brightness pixel check --------- Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.