Skip to content

F16 variants - Update loads and stores to AVX2 - Group 3#579

Merged
kiritigowda merged 10 commits intoROCm:developfrom
r-abishek:ar/opt_f16_loads_stores_3
Jul 25, 2025
Merged

F16 variants - Update loads and stores to AVX2 - Group 3#579
kiritigowda merged 10 commits intoROCm:developfrom
r-abishek:ar/opt_f16_loads_stores_3

Conversation

@r-abishek
Copy link
Copy Markdown
Member

  • Replacement of scalar load/store and conversion to FP32, with AVX2 intrinsics - no additions or removals to external user API.
  • 4-12% improvements in performance for the updated kernels for the FP16 bit depth.
  • F16 Load/Store updates for vignette, magnitude, contrast, brightness.
    image

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances performance of FP16 processing kernels by replacing scalar conversions with AVX2 intrinsics for loads and stores, and adds boundary checks in brightness routines without altering the external API.

  • Swapped out scalar Rpp32f conversion loops for direct FP16-to-FP32 and FP32-to-FP16 AVX2 intrinsics in vignette, magnitude, contrast, and brightness kernels.
  • Inserted rpp_pixel_check_0to1 boundary checks for brightness in both f32 and f16 paths.
  • No changes to public API; purely internal performance optimizations.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
src/modules/tensor/cpu/kernel/vignette.cpp Replaced scalar loops with rpp_simd_load/store FP16 intrinsics
src/modules/tensor/cpu/kernel/magnitude.cpp Updated magnitude kernels to use FP16 AVX2 load/store intrinsics
src/modules/tensor/cpu/kernel/contrast.cpp Swapped in FP16 intrinsics for contrast kernels
src/modules/tensor/cpu/kernel/brightness.cpp Applied FP16 intrinsics and added boundary checks to brightness kernels
Comments suppressed due to low confidence (1)

src/modules/tensor/cpu/kernel/vignette.cpp:1076

  • [nitpick] New AVX2 intrinsics for FP16 load/store have been introduced here; consider adding or updating unit tests to cover both aligned and unaligned lengths and verify correctness of the FP16 paths.
                    rpp_simd_load(rpp_load24_f16pkd3_to_f32pln3_avx, srcPtrTemp, p);                                     // simd loads

@kiritigowda kiritigowda self-assigned this Jul 21, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Jul 22, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #579      +/-   ##
===========================================
- Coverage    87.97%   87.95%   -0.02%     
===========================================
  Files          190      190              
  Lines        80802    80675     -127     
===========================================
- Hits         71080    70951     -129     
- Misses        9722     9724       +2     
Files with missing lines Coverage Δ
src/modules/tensor/cpu/kernel/brightness.cpp 94.07% <100.00%> (-0.24%) ⬇️
src/modules/tensor/cpu/kernel/contrast.cpp 100.00% <100.00%> (ø)
src/modules/tensor/cpu/kernel/magnitude.cpp 100.00% <100.00%> (ø)
src/modules/tensor/cpu/kernel/vignette.cpp 100.00% <100.00%> (ø)

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kiritigowda kiritigowda merged commit efa5f69 into ROCm:develop Jul 25, 2025
22 checks passed
ManasaDattaT pushed a commit to ManasaDattaT/rpp that referenced this pull request Dec 19, 2025
* Updates for 4 kernels

* Update the brightness pixel check

---------

Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com>
Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants