F16 variants - Update loads and stores to AVX2 - Group 3 by r-abishek · Pull Request #579 · ROCm/rpp

r-abishek · 2025-07-09T05:50:13Z

Replacement of scalar load/store and conversion to FP32, with AVX2 intrinsics - no additions or removals to external user API.
4-12% improvements in performance for the updated kernels for the FP16 bit depth.
F16 Load/Store updates for vignette, magnitude, contrast, brightness.

…nels F16 load/store updates for 4 kernels

Copilot

Pull Request Overview

This PR enhances performance of FP16 processing kernels by replacing scalar conversions with AVX2 intrinsics for loads and stores, and adds boundary checks in brightness routines without altering the external API.

Swapped out scalar Rpp32f conversion loops for direct FP16-to-FP32 and FP32-to-FP16 AVX2 intrinsics in vignette, magnitude, contrast, and brightness kernels.
Inserted rpp_pixel_check_0to1 boundary checks for brightness in both f32 and f16 paths.
No changes to public API; purely internal performance optimizations.

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
src/modules/tensor/cpu/kernel/vignette.cpp	Replaced scalar loops with `rpp_simd_load/store` FP16 intrinsics
src/modules/tensor/cpu/kernel/magnitude.cpp	Updated magnitude kernels to use FP16 AVX2 load/store intrinsics
src/modules/tensor/cpu/kernel/contrast.cpp	Swapped in FP16 intrinsics for contrast kernels
src/modules/tensor/cpu/kernel/brightness.cpp	Applied FP16 intrinsics and added boundary checks to brightness kernels

Comments suppressed due to low confidence (1)

src/modules/tensor/cpu/kernel/vignette.cpp:1076

[nitpick] New AVX2 intrinsics for FP16 load/store have been introduced here; consider adding or updating unit tests to cover both aligned and unaligned lengths and verify correctness of the FP16 paths.

                    rpp_simd_load(rpp_load24_f16pkd3_to_f32pln3_avx, srcPtrTemp, p);                                     // simd loads

src/modules/tensor/cpu/kernel/brightness.cpp

F16 Review Comments - Update the brightness pixel check - SSE Code

codecov · 2025-07-22T00:23:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Additional details and impacted files

@@             Coverage Diff             @@
##           develop     #579      +/-   ##
===========================================
- Coverage    87.97%   87.95%   -0.02%     
===========================================
  Files          190      190              
  Lines        80802    80675     -127     
===========================================
- Hits         71080    70951     -129     
- Misses        9722     9724       +2

Files with missing lines	Coverage Δ
src/modules/tensor/cpu/kernel/brightness.cpp	`94.07% <100.00%> (-0.24%)`	⬇️
src/modules/tensor/cpu/kernel/contrast.cpp	`100.00% <100.00%> (ø)`
src/modules/tensor/cpu/kernel/magnitude.cpp	`100.00% <100.00%> (ø)`
src/modules/tensor/cpu/kernel/vignette.cpp	`100.00% <100.00%> (ø)`

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

* Updates for 4 kernels * Update the brightness pixel check --------- Co-authored-by: Srihari-mcw <srihari@multicorewareinc.com> Co-authored-by: Kiriti Gowda <kiritigowda@gmail.com>

Srihari-mcw and others added 2 commits June 8, 2025 21:58

Updates for 4 kernels

8b578f5

Merge pull request #444 from Srihari-mcw/f16_load_store_updates_4_ker…

73117d0

…nels F16 load/store updates for 4 kernels

r-abishek requested a review from Copilot July 9, 2025 05:50

r-abishek added the ci:precheckin label Jul 9, 2025

Copilot AI reviewed Jul 9, 2025

View reviewed changes

src/modules/tensor/cpu/kernel/brightness.cpp Outdated Show resolved Hide resolved

src/modules/tensor/cpu/kernel/brightness.cpp Show resolved Hide resolved

Srihari-mcw and others added 3 commits July 11, 2025 17:05

Update the brightness pixel check

a8ed284

Merge branch 'develop' into ar/opt_f16_loads_stores_3

0d387a1

Merge branch 'develop' into ar/opt_f16_loads_stores_3

796dea0

kiritigowda self-assigned this Jul 21, 2025

kiritigowda requested a review from LakshmiKumar23 July 21, 2025 17:54

kiritigowda and others added 2 commits July 21, 2025 11:11

Merge branch 'develop' into ar/opt_f16_loads_stores_3

660e52a

Merge pull request #462 from Srihari-mcw/brightness_f16_fixes

0aa67fc

F16 Review Comments - Update the brightness pixel check - SSE Code

kiritigowda added 2 commits July 22, 2025 16:03

Merge branch 'develop' into ar/opt_f16_loads_stores_3

e9515a5

Merge branch 'develop' into ar/opt_f16_loads_stores_3

cb6f690

rrawther approved these changes Jul 23, 2025

View reviewed changes

Merge branch 'develop' into ar/opt_f16_loads_stores_3

9fb15e1

kiritigowda merged commit efa5f69 into ROCm:develop Jul 25, 2025
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

F16 variants - Update loads and stores to AVX2 - Group 3#579

F16 variants - Update loads and stores to AVX2 - Group 3#579
kiritigowda merged 10 commits intoROCm:developfrom
r-abishek:ar/opt_f16_loads_stores_3

r-abishek commented Jul 9, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jul 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

r-abishek commented Jul 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Jul 22, 2025 •

edited

Loading