Crop Mirror Normalize - HOST Tensor AVX2 Support and Vectorized HIP support for U8 - F32 / F16#92
Merged
r-abishek merged 18 commits intor-abishek:ar/opt_cmn_u8_ffrom Sep 20, 2022
Conversation
…v support for crop mirror normalize
r-abishek
requested changes
Aug 23, 2022
Owner
r-abishek
left a comment
There was a problem hiding this comment.
Please go over these changes
src/include/cpu/rpp_cpu_simd.hpp
Outdated
|
|
||
| inline void rpp_store48_f32pln3_to_f32pkd3_avx(Rpp32f *dstPtr, __m256 *p) | ||
| { | ||
| __m128 p128[8]; |
Owner
There was a problem hiding this comment.
You aren't using 8 registers below
| _mm256_storeu_ps(dstPtrB + 8, p[5]); | ||
| } | ||
|
|
||
| inline void rpp_store48_f32pln3_to_f32pkd3_avx(Rpp32f *dstPtr, __m256 *p) |
Owner
There was a problem hiding this comment.
Also, could you just call the rpp_store24_f32pln3_to_f32pkd3_avx() two times to process all 48
Author
There was a problem hiding this comment.
We cannot use it because since the destination R G B AVX registers are not stored in continuous indices
In p[6]
p[0], p[2], p[4] - R,G,B registers for 0 - 7 locations
p[1], p[3], p[5] - R,G,B registers for 8 - 15 locations
| roiType, | ||
| rpp::deref(rppHandle)); | ||
| } | ||
| else if ((srcDescPtr->dataType == RpptDataType::U8) && (dstDescPtr->dataType == RpptDataType::F32)) |
Owner
There was a problem hiding this comment.
Keep the same order of calls for host and gpu, either is fine.
| dstIdx += dstStridesNCH.y; | ||
|
|
||
| cmnParams_f8.f4[0] = (float4)meanTensor[incrementPerImage + 1]; // Get mean for G channel | ||
| cmnParams_f8.f4[1] = (float4)(1 / stdDevTensor[incrementPerImage + 1]); // Get (1 / stdDev) for G channel |
Owner
There was a problem hiding this comment.
Add the comments for the R channel too
f57d004 to
f010000
Compare
f85c0c4 to
21bbecf
Compare
3b17725 to
15450ab
Compare
ManasaDattaT
pushed a commit
to ManasaDattaT/rpp
that referenced
this pull request
Dec 19, 2025
* added avx support for exposure u8 variant * added avx support for f32,f16,i8 variants of exposure added exposure case in performance tests * updated the description for exposure tensor function * cleanup * temporary changes to resolve merge conflicts * code cleanup * removed additional clock() for exposure case in test_suite * added vectorized hip support for exposure kernel * fixed bugs in exposure hip pkd3 variant * fixed minor bug in pln1 case * restructured exposure hip vectorized codes * Add ci * resolved merge conflicts and updated codesaccording to new file structure * updated exposure hip codes according to new file structure * minor formatting changes * minor formatting changes * Remove ci Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
*Added for U8 - F32, U8 - F16 Variants
*Made changes to support normalization per each R, G, B channel