Skip to content

Crop Mirror Normalize - HOST Tensor AVX2 Support and Vectorized HIP support for U8 - F32 / F16#92

Merged
r-abishek merged 18 commits intor-abishek:ar/opt_cmn_u8_ffrom
sampath1117:sr/opt_cmn_u8_f
Sep 20, 2022
Merged

Crop Mirror Normalize - HOST Tensor AVX2 Support and Vectorized HIP support for U8 - F32 / F16#92
r-abishek merged 18 commits intor-abishek:ar/opt_cmn_u8_ffrom
sampath1117:sr/opt_cmn_u8_f

Conversation

@sampath1117
Copy link
Copy Markdown

*Added for U8 - F32, U8 - F16 Variants
*Made changes to support normalization per each R, G, B channel

@r-abishek r-abishek changed the base branch from master to ar/opt_cmn_u8_f August 23, 2022 03:25
Copy link
Copy Markdown
Owner

@r-abishek r-abishek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please go over these changes


inline void rpp_store48_f32pln3_to_f32pkd3_avx(Rpp32f *dstPtr, __m256 *p)
{
__m128 p128[8];
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You aren't using 8 registers below

_mm256_storeu_ps(dstPtrB + 8, p[5]);
}

inline void rpp_store48_f32pln3_to_f32pkd3_avx(Rpp32f *dstPtr, __m256 *p)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, could you just call the rpp_store24_f32pln3_to_f32pkd3_avx() two times to process all 48

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot use it because since the destination R G B AVX registers are not stored in continuous indices
In p[6]
p[0], p[2], p[4] - R,G,B registers for 0 - 7 locations
p[1], p[3], p[5] - R,G,B registers for 8 - 15 locations

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sampath1117 Could we do the shuffle when its in u8?

roiType,
rpp::deref(rppHandle));
}
else if ((srcDescPtr->dataType == RpptDataType::U8) && (dstDescPtr->dataType == RpptDataType::F32))
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the same order of calls for host and gpu, either is fine.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

dstIdx += dstStridesNCH.y;

cmnParams_f8.f4[0] = (float4)meanTensor[incrementPerImage + 1]; // Get mean for G channel
cmnParams_f8.f4[1] = (float4)(1 / stdDevTensor[incrementPerImage + 1]); // Get (1 / stdDev) for G channel
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the comments for the R channel too

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@r-abishek r-abishek added the enhancement New feature or request label Aug 23, 2022
@r-abishek r-abishek added this to the sow7ms4 milestone Aug 23, 2022
@sampath1117 sampath1117 changed the base branch from ar/opt_cmn_u8_f to master August 23, 2022 06:26
@sampath1117 sampath1117 changed the base branch from master to ar/opt_cmn_u8_f August 23, 2022 06:26
Copy link
Copy Markdown
Owner

@r-abishek r-abishek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

@r-abishek r-abishek merged commit 5d90ad6 into r-abishek:ar/opt_cmn_u8_f Sep 20, 2022
ManasaDattaT pushed a commit to ManasaDattaT/rpp that referenced this pull request Dec 19, 2025
* added avx support for exposure u8 variant

* added avx support for f32,f16,i8 variants of exposure

added exposure case in performance tests

* updated the description for exposure tensor function

* cleanup

* temporary changes to resolve merge conflicts

* code cleanup

* removed additional clock() for exposure case in test_suite

* added vectorized hip support for exposure kernel

* fixed bugs in exposure hip pkd3 variant

* fixed minor bug in pln1 case

* restructured exposure hip vectorized codes

* Add ci

* resolved merge conflicts and updated codesaccording to new file structure

* updated exposure hip codes according to new file structure

* minor formatting changes

* minor formatting changes

* Remove ci

Co-authored-by: sampath1117 <sampath.rachumallu@multicorewareinc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants