Skip to content

Resize Bilinear : Tensor Code clean up#36

Merged
r-abishek merged 12 commits intor-abishek:ar/resize_tensorfrom
fiona-gladwin:fg/bilinear_resize_tensor_mods
Dec 22, 2021
Merged

Resize Bilinear : Tensor Code clean up#36
r-abishek merged 12 commits intor-abishek:ar/resize_tensorfrom
fiona-gladwin:fg/bilinear_resize_tensor_mods

Conversation

@fiona-gladwin
Copy link
Copy Markdown

  • Modify the code to match the new standard and minor changes

@fiona-gladwin fiona-gladwin changed the title Resize Bilinear : Code clean up Resize Bilinear : Tensor Code clean up Dec 14, 2021
@r-abishek r-abishek self-requested a review December 16, 2021 08:11
Copy link
Copy Markdown
Owner

@r-abishek r-abishek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor changes on re-using functions and conventions.

@@ -1504,7 +1438,7 @@ inline RppStatus rpp_store4_f32pln3_to_u8pkd3(Rpp8u* dstPtr, __m128* p)
__m128 p1 = _mm_unpacklo_ps(p[0], p[1]);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the naming convention is changing here a little. Lets follow the same convention like elsewhere. Isn't this function the same as the rpp_store12_f32pln3_to_f32pkd3() function above, except it stores in U8. So it should ideally be called rpp_store12_f32pln3_to_u8pkd3().

@@ -1513,30 +1447,92 @@ inline RppStatus rpp_store4_f32pln3_to_u8pkd3(Rpp8u* dstPtr, __m128* p)

inline RppStatus rpp_store4_f32pln3_to_u8pln3(Rpp8u* dstRPtr, Rpp8u* dstGPtr, Rpp8u* dstBPtr, __m128* p)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is similar to rpp_store12_f32pln3_to_f32pln3() so lets reference with the number 12. 4 for each color.

}

inline RppStatus rpp_bilinear_load4_f16pkd3_to_f32pln3(Rpp16f* srcPtrTopRow, Rpp16f* srcPtrBottomRow, Rpp32u* loc, __m128* p)
inline RppStatus rpp_store4_f32pln1_to_f32pln1(Rpp32f* dstPtr, __m128 p)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is already available as rpp_store4_f32_to_f32(). Please call the same. Similar comment for the two functions above this.
Your rpp_store4_f32pln3_to_f32pln3() is already available as rpp_store12_f32pln3_to_f32pln3().
Your rpp_store4_f32pln3_to_f32pkd3() is already available as rpp_store12_f32pln3_to_f32pkd3().

compute_resize_src_loc_sse(pDstLoc, pWRatio, pWidthLimit, srcLocCF, &pWeightParams[2], true);
compute_bilinear_coefficients_sse(pWeightParams, pBilinearCoeffs);

rpp_simd_load(rpp_bilinear_load4_f16pkd3_to_f32pln3, srcRowPtrsForInterp, srcLocCF, pRow);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for F16, lets actually get rid of any additional functions like rpp_bilinear_load4_f16pkd3_to_f32pln3() or the store func. Lets use the same for loops here in this file along with same F32 calls to rpp_bilinear_load4_f32pkd3_to_f32pln3() and the store func for f32. Like -

for(int cnt = 0; cnt < 12; cnt++)
{
*(srcPtrTemp_ps + cnt) = (Rpp32f) *(srcPtrTemp + cnt);
}
__m128 p[4];
rpp_simd_load(rpp_load12_f32pkd3_to_f32pln3, srcPtrTemp_ps, p); // simd loads
compute_color_cast_12_host(p, pMul, pAdd); // color_cast adjustment
rpp_simd_store(rpp_store12_f32pln3_to_f32pln3, dstPtrTemp_ps, dstPtrTemp_ps + 4, dstPtrTemp_ps + 8, p); // simd stores
for(int cnt = 0; cnt < 4; cnt++)
{
*(dstPtrTempR + cnt) = (Rpp16f) *(dstPtrTemp_ps + cnt);
*(dstPtrTempG + cnt) = (Rpp16f) *(dstPtrTemp_ps + 4 + cnt);
*(dstPtrTempB + cnt) = (Rpp16f) *(dstPtrTemp_ps + 8 + cnt);
}

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is since the f16/f32 type cast is quite suboptimal and we'll be changing the whole mechanism for all functions in the near future.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rpp_bilinear_load4_f32pkd3_to_f32pln3() will actually load the pixels based on the location and store it in vectors.
rpp_bilinear_load4_f16pkd3_to_f32pln3() would load the pixels based on location value then cast it to float and store in vectors.
The required source pixels are not loaded from contiguous memory but is influenced by the location factor.
So for loops followed by rpp_bilinear_load4_f32pkd3_to_f32pln3() would not work.

@r-abishek r-abishek added the enhancement New feature or request label Dec 16, 2021
@r-abishek r-abishek added this to the sow6ms5 milestone Dec 16, 2021
Copy link
Copy Markdown
Owner

@r-abishek r-abishek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, merging.

@r-abishek r-abishek merged commit 30b56a1 into r-abishek:ar/resize_tensor Dec 22, 2021
@fiona-gladwin fiona-gladwin deleted the fg/bilinear_resize_tensor_mods branch December 22, 2021 18:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants