Resize Bilinear : Tensor Code clean up by fiona-gladwin · Pull Request #36 · r-abishek/rpp

fiona-gladwin · 2021-12-14T17:44:23Z

Modify the code to match the new standard and minor changes

Minor changes

r-abishek

A few minor changes on re-using functions and conventions.

r-abishek · 2021-12-16T08:14:35Z

src/include/cpu/rpp_cpu_simd.hpp

@@ -1504,7 +1438,7 @@ inline RppStatus rpp_store4_f32pln3_to_u8pkd3(Rpp8u* dstPtr, __m128* p)
    __m128 p1 = _mm_unpacklo_ps(p[0], p[1]);


I think the naming convention is changing here a little. Lets follow the same convention like elsewhere. Isn't this function the same as the rpp_store12_f32pln3_to_f32pkd3() function above, except it stores in U8. So it should ideally be called rpp_store12_f32pln3_to_u8pkd3().

r-abishek · 2021-12-16T08:15:11Z

src/include/cpu/rpp_cpu_simd.hpp

@@ -1513,30 +1447,92 @@ inline RppStatus rpp_store4_f32pln3_to_u8pkd3(Rpp8u* dstPtr, __m128* p)

 inline RppStatus rpp_store4_f32pln3_to_u8pln3(Rpp8u* dstRPtr, Rpp8u* dstGPtr, Rpp8u* dstBPtr, __m128* p)


This one is similar to rpp_store12_f32pln3_to_f32pln3() so lets reference with the number 12. 4 for each color.

r-abishek · 2021-12-16T08:15:58Z

src/include/cpu/rpp_cpu_simd.hpp

 }

-inline RppStatus rpp_bilinear_load4_f16pkd3_to_f32pln3(Rpp16f* srcPtrTopRow, Rpp16f* srcPtrBottomRow, Rpp32u* loc, __m128* p)
+inline RppStatus rpp_store4_f32pln1_to_f32pln1(Rpp32f* dstPtr, __m128 p)


This function is already available as rpp_store4_f32_to_f32(). Please call the same. Similar comment for the two functions above this.
Your rpp_store4_f32pln3_to_f32pln3() is already available as rpp_store12_f32pln3_to_f32pln3().
Your rpp_store4_f32pln3_to_f32pkd3() is already available as rpp_store12_f32pln3_to_f32pkd3().

r-abishek · 2021-12-16T08:24:13Z

src/modules/cpu/host_tensor_augmentations.hpp

+                    compute_resize_src_loc_sse(pDstLoc, pWRatio, pWidthLimit, srcLocCF, &pWeightParams[2], true);
+                    compute_bilinear_coefficients_sse(pWeightParams, pBilinearCoeffs);
+
+                    rpp_simd_load(rpp_bilinear_load4_f16pkd3_to_f32pln3, srcRowPtrsForInterp, srcLocCF, pRow);


Just for F16, lets actually get rid of any additional functions like rpp_bilinear_load4_f16pkd3_to_f32pln3() or the store func. Lets use the same for loops here in this file along with same F32 calls to rpp_bilinear_load4_f32pkd3_to_f32pln3() and the store func for f32. Like -

rpp/src/modules/cpu/host_tensor_augmentations.hpp

Lines 6136 to 6152 in 915707d

for(int cnt = 0; cnt < 12; cnt++)

{

*(srcPtrTemp_ps + cnt) = (Rpp32f) *(srcPtrTemp + cnt);

}

__m128 p[4];

rpp_simd_load(rpp_load12_f32pkd3_to_f32pln3, srcPtrTemp_ps, p); // simd loads

compute_color_cast_12_host(p, pMul, pAdd); // color_cast adjustment

rpp_simd_store(rpp_store12_f32pln3_to_f32pln3, dstPtrTemp_ps, dstPtrTemp_ps + 4, dstPtrTemp_ps + 8, p); // simd stores

for(int cnt = 0; cnt < 4; cnt++)

{

*(dstPtrTempR + cnt) = (Rpp16f) *(dstPtrTemp_ps + cnt);

*(dstPtrTempG + cnt) = (Rpp16f) *(dstPtrTemp_ps + 4 + cnt);

*(dstPtrTempB + cnt) = (Rpp16f) *(dstPtrTemp_ps + 8 + cnt);

}

This is since the f16/f32 type cast is quite suboptimal and we'll be changing the whole mechanism for all functions in the near future.

rpp_bilinear_load4_f32pkd3_to_f32pln3() will actually load the pixels based on the location and store it in vectors.
rpp_bilinear_load4_f16pkd3_to_f32pln3() would load the pixels based on location value then cast it to float and store in vectors.
The required source pixels are not loaded from contiguous memory but is influenced by the location factor.
So for loops followed by rpp_bilinear_load4_f32pkd3_to_f32pln3() would not work.

r-abishek

Looks good, merging.

fiona-gladwin added 4 commits December 14, 2021 00:19

Resize u8 nchw->nhwc cleanup

f9022c9

Minor changes

Resize i8 tensor implementation clean up

dea0f7e

Resize F32 and F16 tensor implementation clean up

6fc211d

Minor changes

f9dae1f

fiona-gladwin changed the title ~~Resize Bilinear : Code clean up~~ Resize Bilinear : Tensor Code clean up Dec 14, 2021

fiona-gladwin added 3 commits December 15, 2021 10:50

Fix unit test suite

edb248f

Minor changes

be4ac09

Fix u8 NCHW->NHWC resize bilinear interpolation

915707d

r-abishek self-requested a review December 16, 2021 08:11

r-abishek requested changes Dec 16, 2021

View reviewed changes

r-abishek assigned fiona-gladwin Dec 16, 2021

r-abishek added the enhancement New feature or request label Dec 16, 2021

r-abishek added this to the sow6ms5 milestone Dec 16, 2021

fiona-gladwin added 5 commits December 16, 2021 10:40

Modify API names

983e9f4

Code clean up

8dc0095

Add comments for helper functions

d8e4cea

Minor changes

5676076

Minor change

1e9afd4

r-abishek approved these changes Dec 22, 2021

View reviewed changes

r-abishek merged commit 30b56a1 into r-abishek:ar/resize_tensor Dec 22, 2021

fiona-gladwin deleted the fg/bilinear_resize_tensor_mods branch December 22, 2021 18:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resize Bilinear : Tensor Code clean up#36

Resize Bilinear : Tensor Code clean up#36
r-abishek merged 12 commits intor-abishek:ar/resize_tensorfrom
fiona-gladwin:fg/bilinear_resize_tensor_mods

fiona-gladwin commented Dec 14, 2021

Uh oh!

r-abishek left a comment

Uh oh!

r-abishek Dec 16, 2021

Uh oh!

r-abishek Dec 16, 2021

Uh oh!

r-abishek Dec 16, 2021

Uh oh!

r-abishek Dec 16, 2021

Uh oh!

r-abishek Dec 16, 2021

Uh oh!

fiona-gladwin Dec 16, 2021

Uh oh!

r-abishek left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -1504,7 +1438,7 @@ inline RppStatus rpp_store4_f32pln3_to_u8pkd3(Rpp8u* dstPtr, __m128* p)
		__m128 p1 = _mm_unpacklo_ps(p[0], p[1]);

		@@ -1513,30 +1447,92 @@ inline RppStatus rpp_store4_f32pln3_to_u8pkd3(Rpp8u* dstPtr, __m128* p)

		inline RppStatus rpp_store4_f32pln3_to_u8pln3(Rpp8u* dstRPtr, Rpp8u* dstGPtr, Rpp8u* dstBPtr, __m128* p)

	for(int cnt = 0; cnt < 12; cnt++)
	{
	(srcPtrTemp_ps + cnt) = (Rpp32f) (srcPtrTemp + cnt);
	}

	__m128 p[4];

	rpp_simd_load(rpp_load12_f32pkd3_to_f32pln3, srcPtrTemp_ps, p); // simd loads
	compute_color_cast_12_host(p, pMul, pAdd); // color_cast adjustment
	rpp_simd_store(rpp_store12_f32pln3_to_f32pln3, dstPtrTemp_ps, dstPtrTemp_ps + 4, dstPtrTemp_ps + 8, p); // simd stores

	for(int cnt = 0; cnt < 4; cnt++)
	{
	(dstPtrTempR + cnt) = (Rpp16f) (dstPtrTemp_ps + cnt);
	(dstPtrTempG + cnt) = (Rpp16f) (dstPtrTemp_ps + 4 + cnt);
	(dstPtrTempB + cnt) = (Rpp16f) (dstPtrTemp_ps + 8 + cnt);
	}

Conversation

fiona-gladwin commented Dec 14, 2021

Uh oh!

r-abishek left a comment

Choose a reason for hiding this comment

Uh oh!

r-abishek Dec 16, 2021

Choose a reason for hiding this comment

Uh oh!

r-abishek Dec 16, 2021

Choose a reason for hiding this comment

Uh oh!

r-abishek Dec 16, 2021

Choose a reason for hiding this comment

Uh oh!

r-abishek Dec 16, 2021

Choose a reason for hiding this comment

Uh oh!

r-abishek Dec 16, 2021

Choose a reason for hiding this comment

Uh oh!

fiona-gladwin Dec 16, 2021

Choose a reason for hiding this comment

Uh oh!

r-abishek left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants