Resize Bilinear interpolation - Tensor support by fiona-gladwin · Pull Request #32 · r-abishek/rpp

fiona-gladwin · 2021-11-30T04:53:51Z

Add U8 resize bilinear interpolation support.

…mplementation

…nto fg/tensor_resize

r-abishek

Please look into these changes

r-abishek · 2021-11-30T18:27:06Z

utilities/rpp-unittests/HOST_NEW/Tensor_host_pkd3.cpp


-    srcDescPtr->w = ((srcDescPtr->w / 8) * 8) + 8;
-    dstDescPtr->w = ((dstDescPtr->w / 8) * 8) + 8;
+    // srcDescPtr->w = ((srcDescPtr->w / 8) * 8) + 8;


Can we leave it uncommented so as to ensure that it works with a random size too?

r-abishek · 2021-11-30T18:27:43Z

utilities/rpp-unittests/HOST_NEW/Tensor_host_pkd3.cpp


        char temp[1000];
-        strcpy(temp, dst);
+        strcpy(temp, "dst_ten_pkd_");


Why this change?

r-abishek · 2021-11-30T23:11:17Z

src/modules/cpu/host_tensor_augmentations.hpp

+                for (; vectorLoopCount < alignedLength; vectorLoopCount+=4)
+                {
+                    p0 = _mm_setr_ps(vectorLoopCount, vectorLoopCount + 1, vectorLoopCount + 2, vectorLoopCount + 3);
+                    p0 = _mm_mul_ps(p0, pWRatio);


Instead of the setr_ps inside the loop, see if you can do a px0 = _mm_setr_epi32(0, 1, 2, 3); and pxFour = _mm_set1_epi32(4); outside this loop. Check if replacing this line to px0 = _mm_add_epi32(px0, pxFour); p0 = _mm_castsi128_ps(px0); works

Did you try this?

r-abishek · 2021-11-30T23:32:45Z

src/include/cpu/rpp_cpu_simd.hpp

    }
 }

+inline RppStatus rpp_resize_load4_u8pkd3_to_f32pln3(Rpp8u* srcPtrTopRow, Rpp8u* srcPtrBottomRow, Rpp32u* loc, __m128* p)


This kind of double row load could be useful to any functionality using bilinear interpolation. Probably call it bilinear load instead of having a tie to resize.

r-abishek · 2021-11-30T23:54:03Z

src/include/cpu/rpp_cpu_simd.hpp

+    px[1] = _mm_loadu_si128((__m128i *)(srcPtrTopRow + loc[1]));                /* Load the 1st RGBR pixel from TopRow*/
+    px[2] = _mm_loadu_si128((__m128i *)(srcPtrTopRow + loc[2]));                /* Load the 1st RGBR pixel from TopRow*/
+    px[3] = _mm_loadu_si128((__m128i *)(srcPtrTopRow + loc[3]));                /* Load the 1st RGBR pixel from TopRow*/
+    px[0] = _mm_unpacklo_epi8(px[0], px[2]); // R1 R3 G1 G3 B1 B3 R2 R3 G2 G3 B B


Few things on the comments:
Follow the same /* comment style like other simd helper functions.
Please add spacing after /* and before */.
Also add a minimum of 4 spaces after the semicolon and before you start your comment.
Please see if you can use some similar commenting style from the previous simd helpers for uniformity in all our helpers.

r-abishek · 2021-11-30T23:54:51Z

src/include/cpu/rpp_cpu_simd.hpp

+    p[9] = _mm_cvtepi32_ps(_mm_shuffle_epi8(px[2], maskp4));
+    p[10] = _mm_cvtepi32_ps(_mm_shuffle_epi8(px[3], maskp1));
+    p[11] = _mm_cvtepi32_ps(_mm_shuffle_epi8(px[3], maskp2));
+    return RPP_SUCCESS;


Give one line space before all the return RPP_SUCCESS; lines

r-abishek · 2021-12-01T00:20:47Z

src/modules/cpu/host_tensor_augmentations.hpp

+        __m128 pChannel = _mm_set1_ps((float) srcDescPtr->c);
+        __m128 p0, p2, p4, p5, p6, p7, pColFloor;
+        __m128i pxColFloor;
+        __m128 pRow[16], pPixels[4];


Not sure if you are using all 16 and 4 allocated?

Modify test suite

* Erode initial commit * Optimize erode for hip * Add hip tensor test suite support for erode * Dilate initial commit * Optimize dilate for hip * Add hip tensor test suite support for dilate * Add hip perf tests for erode and dilate * Add conditions for borders * Modify ROI override comment for better clarity * Modify comment style

Modify unit tests and performance tests Add ImagePatch struct Introduce dstImageSizes param for resize Fix the pkd version wrt changes

Add F32 resize HOST calls in test suite

…Cm#81) * Tensor color_twist initial commit for U8 * Merge branch 'rr/color_twist_new' of https://github.com/rrawther/rpp into ar/opt_color_twist_host * Remove two mul instructions * Add cast to ps * Add AVX2 version for U8PKD3 color_twist * Fix all variants under U8 * Common formatting change * Add support for f32 * Add f16 support * Add i8 support * Minor build fix * Add tensor color_twist unittests * Add tensor color_twist performancetests * Fix codacy issue * Fix codacy issue * Remove commented code * Change alignedLength computation in tensor_augmentations * Modify comment style * Fix for BatchPD PLN3 color_twist * Increase static allocation * Fix for squiggly lines

Resize F32 host tensor

Add I8 and F16 resize calls in test suite

* Remove OCL_COMPILE and reformat * Minor #ifdef change

r-abishek

Please take a look at restructuring and this second round of changes

r-abishek · 2021-12-08T07:15:50Z

include/rppt_tensor_augmentations.h

+// *param[in] srcDesc source tensor descriptor
+// *param[out] dstPtr destination tensor memory
+// *param[in] dstDesc destination tensor descriptor
+// *param[in] roiTensorSrc ROI data for each image in source tensor (2D tensor of size batchSize * 4, in either format - XYWH(xy.x, xy.y, roiWidth, roiHeight) or LTRB(lt.x, lt.y, rb.x, rb.y))


Please describe the dstImgSizes param here with language similar to the alpha for brightness or other parm in this file.

r-abishek · 2021-12-08T07:20:37Z

src/modules/rppt_tensor_augmentations.cpp

+    if ((srcDescPtr->dataType == RpptDataType::U8) && (dstDescPtr->dataType == RpptDataType::U8))
+    {
+        resize_u8_u8_host_tensor(static_cast<Rpp8u*>(srcPtr) + srcDescPtr->offsetInBytes,
+                                srcDescPtr,


Minor formatting change to maintain tabbing such that all params start at the character after "(". Add one space for L869-L876. F16/F32/I8 are fine.

r-abishek · 2021-12-08T07:24:43Z

src/modules/cpu/host_tensor_augmentations.hpp

+        dstPtrChannel = dstPtrImage;
+        Rpp32f srcLocationRow, srcLocationColumn, pixel;
+        Rpp32s srcLocationRowFloor, srcLocationColumnFloor;
+        Rpp32u alignedLength = (dstImgSize[batchCount].width / 4) * 4;


Can you try using Rpp32u alignedLength = dstImgSize[batchCount].width & ~3; here

r-abishek · 2021-12-08T23:51:12Z

src/modules/cpu/host_tensor_augmentations.hpp

+                for (; vectorLoopCount < alignedLength; vectorLoopCount+=4)
+                {
+                    p0 = _mm_setr_ps(vectorLoopCount, vectorLoopCount + 1, vectorLoopCount + 2, vectorLoopCount + 3);
+                    p0 = _mm_mul_ps(p0, pWRatio);


Did you try this?

r-abishek · 2021-12-09T01:22:21Z

src/modules/cpu/host_tensor_augmentations.hpp

+
+                for (; vectorLoopCount < alignedLength; vectorLoopCount+=4)
+                {
+                    p0 = _mm_setr_ps(vectorLoopCount, vectorLoopCount + 1, vectorLoopCount + 2, vectorLoopCount + 3);


Please try modularizing L5368-L5382 inside rpp_cpu_common.hpp under compute_resize_column_loc() and reuse it for L5499-L5512, L5613-L5626 and if possible for all other bit depths.

r-abishek · 2021-12-09T01:41:04Z

src/modules/cpu/host_tensor_augmentations.hpp

+                        Rpp32f weightedWidth = srcLocationColumn - srcLocationColumnFloor;
+                        Rpp32f weightedWidth1 = 1 - weightedWidth;
+                        srcLocationColumnFloor = (srcLocationColumnFloor > widthLimit) ? widthLimit : srcLocationColumnFloor;
+                        *dstPtrRow++ = (Rpp32f)(((*(srcPtrTopRowR + srcLocationColumnFloor)) * weightedHeight1 * weightedWidth1)


You can possibly do the m1/m2/m3/m4 here too?

r-abishek · 2021-12-09T01:47:20Z

src/include/cpu/rpp_cpu_simd.hpp

+inline RppStatus rpp_bilinear_load4_f32pkd3_to_f32pln3(Rpp32f* srcPtrTopRow, Rpp32f* srcPtrBottomRow, Rpp32u* loc, __m128* p)
+{
+    __m128 pTemp[8];
+    pTemp[0] = _mm_loadu_ps((float *)(srcPtrTopRow + loc[0]));


Why this typecast?

r-abishek · 2021-12-09T01:47:48Z

src/include/cpu/rpp_cpu_simd.hpp

+inline RppStatus rpp_bilinear_load4_f32pln_to_f32pln(Rpp32f* srcPtrTopRow, Rpp32f* srcPtrBottomRow, Rpp32u* loc, __m128* p)
+{
+    __m128 pTemp[8];
+    pTemp[0] = _mm_loadu_ps((float *)(srcPtrTopRow + loc[0]));


Same here - why this typecast?

r-abishek · 2021-12-09T02:04:51Z

src/include/cpu/rpp_cpu_simd.hpp

+    p[9] = _mm_unpacklo_ps(pTemp[4], pTemp[5]);
+    p[10] = _mm_unpackhi_ps(pTemp[4], pTemp[5]);
+    p[11] = _mm_unpacklo_ps(pTemp[6], pTemp[7]);
+    return RPP_SUCCESS;


One line space before all return RPP_SUCCESS;

r-abishek · 2021-12-09T02:11:04Z

src/include/cpu/rpp_cpu_simd.hpp

+    return RPP_SUCCESS;
+}
+
+inline RppStatus rpp_store4_f32pln_to_u8pln(Rpp8u* dstPtr, __m128 p)


Is this just for pln1 to pln1? Then change to say f32pln1_to_u8pln1

* Add crop for hip * Add support for crop * Add test suite for crop * Fix #ifdefs * Remove unused variable * Remove unused variable * Revert hip headers * Remove unnecessary hip includes * Add two versions of code - slower removal pending * Remove usage of buffer_copy helpers

r-abishek

Another round of changes. @fiona-gladwin please take a look.

r-abishek · 2021-12-10T02:34:28Z

src/include/cpu/rpp_cpu_common.hpp

    return product;
 }

+inline void compute_resize_src_loc(Rpp32s dstLocation, Rpp32f scale, Rpp32u limit, Rpp32s &srcLoc, Rpp32f &weight, bool hasRGBChannels=false)


Move this compute_resize_src_loc() as the first function under the compute section at L2094 in this file

r-abishek · 2021-12-10T02:40:37Z

src/modules/cpu/host_tensor_augmentations.hpp

+            for(int i = 0; i < dstImgSize[batchCount].height; i++)
+            {
+                compute_resize_src_loc(i, hRatio, heightLimit, srcLocationRowFloor, weightedHeight);
+                weightedHeight1 = 1 - weightedHeight;


Why is the weightedHeight1 alone computed outside?

r-abishek · 2021-12-10T02:41:08Z

src/modules/cpu/host_tensor_augmentations.hpp

+                {
+                    pDstLoc = _mm_setr_ps(vectorLoopCount, vectorLoopCount + 1, vectorLoopCount + 2, vectorLoopCount + 3);
+                    compute_resize_src_loc_sse(pDstLoc, pWRatio, pWidthLimit, srcLocCF, pWeightedWidth, true);
+                    pWeightedWidth1  = _mm_sub_ps(pOne, pWeightedWidth);


Why is the pWeightedWidth1 alone computed outside?

r-abishek · 2021-12-10T03:03:37Z

src/modules/cpu/host_tensor_augmentations.hpp

+                    for (; vectorLoopCount < dstImgSize[batchCount].width; vectorLoopCount++)
+                    {
+                        compute_resize_src_loc(vectorLoopCount, wRatio, widthLimit, srcLocationColumnFloor, weightedWidth, true);
+                        weightedWidth1 = 1 - weightedWidth;


Compute weightedWidth1 inside too

r-abishek · 2021-12-10T03:05:47Z

src/modules/cpu/host_tensor_augmentations.hpp

+                    {
+                        compute_resize_src_loc(vectorLoopCount, wRatio, widthLimit, srcLocationColumnFloor, weightedWidth, true);
+                        weightedWidth1 = 1 - weightedWidth;
+                        Rpp32f m1 = weightedHeight1 * weightedWidth1;


You could have a templated compute_bilinear_interpolation() just like the compute_bilinear_interpolation_sse() for L5392-L5407 and reuse everywhere

r-abishek · 2021-12-10T03:18:18Z

src/include/cpu/rpp_cpu_simd.hpp

+    return RPP_SUCCESS;
+}
+
+inline void compute_resize_src_loc_sse(__m128 &dstLoc, __m128 &scale, __m128 &limit, Rpp32u *srcLoc, __m128 &weight, bool hasRGBChannels = false)


Lets move all these compute functions to rpp_cpu_common under the compute section. Use the rpp_cpu_simd only for generalized load/store routine pairs.

Restructure codes

…te-Libraries/rpp into fg/tensor_resize

r-abishek

Looks much better now

fiona-gladwin added 10 commits November 22, 2021 11:01

Add API for tensor implementation of resize

e79a3d6

Add basic U8 Host pkd version

9cad4a1

Add SSE optimizations for Tensor U8 pkd version

89dccf1

Add U8 tensor PKD to PLN support

5b4843c

Add resize U8 PLN to PKD and PLN to PLN basic tensor implementation

d10798d

Add SSE optimization for U8 PLN to PKD and PLN to PLN tensor resize i…

31365dc

…mplementation

Merge branch 'ar/resize_tensor' of https://github.com/r-abishek/rpp i…

f300c0f

…nto fg/tensor_resize

Add basic F32 tensor resize support

b95ab71

Add SSE optimization for F32 resize bilinear interpolation

d4fdc77

Minor changes

26ab6a3

r-abishek assigned fiona-gladwin Dec 1, 2021

r-abishek added the enhancement New feature or request label Dec 1, 2021

r-abishek added this to the sow6ms5 milestone Dec 1, 2021

r-abishek requested changes Dec 1, 2021

View reviewed changes

r-abishek reviewed Dec 1, 2021

View reviewed changes

fiona-gladwin and others added 15 commits November 30, 2021 23:19

Add U8 Host PLN1 tensor support

c79d5f1

Minor changes

181f373

Merge branch 'fg/tensor_resize' into fg/tensor_resize_f32

77f1a4d

Add F32 PLN1 HOST tensor support

bb2a2c5

Modify test suite

Merge branch 'fg/tensor_resize_f32' into fg/tensor_resize

bc94aff

Modify test suite

b0e2ae5

Modify unit tests and performance tests Add ImagePatch struct Introduce dstImageSizes param for resize Fix the pkd version wrt changes

Merge branch 'fg/tensor_resize' into fg/tensor_resize_test

5f12ede

Fix F32 resize HOST version

5466bf4

Add F32 resize HOST calls in test suite

Merge branch 'fg/tensor_resize_test' into fg/tensor_resize

4c110e8

Add support for I8 data type in resize bilinear interpolation

720447a

Minor changes

bf3d78c

Resize F32 host tensor

Add F16 resize bilinear interpolation support HOST

1449a5a

Add I8 and F16 resize calls in test suite

Merge branch 'fg/tensor_resize_i8_f16' into fg/tensor_resize

825761e

Tensor API - Remove OCL_COMPILE and reformat (ROCm#82)

5cfe0d4

* Remove OCL_COMPILE and reformat * Minor #ifdef change

r-abishek requested changes Dec 9, 2021

View reviewed changes

fiona-gladwin and others added 5 commits December 9, 2021 00:50

Minor changes

9f46cf7

Add API to compute source location

2eecd86

Add API to calculate bilinear coefficients

0e35634

Add API to compute dst pixels for bilinear interpolation

1d6cfa4

r-abishek requested changes Dec 10, 2021

View reviewed changes

fiona-gladwin added 4 commits December 9, 2021 20:51

Minor changes

bd7ea40

Add API for bilinear interpolation computation

b633e9f

Restructure codes

Merge branch 'master' of https://github.com/GPUOpen-ProfessionalCompu…

e68bd64

…te-Libraries/rpp into fg/tensor_resize

Minor changes

82a5a68

r-abishek approved these changes Dec 11, 2021

View reviewed changes

r-abishek merged commit b120d67 into r-abishek:ar/resize_tensor Dec 11, 2021

fiona-gladwin deleted the fg/tensor_resize branch December 13, 2021 06:36

Conversation

fiona-gladwin commented Nov 30, 2021

Uh oh!

r-abishek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r-abishek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r-abishek left a comment

Choose a reason for hiding this comment

Uh oh!

r-abishek Dec 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

r-abishek left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

r-abishek Dec 10, 2021 •

edited

Loading