Resize Bilinear : Tensor Code clean up#36
Resize Bilinear : Tensor Code clean up#36r-abishek merged 12 commits intor-abishek:ar/resize_tensorfrom
Conversation
fiona-gladwin
commented
Dec 14, 2021
- Modify the code to match the new standard and minor changes
r-abishek
left a comment
There was a problem hiding this comment.
A few minor changes on re-using functions and conventions.
src/include/cpu/rpp_cpu_simd.hpp
Outdated
| @@ -1504,7 +1438,7 @@ inline RppStatus rpp_store4_f32pln3_to_u8pkd3(Rpp8u* dstPtr, __m128* p) | |||
| __m128 p1 = _mm_unpacklo_ps(p[0], p[1]); | |||
There was a problem hiding this comment.
I think the naming convention is changing here a little. Lets follow the same convention like elsewhere. Isn't this function the same as the rpp_store12_f32pln3_to_f32pkd3() function above, except it stores in U8. So it should ideally be called rpp_store12_f32pln3_to_u8pkd3().
src/include/cpu/rpp_cpu_simd.hpp
Outdated
| @@ -1513,30 +1447,92 @@ inline RppStatus rpp_store4_f32pln3_to_u8pkd3(Rpp8u* dstPtr, __m128* p) | |||
|
|
|||
| inline RppStatus rpp_store4_f32pln3_to_u8pln3(Rpp8u* dstRPtr, Rpp8u* dstGPtr, Rpp8u* dstBPtr, __m128* p) | |||
There was a problem hiding this comment.
This one is similar to rpp_store12_f32pln3_to_f32pln3() so lets reference with the number 12. 4 for each color.
| } | ||
|
|
||
| inline RppStatus rpp_bilinear_load4_f16pkd3_to_f32pln3(Rpp16f* srcPtrTopRow, Rpp16f* srcPtrBottomRow, Rpp32u* loc, __m128* p) | ||
| inline RppStatus rpp_store4_f32pln1_to_f32pln1(Rpp32f* dstPtr, __m128 p) |
There was a problem hiding this comment.
This function is already available as rpp_store4_f32_to_f32(). Please call the same. Similar comment for the two functions above this.
Your rpp_store4_f32pln3_to_f32pln3() is already available as rpp_store12_f32pln3_to_f32pln3().
Your rpp_store4_f32pln3_to_f32pkd3() is already available as rpp_store12_f32pln3_to_f32pkd3().
| compute_resize_src_loc_sse(pDstLoc, pWRatio, pWidthLimit, srcLocCF, &pWeightParams[2], true); | ||
| compute_bilinear_coefficients_sse(pWeightParams, pBilinearCoeffs); | ||
|
|
||
| rpp_simd_load(rpp_bilinear_load4_f16pkd3_to_f32pln3, srcRowPtrsForInterp, srcLocCF, pRow); |
There was a problem hiding this comment.
Just for F16, lets actually get rid of any additional functions like rpp_bilinear_load4_f16pkd3_to_f32pln3() or the store func. Lets use the same for loops here in this file along with same F32 calls to rpp_bilinear_load4_f32pkd3_to_f32pln3() and the store func for f32. Like -
rpp/src/modules/cpu/host_tensor_augmentations.hpp
Lines 6136 to 6152 in 915707d
There was a problem hiding this comment.
This is since the f16/f32 type cast is quite suboptimal and we'll be changing the whole mechanism for all functions in the near future.
There was a problem hiding this comment.
rpp_bilinear_load4_f32pkd3_to_f32pln3() will actually load the pixels based on the location and store it in vectors.
rpp_bilinear_load4_f16pkd3_to_f32pln3() would load the pixels based on location value then cast it to float and store in vectors.
The required source pixels are not loaded from contiguous memory but is influenced by the location factor.
So for loops followed by rpp_bilinear_load4_f32pkd3_to_f32pln3() would not work.