RPP - float, int & tensor support: required for RALI-SOW3#32
RPP - float, int & tensor support: required for RALI-SOW3#32kiritigowda merged 894 commits intoROCm:masterfrom
Conversation
| dst_pixIdx += dst_inc[id_z]; | ||
| } | ||
| } else { | ||
| for (indextmp = 0; indextmp < channel; indextmp++) { |
There was a problem hiding this comment.
consider loop unrolling and vector datatypes for better performance
| (id_x + id_y * max_dst_width[id_z]) * out_plnpkdind; | ||
| if ((id_x < dst_width[id_z]) && (id_y < dst_height[id_z])) { | ||
| for (indextmp = 0; indextmp < channel; indextmp++) { | ||
| output[dst_pixIdx] = (half)((input[src_pixIdx] - local_mean) / 255.0 * local_std_dev); |
There was a problem hiding this comment.
avoid division for all constant divisors. Use multiply by inverse instead. Applicable to all kernels
|
|
||
| unsigned int pixId; | ||
| pixId = id_x + id_y * dest_width + id_z * dest_width * dest_height; | ||
| A = srcPtr[x + y * source_width + id_z * source_height * source_width]; |
There was a problem hiding this comment.
consider doing more work by using vector datatypes
| const unsigned int dest_height, const unsigned int dest_width, | ||
| const unsigned int channel) { | ||
| int A, B, C, D, x, y, index, pixVal; | ||
| float x_ratio = ((float)(source_width - 1)) / dest_width; |
There was a problem hiding this comment.
it is better is pass x_ratio and y_ratio instead of computing every time
| int id_y = get_global_id(1); | ||
| int id_z = get_global_id(2); | ||
|
|
||
| int xc = id_x - dest_width / 2; |
| color_twist_host(srcPtr, batch_srcSizeMax[batchCount], dstPtr, alpha, beta, hueShift, saturationFactor, chnFormat, channel); | ||
| color_twist_host(srcPtrImage, batch_srcSizeMax[batchCount], dstPtrImage, alpha, beta, hueShift, saturationFactor, chnFormat, channel); | ||
|
|
||
| if (outputFormatToggle == 1) |
There was a problem hiding this comment.
this looks very inefficient. Need to revisit
| xG = _mm_loadu_ps(srcPtrTempG); | ||
| xB = _mm_loadu_ps(srcPtrTempB); | ||
|
|
||
| xR = _mm_div_ps(xR, pFactor); |
There was a problem hiding this comment.
please use mulps instead. True for all constant divisors
kiritigowda
left a comment
There was a problem hiding this comment.
@rrawther let me know this is good to merge. it LGTM.
|
@kiritigowda : Pavel found some issues with GPU flow. Waiting for the status of that to merge |
|
Issues
======
+ Solved 1
- Added 6
Complexity increasing per file
==============================
- utilities/rpp-unittests/SOW3_HOST/tensorDifference.py 1
Clones added
============
- utilities/rpp-unittests/OCL/BatchPD_ocl_pkd3.cpp 63
- utilities/rpp-unittests/SOW3_OCL/BatchPD_ocl_pkd3.cpp 24
- src/modules/cl/cl_declarations.hpp 1
- utilities/rpp-unittests/SOW3_HOST/BatchPD_host_pkd3.cpp 22
- utilities/rpp-unittests/HOST/BatchPD_host_pkd3.cpp 101
- utilities/rpp-unittests/HOST/Single_host.cpp 4
- utilities/rpp-unittests/SOW3_HOST/BatchPD_host_pln1.cpp 22
- src/modules/cl/cl_fused_functions.cpp 3
- utilities/rpp-unittests/SOW3_OCL/BatchPD_ocl_pln1.cpp 23
- utilities/rpp-unittests/SOW3_HOST/BatchPD_host_pln3.cpp 24
- utilities/rpp-unittests/SOW3_OCL/BatchPD_ocl_pln3.cpp 25
- utilities/rpp-unittests/HIP/Single_hip.cpp 9
- src/include/cpu/rpp_cpu_common.hpp 12
- utilities/rpp-unittests/HOST/BatchPD_host_pln1.cpp 102
- utilities/rpp-unittests/HIP/BatchPD_hip.cpp 8
- src/modules/cl/cl_color_model_conversions.cpp 1
- utilities/rpp-unittests/OCL/Single_ocl.cpp 6
- include/rppi_fused_functions.h 2
- src/modules/cl/cl_geometry_transforms.cpp 19
See the complete overview on Codacy |
Resize Bilinear interpolation - Tensor support
Major Work: