Skip to content

RPP - float, int & tensor support: required for RALI-SOW3#32

Merged
kiritigowda merged 894 commits intoROCm:masterfrom
LokeshBonta:master
Aug 25, 2020
Merged

RPP - float, int & tensor support: required for RALI-SOW3#32
kiritigowda merged 894 commits intoROCm:masterfrom
LokeshBonta:master

Conversation

@LokeshBonta
Copy link
Copy Markdown
Contributor

Major Work:

  1. Given support for 7 variations for fused functions, u8-u8, u8-f32, u8-f16, u8-i8, i8-i8, f32-f32, f16-f16
  2. Crop, Resize Crop, Crop Mirror Normalize, Rotate, Resize, Resize Crop Mirror
  3. Unit testing framework and script

muthukumaravel7 and others added 30 commits August 8, 2019 13:41
dst_pixIdx += dst_inc[id_z];
}
} else {
for (indextmp = 0; indextmp < channel; indextmp++) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider loop unrolling and vector datatypes for better performance

(id_x + id_y * max_dst_width[id_z]) * out_plnpkdind;
if ((id_x < dst_width[id_z]) && (id_y < dst_height[id_z])) {
for (indextmp = 0; indextmp < channel; indextmp++) {
output[dst_pixIdx] = (half)((input[src_pixIdx] - local_mean) / 255.0 * local_std_dev);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

avoid division for all constant divisors. Use multiply by inverse instead. Applicable to all kernels


unsigned int pixId;
pixId = id_x + id_y * dest_width + id_z * dest_width * dest_height;
A = srcPtr[x + y * source_width + id_z * source_height * source_width];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider doing more work by using vector datatypes

const unsigned int dest_height, const unsigned int dest_width,
const unsigned int channel) {
int A, B, C, D, x, y, index, pixVal;
float x_ratio = ((float)(source_width - 1)) / dest_width;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is better is pass x_ratio and y_ratio instead of computing every time

int id_y = get_global_id(1);
int id_z = get_global_id(2);

int xc = id_x - dest_width / 2;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use >>1 instead of /2

color_twist_host(srcPtr, batch_srcSizeMax[batchCount], dstPtr, alpha, beta, hueShift, saturationFactor, chnFormat, channel);
color_twist_host(srcPtrImage, batch_srcSizeMax[batchCount], dstPtrImage, alpha, beta, hueShift, saturationFactor, chnFormat, channel);

if (outputFormatToggle == 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks very inefficient. Need to revisit

xG = _mm_loadu_ps(srcPtrTempG);
xB = _mm_loadu_ps(srcPtrTempB);

xR = _mm_div_ps(xR, pFactor);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use mulps instead. True for all constant divisors

Copy link
Copy Markdown
Collaborator

@kiritigowda kiritigowda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rrawther let me know this is good to merge. it LGTM.

@rrawther
Copy link
Copy Markdown
Contributor

@kiritigowda : Pavel found some issues with GPU flow. Waiting for the status of that to merge

Copy link
Copy Markdown
Collaborator

Codacy Here is an overview of what got changed by this pull request:

Issues
======
+ Solved 1
- Added 6
           

Complexity increasing per file
==============================
- utilities/rpp-unittests/SOW3_HOST/tensorDifference.py  1
         

Clones added
============
- utilities/rpp-unittests/OCL/BatchPD_ocl_pkd3.cpp  63
- utilities/rpp-unittests/SOW3_OCL/BatchPD_ocl_pkd3.cpp  24
- src/modules/cl/cl_declarations.hpp  1
- utilities/rpp-unittests/SOW3_HOST/BatchPD_host_pkd3.cpp  22
- utilities/rpp-unittests/HOST/BatchPD_host_pkd3.cpp  101
- utilities/rpp-unittests/HOST/Single_host.cpp  4
- utilities/rpp-unittests/SOW3_HOST/BatchPD_host_pln1.cpp  22
- src/modules/cl/cl_fused_functions.cpp  3
- utilities/rpp-unittests/SOW3_OCL/BatchPD_ocl_pln1.cpp  23
- utilities/rpp-unittests/SOW3_HOST/BatchPD_host_pln3.cpp  24
- utilities/rpp-unittests/SOW3_OCL/BatchPD_ocl_pln3.cpp  25
- utilities/rpp-unittests/HIP/Single_hip.cpp  9
- src/include/cpu/rpp_cpu_common.hpp  12
- utilities/rpp-unittests/HOST/BatchPD_host_pln1.cpp  102
- utilities/rpp-unittests/HIP/BatchPD_hip.cpp  8
- src/modules/cl/cl_color_model_conversions.cpp  1
- utilities/rpp-unittests/OCL/Single_ocl.cpp  6
- include/rppi_fused_functions.h  2
- src/modules/cl/cl_geometry_transforms.cpp  19
         

See the complete overview on Codacy

@kiritigowda kiritigowda changed the title RPP float, int and tensor support, required for RALI-SOW3 RPP - float, int & tensor support: required for RALI-SOW3 Aug 25, 2020
@kiritigowda kiritigowda merged commit 35d5480 into ROCm:master Aug 25, 2020
fiona-gladwin pushed a commit to fiona-gladwin/rpp that referenced this pull request Dec 13, 2021
Resize Bilinear interpolation - Tensor support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants