Color Twist - Optimizations in sse/avx2 for tensor implementation#81
Color Twist - Optimizations in sse/avx2 for tensor implementation#81kiritigowda merged 22 commits intoROCm:masterfrom
Conversation
…into ar/opt_color_twist_host
| x3 = _mm_shuffle_ps(x3,x3, _MM_SHUFFLE(0,3,2,1)); | ||
|
|
||
| // Un-normalize | ||
| #if 0 |
There was a problem hiding this comment.
please remove the commented code.
| dstPtrChannel = dstPtrImage; | ||
|
|
||
| #if __AVX2__ | ||
| Rpp32u alignedLength = bufferLength & ~23; |
There was a problem hiding this comment.
this logic only works for alignment which are power of 2. Else you need to do x = x/align *align.
applies to all such math
There was a problem hiding this comment.
Done for all occurrences of 47 or 23 in tensor_augmentations hpp.
| // RpptRoiType roiType, | ||
| // rppHandle_t rppHandle) | ||
| // { | ||
| // #ifdef OCL_COMPILE |
There was a problem hiding this comment.
I'll remove all the OCL_COMPILE for all the functions developed in a separate PR along with one other common API level change.
rrawther
left a comment
There was a problem hiding this comment.
please address the review comments
|
@rrawther @r-abishek There are potential issues with this PR. I have contacted Abishek regarding them. I will get the images that look wrong and share them via email. |
…k/rpp into ar/opt_color_twist_host
| __m128 p[4]; | ||
| rpp_simd_load(rpp_load12_f32pkd3_to_f32pln3, srcPtrTemp_ps, p); // simd loads | ||
| compute_color_twist_12_host(p[0], p[1], p[2], pColorTwistParams); // color_twist adjustment | ||
| rpp_simd_store(rpp_store12_f32pln3_to_f32pkd3, dstPtrTemp_ps, p); // simd stores |
There was a problem hiding this comment.
We need to convert the pixels to fp16 and store in the SIMD code to avoid extra conversion at the end. Applies to similar code below
There was a problem hiding this comment.
@rrawther I am unable to use the _mm256_loadu_ph() that directly loads half precision pixels since its under AVX512 instructions -> https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_loadu_&ig_expand=7353,340,383,4325,4324
There was a problem hiding this comment.
Or the 128 vector length _mm_loadu_ph() seems to be under "CPUID Flags: AVX512_FP16 + AVX512VL" too
| // roiTypeSrc = RpptRoiType::LTRB; | ||
| // roiTypeDst = RpptRoiType::LTRB; | ||
| // Uncomment to run test case with an xywhROI override | ||
| /*for (i = 0; i < images; i++) |
There was a problem hiding this comment.
Please avoid commented code for enabling/disabling features. Instead use #deifnes or command_line arguments
rrawther
left a comment
There was a problem hiding this comment.
@r-abishek @paveltc : Are all the tests pass using this PR?
|
@rrawther @r-abishek I ran the tests and I don't see the issues. |
|
@kiritigowda: this is ready to merge |
@rrawther This PR adds the new color_twist tensor based implementation.