Skip to content

Color Twist - Optimizations in sse/avx2 for tensor implementation#81

Merged
kiritigowda merged 22 commits intoROCm:masterfrom
r-abishek:ar/opt_color_twist_host
Dec 6, 2021
Merged

Color Twist - Optimizations in sse/avx2 for tensor implementation#81
kiritigowda merged 22 commits intoROCm:masterfrom
r-abishek:ar/opt_color_twist_host

Conversation

@r-abishek
Copy link
Copy Markdown
Member

  • Adds color_twist optimizations in sse and avx2 for host
  • Uses a tensor based implementation
  • Support for NCHW<->NHWC, U8/F16/F32/I8
  • Adds the corresponding unit tests and performance tests

@rrawther This PR adds the new color_twist tensor based implementation.

x3 = _mm_shuffle_ps(x3,x3, _MM_SHUFFLE(0,3,2,1));

// Un-normalize
#if 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the commented code.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

dstPtrChannel = dstPtrImage;

#if __AVX2__
Rpp32u alignedLength = bufferLength & ~23;
Copy link
Copy Markdown
Contributor

@rrawther rrawther Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic only works for alignment which are power of 2. Else you need to do x = x/align *align.
applies to all such math

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done for all occurrences of 47 or 23 in tensor_augmentations hpp.

// RpptRoiType roiType,
// rppHandle_t rppHandle)
// {
// #ifdef OCL_COMPILE
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this commented?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove all the OCL_COMPILE for all the functions developed in a separate PR along with one other common API level change.

Copy link
Copy Markdown
Contributor

@rrawther rrawther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please address the review comments

@kiritigowda kiritigowda added the enhancement New feature or request label Dec 2, 2021
Copy link
Copy Markdown
Contributor

@rrawther rrawther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paveltc to confirm all the tests are passing for this PR

@paveltc
Copy link
Copy Markdown
Contributor

paveltc commented Dec 3, 2021

@rrawther @r-abishek There are potential issues with this PR. I have contacted Abishek regarding them. I will get the images that look wrong and share them via email.

@kiritigowda kiritigowda requested a review from rrawther December 6, 2021 15:34
__m128 p[4];
rpp_simd_load(rpp_load12_f32pkd3_to_f32pln3, srcPtrTemp_ps, p); // simd loads
compute_color_twist_12_host(p[0], p[1], p[2], pColorTwistParams); // color_twist adjustment
rpp_simd_store(rpp_store12_f32pln3_to_f32pkd3, dstPtrTemp_ps, p); // simd stores
Copy link
Copy Markdown
Contributor

@rrawther rrawther Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to convert the pixels to fp16 and store in the SIMD code to avoid extra conversion at the end. Applies to similar code below

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rrawther I am unable to use the _mm256_loadu_ph() that directly loads half precision pixels since its under AVX512 instructions -> https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_loadu_&ig_expand=7353,340,383,4325,4324

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or the 128 vector length _mm_loadu_ph() seems to be under "CPUID Flags: AVX512_FP16 + AVX512VL" too

// roiTypeSrc = RpptRoiType::LTRB;
// roiTypeDst = RpptRoiType::LTRB;
// Uncomment to run test case with an xywhROI override
/*for (i = 0; i < images; i++)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid commented code for enabling/disabling features. Instead use #deifnes or command_line arguments

Copy link
Copy Markdown
Contributor

@rrawther rrawther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@r-abishek @paveltc : Are all the tests pass using this PR?

@r-abishek
Copy link
Copy Markdown
Member Author

@rrawther I've increased dst static allocation by 1 due to the writes. This should fix the issue. @paveltc Please let me know if all the issues are fixed.

@paveltc
Copy link
Copy Markdown
Contributor

paveltc commented Dec 6, 2021

@rrawther @r-abishek I ran the tests and I don't see the issues.

@r-abishek
Copy link
Copy Markdown
Member Author

@paveltc Thanks for verifying, @rrawther I have added the minor fix here - bb0252d

@rrawther
Copy link
Copy Markdown
Contributor

rrawther commented Dec 6, 2021

@kiritigowda: this is ready to merge

@kiritigowda kiritigowda merged commit 10aaf5b into ROCm:master Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants