Reduce LLC loads, stores and multiplies on MulTransposed - 8% faster by ChipKerchner · Pull Request #16375 · opencv/opencv

ChipKerchner · 2020-01-17T13:11:42Z

Reduce LLC loads, stores and multiplies by 2x on MulTransposed - 8% faster

ChipKerchner · 2020-01-17T15:29:41Z

Last failure doesn't seem to be related to my check-in

alalek · 2020-01-17T15:36:26Z

modules/core/src/matmul.simd.hpp

-                double s0 = 0, s1 = 0, s2 = 0, s3 = 0;
-                const sT *tsrc = src + j;
+#if CV_SIMD_64F
+                if (is_same<sT, double>::value && is_same<dT, double>::value)


Please try type traits from OpenCV:

DataType<sT>::depth == CV_64F && DataType<dT>::depth == CV_64F

alalek · 2020-01-20T12:25:05Z

/cc @terfendail BTW, dispatching is blocked by CV_MULTRANSPOSED_BASELINE_ONLY for this function.

terfendail · 2020-01-23T13:44:28Z

The change looks fine for me.
However I can't reproduce the performance gain. There is no dedicated performance test for the function, so I've used accuracy test evaluation time. For SSE3 baseline I've got performance gain of about 3 per cent, for SSE4_2 baseline I've got performance degradation of 2-3 per cent. Both results look like random fluctuation rather than stable performance change.

ChipKerchner · 2020-01-23T13:50:18Z

The performance gains were measured on a Power9 VSX system. It is possible that the gains only show up on a non-Intel platform since there are different stalls related to memory access for this platform.

ChipKerchner added 3 commits January 17, 2020 07:10

Reduce LLC loads, stores and multiplies on MulTransposed - 8% faster

129419b

Add is_same method so c++11 is not required

0fb3f0d

Remove trailing whitespaces.

1bd2e84

alalek reviewed Jan 17, 2020

View reviewed changes

Change is_same to DataType depth check

4f0c09b

alalek assigned terfendail Jan 23, 2020

alalek merged commit 4d2da2d into opencv:3.4 Jan 24, 2020

ChipKerchner deleted the vectorizeMultTranspose branch January 27, 2020 15:53

alalek mentioned this pull request Jan 28, 2020

Merge 3.4 #16450

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce LLC loads, stores and multiplies on MulTransposed - 8% faster#16375

Reduce LLC loads, stores and multiplies on MulTransposed - 8% faster#16375
alalek merged 4 commits intoopencv:3.4from
ChipKerchner:vectorizeMultTranspose

ChipKerchner commented Jan 17, 2020 •

edited

Loading

Uh oh!

ChipKerchner commented Jan 17, 2020

Uh oh!

alalek Jan 17, 2020

Uh oh!

ChipKerchner Jan 17, 2020

Uh oh!

alalek commented Jan 20, 2020

Uh oh!

terfendail commented Jan 23, 2020

Uh oh!

ChipKerchner commented Jan 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ChipKerchner commented Jan 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChipKerchner commented Jan 17, 2020

Uh oh!

alalek Jan 17, 2020

Choose a reason for hiding this comment

Uh oh!

ChipKerchner Jan 17, 2020

Choose a reason for hiding this comment

Uh oh!

alalek commented Jan 20, 2020

Uh oh!

terfendail commented Jan 23, 2020

Uh oh!

ChipKerchner commented Jan 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ChipKerchner commented Jan 17, 2020 •

edited

Loading