Reduce LLC loads, stores and multiplies on MulTransposed - 8% faster#16375
Reduce LLC loads, stores and multiplies on MulTransposed - 8% faster#16375alalek merged 4 commits intoopencv:3.4from ChipKerchner:vectorizeMultTranspose
Conversation
|
Last failure doesn't seem to be related to my check-in |
modules/core/src/matmul.simd.hpp
Outdated
| double s0 = 0, s1 = 0, s2 = 0, s3 = 0; | ||
| const sT *tsrc = src + j; | ||
| #if CV_SIMD_64F | ||
| if (is_same<sT, double>::value && is_same<dT, double>::value) |
There was a problem hiding this comment.
Please try type traits from OpenCV:
DataType<sT>::depth == CV_64F && DataType<dT>::depth == CV_64F
|
/cc @terfendail BTW, dispatching is blocked by |
|
The change looks fine for me. |
|
The performance gains were measured on a Power9 VSX system. It is possible that the gains only show up on a non-Intel platform since there are different stalls related to memory access for this platform. |
Reduce LLC loads, stores and multiplies by 2x on MulTransposed - 8% faster