Vectorize flipHoriz and flipVert functions.#15555
Vectorize flipHoriz and flipVert functions.#15555alalek merged 6 commits intoopencv:3.4from ChipKerchner:flipVectorize
Conversation
|
Need someone to review the NEON code. |
modules/core/src/copy.cpp
Outdated
| #elif CV_VSX | ||
| static const vec_uchar16 perm = {15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0}; | ||
| vec_uchar16 vec = vsx_ld(0, ptr); | ||
| return v_uint8x16(vec_perm(vec, vec, perm)); |
There was a problem hiding this comment.
vec_revb(__int128) should do this in one instruction on P9. It may help reduce register pressure in tight kernels.
|
Please also check this CI results: https://ocv-power.imavr.com/#/opencv_pullrequests (PowerPC builds are failed) |
|
I don't get why these |
|
I've confirmed that current HEAD ( 6fedc7d ) passes on various Arm boards. Though, I still recommend to put |
|
I've prepared a change for an universal intrinsics |
|
Could this PR be looked at and approved now that v_reverse works for all vector platforms? |
modules/core/src/copy.cpp
Outdated
| #elif CV_VSX | ||
| static const vec_uchar16 perm = {8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, 3, 4, 5, 6, 7}; | ||
| vec_uchar16 vec = vsx_ld(0, ptr); | ||
| return v_uint8x16(vec_perm(vec, vec, perm)); |
There was a problem hiding this comment.
Is this the same as xxswapd (e.g vec_permxxdi(vec,vec,2))?
There was a problem hiding this comment.
Outdated code.
modules/core/src/copy.cpp
Outdated
| T t0, t1; | ||
|
|
||
| t0 = *((T*)((uchar*)src + i)); | ||
| t1 = *((T*)((uchar*)src + j - sizeof(T))); |
There was a problem hiding this comment.
Be careful.
At least on ARM these T* pointers are required to be aligned - see #14710
modules/core/src/copy.cpp
Outdated
| if( ((size_t)src0|(size_t)dst0|(size_t)src1|(size_t)dst1) % sizeof(int) == 0 ) | ||
| { | ||
| #if CV_SIMD | ||
| for( ; i <= size.width - (v_int32::nlanes * 4); i += v_int32::nlanes * 4 ) |
There was a problem hiding this comment.
SIMD code doesn't require alignment check above, so it can be placed before if.
|
Looks like there is a problem with unaligned v_load on ARM platform: Can be reproduced by running this test case: |
|
I did some quick experiments. https://github.com/tomoaki0705/unalignedLoad As @mshabunin showed, the error comes when the load happens with Also, it happens on Armv7 (32bit) but not on Arm v8 (64bit) Here's a summarized table result
So, when loading from unaligned address using v_uint64x2 on Armv7, How to fix, it's a difficult question. |
Vectorize flipHoriz and flipVert functions.
flipVert - 2x faster on VSX and 1.5x on x86
flipHoriz - up to 40x+faster on VSX and x86.
Also improved for NEON.