Vectorize flipHoriz and flipVert functions. by ChipKerchner · Pull Request #15555 · opencv/opencv

ChipKerchner · 2019-09-20T17:01:21Z

Vectorize flipHoriz and flipVert functions.

flipVert - 2x faster on VSX and 1.5x on x86
flipHoriz - up to 40x+faster on VSX and x86.

Also improved for NEON.

force_builders=Linux AVX2,Custom
buildworker:Custom=linux-3
build_image:Custom=ubuntu:18.04
CPU_BASELINE:Custom=AVX512_SKX
disable_ipp=ON

ChipKerchner · 2019-09-20T17:22:24Z

Need someone to review the NEON code.

pmur · 2019-09-20T17:15:58Z

modules/core/src/copy.cpp

+#elif CV_VSX
+    static const vec_uchar16 perm = {15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0};
+    vec_uchar16 vec = vsx_ld(0, ptr);
+    return v_uint8x16(vec_perm(vec, vec, perm));


vec_revb(__int128) should do this in one instruction on P9. It may help reduce register pressure in tight kernels.

alalek · 2019-09-24T15:32:20Z

Please also check this CI results: https://ocv-power.imavr.com/#/opencv_pullrequests (PowerPC builds are failed)

…support it.

modules/core/src/copy.cpp

tomoaki0705 · 2019-09-28T00:18:25Z

I don't get why these v_load_mirror series goes in to universal intrinsic.
That's the place to write those wrapper functions.

tomoaki0705 · 2019-10-07T20:46:27Z

I've confirmed that current HEAD ( 6fedc7d ) passes on various Arm boards.
NEON code works fine.

Though, I still recommend to put v_load_mirror series in to universal intrinsic.

ChipKerchner · 2019-10-08T11:39:19Z

I've prepared a change for an universal intrinsics v_reverse that addresses the above requests. #15662

ChipKerchner · 2019-10-22T12:59:27Z

Could this PR be looked at and approved now that v_reverse works for all vector platforms?

pmur · 2019-10-22T13:44:28Z

modules/core/src/copy.cpp

+#elif CV_VSX
+    static const vec_uchar16 perm = {8, 9, 10, 11, 12, 13, 14, 15, 0, 1, 2, 3, 4, 5, 6, 7};
+    vec_uchar16 vec = vsx_ld(0, ptr);
+    return v_uint8x16(vec_perm(vec, vec, perm));


Is this the same as xxswapd (e.g vec_permxxdi(vec,vec,2))?

Outdated code.

alalek

Thank you!

alalek · 2019-10-29T18:58:49Z

modules/core/src/copy.cpp

+            T t0, t1;
+
+            t0 = *((T*)((uchar*)src + i));
+            t1 = *((T*)((uchar*)src + j - sizeof(T)));


Be careful.
At least on ARM these T* pointers are required to be aligned - see #14710

alalek · 2019-10-29T19:02:07Z

modules/core/src/copy.cpp

        if( ((size_t)src0|(size_t)dst0|(size_t)src1|(size_t)dst1) % sizeof(int) == 0 )
        {
+#if CV_SIMD
+            for( ; i <= size.width - (v_int32::nlanes * 4); i += v_int32::nlanes * 4 )


SIMD code doesn't require alignment check above, so it can be placed before if.

mshabunin · 2019-12-11T15:13:27Z

Looks like there is a problem with unaligned v_load on ARM platform:

Program received signal SIGBUS, Bus error.
cv::hal_baseline::v_load (ptr=0x7effe83c) at ../opencv/modules/core/include/opencv2/core/hal/intrin_neon.hpp:1212
1212    ../opencv/modules/core/include/opencv2/core/hal/intrin_neon.hpp: No such file or directory.
(gdb) bt
#0  cv::hal_baseline::v_load (ptr=0x7effe83c) at ../opencv/modules/core/include/opencv2/core/hal/intrin_neon.hpp:1212
#1  0x76829e7e in cv::flipHoriz_single<cv::hal_baseline::v_uint64x2> (esz=8, size=..., dstep=32, dst=0x7effe83c "", 
    sstep=32, src=0x7effe83c "") at ../opencv/modules/core/src/copy.cpp:581
#2  cv::flipHoriz (src=0x7effe83c "", sstep=32, dst=0x7effe83c "", dstep=32, size=..., esz=8)
    at ../opencv/modules/core/src/copy.cpp:698
#3  0x7682b2f0 in cv::flip (_src=..., _dst=..., flip_mode=1) at ../opencv/modules/core/src/copy.cpp:977
#4  0x76bc2e72 in cv::intersectConvexConvex (_p1=..., _p2=..., _p12=..., handleNested=true)
    at ../opencv/modules/imgproc/src/geometry.cpp:530
#5  0x004ead06 in opencv_test::(anonymous namespace)::Imgproc_IntersectConvexConvex_intersection_1_Test::Body (
    this=0x8bb830) at ../opencv/modules/imgproc/test/test_intersectconvexconvex.cpp:161
#6  0x004eab5c in opencv_test::(anonymous namespace)::Imgproc_IntersectConvexConvex_intersection_1_Test::TestBody (
    this=0x8bb830) at ../opencv/modules/imgproc/test/test_intersectconvexconvex.cpp:146
<cut>

Can be reproduced by running this test case: ./bin/opencv_test_imgproc --gtest_filter=Imgproc_IntersectConvexConvex.intersection_1

modules/core/src/copy.cpp

tomoaki0705 · 2019-12-12T08:46:48Z

I did some quick experiments. https://github.com/tomoaki0705/unalignedLoad
@ChipKerchner is right. There is no "unaligned load instruction" in NEON.
Still, I figured out that there is a corner case in the combination of Armv7 + 64bit loading.

As @mshabunin showed, the error comes when the load happens with v_uint64x2

#1  0x76829e7e in cv::flipHoriz_single<cv::hal_baseline::v_uint64x2>

Also, it happens on Armv7 (32bit) but not on Arm v8 (64bit)

Here's a summarized table result

Platform	32/64bit	GCC	loading structure	memory address	result
Jetson TX1	64	5.4.0	`uint64x2`	unaligned	success
Jetson TX1	64	5.4.0	`uint64x2`	aligned	success
ODROID-XU4	32	5.4.0	`uint64x2`	unaligned	failure (SIGBUS)
ODROID-XU4	32	5.4.0	`uint64x2`	aligned	success
ODROID-XU4	32	5.4.0	`uint32x4`	unaligned	success

So, when loading from unaligned address using v_uint64x2 on Armv7, SIGBUS raises.
I can't believe this, but as far as I tried, the result shows above result.

How to fix, it's a difficult question.
May be replace v_uint64x2 with v_uint32x4 ?

Vectorize flipHoriz and flipVert functions.

7770cac

pmur reviewed Sep 20, 2019

View reviewed changes

Change v_load_mirror_1 to use vec_revb for VSX

c115eb4

ChipKerchner added 2 commits September 24, 2019 11:13

Only use vec_revb in ISA3.0

85feed0

Removing vec_revb code since some of the older compilers don't fully …

6fedc7d

…support it.

terfendail reviewed Sep 27, 2019

View reviewed changes

modules/core/src/copy.cpp Outdated Show resolved Hide resolved

terfendail reviewed Sep 27, 2019

View reviewed changes

modules/core/src/copy.cpp Outdated Show resolved Hide resolved

Use new v_reverse intrinsic and cleanup code.

d89623a

pmur reviewed Oct 22, 2019

View reviewed changes

alalek approved these changes Oct 29, 2019

View reviewed changes

alalek assigned terfendail Oct 29, 2019

Ensure there are no alignment issues with copies

f170d42

alalek unassigned terfendail Nov 1, 2019

alalek merged commit ed7e427 into opencv:3.4 Nov 1, 2019

alalek mentioned this pull request Nov 4, 2019

Merge 3.4 #15843

Merged

ChipKerchner deleted the flipVectorize branch November 5, 2019 17:54

alalek reviewed Dec 11, 2019

View reviewed changes

modules/core/src/copy.cpp Show resolved Hide resolved

tomoaki0705 mentioned this pull request Dec 13, 2019

core: Workaround flip horiz #16152

Closed

catree mentioned this pull request Jan 31, 2020

NEP: universal SIMD NEP 38 numpy/numpy#15228

Merged

Uh oh!

Conversation

ChipKerchner commented Sep 20, 2019 • edited by alalek Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChipKerchner commented Sep 20, 2019

Uh oh!

pmur Sep 20, 2019

Choose a reason for hiding this comment

Uh oh!

alalek commented Sep 24, 2019

Uh oh!

Uh oh!

Uh oh!

tomoaki0705 commented Sep 28, 2019

Uh oh!

tomoaki0705 commented Oct 7, 2019

Uh oh!

ChipKerchner commented Oct 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChipKerchner commented Oct 22, 2019

Uh oh!

pmur Oct 22, 2019

Choose a reason for hiding this comment

Uh oh!

ChipKerchner Oct 22, 2019

Choose a reason for hiding this comment

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

alalek Oct 29, 2019

Choose a reason for hiding this comment

Uh oh!

alalek Oct 29, 2019

Choose a reason for hiding this comment

Uh oh!

mshabunin commented Dec 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

tomoaki0705 commented Dec 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ChipKerchner commented Sep 20, 2019 •

edited by alalek

Loading

ChipKerchner commented Oct 8, 2019 •

edited

Loading

mshabunin commented Dec 11, 2019 •

edited

Loading

tomoaki0705 commented Dec 12, 2019 •

edited

Loading