Convert HOG from SSE SIMD to HAL - 35-45% faster on Power (VSX) by ChipKerchner · Pull Request #15199 · opencv/opencv

ChipKerchner · 2019-07-31T17:49:06Z

Convert HOG from SSE SIMD to HAL - 35-45% faster on Power (VSX).

force_builders=ARMv8,Custom
buildworker:Custom=linux-1,linux-2,linux-4
docker_image:Custom=powerpc64le

mshabunin · 2019-08-01T08:10:39Z

Shouldn't it be transformed differently?

Before:

#if CV_SSE2
...
#elif CV_NEON
...
#else
...
#endif

After:

#if CV_SIMD128 // or CV_SIMD for wide universal intrinsics
...
#else
...
#endif

ChipKerchner · 2019-08-01T11:36:01Z

I could remove the NEON specific code and use the CV_SIMD128 for 3 platforms (SSE2, NEON, VSX) instead.

I just did NOT have a way of testing NEON.

alalek · 2019-08-01T13:07:53Z

Yes, this is right way.
We would run tests on NEON-capable hardware.

terfendail · 2019-08-01T20:50:05Z

modules/objdetect/src/hog.cpp

+
+            v_int32x4 sign = (ione & v_reinterpret_as_s32(_angle < fzero));
+            v_int32x4 _hidx = v_trunc(_angle);
+            _hidx -= sign;


I suppose v_floor intrinsic could be used here to compute _hidx instead of lines 502-504

terfendail · 2019-08-02T07:58:33Z

modules/objdetect/src/hog.cpp

+            v_int32x4 mask0 = _hidx >> 31;
+            v_int32x4 it0 = mask0 & _nbins;
+            mask0 = (_hidx < _nbins);
+            v_int32x4 it1 = ~mask0 & _nbins;


I think the code will seem simpler if >= is used instead of < invertion

alalek

Well done!

alalek · 2019-08-02T16:25:00Z

modules/objdetect/src/hog.cpp

-        int32x4_t ifour = vdupq_n_s32(4);
+    #if CV_SIMD128
+        const float a[] = { 0.0, 1.0, 2.0, 3.0 };
+        v_float32x4 idx = v_load((float *)a);


v_float32x4 idx(0.0f, 1.0f, 2.0f, 3.0f); here and above (line 251).

alalek

Well done! Thank you 👍

Convert SSE SIMD to HAL. 35-45% improvement for Power (VSX)

c613fc1

mshabunin added the optimization label Aug 1, 2019

terfendail reviewed Aug 1, 2019

View reviewed changes

terfendail reviewed Aug 2, 2019

View reviewed changes

ChipKerchner added 2 commits August 2, 2019 07:57

Remove CV_NEON code. Use v_floor instead of 3 lines of code.

639b4a5

Invert comparison logic to simplify code.

e83ac6a

alalek reviewed Aug 2, 2019

View reviewed changes

Change initialization from v_load to constructor type.

38e6e1f

alalek approved these changes Aug 8, 2019

View reviewed changes

alalek merged commit d513fb4 into opencv:3.4 Aug 8, 2019

ChipKerchner deleted the hogToHal branch August 8, 2019 18:59

alalek mentioned this pull request Aug 13, 2019

Merge 3.4 #15295

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Convert HOG from SSE SIMD to HAL - 35-45% faster on Power (VSX)#15199

Convert HOG from SSE SIMD to HAL - 35-45% faster on Power (VSX)#15199
alalek merged 4 commits intoopencv:3.4from
ChipKerchner:hogToHal

ChipKerchner commented Jul 31, 2019 •

edited by alalek

Loading

Uh oh!

mshabunin commented Aug 1, 2019

Uh oh!

ChipKerchner commented Aug 1, 2019

Uh oh!

alalek commented Aug 1, 2019

Uh oh!

terfendail Aug 1, 2019

Uh oh!

terfendail Aug 2, 2019

Uh oh!

alalek left a comment

Uh oh!

alalek Aug 2, 2019

Uh oh!

alalek left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

ChipKerchner commented Jul 31, 2019 • edited by alalek Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mshabunin commented Aug 1, 2019

Uh oh!

ChipKerchner commented Aug 1, 2019

Uh oh!

alalek commented Aug 1, 2019

Uh oh!

terfendail Aug 1, 2019

Choose a reason for hiding this comment

Uh oh!

terfendail Aug 2, 2019

Choose a reason for hiding this comment

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

alalek Aug 2, 2019

Choose a reason for hiding this comment

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ChipKerchner commented Jul 31, 2019 •

edited by alalek

Loading