core: add 64F intrinsic in HAL NEON by tomoaki0705 · Pull Request #7175 · opencv/opencv

tomoaki0705 · 2016-08-26T09:47:28Z

This pullrequest changes

use universal intrinsic for accumulate series using float/double
accumulate, accumulateSquare, accumulateProduct and accumulateWeighted
add v_cvt_f64_high in both SSE/NEON
add test for conversion v_cvt_f64
improve some existing universal intrinsic by using new instructions in Aarch64

tomoaki0705 · 2016-08-26T09:53:24Z

Here is the measurement performance

Platform	Windows	Windows	ARM(32bit)	ARM(32bit)	ARM(64bit)	ARM(64bit)
implementation	before	after	before	after	before	after
Video_Acc.accuracy	349 ms	228 ms	707 ms	676 ms	867 ms	864 ms
Video_AccSquared.accuracy	217 ms	216 ms	641 ms	639 ms	929 ms	925 ms
Video_AccProduct.accuracy	219 ms	221 ms	683 ms	682 ms	934 ms	931 ms
Video_RunningAvg.accuracy	223 ms	223 ms	643 ms	642 ms	904 ms	897 ms

For Windows measurement, I removed AVX implementation and switched off the IPP.
Platform information

Windows: MacBookPro Mid2012 (Windows 7 x64 + VS2012 Update 4 + Corei7 4Core 2.6GHz)
ARM 32bit: Jetson TK1(Ubuntu 14.04 + gcc 4.8.4 + ARM Cortex A15 4Core 2.3GHz)
ARM 64bit: ODROID-C2 (Ubuntu 16.04 + gcc 5.4.0 + ARM Cortex A53 4Core 2.0GHz)

Please note that this is not for improvement, so most of the measurement number doesn't change.

Follow up commit of #7110

tomoaki0705 · 2016-08-26T10:01:39Z

Here is an extra measurement result

Platform	Rasberry Pi 3	Rasberry Pi 3	PINE64	PINE64
implementation	before	after	before	after
Video_Acc.accuracy	1647 ms	1646 ms	1358 ms	1269 ms
Video_AccSquared.accuracy	1756 ms	1758 ms	1477 ms	1376 ms
Video_AccProduct.accuracy	1822 ms	1820 ms	1479 ms	1382 ms
Video_RunningAvg.accuracy	1815 ms	1817 ms	1403 ms	1315 ms

Raspberry Pi 3 (Rasbian OS Jessie March 2016 + gcc 4.9.2 + ARM Cortex A53 4Core 1.2GHz )
PINE64 (Ubuntu 16.04 + gcc 5.3.1 + ARM Cortex A53 4Core 1.2GHz )

mshabunin · 2016-08-26T12:35:04Z

@tomoaki0705 , looks like gcc from Android NDK doesn't support several intrinsics. Probably you should implement them similar to workaround made here: https://github.com/opencv/opencv/pull/6942/files#diff-b6faf5330d7cd50cbadecd85ae1bec5a

* use universal intrinsic for accumulate series using float/double * accumulate, accumulateSquare, accumulateProduct and accumulateWeighted * add v_cvt_f64_high in both SSE/NEON * add test for conversion v_cvt_f64_high in test_intrin.cpp * improve some existing universal intrinsic by using new instructions in Aarch64 * add workaround for Android build in intrin_neon.hpp

tomoaki0705 · 2016-09-02T06:05:14Z

Sorry, I accidentally pushed different commits and re-started the tests.

It seems that Android build needs to be triggered manually by some one else, but this test seems promising that workaround proposed by @mshabunin works well.

How does it looks ?
Best regards.

alalek · 2016-09-02T10:02:33Z

Looks good to me! 👍

tomoaki0705 force-pushed the featureIntrinsic64 branch 2 times, most recently from 9867582 to 7fef96b Compare September 2, 2016 05:59

opencv-pushbot merged commit 7fef96b into opencv:master Sep 2, 2016

opencv-pushbot pushed a commit that referenced this pull request Sep 2, 2016

Merge pull request #7175 from tomoaki0705:featureIntrinsic64

28db4a2

tomoaki0705 deleted the featureIntrinsic64 branch September 2, 2016 12:35

This was referenced Feb 26, 2023

finiteMask() and doubles for patchNaNs() #23098

Merged

core(simd): 64-bit integer EQ/NE without misused 64F guard #23307

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

core: add 64F intrinsic in HAL NEON#7175

core: add 64F intrinsic in HAL NEON#7175
opencv-pushbot merged 1 commit intoopencv:masterfrom
tomoaki0705:featureIntrinsic64

tomoaki0705 commented Aug 26, 2016

Uh oh!

tomoaki0705 commented Aug 26, 2016

Uh oh!

tomoaki0705 commented Aug 26, 2016

Uh oh!

mshabunin commented Aug 26, 2016

Uh oh!

tomoaki0705 commented Sep 2, 2016

Uh oh!

alalek commented Sep 2, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

tomoaki0705 commented Aug 26, 2016

This pullrequest changes

Uh oh!

tomoaki0705 commented Aug 26, 2016

Uh oh!

tomoaki0705 commented Aug 26, 2016

Uh oh!

mshabunin commented Aug 26, 2016

Uh oh!

tomoaki0705 commented Sep 2, 2016

Uh oh!

alalek commented Sep 2, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants