Skip to content

core: add 64F intrinsic in HAL NEON#7175

Merged
opencv-pushbot merged 1 commit intoopencv:masterfrom
tomoaki0705:featureIntrinsic64
Sep 2, 2016
Merged

core: add 64F intrinsic in HAL NEON#7175
opencv-pushbot merged 1 commit intoopencv:masterfrom
tomoaki0705:featureIntrinsic64

Conversation

@tomoaki0705
Copy link
Copy Markdown
Contributor

This pullrequest changes

  • use universal intrinsic for accumulate series using float/double
  • accumulate, accumulateSquare, accumulateProduct and accumulateWeighted
  • add v_cvt_f64_high in both SSE/NEON
  • add test for conversion v_cvt_f64
  • improve some existing universal intrinsic by using new instructions in Aarch64

@tomoaki0705
Copy link
Copy Markdown
Contributor Author

Here is the measurement performance

Platform Windows Windows ARM(32bit) ARM(32bit) ARM(64bit) ARM(64bit)
implementation before after before after before after
Video_Acc.accuracy 349 ms 228 ms 707 ms 676 ms 867 ms 864 ms
Video_AccSquared.accuracy 217 ms 216 ms 641 ms 639 ms 929 ms 925 ms
Video_AccProduct.accuracy 219 ms 221 ms 683 ms 682 ms 934 ms 931 ms
Video_RunningAvg.accuracy 223 ms 223 ms 643 ms 642 ms 904 ms 897 ms

For Windows measurement, I removed AVX implementation and switched off the IPP.
Platform information

  • Windows: MacBookPro Mid2012 (Windows 7 x64 + VS2012 Update 4 + Corei7 4Core 2.6GHz)
  • ARM 32bit: Jetson TK1(Ubuntu 14.04 + gcc 4.8.4 + ARM Cortex A15 4Core 2.3GHz)
  • ARM 64bit: ODROID-C2 (Ubuntu 16.04 + gcc 5.4.0 + ARM Cortex A53 4Core 2.0GHz)

Please note that this is not for improvement, so most of the measurement number doesn't change.

Follow up commit of #7110

@tomoaki0705
Copy link
Copy Markdown
Contributor Author

Here is an extra measurement result

Platform Rasberry Pi 3 Rasberry Pi 3 PINE64 PINE64
implementation before after before after
Video_Acc.accuracy 1647 ms 1646 ms 1358 ms 1269 ms
Video_AccSquared.accuracy 1756 ms 1758 ms 1477 ms 1376 ms
Video_AccProduct.accuracy 1822 ms 1820 ms 1479 ms 1382 ms
Video_RunningAvg.accuracy 1815 ms 1817 ms 1403 ms 1315 ms
  • Raspberry Pi 3 (Rasbian OS Jessie March 2016 + gcc 4.9.2 + ARM Cortex A53 4Core 1.2GHz )
  • PINE64 (Ubuntu 16.04 + gcc 5.3.1 + ARM Cortex A53 4Core 1.2GHz )

@mshabunin
Copy link
Copy Markdown
Contributor

@tomoaki0705 , looks like gcc from Android NDK doesn't support several intrinsics. Probably you should implement them similar to workaround made here: https://github.com/opencv/opencv/pull/6942/files#diff-b6faf5330d7cd50cbadecd85ae1bec5a

  * use universal intrinsic for accumulate series using float/double
  * accumulate, accumulateSquare, accumulateProduct and accumulateWeighted
  * add v_cvt_f64_high in both SSE/NEON
  * add test for conversion v_cvt_f64_high in test_intrin.cpp
  * improve some existing universal intrinsic by using new instructions in Aarch64
  * add workaround for Android build in intrin_neon.hpp
@tomoaki0705 tomoaki0705 force-pushed the featureIntrinsic64 branch 2 times, most recently from 9867582 to 7fef96b Compare September 2, 2016 05:59
@tomoaki0705
Copy link
Copy Markdown
Contributor Author

Sorry, I accidentally pushed different commits and re-started the tests.

It seems that Android build needs to be triggered manually by some one else, but this test seems promising that workaround proposed by @mshabunin works well.

How does it looks ?
Best regards.

@alalek
Copy link
Copy Markdown
Member

alalek commented Sep 2, 2016

Looks good to me! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants