fast_math.hpp performance improvements by pmur · Pull Request #15122 · opencv/opencv

pmur · 2019-07-22T19:35:23Z

Optimize fast_math.hpp primitives for P8+ and GCC based systems.

Leverage compiler builtins to implement most rounding primitives. This should allow the compiler to choose more efficient instructions for the target architecture. PPC has dedicated rounding instructions. Hopefully this also carries over to other architectures too.

Notably, __builtin_lrint{,f} functions just call out to libm. Instead, include an inline solution similar to ARM.

This reduces the testing time by 200-300 seconds on POWER9.

allow_multiple_commits=1

alalek · 2019-07-22T23:32:48Z

Optimization patches should go into 3.4 branch first. We will merge changes from 3.4 into master regularly (weekly/bi-weekly).
Clang mimics for GCC.
Please try to use this condition (with extra macro to bypass this code path):

#if defined(__GNUC__) && !defined(__clang__) \
    && !defined(OPENCV_SKIP_FASTMATH_GCC_BUILTINS)

It would be nice to have simple perf tests for these functions.
You can check PowerPC build results here: https://ocv-power.imavr.com/#/opencv_pullrequests

alalek · 2019-08-01T16:40:22Z

modules/core/include/opencv2/core/fast_math.hpp

+   without the -fno-math-errno option. */
+#ifdef OPENCV_USE_FASTMATH_GCC_BUILTINS
+#  define _OPENCV_FASTMATH_ENABLE_GCC_MATH_BUILTINS ((defined __GNUC__ && !defined __clang__) \
+                                                     && defined OPENCV_USE_FASTMATH_GCC_BUILTINS)


Does it really works as expected?

#if defined _OPENCV_FASTMATH_ENABLE_GCC_MATH_BUILTINS

alalek · 2019-08-01T16:40:25Z

modules/core/include/opencv2/core/fast_math.hpp

    // 3. version for float
-    #define ARM_ROUND_FLT(value) ARM_ROUND(value, "vcvtr.s32.f32 %[temp], %[value]\n vmov %[res], %[temp]")
+    #define CV_INLINE_ROUND_FLT(value) ARM_ROUND(value, "vcvtr.s32.f32 %[temp], %[value]\n vmov %[res], %[temp]")
+#elif defined __PPC64__ && defined __GNUC__ && defined _ARCH_PWR8


Probably we need to apply __CUDACC__ guard here too:
https://www.ibm.com/developerworks/community/blogs/fe313521-2e95-46f2-817d-44a4f27eba32/entry/how_to_develop_nvidia_cuda_applications_on_ibm_power8?lang=en

alalek · 2019-08-07T15:49:33Z

modules/core/perf/perf_cvround.cpp


 template <typename T>
-static void CvRoundMat(const cv::Mat & src, cv::Mat & dst)
+static void CvRoundMat(const cv::Mat & src, cv::Mat & dst, int (*round)(T))


Performance tests are degraded by itself due using of function pointer.

Add a basic sanity test to verify the rounding functions work as expected. Likewise, extend the rounding performance test to cover the additional float -> int fast math functions.

Add a new macro definition OPENCV_USE_FASTMATH_GCC_BUILTINS to enable usage of GCC inline math functions, if available and requested by the user. Likewise, enable it for POWER. This is nearly always a substantial improvement over using integer manipulation as most operations can be done in several instructions with no branching. The result is a 1.5-1.8x speedup in the ceil/floor operations. 1. As tested with AT 12.0-1 (GCC 8.3.1) compiler on P9 LE.

Implement cvRound using inline asm. No compiler support exists today to properly optimize this. This results in about a 4x speedup over the default rounding. Likewise, simplify the growing number of rounding function overloads. For P9 enabled targets, utilize the classification testing instruction to test for Inf/Nan values. Operation speedup is about 1.2x for FP32, and 1.5x for FP64 operands. For P8 targets, fallback to the GCC nan inline. It provides a 1.1/1.4x improvement for FP32/FP64 arguments.

alalek

Looks good to me! Thank you 👍

pmur force-pushed the fast-math-improvements branch from d5417f8 to d6f8a47 Compare July 25, 2019 19:44

pmur changed the base branch from master to 3.4 July 25, 2019 19:45

pmur force-pushed the fast-math-improvements branch 2 times, most recently from cf75cf7 to d1769c4 Compare July 29, 2019 20:30

alalek reviewed Aug 1, 2019

View reviewed changes

pmur force-pushed the fast-math-improvements branch from d1769c4 to 17584ef Compare August 1, 2019 19:43

alalek reviewed Aug 7, 2019

View reviewed changes

pmur added 3 commits August 7, 2019 14:59

fast_math: add extra perf/unit tests

b2135be

Add a basic sanity test to verify the rounding functions work as expected. Likewise, extend the rounding performance test to cover the additional float -> int fast math functions.

pmur force-pushed the fast-math-improvements branch from 17584ef to f38a61c Compare August 7, 2019 20:03

alalek approved these changes Aug 14, 2019

View reviewed changes

opencv-pushbot merged commit f38a61c into opencv:3.4 Aug 14, 2019

opencv-pushbot pushed a commit that referenced this pull request Aug 14, 2019

Merge pull request #15122 from pmur:fast-math-improvements

13ecd5b

alalek mentioned this pull request Aug 16, 2019

Merge 3.4 #15323

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fast_math.hpp performance improvements#15122

fast_math.hpp performance improvements#15122
opencv-pushbot merged 3 commits intoopencv:3.4from
pmur:fast-math-improvements

pmur commented Jul 22, 2019 •

edited by alalek

Loading

Uh oh!

alalek commented Jul 22, 2019

Uh oh!

alalek Aug 1, 2019

Uh oh!

alalek Aug 1, 2019

Uh oh!

alalek Aug 7, 2019

Uh oh!

alalek left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

pmur commented Jul 22, 2019 • edited by alalek Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alalek commented Jul 22, 2019

Uh oh!

alalek Aug 1, 2019

Choose a reason for hiding this comment

Uh oh!

alalek Aug 1, 2019

Choose a reason for hiding this comment

Uh oh!

alalek Aug 7, 2019

Choose a reason for hiding this comment

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pmur commented Jul 22, 2019 •

edited by alalek

Loading