Skip to content

Merge 3.4#15323

Merged
alalek merged 13 commits intoopencv:masterfrom
alalek:merge-3.4
Aug 16, 2019
Merged

Merge 3.4#15323
alalek merged 13 commits intoopencv:masterfrom
alalek:merge-3.4

Conversation

@alalek
Copy link
Copy Markdown
Member

@alalek alalek commented Aug 16, 2019

#15101 from alalek:cmake_initialization
#15122 from pmur:fast-math-improvements
#15301 from alalek:backport
#15309 from alalek:backport_15305
#15322 from alalek:backport_15318

Previous "Merge 3.4": #15295

buildworker:Win64 OpenCL=windows-2
buildworker:Custom=linux-1,linux-2,linux-4
build_image:Docs=docs-js
build_image:Custom=javascript
#build_image:Custom=powerpc64le
#build_image:Custom=ubuntu-openvino:16.04
#buildworker:Custom=linux-2
#build_image:Custom=ubuntu-vulkan:16.04
#buildworker:Custom=linux-4
#build_image:Custom=fedora:28
#build_image:Custom=ubuntu-cuda:16.04
#build_image:Custom=ubuntu-clang:18.04
build_image:Custom Mac=openvino-2019r1
build_image:Custom Win=openvino-2019r1
#build_image:Custom Win=msvs2017
#build_image:Custom Win=msvs2019
test_modules:Custom Mac=dnn,java,python3

pmur and others added 13 commits August 7, 2019 14:59
Add a basic sanity test to verify the rounding functions
work as expected.

Likewise, extend the rounding performance test to cover the
additional float -> int fast math functions.
Add a new macro definition OPENCV_USE_FASTMATH_GCC_BUILTINS to enable
usage of GCC inline math functions, if available and requested by the
user.

Likewise, enable it for POWER. This is nearly always a substantial
improvement over using integer manipulation as most operations can
be done in several instructions with no branching. The result is a
1.5-1.8x speedup in the ceil/floor operations.

1. As tested with AT 12.0-1 (GCC 8.3.1) compiler on P9 LE.
Implement cvRound using inline asm. No compiler support
exists today to properly optimize this. This results in
about a 4x speedup over the default rounding. Likewise,
simplify the growing number of rounding function overloads.

For P9 enabled targets, utilize the classification
testing instruction to test for Inf/Nan values. Operation
speedup is about 1.2x for FP32, and 1.5x for FP64 operands.

For P8 targets, fallback to the GCC nan inline. It provides
a 1.1/1.4x improvement for FP32/FP64 arguments.
Found via `codespell -q 3 -S ./3rdparty,./modules -L amin,ang,atleast,dof,endwhile,hist,uint`

backporting of commit: 32aba5e
Found using `codespell -q 3 -S ./3rdparty -L activ,amin,ang,atleast,childs,dof,endwhile,halfs,hist,iff,nd,od,uint`

backporting of commit: ec43292
@alalek
Copy link
Copy Markdown
Member Author

alalek commented Aug 16, 2019

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants