Skip to content

Merge 3.4#15678

Merged
alalek merged 19 commits intoopencv:masterfrom
alalek:merge-3.4
Oct 9, 2019
Merged

Merge 3.4#15678
alalek merged 19 commits intoopencv:masterfrom
alalek:merge-3.4

Conversation

@alalek
Copy link
Copy Markdown
Member

@alalek alalek commented Oct 9, 2019

#15510 from seiko2plus:issue15506
#15544 from mshabunin:disable_posix_memalign
#15642 from alalek:issue_15597
#15653 from tolysz:patch-1
#15654 from sturkmen72:patch-3
#15658 from tolysz:patch-1
#15661 from alalek:fix_android_build_avx2
#15664 from alalek:build_eliminate_cuda_warnings (moved to opencv_contrib)
#15666 from seanm:Wnewline

Previous "Merge 3.4": #15651

buildworker:Win64 OpenCL=windows-2
buildworker:Custom=linux-1,linux-2,linux-4
build_image:Docs=docs-js
build_image:Custom=javascript
#build_image:Custom=powerpc64le
#build_image:Custom=ubuntu-openvino-2019r3.0:16.04
#buildworker:Custom=linux-2
#build_image:Custom=ubuntu-vulkan:16.04
#buildworker:Custom=linux-4
#build_image:Custom=fedora:28
#build_image:Custom=ubuntu-cuda:16.04
#build_image:Custom=ubuntu-clang:18.04
build_image:Custom Mac=openvino-2019r3.0
build_image:Custom Win=openvino-2019r3.0
test_opencl:Custom Win=OFF
#build_image:Custom Win=msvs2017
#build_image:Custom Win=msvs2019
test_modules:Custom Mac=dnn,java,python3

alalek and others added 19 commits October 4, 2019 19:56
```
#define NPP_VER_MAJOR 10
#define NPP_VER_MINOR 2
#define NPP_VER_PATCH 0
#define NPP_VER_BUILD 243

#define NPP_VERSION (NPP_VER_MAJOR * 1000 +     \
                     NPP_VER_MINOR *  100 +     \
                     NPP_VER_PATCH)
* core: rework and optimize SIMD implementation of dotProd

  - add new universal intrinsics v_dotprod[int32], v_dotprod_expand[u&int8, u&int16, int32], v_cvt_f64(int64)
  - add a boolean param for all v_dotprod&_expand intrinsics that change the behavior of addition order between
    pairs in some platforms in order to reach the maximum optimization when the sum among all lanes is what only matters
  - fix clang build on ppc64le
  - support wide universal intrinsics for dotProd_32s
  - remove raw SIMD and activate universal intrinsics for dotProd_8
  - implement SIMD optimization for dotProd_s16&u16
  - extend performance test data types of dotprod
  - fix GCC VSX workaround of vec_mule and vec_mulo (in little-endian it must be swapped)
  - optimize v_mul_expand(int32) on VSX

* core: remove boolean param from v_dotprod&_expand and implement v_dotprod_fast&v_dotprod_expand_fast

  this changes made depend on "terfendail" review
- _mm256_bslli_epi128() works in GCC 4.9.3+ only
- Android NDK r10 doesn't support this instruction
* Cuda + OpenGL on ARM

There might be multiple ways of getting OpenCV compile on Tegra (NVIDIA Jetson) platform, but mainly they modify CUDA(8,9,10...) source code, this one fixes it for all installations. 
( https://devtalk.nvidia.com/default/topic/1007290/jetson-tx2/building-opencv-with-opengl-support-/post/5141945/#5141945 et al.).
This way is exactly the same as the one proposed but the code change happens in OpenCV.

* Updated,
The link provided mentions: cuda8 + 9, I have cuda 10 + 10.1 (and can confirm it is still defined this way).
NVIDIA is probably using some other "secret" backend with Jetson.
* Disable posix_memalign by default

* core: fix memalign parameter handling
@alalek
Copy link
Copy Markdown
Member Author

alalek commented Oct 9, 2019

👍

@alalek alalek merged commit 6557378 into opencv:master Oct 9, 2019
@alalek alalek mentioned this pull request Oct 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants