Skip to content

core(IPP): disable SSE4.2 code path in countNonZero()#18986

Merged
opencv-pushbot merged 1 commit intoopencv:3.4from
alalek:fix_ipp_17453_2
Dec 2, 2020
Merged

core(IPP): disable SSE4.2 code path in countNonZero()#18986
opencv-pushbot merged 1 commit intoopencv:3.4from
alalek:fix_ipp_17453_2

Conversation

@alalek
Copy link
Copy Markdown
Member

@alalek alalek commented Dec 1, 2020

resolves #17453
relates #17455

Reproduced even on Linux with OPENCV_IPP=sse42
(MacOSX and Win32 platforms doesn't have AVX2 IPP optimizations)

/cc @eplankin

@diablodale
Copy link
Copy Markdown
Contributor

Fixes the previously failing test cases on my Win10, debug build, 64-bit target, no AVX2 cpu.

[==========] Running 6 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 6 tests from Core/CountNonZeroBig
[ RUN      ] Core/CountNonZeroBig./0, where GetParam() = (0, 1x524190)
[       OK ] Core/CountNonZeroBig./0 (4 ms)
[ RUN      ] Core/CountNonZeroBig./1, where GetParam() = (0, 524190x1) 
[       OK ] Core/CountNonZeroBig./1 (5 ms)
[ RUN      ] Core/CountNonZeroBig./2, where GetParam() = (0, 3840x2160)
[       OK ] Core/CountNonZeroBig./2 (10 ms)
[ RUN      ] Core/CountNonZeroBig./3, where GetParam() = (5, 1x524190)
[       OK ] Core/CountNonZeroBig./3 (8 ms)
[ RUN      ] Core/CountNonZeroBig./4, where GetParam() = (5, 524190x1)
[       OK ] Core/CountNonZeroBig./4 (27 ms)
[ RUN      ] Core/CountNonZeroBig./5, where GetParam() = (5, 3840x2160)
[       OK ] Core/CountNonZeroBig./5 (53 ms)
[----------] 6 tests from Core/CountNonZeroBig (128 ms total)

For clarity, I want to raise a caution on what you wrote above "(MacOSX and Win32 platforms doesn't have AVX2 IPP optimizations)". It is my understanding that some OpenCV builds on Win32 platforms do support AVX2 IPP optimizations. However, 32-bit compile targets do not. When writing, I believe it is important to distinguish between the "Win32 API" and a Windows 32-bit target so that readers don't confuse the two. :-)

@alalek
Copy link
Copy Markdown
Member Author

alalek commented Dec 1, 2020

Right, Win32 above means "Windows 32-bit" (not Win32 API) in terms of OpenCV CI builds (64-bit is "Win64").

Also this note is correct for current IPPICV package (subset of Intel IPP functions and optimizations) only.
Standalone (external) IPP package or future updates of IPPICV may support AVX2 too (at least on Mac).

@opencv-pushbot opencv-pushbot merged commit e958600 into opencv:3.4 Dec 2, 2020
@diablodale
Copy link
Copy Markdown
Contributor

Is the cherry-pick into master/4.5 branch automated? I don't see this change yet there.
Just checking as an additional set of eyes 🤪

@alalek
Copy link
Copy Markdown
Member Author

alalek commented Dec 3, 2020

It is not automated, but it is regular (weekly / bi-weekly): https://github.com/opencv/opencv/wiki/Branches

You may want to pick this related patch too: #18991

@alalek alalek mentioned this pull request Dec 4, 2020
@alalek alalek mentioned this pull request Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

countNonZero returns wrong counts for unit8_t arrays with large dimensions (IPP)

4 participants