Extended several core functions to support new types#24962
Extended several core functions to support new types#24962vpisarev merged 10 commits intoopencv:5.xfrom
Conversation
…ithmetic functions
…d sum() to support new types (F16, BF16, U32, U64, S64)
* extended findnonzero, hasnonzero with the new types support
…TestGPU.MathOpTest` was disabled - not clear whether to set tolerance - it's not bit-exact operation, as possibly assumed by the test, due to the use of scale and possibly limited accuracy of the intermediate floating-point calculations.
…n Mul, Div and AddWeighted (at least when using OpenCL on Windows x64 or MacOS x64). Disabled the respective tests.
|
Merges to target branch of "Merge 4.x" (#24981) is prohibited. If you don't want to redo conflicts resolving of multi PRs yourself. |
opencv-alalek
left a comment
There was a problem hiding this comment.
There are massive changes of SIMD and other optimizations.
And again, there is no any performance report attached to this PR. What is the problem?
| SIMD_ONLY(for (; x < width; x += simd_width) \ | ||
| { \ | ||
| if (x + simd_width > width) { \ | ||
| if (((x == 0) | (dst == src1) | (dst == src2)) != 0) \ |
There was a problem hiding this comment.
|
Any evidence that this works better than ||? E.g godbolt link.
| SIMD_ONLY(for (; x < width; x += simd_width) \ | ||
| { \ | ||
| if (x + simd_width > width) { \ | ||
| if (((x == 0) | (dst == src1) | (dst == src2)) != 0) \ |
There was a problem hiding this comment.
(dst == src1) | (dst == src2)
these invariant checks should be out of the loop.
| ocv_add_dispatched_file(matmul SSE2 SSE4_1 AVX2 AVX512_SKX NEON_DOTPROD LASX) | ||
| ocv_add_dispatched_file(mean SSE2 AVX2 LASX) | ||
| ocv_add_dispatched_file(merge SSE2 AVX2 LASX) | ||
| ocv_add_dispatched_file(minmax SSE2 SSE4_1 AVX2 VSX3 LASX) |
There was a problem hiding this comment.
Such kind of optimizations should be done on 4.x branch first according to existed policy: https://github.com/opencv/opencv/wiki/Branches
| } | ||
| } | ||
|
|
||
| #ifdef HAVE_OPENCL |
There was a problem hiding this comment.
- Git history of these changes has been lost in this PR (missing explicit git rename/copy).
- This guarantee 100% merge conflicts in the future against 4.x branch (yep, you don't care about "merge 4.x" requests, even no reviewing activity).
- Just take a look on the history of other .dispatch.cpp files (...)
| UVT v_idx_delta = vx_setall_##usuffix((UT)vlanes); \ | ||
| UVT v_invalid_idx = vx_setall_##usuffix((UT)-1); \ | ||
| VT v_minval = vx_setall_##suffix(minVal); \ | ||
| VT v_maxval = vx_setall_##suffix(maxVal); \ |
There was a problem hiding this comment.
Good luck with code debugging in multi-line macros (100 lines).
| //DEFINE_MINMAXIDX_FUNC_NOSIMD(minMaxIdx16bf, bfloat16_t, float) | ||
| DEFINE_MINMAXIDX_FUNC_NOSIMD(minMaxIdx64u, uint64, uint64) | ||
| DEFINE_MINMAXIDX_FUNC_NOSIMD(minMaxIdx64s, int64, int64) | ||
| DEFINE_MINMAXIDX_FUNC_NOSIMD(minMaxIdx32u, unsigned, int64) |
| Values(false))); | ||
|
|
||
| INSTANTIATE_TEST_CASE_P(MulTestGPU, MathOpTest, | ||
| INSTANTIATE_TEST_CASE_P(DISABLED_MulTestGPU, MathOpTest, |
There was a problem hiding this comment.
So, which part has been changed? OpenCV or G-API's OpenCL? Why?
Extended the following functions to support
CV_16F, CV_16BF, CV_32U, CV_64U and CV_64S:The corresponding tests (mainly in test_arithm.cpp) have been extended to test the new functionality.
Some further improvements to those basic functions are expected in this or subsequent PRs, such as:
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.