Extended several core functions to support new types by vpisarev · Pull Request #24962 · opencv/opencv

vpisarev · 2024-02-05T01:54:10Z

Extended the following functions to support CV_16F, CV_16BF, CV_32U, CV_64U and CV_64S:

add(), subtract(), multiply(), divide(), recip(), absdiff(), addWeighted(), scaleAdd(), min(), max(), compare(), inRange(), mixChannels().
countNonZero(), findNonZero(), hasNonZero(), sum(), mean(), meanStdDev(), norm(), minMaxIdx(), minMaxLoc(),

The corresponding tests (mainly in test_arithm.cpp) have been extended to test the new functionality.

Some further improvements to those basic functions are expected in this or subsequent PRs, such as:

broadcasting support in binary operations
acceleration of operations on big arrays using parallel loops
faster fp16 processing on ARM v8.2 or later with vector FP16 arithmetics. Now FP16 numbers are usually processed by converting them to FP32 on-fly.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

…ithmetic functions

…d sum() to support new types (F16, BF16, U32, U64, S64)

* extended findnonzero, hasnonzero with the new types support

…TestGPU.MathOpTest` was disabled - not clear whether to set tolerance - it's not bit-exact operation, as possibly assumed by the test, due to the use of scale and possibly limited accuracy of the intermediate floating-point calculations.

…n Mul, Div and AddWeighted (at least when using OpenCL on Windows x64 or MacOS x64). Disabled the respective tests.

opencv-alalek · 2024-02-11T07:59:07Z

Merges to target branch of "Merge 4.x" (#24981) is prohibited. If you don't want to redo conflicts resolving of multi PRs yourself.

opencv-alalek

There are massive changes of SIMD and other optimizations.
And again, there is no any performance report attached to this PR. What is the problem?

opencv-alalek · 2024-02-05T07:53:01Z

modules/core/src/arithm.simd.hpp

+        SIMD_ONLY(for (; x < width; x += simd_width) \
+        { \
+            if (x + simd_width > width) { \
+                if (((x == 0) | (dst == src1) | (dst == src2)) != 0) \


|

Any evidence that this works better than ||? E.g godbolt link.

opencv-alalek · 2024-02-05T07:54:13Z

modules/core/src/arithm.simd.hpp

+        SIMD_ONLY(for (; x < width; x += simd_width) \
+        { \
+            if (x + simd_width > width) { \
+                if (((x == 0) | (dst == src1) | (dst == src2)) != 0) \


(dst == src1) | (dst == src2)

these invariant checks should be out of the loop.

opencv-alalek · 2024-02-11T08:10:01Z

modules/core/CMakeLists.txt

 ocv_add_dispatched_file(matmul SSE2 SSE4_1 AVX2 AVX512_SKX NEON_DOTPROD LASX)
 ocv_add_dispatched_file(mean SSE2 AVX2 LASX)
 ocv_add_dispatched_file(merge SSE2 AVX2 LASX)
+ocv_add_dispatched_file(minmax SSE2 SSE4_1 AVX2 VSX3 LASX)


Such kind of optimizations should be done on 4.x branch first according to existed policy: https://github.com/opencv/opencv/wiki/Branches

opencv-alalek · 2024-02-11T08:12:33Z

modules/core/src/minmax.dispatch.cpp

+    }
+}
+
+#ifdef HAVE_OPENCL


Git history of these changes has been lost in this PR (missing explicit git rename/copy).

This guarantee 100% merge conflicts in the future against 4.x branch (yep, you don't care about "merge 4.x" requests, even no reviewing activity).

Just take a look on the history of other .dispatch.cpp files (...)

opencv-alalek · 2024-02-11T08:27:29Z

modules/core/src/minmax.simd.hpp

+    UVT v_idx_delta = vx_setall_##usuffix((UT)vlanes); \
+    UVT v_invalid_idx = vx_setall_##usuffix((UT)-1); \
+    VT v_minval = vx_setall_##suffix(minVal); \
+    VT v_maxval = vx_setall_##suffix(maxVal); \


Good luck with code debugging in multi-line macros (100 lines).

opencv-alalek · 2024-02-11T08:34:59Z

modules/core/src/minmax.simd.hpp

+//DEFINE_MINMAXIDX_FUNC_NOSIMD(minMaxIdx16bf, bfloat16_t, float)
+DEFINE_MINMAXIDX_FUNC_NOSIMD(minMaxIdx64u, uint64, uint64)
+DEFINE_MINMAXIDX_FUNC_NOSIMD(minMaxIdx64s, int64, int64)
+DEFINE_MINMAXIDX_FUNC_NOSIMD(minMaxIdx32u, unsigned, int64)


opencv-alalek · 2024-02-11T08:45:21Z

modules/gapi/test/gpu/gapi_core_tests_gpu.cpp

                                Values(false)));

-INSTANTIATE_TEST_CASE_P(MulTestGPU, MathOpTest,
+INSTANTIATE_TEST_CASE_P(DISABLED_MulTestGPU, MathOpTest,


So, which part has been changed? OpenCV or G-API's OpenCL? Why?

vpisarev added 10 commits January 28, 2024 08:26

started adding support for new types (16f, 16bf, 32u, 64u, 64s) to ar…

5ae1d56

…ithmetic functions

fixed several tests; refactored and extended sum(), extended inRange().

23a7532

extended countNonZero(), mean(), meanStdDev(), minMaxIdx(), norm() an…

3f8580e

…d sum() to support new types (F16, BF16, U32, U64, S64)

* put missing CV_DEPTH_MAX to some function dispatcher tables

769dbe0

* extended findnonzero, hasnonzero with the new types support

extended mixChannels() to support new types

13b5ed8

minor fix

e057c2f

fixed a few compile errors on Linux and a few failures in core tests

d6ad130

fixed a few more warnings and test failures

2a607af

found that in the current snapshot G-API produces incorrect results i…

fa054fb

…n Mul, Div and AddWeighted (at least when using OpenCL on Windows x64 or MacOS x64). Disabled the respective tests.

vpisarev merged commit 1d18aba into opencv:5.x Feb 11, 2024

vpisarev mentioned this pull request Feb 11, 2024

Basic core operations do not support new types in 5.x #24905

Closed

4 tasks

opencv-alalek reviewed Feb 11, 2024

View reviewed changes

Kumataro mentioned this pull request Feb 12, 2024

Draft: core: Extend norm() function for u64/s64/u32 #24888

Closed

6 tasks

dkurt added this to the 5.0 milestone Apr 8, 2024

mshabunin mentioned this pull request Jun 25, 2024

Merge 4.x -> 5.x #25745

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extended several core functions to support new types#24962

Extended several core functions to support new types#24962
vpisarev merged 10 commits intoopencv:5.xfrom
vpisarev:arithm_new_types

vpisarev commented Feb 5, 2024 •

edited

Loading

Uh oh!

opencv-alalek commented Feb 11, 2024

Uh oh!

opencv-alalek left a comment

Uh oh!

opencv-alalek Feb 5, 2024

Uh oh!

opencv-alalek Feb 5, 2024

Uh oh!

opencv-alalek Feb 11, 2024

Uh oh!

opencv-alalek Feb 11, 2024

Uh oh!

opencv-alalek Feb 11, 2024

Uh oh!

opencv-alalek Feb 11, 2024

Uh oh!

opencv-alalek Feb 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

vpisarev commented Feb 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

opencv-alalek commented Feb 11, 2024

Uh oh!

opencv-alalek left a comment

Choose a reason for hiding this comment

Uh oh!

opencv-alalek Feb 5, 2024

Choose a reason for hiding this comment

Uh oh!

opencv-alalek Feb 5, 2024

Choose a reason for hiding this comment

Uh oh!

opencv-alalek Feb 11, 2024

Choose a reason for hiding this comment

Uh oh!

opencv-alalek Feb 11, 2024

Choose a reason for hiding this comment

Uh oh!

opencv-alalek Feb 11, 2024

Choose a reason for hiding this comment

Uh oh!

opencv-alalek Feb 11, 2024

Choose a reason for hiding this comment

Uh oh!

opencv-alalek Feb 11, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vpisarev commented Feb 5, 2024 •

edited

Loading