Redesign the SIMD macro. by hanliutong · Pull Request #22463 · opencv/opencv

hanliutong · 2022-09-02T07:49:32Z

In this patch, we rename the SIMD macro for more clearer, we also introduce a new macro named CV_SIMD_OR_VECTOR instead of CV_SIMD || CV_SIMD_SCALABLE to make it more concise

old name / usage	new name
CV_SIMD	CV_SIMD
CV_SIMD_SCALABLE	CV_VECTOR_SIMD
CV_SIMD_SCALABLE_64F	CV_VECTOR_SIMD_64F
CV_SIMD \|\| CV_SIMD_SCALABLE	CV_SIMD_OR_VECTOR

And I also think the new macro CV_SIMD_OR_VECTOR is also a temporary solution (please correct me if i'm wrong). Since CV_SIMD is inherently designed for multiple register widths with size-agnostic code, all the code blocks wrapped in CV_SIMD should be compatible with the RVV backend. If we can (and also should) translate/modify all the current CV_SIMD code blocks right now, we do not need the CV_SIMD_OR_VECTOR macro anymore. The reason CV_SIMD_OR_VECTOR is needed for now is that there are 442 use cases of CV_SIMD in 53 files, modifying them would takes a long time.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

hanliutong · 2022-09-02T07:54:12Z

Hello @vpisarev , I have already modified the SIMD macro as we discussed, and I also added the license information in the header of the file. Please review it if you have time.

vpisarev · 2022-09-02T08:22:02Z

@hanliutong, do we need CV_VECTOR_SIMD_64F? Is there possibility that 32-bit floating-point arithmetics is available, but 64-bit one is not available?
CV_SIMD_64F has been introduced because in ARMv7 NEON only supported 32-bit floating-point operations.

hanliutong · 2022-09-02T08:54:55Z

@vpisarev

Is there possibility that 32-bit floating-point arithmetics is available, but 64-bit one is not available?

Yes, according to RISC-V V SPEC Chap 18. Standard Vector Extensions, 64-bit float is not available with Zvl32b, Zve32x and Zve32f.

However, I think it will mostly happen on processors for embedded devices, which is not considered in our current implementation, then we always define CV_VECTOR_SIMD_64F as 1 (for now).

Maybe we can keep it for future expansion.

alalek · 2022-09-02T11:08:23Z

CV_VECTOR_SIMD

SIMD is already vectorized data. I see no logic behind adding VECTOR word here.
"SCALABLE" seems misused too.

May be we want something like VARIABLE_SIMD / SIMD_VARIABLE_VECTOR / SIMD_VVL (variable vector length)?

vpisarev · 2022-09-13T11:12:57Z

@alalek, I would insist to use the proposed name. RVV stands for RISC-V Vector extension, ARM SVE stands for Scalable Vector Extension. In general, those instructions should be classified as vector extension. Just using CV_VECTOR would be confusing, because vector has different meanings. CV_VECTOR_SIMD is optimal, because it explicitly mentions vectors and explicitly mentions SIMD, i.e. our SIMD-based HAL.

alalek · 2022-09-13T11:38:03Z

CV_VECTOR_SIMD

It says nothing about variable or fixed size of used SIMD vectors.

hanliutong · 2022-09-16T09:08:03Z

Hello @vpisarev, @alalek and @asmorkalov , thank you for the suggestion about the designing of SIMD-related macros.

Since there are a lot of different opinions, I did some work to find the best solution, please see below, and sorry for the long message.

SIMD and Vector

I would like to give more information firstly about SIMD, Vector, RISC-V Vector, and ARM SVE.

SIMD is the term coming from Flynn's taxonomy in 1972 as a set of data-level parallelism computer architectures.

Vector is also a set of data level parallelism (DLP) computer architectures but is not included in Flynn's taxonomy because Vector is released in 1977, a few years later than Flynn's paper. And as Vadim says, vector has different meanings, people usually think of this as a name of data structure, not a name set of ISA.

Duncan's taxonomy extends Flynn's taxonomy in order to include vector processes in 1990, it believes that both Vector and SIMD belong to synchronous architectures. However, I think Duncan's taxonomy is not famous as Flynn's taxonomy with rarely used.

John Hennessy and David Patterson give a broader definition of SIMD in their book: Computer Architecture A Quantitative Approach, they define Vector architecture (like RVV), multimedia SIMD instruction set extensions (Neon, SSE, AVX), and graphics processing units (GPUs) as 3 variations of SIMD (this SIMD is from Flynn's taxonomy, I think).

ARM SVE is a next-generation SIMD extension to AArch64. Although there is a "vector" in the name, ARM SVE is also a SIMD extension, rather than a vector architecture. More precisely, ARM SVE is a SIMD extension that supports scalable vector length. (I just learned about this part, anyone more familiar with SVE may give more clearer information)

RISC-V Vector is a real vector architecture with scalable vector length.

By the way, the main different between vector architecture and multimedia SIMD instruction set extensions is how the instructions control is implemented (control register like vl and control instruction like setvl is unique in vector) and communication between execution unit and memory unit (elements are processed sequentially in the pipelined unit on the vector architecture). And whether the register is variable length is not one of the distinguishing features.

SIMD macros

The above information is given from the architectural perspective, but I think when we discuss SIMD-related macros in OpenCV, we should look at the programming model view.

In my personal opinion, maybe we don't care if the architecture is traditional SIMD or SIMD with predication or pipelined Vector, from the perspective of the Universal Intrinsic designing and programming model, we only care if the length of the vector registers is fixed or variable, in another word, if the instructions are designed for fixed-size vector length only or for variable vector length.

and then let's go back to the starting point and think about the significance of our use of SIMD macros

The meaning of the CV_SIMD macro is to indicate that the code in the following code block is data level parallelism by using Universal Intrinsic. This macro does not consider whether the vector is scalable or not, and in practice it refers to fixed length SIMD.
We add a RVV back-end to Universal Intrinsic, this is the first scalable back-end, also introduce a new API on Universal Intrinsic. And then we need a new macro for scalable back-end and the new API, which named as "CV_SIMD_SCALABLE"
The source code in OpenCV wrapped with CV_SIMD macro need to be modified for scalable back-end by using new API and should be wrapped with CV_SIMD_SCALABLE. And also, since there is a compatibility layer in the original Universal Intrinsic, the modified code can/should also wrapped with CV_SIMD, then we use CV_SIMD || CV_SIMD_SCALABLE.

Accordingly, I find that we need a total of 3 macros. one for scalable vector length, one for fixed vector length, one for both vector length.

Proposal

Seems the following proposal is the most logical:

Macro name	describe
CV_SIMD_FIXED (renamed from CV_SIMD)	For fixed vector length
CV_SIMD_SCALABLE (keeping)	For scalable vector length
CV_SIMD (reuse)	For both vector length, defined as CV_SIMD_SCALABLE \|\| CV_SIMD_FIXED

However, we need to modify the current CV_SIMD to CV_SIMD_FIXED first, which may cause confusion. And then we need to change it back from CV_SIMD_FIXED to CV_SIMD when some contributor modifies the loop with the new API. But it is reasonable because as I said in the initial comment in this PR, the macro for scalable vector length and for fixed vector length is temporary, when we modify all the SIMD loops with the new API, we only need one macro named CV_SIMD.

Then here is another proposal, which will not require massive modification of the existing CV_SIMD

Macro name	describe
CV_SIMD (keeping)	For fixed vector length
CV_SIMD_SCALABLE (keeping)	For scalable vector length
CV_SIMD_ANY (new macro)	For both vector length

Maybe CV_ANY_SIMD is more fluent.

alalek · 2022-09-16T09:48:22Z

Macro usage:

We should keep CV_SIMD - to keep working state of already existing code (in meaning of the fixed size).
New code for SVE/RVV should use CV_SIMD_SCALABLE (or alternative name).
We don't really need CV_SIMD_ANY (under any name) as any scalable code should work with "fixed size vectors" by definition (under existed CV_SIMD macro). Similar CV_SIMD || CV_SIMD_SCALABLE should not be used in optimized code.

"Scalable" code requires more limitations, so optimized "scalable" code is a subset of all optimized SIMD/vector code.

Naming:

VECTOR should be avoided as it confuses developers as "vector" word is already widely used (SIMD register types, functions names through v_ prefix, etc)

grep -Rni vector modules/core/include/opencv2/core/hal/

Details

modules/core/include/opencv2/core/hal/intrin.hpp:426:    //! @brief Maximum available vector register capacity 8-bit unsigned integer values
modules/core/include/opencv2/core/hal/intrin.hpp:428:    //! @brief Maximum available vector register capacity 8-bit signed integer values
modules/core/include/opencv2/core/hal/intrin.hpp:430:    //! @brief Maximum available vector register capacity 16-bit unsigned integer values
modules/core/include/opencv2/core/hal/intrin.hpp:432:    //! @brief Maximum available vector register capacity 16-bit signed integer values
modules/core/include/opencv2/core/hal/intrin.hpp:434:    //! @brief Maximum available vector register capacity 32-bit unsigned integer values
modules/core/include/opencv2/core/hal/intrin.hpp:436:    //! @brief Maximum available vector register capacity 32-bit signed integer values
modules/core/include/opencv2/core/hal/intrin.hpp:438:    //! @brief Maximum available vector register capacity 64-bit unsigned integer values
modules/core/include/opencv2/core/hal/intrin.hpp:440:    //! @brief Maximum available vector register capacity 64-bit signed integer values
modules/core/include/opencv2/core/hal/intrin.hpp:442:    //! @brief Maximum available vector register capacity 32-bit floating point values (single precision)
modules/core/include/opencv2/core/hal/intrin.hpp:445:    //! @brief Maximum available vector register capacity 64-bit floating point values (double precision)
modules/core/include/opencv2/core/hal/intrin.hpp:462:    //! @brief Maximum available vector register capacity 8-bit unsigned integer values
modules/core/include/opencv2/core/hal/intrin.hpp:464:    //! @brief Maximum available vector register capacity 8-bit signed integer values
modules/core/include/opencv2/core/hal/intrin.hpp:466:    //! @brief Maximum available vector register capacity 16-bit unsigned integer values
modules/core/include/opencv2/core/hal/intrin.hpp:468:    //! @brief Maximum available vector register capacity 16-bit signed integer values
modules/core/include/opencv2/core/hal/intrin.hpp:470:    //! @brief Maximum available vector register capacity 32-bit unsigned integer values
modules/core/include/opencv2/core/hal/intrin.hpp:472:    //! @brief Maximum available vector register capacity 32-bit signed integer values
modules/core/include/opencv2/core/hal/intrin.hpp:474:    //! @brief Maximum available vector register capacity 64-bit unsigned integer values
modules/core/include/opencv2/core/hal/intrin.hpp:476:    //! @brief Maximum available vector register capacity 64-bit signed integer values
modules/core/include/opencv2/core/hal/intrin.hpp:478:    //! @brief Maximum available vector register capacity 32-bit floating point values (single precision)
modules/core/include/opencv2/core/hal/intrin.hpp:481:    //! @brief Maximum available vector register capacity 64-bit floating point values (double precision)
modules/core/include/opencv2/core/hal/intrin.hpp:501:    //! @brief Maximum available vector register capacity 8-bit unsigned integer values
modules/core/include/opencv2/core/hal/intrin.hpp:503:    //! @brief Maximum available vector register capacity 8-bit signed integer values
modules/core/include/opencv2/core/hal/intrin.hpp:505:    //! @brief Maximum available vector register capacity 16-bit unsigned integer values
modules/core/include/opencv2/core/hal/intrin.hpp:507:    //! @brief Maximum available vector register capacity 16-bit signed integer values
modules/core/include/opencv2/core/hal/intrin.hpp:509:    //! @brief Maximum available vector register capacity 32-bit unsigned integer values
modules/core/include/opencv2/core/hal/intrin.hpp:511:    //! @brief Maximum available vector register capacity 32-bit signed integer values
modules/core/include/opencv2/core/hal/intrin.hpp:513:    //! @brief Maximum available vector register capacity 64-bit unsigned integer values
modules/core/include/opencv2/core/hal/intrin.hpp:515:    //! @brief Maximum available vector register capacity 64-bit signed integer values
modules/core/include/opencv2/core/hal/intrin.hpp:517:    //! @brief Maximum available vector register capacity 32-bit floating point values (single precision)
modules/core/include/opencv2/core/hal/intrin.hpp:520:    //! @brief Maximum available vector register capacity 64-bit floating point values (double precision)
modules/core/include/opencv2/core/hal/intrin.hpp:546:    //! @brief Create maximum available capacity vector with elements set to a specific value
modules/core/include/opencv2/core/hal/intrin.hpp:563:    //! @brief Create maximum available capacity vector with elements set to zero
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:82:"Universal intrinsics" is a types and functions set intended to simplify vectorization of code on
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:92:There are several types representing packed values vector registers, each type is
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:212:These operations allow to reorder or recombine elements in one or multiple vectors.
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:279:In these operations vectors represent matrix rows/columns: @ref v_dotprod, @ref v_dotprod_fast,
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:820:/** @brief Count the 1 bits in the vector lanes and return result as corresponding unsigned type
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:1322:/** @brief Element shift left among vector
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:1327:/** @brief Element shift right among vector
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:1347:/** @brief Sums all elements of each input vector, returns the vector of sums
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:1491:Same as cv::v_expand, but return lower half of the vector.
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:1510:Same as cv::v_expand_low, but expand higher half of the vector instead.
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:1547:/** @brief Interleave two vectors
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:2275:/** @brief Combine vector from first elements of two vectors
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:2297:/** @brief Combine vector from last elements of two vectors
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:2319:/** @brief Combine two vectors from lower and higher parts of two other vectors
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:2338:/** @brief Vector reverse order
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:2340:Reverse the order of the vector
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:2355:/** @brief Vector extract
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:2387:/** @brief Vector extract
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:2407:/** @brief Broadcast i-th element of vector
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:2425:Rounds each value. Input type is float vector ==> output type is int vector.
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:2450:Floor each value. Input type is float vector ==> output type is int vector.
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:2463:Ceil each value. Input type is float vector ==> output type is int vector.
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:2476:Truncate each value. Input type is float vector ==> output type is int vector.
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:2585:/** @brief Convert to double high part of vector
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:2607:/** @brief Convert to double high part of vector
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:2790://! @brief Create new vector with zero elements
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:2836://! @brief Create new vector with elements set to a specific value
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:2884://! @brief Convert vector to different type without modifying underlying data.
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:2969://! @brief Pack values from two vectors to one
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:2971://! Return vector type have twice more elements than input vector types. Variant with _u_ suffix also
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:3004://! @brief Pack values from two vectors to one with rounding shift
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:3006://! Values from the input vectors will be shifted right by _n_ bits with rounding, converted to narrower
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:3007://! type and returned in the result vector. Variant with _u_ suffix converts to unsigned type.
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:3034://! @brief Store values from the input vector into memory with pack
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:3064://! @brief Store values from the input vector into memory with pack
modules/core/include/opencv2/core/hal/intrin_cpp.hpp:3097://! @brief Pack boolean values from multiple vectors to one unsigned 8-bit integer vector
modules/core/include/opencv2/core/hal/intrin_msa.hpp:19://MSA implements 128-bit wide vector registers shared with the 64-bit wide floating-point unit registers.
modules/core/include/opencv2/core/hal/intrin_neon.hpp:1137:    // also adds FMA support both for single- and double-precision floating-point vectors
modules/core/include/opencv2/core/hal/intrin_neon.hpp:1326:#define OPENCV_HAL_IMPL_NEON_REDUCE_OP_16(_Tpvec, _Tpnvec, scalartype, func, vectorfunc, suffix) \
modules/core/include/opencv2/core/hal/intrin_neon.hpp:1329:    return v##vectorfunc##vq_##suffix(a.val); \
modules/core/include/opencv2/core/hal/intrin_neon.hpp:1332:#define OPENCV_HAL_IMPL_NEON_REDUCE_OP_16(_Tpvec, _Tpnvec, scalartype, func, vectorfunc, suffix) \
modules/core/include/opencv2/core/hal/intrin_neon.hpp:1335:    _Tpnvec##_t a0 = vp##vectorfunc##_##suffix(vget_low_##suffix(a.val), vget_high_##suffix(a.val)); \
modules/core/include/opencv2/core/hal/intrin_neon.hpp:1336:    a0 = vp##vectorfunc##_##suffix(a0, a0); \
modules/core/include/opencv2/core/hal/intrin_neon.hpp:1337:    a0 = vp##vectorfunc##_##suffix(a0, a0); \
modules/core/include/opencv2/core/hal/intrin_neon.hpp:1338:    return (scalartype)vget_lane_##suffix(vp##vectorfunc##_##suffix(a0, a0),0); \
modules/core/include/opencv2/core/hal/intrin_neon.hpp:1348:#define OPENCV_HAL_IMPL_NEON_REDUCE_OP_8(_Tpvec, _Tpnvec, scalartype, func, vectorfunc, suffix) \
modules/core/include/opencv2/core/hal/intrin_neon.hpp:1351:    return v##vectorfunc##vq_##suffix(a.val); \
modules/core/include/opencv2/core/hal/intrin_neon.hpp:1354:#define OPENCV_HAL_IMPL_NEON_REDUCE_OP_8(_Tpvec, _Tpnvec, scalartype, func, vectorfunc, suffix) \
modules/core/include/opencv2/core/hal/intrin_neon.hpp:1357:    _Tpnvec##_t a0 = vp##vectorfunc##_##suffix(vget_low_##suffix(a.val), vget_high_##suffix(a.val)); \
modules/core/include/opencv2/core/hal/intrin_neon.hpp:1358:    a0 = vp##vectorfunc##_##suffix(a0, a0); \
modules/core/include/opencv2/core/hal/intrin_neon.hpp:1359:    return (scalartype)vget_lane_##suffix(vp##vectorfunc##_##suffix(a0, a0),0); \
modules/core/include/opencv2/core/hal/intrin_neon.hpp:1369:#define OPENCV_HAL_IMPL_NEON_REDUCE_OP_4(_Tpvec, _Tpnvec, scalartype, func, vectorfunc, suffix) \
modules/core/include/opencv2/core/hal/intrin_neon.hpp:1372:    return v##vectorfunc##vq_##suffix(a.val); \
modules/core/include/opencv2/core/hal/intrin_neon.hpp:1375:#define OPENCV_HAL_IMPL_NEON_REDUCE_OP_4(_Tpvec, _Tpnvec, scalartype, func, vectorfunc, suffix) \
modules/core/include/opencv2/core/hal/intrin_neon.hpp:1378:    _Tpnvec##_t a0 = vp##vectorfunc##_##suffix(vget_low_##suffix(a.val), vget_high_##suffix(a.val)); \
modules/core/include/opencv2/core/hal/intrin_neon.hpp:1379:    return (scalartype)vget_lane_##suffix(vp##vectorfunc##_##suffix(a0, vget_high_##suffix(a.val)),0); \
modules/core/include/opencv2/core/hal/intrin_rvv_scalable.hpp:7:#include <vector>
modules/core/include/opencv2/core/hal/intrin_rvv_scalable.hpp:424:    std::vector<uint> idx_; \
modules/core/include/opencv2/core/hal/intrin_rvv_scalable.hpp:434:    std::vector<uint> idx_; \
modules/core/include/opencv2/core/hal/intrin_rvv_scalable.hpp:938:    // vector add
modules/core/include/opencv2/core/hal/intrin_sse.hpp:75:    typedef __m128i vector_type;
modules/core/include/opencv2/core/hal/intrin_sse.hpp:101:    typedef __m128i vector_type;
modules/core/include/opencv2/core/hal/intrin_sse.hpp:127:    typedef __m128i vector_type;
modules/core/include/opencv2/core/hal/intrin_sse.hpp:150:    typedef __m128i vector_type;
modules/core/include/opencv2/core/hal/intrin_sse.hpp:173:    typedef __m128i vector_type;
modules/core/include/opencv2/core/hal/intrin_sse.hpp:195:    typedef __m128i vector_type;
modules/core/include/opencv2/core/hal/intrin_sse.hpp:217:    typedef __m128 vector_type;
modules/core/include/opencv2/core/hal/intrin_sse.hpp:239:    typedef __m128i vector_type;
modules/core/include/opencv2/core/hal/intrin_sse.hpp:273:    typedef __m128i vector_type;
modules/core/include/opencv2/core/hal/intrin_sse.hpp:307:    typedef __m128d vector_type;
modules/core/include/opencv2/core/hal/intrin_sse.hpp:1516:    return _Tpvec(v_sse_reinterpret_as<typename _Tpvec::vector_type>(
modules/core/include/opencv2/core/hal/intrin_sse.hpp:1526:    return _Tpvec(v_sse_reinterpret_as<typename _Tpvec::vector_type>(
modules/core/include/opencv2/core/hal/intrin_sse.hpp:1536:    return _Tpvec(v_sse_reinterpret_as<typename _Tpvec::vector_type>(
modules/core/include/opencv2/core/hal/intrin_sse.hpp:1547:    return _Tpvec(v_sse_reinterpret_as<typename _Tpvec::vector_type>(
modules/core/include/opencv2/core/hal/intrin_vsx.hpp:424:    const __vector _Tpvn vn = vec_splats((_Tpvn)n);                                                 \
modules/core/include/opencv2/core/hal/intrin_vsx.hpp:425:    const __vector _Tpdel delta = vec_splats((_Tpdel)((_Tpdel)1 << (n-1)));                         \
modules/core/include/opencv2/core/hal/intrin_vsx.hpp:431:    const __vector _Tpvn vn = vec_splats((_Tpvn)n);                                                 \
modules/core/include/opencv2/core/hal/intrin_vsx.hpp:432:    const __vector _Tpdel delta = vec_splats((_Tpdel)((_Tpdel)1 << (n-1)));                         \
modules/core/include/opencv2/core/hal/intrin_wasm.hpp:41:    typedef v128_t vector_type;
modules/core/include/opencv2/core/hal/intrin_wasm.hpp:64:    typedef v128_t vector_type;
modules/core/include/opencv2/core/hal/intrin_wasm.hpp:87:    typedef v128_t vector_type;
modules/core/include/opencv2/core/hal/intrin_wasm.hpp:109:    typedef v128_t vector_type;
modules/core/include/opencv2/core/hal/intrin_wasm.hpp:131:    typedef v128_t vector_type;
modules/core/include/opencv2/core/hal/intrin_wasm.hpp:153:    typedef v128_t vector_type;
modules/core/include/opencv2/core/hal/intrin_wasm.hpp:175:    typedef v128_t vector_type;
modules/core/include/opencv2/core/hal/intrin_wasm.hpp:197:    typedef v128_t vector_type;
modules/core/include/opencv2/core/hal/intrin_wasm.hpp:219:    typedef v128_t vector_type;
modules/core/include/opencv2/core/hal/intrin_wasm.hpp:241:    typedef v128_t vector_type;
modules/core/include/opencv2/core/hal/msa_macros.h:16:/* Define 64 bits vector types */
modules/core/include/opencv2/core/hal/msa_macros.h:17:typedef signed char v8i8 __attribute__ ((vector_size(8), aligned(8)));
modules/core/include/opencv2/core/hal/msa_macros.h:18:typedef unsigned char v8u8 __attribute__ ((vector_size(8), aligned(8)));
modules/core/include/opencv2/core/hal/msa_macros.h:19:typedef short v4i16 __attribute__ ((vector_size(8), aligned(8)));
modules/core/include/opencv2/core/hal/msa_macros.h:20:typedef unsigned short v4u16 __attribute__ ((vector_size(8), aligned(8)));
modules/core/include/opencv2/core/hal/msa_macros.h:21:typedef int v2i32 __attribute__ ((vector_size(8), aligned(8)));
modules/core/include/opencv2/core/hal/msa_macros.h:22:typedef unsigned int v2u32 __attribute__ ((vector_size(8), aligned(8)));
modules/core/include/opencv2/core/hal/msa_macros.h:23:typedef long long v1i64 __attribute__ ((vector_size(8), aligned(8)));
modules/core/include/opencv2/core/hal/msa_macros.h:24:typedef unsigned long long v1u64 __attribute__ ((vector_size(8), aligned(8)));
modules/core/include/opencv2/core/hal/msa_macros.h:25:typedef float v2f32 __attribute__ ((vector_size(8), aligned(8)));
modules/core/include/opencv2/core/hal/msa_macros.h:26:typedef double v1f64 __attribute__ ((vector_size(8), aligned(8)));
modules/core/include/opencv2/core/hal/msa_macros.h:29:/* Load values from the given memory a 64-bit vector. */
modules/core/include/opencv2/core/hal/msa_macros.h:41:/* Load values from the given memory address to a 128-bit vector */
modules/core/include/opencv2/core/hal/msa_macros.h:53:/* Store 64bits vector elements values to the given memory address. */
modules/core/include/opencv2/core/hal/msa_macros.h:65:/* Store the values of elements in the 128 bits vector __a to the given memory address __a. */
modules/core/include/opencv2/core/hal/msa_macros.h:77:/* Store the value of the element with the index __c in vector __a to the given memory address __a. */
modules/core/include/opencv2/core/hal/msa_macros.h:99:/* Duplicate elements for 64-bit doubleword vectors */
modules/core/include/opencv2/core/hal/msa_macros.h:111:/* Duplicate elements for 128-bit quadword vectors */
modules/core/include/opencv2/core/hal/msa_macros.h:131:/* Create a 64 bits vector */
modules/core/include/opencv2/core/hal/msa_macros.h:143:/* Sign extends or zero extends each element in a 64 bits vector to twice its original length, and places the results in a 128 bits vector. */
modules/core/include/opencv2/core/hal/msa_macros.h:166:/* Copies the least significant half of each element of a 128 bits vector into the corresponding elements of a 64 bits vector. */
modules/core/include/opencv2/core/hal/msa_macros.h:262:/* Right shift elements in a 128 bits vector by an immediate value, and places the results in a 64 bits vector. */
modules/core/include/opencv2/core/hal/msa_macros.h:299:/* Right shift elements in a 128 bits vector by an immediate value, and places the results in a 64 bits vector. */
modules/core/include/opencv2/core/hal/msa_macros.h:336:/* Right shift elements in a 128 bits vector by an immediate value, saturate the results and them in a 64 bits vector. */
modules/core/include/opencv2/core/hal/msa_macros.h:379:/* Right shift elements in a 128 bits vector by an immediate value, saturate the results and them in a 64 bits vector.
modules/core/include/opencv2/core/hal/msa_macros.h:505:/* Minimum values between corresponding elements in the two vectors are written to the returned vector. */
modules/core/include/opencv2/core/hal/msa_macros.h:517:/* Maximum values between corresponding elements in the two vectors are written to the returned vector. */
modules/core/include/opencv2/core/hal/msa_macros.h:529:/* Vector type reinterpretion */
modules/core/include/opencv2/core/hal/msa_macros.h:532:/* Add the odd elements in vector __a with the even elements in vector __b to double width elements in the returned vector. */
modules/core/include/opencv2/core/hal/msa_macros.h:540:/* Copy even elements in __a to the left half and even elements in __b to the right half and return the result vector. */
modules/core/include/opencv2/core/hal/msa_macros.h:546:/* Copy even elements in __a to the left half and even elements in __b to the right half and return the result vector. */
modules/core/include/opencv2/core/hal/msa_macros.h:960:/* Vector saturating rounding shift left, qrshl -> ri = ai << bi; */

hanliutong · 2022-09-16T12:59:23Z

Hello @alalek , I agree with you that we should avoid use "vector" in the name. Not only to avoid confuses, but also because “scalable” is a more appropriate statement.

About the macro usage, I have some different opinions:

"Scalable" code requires more limitations, so optimized "scalable" code is a subset of all optimized SIMD/vector code.

I think all the code wrapped by CV_SIMD macro can be used (by necessary modify) for CV_SIMD_SCALABLE. Becasue all the code wrapped in CV_SIMD means it is a wide universal intrinsic code block, which mean it is suitable for 128, 256 and 512 bit. In this sense, it is already variable length and should be compatible with CV_SIMD_SCALABLE. And yes, as you said, there are indeed some codes that require more limitations, but I think all of them are wrapped in CV_SIMD128 and CV_SIMD256, not in CV_SIMD.

Any scalable code should work with "fixed size vectors" by definition

Yes. But the reason of using CV_SIMD_SCALABLE is we propose the new interface (API) of Universal Intrinsic (and the reason for proposing a new API is that the existing API is not compatible with variable-length architectures). In this case, CV_SIMD is not defined in RVV back-end, so we need a macro like CV_SIMD || CV_SIMD_SCALABLE for the code written in new API. Please see the sample code below:

	Origin Universal Intrinsic	New Universal Intrinsic
add two vector	va + vb;	v_add(va, vb);
get nlanes	v_float32::nlanes	Vtraits<v_float32>::vlanes()

#if CV_SIMD //origin SIMD loop
	for (int i = 0; i < N; i += v_float32::nlanes )
		vx_load([i]...)
		vc = va + vb; // not work in rvv back-end, need modify
		v_store([i]...)
#endif

#if CV_SIMD || CV_SIMD_SCALABLE //modified SIMD loop
	for (int i = 0; i < N; i += Vtraits<v_float32>::vlanes() )
		vx_load([i]...)
		vc = v_add(va, vb); //Compatible with CV_SIMD macros because of the compatibility layer
		v_store([i]...)
#endif

And there is a full example in #22520

For more detiles about rvv backed and the compatibility layer, please refer to intrin_rvv_scalable.hpp#L18 and intrin.hpp#L714

alalek · 2022-09-16T13:44:34Z

#if CV_SIMD || CV_SIMD_SCALABLE

If CV_SIMD=1 then CV_SIMD_SCALABLE=1 (by default, until we want to disable it for testing or debug purposes), so complex condition is not needed. Just use:

#if CV_SIMD_SCALABLE

hanliutong · 2022-09-16T15:31:57Z

If CV_SIMD=1 then CV_SIMD_SCALABLE=1

does it means do some thing like this?

#if CV_SIMD
    #define CV_SIMD_SCALABLE 1
#endif

This does avoid complex macros, but I propose not to do this becasue of two reasons

It may cause confusion. CV_SIMD_SCALABLE will enable in non-variable-length (aka fixed length ) backends. I personally think it doesn't make sense. And code block like following looks like it's specifically design for variable length, but in fact, it is for both length.

#if CV_SIMD_SCALABLE 
   for (...)
#endif

CV_SIMD_SCALABLE is not only used to control the SIMD loop, but also used to control the compatibility layer in Universal Intrinsic. Enable it when CV_SIMD=1 will cause redefinition of some universal intrinsic functions, such as vx_func in wide universal intrinsic.

alalek · 2022-09-16T15:41:20Z

Logic behind this is simple:

If you wrote "optimized code" with support of variable vectors/SIMD backend
then this "optimized code" must work with fixed-size SIMD backend.

(the second case is tested regularly on CI in different ways)

hanliutong · 2022-09-16T16:09:21Z

If you wrote "optimized code" with support of variable vectors/SIMD backend
then this "optimized code" must work with fixed-size SIMD backend.

Yes, I totally agree with that! In fact, that is why I introduced CV_SIMD || CV_SIMD_SCALABLE.

I would like to clarify the state of these two macros on different backend:

Backend	CV_SIMD	CV_SIMD_SCALABLE
RVV (or other scalable target)	0	1
SSE/AVX/AVX512, ARM Neon, and others	1	0
cpp (without any SIMD extension)	0	0

The optimized code (written by new API) has aready support both variable backend and fixed-size backend, and then with CV_SIMD || CV_SIMD_SCALABLE the optimized code is enabled for both backend.

I think we're very close to a consensus, the only difference is that I think a single macro that defined as CV_SIMD || CV_SIMD_SCALABLE to make it clearer.

asmorkalov · 2022-10-11T09:09:04Z

Hello @alalek and @hanliutong @vpisarev . I want to refresh the discussion and finally drive to some conclusion. I vote for CV_SIMD_ANY option proposed by Liutong and discussed on the call:

Macro name	describe
CV_SIMD (keeping)	For fixed vector length
CV_SIMD_SCALABLE (keeping)	For scalable vector length
CV_SIMD_ANY (new macro)	For both vector length

Or even do not introduce hew shorter macro al all. Pros for the solution:

No concerns about existing naming and practice in OpenCV. No massive code changes.
We just add or remove extra condition on place, if the loop is tested in scalable mode or found a bug.

alalek · 2022-10-11T11:27:04Z

Just a note, that #if CV_SIMD_ANY code guard would require to write VVL/scalable SIMD code inside. This may confuse.

hanliutong · 2022-10-24T02:37:28Z

Hello @asmorkalov @alalek @vpisarev , I have updated the code according to the previous discussion. Thanks you all for your suggestions!

I would like to clarify the original idea of introducing a shorter macro: the existing wide universal intrinsic code (guarded with CV_SIMD, but not CV_SIMD_<specific width>) can be used for the new RVV backend with a few modified(including use CV_SIMD | CV_SIMD_SCALABLE). the new SIMD code should also write for all backend, then it will be guarded by CV_SIMD | CV_SIMD_SCALABLE macro. In another word, both of CV_SIMD and CV_SIMD_SCALABLE may not be used individually anymore, so we think introducing a shorter macro is make sense.

asmorkalov · 2022-10-26T11:19:29Z

Hello @hanliutong I apologize for large delay. We discussed the naming on core team meeting couple of times and finally have driven to consensus. The team decided not to introduce new macro, because of very low adoption rate for today and no consensus on naming. So,

please remove new macro from the code. All other changes seams to be useful.
please update HSV pipeline to use CV_SIMD || CV_SIMD_SCALABLE and I'll merge it after Vadim's approve.

hanliutong · 2022-10-26T13:46:11Z

Hello @asmorkalov, I have already update this PR and #22520 (HSV) according to your comment.

asmorkalov requested a review from vpisarev September 5, 2022 06:42

asmorkalov added the category: core label Sep 5, 2022

vpisarev approved these changes Sep 13, 2022

View reviewed changes

asmorkalov added the pr: Discussion Required label Sep 14, 2022

hanliutong requested a review from vpisarev October 24, 2022 02:37

hanliutong force-pushed the rvv branch from d7468e8 to 494a8d6 Compare October 24, 2022 11:32

Update SIMD macro for RVV backend.

5462a6b

hanliutong force-pushed the rvv branch from 494a8d6 to 5462a6b Compare October 26, 2022 13:02

asmorkalov added platform: riscv and removed pr: Discussion Required labels Oct 26, 2022

vpisarev approved these changes Oct 26, 2022

View reviewed changes

asmorkalov assigned vpisarev Oct 27, 2022

asmorkalov merged commit 778fadd into opencv:4.x Oct 27, 2022

alalek mentioned this pull request Jan 8, 2023

(5.x) Merge 4.x #23113

Merged

Uh oh!

Conversation

hanliutong commented Sep 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

hanliutong commented Sep 2, 2022

Uh oh!

vpisarev commented Sep 2, 2022

Uh oh!

hanliutong commented Sep 2, 2022

Uh oh!

alalek commented Sep 2, 2022

Uh oh!

vpisarev commented Sep 13, 2022

Uh oh!

alalek commented Sep 13, 2022

Uh oh!

hanliutong commented Sep 16, 2022

SIMD and Vector

SIMD macros

Proposal

Uh oh!

alalek commented Sep 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hanliutong commented Sep 16, 2022

Uh oh!

alalek commented Sep 16, 2022

Uh oh!

hanliutong commented Sep 16, 2022

Uh oh!

alalek commented Sep 16, 2022

Uh oh!

hanliutong commented Sep 16, 2022

Uh oh!

asmorkalov commented Oct 11, 2022

Uh oh!

alalek commented Oct 11, 2022

Uh oh!

hanliutong commented Oct 24, 2022

Uh oh!

asmorkalov commented Oct 26, 2022

Uh oh!

hanliutong commented Oct 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hanliutong commented Sep 2, 2022 •

edited

Loading

alalek commented Sep 16, 2022 •

edited

Loading