Skip to content

New shiny Core module for OpenCV 5.0 #25011

@vpisarev

Description

@vpisarev

Introduction

Core module is the crucial module in OpenCV. All other modules depend on it.
It must form a solid foundation for the future-proof OpenCV.

What are the desirable key properties of the module:

  • enough of essential functionality for other modules so that they don't have to reimplement it.
  • basic data structures and infrastructure to be used jointly by other modules, to connect them together and provide smooth data flow. By infrastructure we mean memory management, parallel computing framework, HAL, error handling, logging & tracing, basic I/O etc.
  • highly efficient with small overhead.
  • multi-level API to enable both high-level pipeline-style use and more advanced use where efficient parallel kernels in other modules can reuse low-level primitives from the core module.
  • compact size and small footprint (including low compile-time overhead)

A bit of history and why core module should be somewhat like Python's numpy

OpenCV's Core module in its current form has been created in ~2009, where cv::Mat, a multi-dimensional dense array, has been introduced as a complete replacement for CvMat, CvMatND and IplImage. The whole OpenCV API has been reconstructed (before 2009 it was a C API) around this cv::Mat and a few other basic structures like std::vector<> (to handle point clouds etc.). The idea behind combinding image, matrix and multi-dimensional array (tensor) has been borrowed from Matlab, where toolboxes, including image processing toolbox, basic linear algebra toolbox, Jean-Yves Bouguet camera calibration toolbox etc. all happily use Matlab matices and so it's super-easy to create pipelines that use algorithms from different areas.

It seems that Python's famous numerical extension numpy borrowed the same idea and also implemented ubiquitous matrix/array type called ndarray there. On top of numpy some bigger packages have been developed like scipy, scikit-learn etc. An efficient and yet quite comprehensive set of basic operations extended by the derived packages mostly eliminated the problem of very low speed of manually-written Python code (because all the kernels in numpy are implemented in efficient C or Fortran). That suddenly made Python a sound substitution for Fortran & Matlab in the new century.

With the rise of Deep Learning the idea has been greatly extended. Efficient, GPU-accelerated, comprehensive set of operations (very similar to numpy) that can be put together into graphs, together with automatic differentiation tools, formed the foundation of the modern Deep Learning technology. If one looks at PyTorch, Tensorflow, JAX, ONNX specification etc., he/she will find many similarities with numpy. In particular, many ONNX operations follow numpy quite closely and use numpy for illustration of those operations. Of course, there are some deep learning-specific operations like Convolution or SoftMax or Dropout or Attention, but most of ONNX operations have numpy counterparts.

Python community (since all aforementioned frameworks, except for OpenCV, mainly use Python) noticed this close resemblance of many array processing frameworks and decided to introduce so called Python array API standard. It's clear that this is emerging standard, as its API lacks some important numpy functionality, some important PyTorch/ONNX operations, it lacks the notion of an external accelerator (like GPU or NPU) where user may want to transfer array/tensor to, perform a set of operations there and transfer the results back. This is crucial functionality for deep learning frameworks, for OpenCV, its deep learning module and its GPU-accelerated image processing functionality etc. So the standard will definitely evolve, but it makes sense for us in OpenCV 5 to comply with it more or less even now. Besides implementation of already specified API, for us it's opportunity to offer extra kernels to the community that are important for computer vision and image processing use cases.

The list of functions to implement/improve in Core module in OpenCV 5.0

Basically, OpenCV's core module should implement a big subset of "Python array API standard" with certain extensions that we consider useful.

The list of functions below has been directly copied from https://data-apis.org/array-api/latest/API_specification/index.html. Probably, the following content should be presented in a table.

  1. Unary/binary arithmetic, math and logic operations. We have implementation of many of those operations already (sometimes under slightly different names), but we need to support broadcasting for binary operations.

    abs    // cv::absdiff with 0 as a second parameter
    acos
    acosh  // via cv::log()
    add    // cv::add
    asin
    asinh  // via cv::log()
    atan
    atan2  // cv::polarToCart
    atanh  // via cv::log()
    bitwise_and // cv::bitwise_and
    bitwise_left_shift
    bitwise_invert // cv::bitwise_not
    bitwise_or // cv::bitwise_or
    bitwise_right_shift
    bitwise_xor // cv::bitwise_xor
    ceil
    conj
    cos    // cv::cartToPolar
    cosh   // via cv::exp()
    divide // cv::divide
    equal  // cv::compare(..., CMP_EQ)
    exp    // cv::exp
    expm1
    floor
    floor_divide
    greater // cv::compare(..., CMP_GT)
    greater_equal // cv::compare(..., CMP_GE)
    imag
    isfinite
    isinf
    isnan
    less  // cv::compare(..., CMP_LT)
    less_equal // cv::compare(..., CMP_LE)
    log   // cv::log
    log1p
    log2
    log10
    logaddexp
    logical_and
    logical_not
    logical_or
    logical_xor
    multiply  // cv::multiply
    negative
    not_equal  // compare(...,CMP_NE)
    positive   // copyTo()
    pow        // cv::pow
    real
    remainder
    round     // convertTo()
    sign
    sin      // only cartToPolar
    sinh      // via cv::exp()
    square    // via cv::multiply()
    sqrt      // cv::sqrt()
    subtract  // cv::subtract()
    tan
    tanh    // no direct function. Can be computed via cv::exp()
    trunc
    
  2. Linear algebra functions. Same situation.

    matmul // cv::gemm()
    matrix_transpose // cv::transpose()
    tensordot
    vecdot
    
  3. Array permutation functions. Same situation:

    broadcast_arrays
    broadcast_to   // +
    concat        // in OpenCV we have 2D hconcat and vconcat
    expand_dims
    flip          // 2D only for now
    permute_dims  // in cv::dnn we have general Transpose. in core we have 2D transpose
    reshape
    roll
    squeeze
    stack
    
  4. Statistical functions. Same situation:

    max   // minMaxIdx() computes min, max and their indices.
    mean  // cv::mean
    min   // via minMaxIdx()
    prod
    std   // cv::meanStdDev() computes both mean and standard deviation
    sum   // cv::sum
    var   // via cv::meanStdDev()
    
  5. Misc functions from several other groups. Mostly implemented as well in one form or another:

    // searching functions
    argmax
    argmin
    nonzero // ~ cv::countNonZero()
    where // element-wise ternary operator ?:
    
    // set functions
    unique_all
    unique_counts
    unique_inverse
    unique_values
    
    // sorting
    argsort  // called cv::sortIdx() in OpenCV
    sort     // cv::sort()
    
    // utility functions
    all    // as cv::countNonZero(m) == m.total()
    any   // cv::hasNonZero
    
    // initialization functions
    arange
    asarray // many non-mat array can be converted to cv::Mat using getMat().
            // in Python bindings Mat is implicitly constructed from ndarray and vice versa
    empty   // Mat()
    empty_like
    eye     // Mat::eye()
    from_dlpack
    full
    full_like
    linspace
    meshgrid
    ones     // Mat::ones()
    ones_like
    tril
    triu
    zeros    // Mat::zeros()
    zeros_like
    
  6. Some useful extra operations not included into "Python array API standard", but included into numpy and/or ONNX specifications:

    einsum // already implemented in cv::dnn
    einops.* // a family of operations from excellent einops package:
             // https://github.com/arogozhnikov/einops
    reduce(..., sum|min|max|avg|...) // in core we already have 2D reduce(),
                                    // need to extend it to ND, as in ONNX or cv::dnn
    

Also, in Core we already have a bunch of functions implemented in numpy, but missing in the standard, like various matrix decomposition and backward substitution algorithms (LU, Cholesky, SVD, QR), FFT etc.

As you can see, many of the operations are already implemented in Core or in DNN module.

What needs to be done basically:

  • implement the rest of API (those are mostly element-wise operations)
  • support parallel implementation to take into account Amdahl law (i.e. even a cheap operation with single-thread implementation may become a bottleneck in a data processing pipeline on a multi-core machine if all other operations in the same pipeline are parallel)
  • support broadcasting in binary operations. We partially support broadcasting in core (A op A and A op scalar) and fully support broadcasting in dnn. Need to merge dnn implementation into core.
  • support multi-dimensional arrays. In most element-wise operations we already support multi-dimensional arrays, but in reduce(), flip(), transpose() and a few others we still don't.
  • support FP16 and BF16 where possible (partly done in core already). Those 2 types become the main types for extensive data processing nowadays. Thankfully, even architectures w/o full support for FP16 and BF16 arithmetics still provide instructions to efficiently convert FP16 and BF16 to/from FP32, e.g. in Intel/AMD AVX2 there are such instructions. And on ARM v8.2 as well as on many modern GPUs FP16 arithmetic is supported natively, and that could give us ~2x acceleration in pipelines that do many arithmetic and matrix operations and/or transfer a lot of data.

So, once again, why is it important, besides the declaration that we 'sort of implemented' the emerging standard?

  • Efficient, high-quality implementations of basic array processing functions will allow us to reduce code duplication and use those functions in DNN module and maybe more efficiently implement higher-level image processing algorithms in imgproc, photo and maybe other modules.

  • We can introduce more or less future-proof HAL for vendors who would like to accelerate OpenCV 5+. They will see that we ask for the same API (at least at semantic level) as the whole Python+numpy+PyTorch+... community, which is a huge number of people, many companies.

  • The goal for OpenCV 5 is to introduce not just the new CPU HAL, but also non-CPU HAL. All above-mentioned functions should be able to use such a HAL. And then all the functionality that is built on top of this basic API (which we and community will gradually extend) will automatically run on GPU or other HAL-supporting accelerators. See a dedicated feature requests (New CPU HAL for OpenCV 5.0 #25019, Introducing non-CPU HAL for OpenCV 5+ #25025) where this HAL is described.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions