Skip to content

Optimization based on RISC-V P Packed SIMD Extension v0.5.2#24556

Merged
asmorkalov merged 1 commit intoopencv:4.xfrom
plctlab:rvp
Jan 16, 2024
Merged

Optimization based on RISC-V P Packed SIMD Extension v0.5.2#24556
asmorkalov merged 1 commit intoopencv:4.xfrom
plctlab:rvp

Conversation

@Junyan721113
Copy link
Copy Markdown
Contributor

Summary

Provides OpenCV optimizations for the RISC-V P extension (v0.5.2).

  1. Added RVP as a new backend to the OpenCV build system;
  2. Optimized some of the algorithms in the DNN, features2d (feature detection), and imgproc (image processing) modules using RVP Intrinsic functions;
  3. Verified the correctness of the optimized algorithms using the QEMU simulator.

The writer of the code and the author of the PR is an intern at ISCAS (Institute of Software, Chinese Academy of Sciences).

List of RVP optimizations

  • Optimization of three convolution functions for int8 layers of deep neural networks
// modules/dnn/src/int8layers/layers_common.simd.hpp
void cv::dnn::fastConv( ... );
void cv::dnn::fastDepthwiseConv( ... );
void cv::dnn::fastGEMM1T( ... );
  • Optimization of matrix affine transformations
// modules/imgproc/src/imgwarp.rvp.cpp
int cv::opt_RVP::warpAffineBlockline( ... );
  • Optimization of nearest neighbor interpolation for matrix scaling with pix_size 2 or 4
// modules/imgproc/src/resize.rvp.cpp
class cv::opt_RVP::resizeNNInvokerRVP4;
class cv::opt_RVP::resizeNNInvokerRVP2;
  • Optimization of Array Accumulation with Squares or Element Multiplication
// modules/imgproc/src/accum.simd.hpp
void accSqr_simd_( ... );
void accProd_simd_( ... );
  • Optimization of integral for unsigned char arrays
// modules/imgproc/src/sumpixels.simd.hpp
template <>
struct Integral_SIMD<uchar, int, double>;
  • Optimization of FAST corner detection algorithm with patternSize 16
// modules/features2d/src/fast.rvp.cpp
class cv::opt_RVP::FAST_t_patternSize16_RVP;

Correctness validation (QEMU)

opencv_test_dnn_rvp Consistent with control (before adding RVP optimization)

opencv_test_imgproc_rvp Consistent with controls

opencv_test_features2d_rvp Consistent with controls

Q&A

Why RVP ?

As a lightweight extension, there is some potential for P extensions to be used in the embedded domain.

Why v0.5.2 ?

Although RVP is not frozen, Andes has massive plans based on version 0.5.2, just like T-Head and RVV071.

Why not Universal Intrinsics ?

RVP052 has no floating-point arithmetic and only supports parallel arithmetic up to 64 bits, which makes it less capable of implementing Universal Intrinsics, and thus most of its optimizations refer to existing function-specific optimizations.

How to perform tests ?

The correctness tests are as follows. (Due to hardware issues, performance test results are not available at this time)

Environment

export RISCV=/opt/andes
export OPENCV_TEST_DATA_PATH=**path_to_opencv_extra**/testdata

Toolchain

nds-gnu-toolchain

build_linux_toolchain.sh

TARGET=riscv64-linux
PREFIX=/opt/andes
ARCH=rv64imafdcxandes
ABI=lp64d
CPU=andes-25-series
XLEN=64
BUILD=`pwd`/build-nds64le-linux-glibc-v5d

Qemu

qemu

../configure --prefix=/opt/andes --target-list=riscv32-linux-user,riscv64-linux-user --disable-werror --static

Build

cmake -D CMAKE_BUILD_TYPE=Debug -D CMAKE_INSTALL_PREFIX=/opt/andes -D BUILD_SHARED_LIBS=OFF --toolchain ../platforms/linux/riscv64-andes-gcc.toolchain.cmake ..

Related Tests

dnn module test

qemu-riscv64 -cpu andes-ax25 -L /opt/riscv/sysroot opencv_test_dnn
# int8layers/layers_common_simd.hpp
# --gtest_filter=*Int8*
# --gtest_filter=*Conv*
# --gtest_filter=*Gemm*

imgproc module test

qemu-riscv64 -cpu andes-ax25 -L /opt/riscv/sysroot opencv_test_imgproc
# imgwarp.rvp.cpp
# --gtest_filter=*Affine*
#
# resize.rvp.cpp
# --gtest_filter=*Resize*
#
# sumpixels.simd.hpp
# --gtest_filter=*Integ*

features2d module test

qemu-riscv64 -cpu andes-ax25 -L /opt/riscv/sysroot opencv_test_features2d
# fast.rvp.cpp
# --gtest_filter=*FAST*
# --gtest_filter=*ORB*

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@asmorkalov
Copy link
Copy Markdown
Contributor

cc @hanliutong @vpisarev

@asmorkalov
Copy link
Copy Markdown
Contributor

@mshabunin Is it possible to add P extension to QEMU configuration on CI? It should help a lot.

@vpisarev vpisarev self-requested a review November 20, 2023 10:37
@vpisarev
Copy link
Copy Markdown
Contributor

vpisarev commented Nov 20, 2023

@Junyan721113, thank you for the contribution! This is a useful effort.

In the long term, however, it will be extremely difficult for our small team to maintain 1000 different branches of the same code. We do it, sometimes, for critical paths in critical modules, such as deep learning convolution etc., but for general-purpose functions using platform-specific intrinsics is too much. Please, consider implementing universal intrinsics backend instead: https://github.com/opencv/opencv/tree/4.x/modules/core/include/opencv2/core/hal.

In this case many hundreds of optimized loops in OpenCV can immediately make use of these instructions. Many other backends rely on 128-bit extensions, whereas P-extension is 64-bit, as far as I know. The solution could be to use a pair of registers to emulate 128-bit simd register.

@vpisarev vpisarev requested a review from mshabunin November 21, 2023 06:52
@mshabunin
Copy link
Copy Markdown
Contributor

I have several questions, concerns and suggestions.

Lower level or technical:

  • CPU check uses __nd__ prefix while other code uses __rv__v_ prefix
  • code uses nds_intrinsic.h header, but I have seen other variant - riscv-dsp.h in the T-Head toolchain.
  • you claim that this is v0.5.2, but P-extension revision history states that __nds__ prefix has been replaced with to __rv__ in v0.8
  • you used -mext-dsp GCC option for enabling this extension, but it seem to be toolchain-specific option because generic GCC doesn't have it. T-Head toolchain, for example, uses common ISA-string syntax: -mcpu=rv64gcp.

Higher level or more strategic questions and proposals:

  • As you wrote, the P-extension differs from RVV thus can not be easily implemented via Universal Intrinsics mechanism, but there is another HAL mechanism for lower-level CPU optimizations which is used by the Carotene library on ARM platforms. I suggest moving all non-dnn code to similar third-party component. For example, FAST algorithm should allow such optimization-shortcut: see https://github.com/opencv/opencv/blob/4.x/modules/features2d/src/hal_replacement.hpp
    Reference documentation is here:
  • Is T-Head DSP implementation compatible with Andes? Is it possible to implement this optimization in a way compatible with both platforms?
  • P-extension documentation has v0.9.11 already, several incompatible changes have been added there since v0.5.2 and v0.8. For example, all intrinsics should now have __rv_ prefix instead of __rv__. Is it possible to distinguish between the extension revisions and either support multiple of them or only a single one? We already had similar problems with RVV and RVV intrinsics specifications: new spec comes out and our code becomes broken and now we have to support multiple revisions.
  • Is there any consumer-grade harware available for purchase for real tests?
  • Do you know about any plans to add P-extension support to the mainline GCC and LLVM toolchains and the mainline QEMU? It is OK to use custom toolchain for development for specific device, but we try to use more generic approaches to optimizations.

@Junyan721113
Copy link
Copy Markdown
Contributor Author

@Junyan721113, thank you for the contribution! This is a useful effort.

In the long term, however, it will be extremely difficult for our small team to maintain 1000 different branches of the same code. We do it, sometimes, for critical paths in critical modules, such as deep learning convolution etc., but for general-purpose functions using platform-specific intrinsics is too much. Please, consider implementing universal intrinsics backend instead: https://github.com/opencv/opencv/tree/4.x/modules/core/include/opencv2/core/hal.

Thank you for your guidance! Most of the current optimizations for P extensions are where other platform-specific optimizations already exist (such as int8layers/layers_common.simd.hpp). I would like to know exactly what parts of the code "critical paths in critical modules" refer to, so that P extensions can be optimized in other ways if Universal Intrinsics is not possible.

In this case many hundreds of optimized loops in OpenCV can immediately make use of these instructions. Many other backends rely on 128-bit extensions, whereas P-extension is 64-bit, as far as I know. The solution could be to use a pair of registers to emulate 128-bit simd register.

However, I'm sorry to say that I'm currently having trouble implementing Universal Intrinsics with the P extension for the following reasons:

  1. P extensions do not have floating point instructions, thus making it difficult to implement the floating point vector part of Universal Intrinsics; moreover, P extensions do not have vector registers, limiting many optimization operations.
  2. Another solution is to fall back to a pure C++ implementation of Universal Intrinsics on floating-point vectors, but this may lead to negative optimizations, just as RVV generates redundant Load/Stores. (modules/core/include/opencv2/core/hal/intrin_rvv.hpp)

@Junyan721113
Copy link
Copy Markdown
Contributor Author

  • CPU check uses __nd__ prefix while other code uses __rv__v_ prefix
  • you claim that this is v0.5.2, but P-extension revision history states that __nds__ prefix has been replaced with to __rv__ in v0.8

This is my fault. RVP v0.5.2 should use __nds__ prefix rather than __rv__ prefix.

  • code uses nds_intrinsic.h header, but I have seen other variant - riscv-dsp.h in the T-Head toolchain.
  • you used -mext-dsp GCC option for enabling this extension, but it seem to be toolchain-specific option because generic GCC doesn't have it. T-Head toolchain, for example, uses common ISA-string syntax: -mcpu=rv64gcp.

I'm sorry, but Andes toolchain uses nds_intrinsic.h as header, and the -mext-dsp option is documented in Andes DSP Library.

As a test outside of this PR, A 3rdparty component called ndsrvp is created, containing one of the non-dnn code (integral_SIMD), and it works very well.
All the non-dnn code in this PR have been removed, currently this PR can be focused on dnn optinizations.
This HAL mechanism is quite suitable for rvp optimizations, all the non-dnn code is expected to be moved into ndsrvp soon.

  • Is T-Head DSP implementation compatible with Andes? Is it possible to implement this optimization in a way compatible with both platforms?

T-Head DSP implementation does not support __nds__ prefix, and has different intrinsic function definations using intXLEN_t and uintXLEN_t, so it is possibly incompatible. And this PR is only intended to add optimizations based on rvp v0.5.2, which is Andes RVP.

  • P-extension documentation has v0.9.11 already, several incompatible changes have been added there since v0.5.2 and v0.8. For example, all intrinsics should now have __rv_ prefix instead of __rv__. Is it possible to distinguish between the extension revisions and either support multiple of them or only a single one? We already had similar problems with RVV and RVV intrinsics specifications: new spec comes out and our code becomes broken and now we have to support multiple revisions.

Supporting only v0.5.2 might be the best solution of this PR. RVP is renamed to RVP052 in order to distinguish RVP revisions.
Andes has plans for RVP052, just as T-Head has plans for RVV071.

  • Is there any consumer-grade harware available for purchase for real tests?

Communication has been made with Andes, development board will soon be available for perfromance tests.

  • Do you know about any plans to add P-extension support to the mainline GCC and LLVM toolchains and the mainline QEMU? It is OK to use custom toolchain for development for specific device, but we try to use more generic approaches to optimizations.

I'm sorry, but currently I don't know about any plans related to Andes adding support to mainline.

Copy link
Copy Markdown
Contributor

@mshabunin mshabunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest simplifying CPU-feature part: instead of adding RVP052 as a separate CPU feature, let's use custom macro defined in cmake toolchain file, like it is done in platforms/linux/riscv64-071-gcc.toolchain.cmake.

Basically you have to revert all core modifications and add some macro definition to the riscv64-andes-gcc.toolchain.cmake (e.g. -D__riscv_andes_rvp052 or maybe there is one built into the compiler already?). Then use plain #ifdef guard for optimized code sections.

Tricky part is dispatched fastConv, fastDepthwiseConv and fastGEMM - I suggest adding new files conv_depthwise.rvp052.cpp/.hpp with your implementation and include/call it if that macro is enabled.

Probably some additional cmake variable should be set in the toolchain file, so that dnn/CMakeLists.txt would know when to add new rvp052.cpp files to the build (or it can be just guarded by the same macro and added to the build unconditionally).

cc @opencv-alalek , what do you think?

@opencv-alalek
Copy link
Copy Markdown
Contributor

CPU features uses common principles for detection / control / compilation / execution and diagnostic.
We could work without all of this, but it doesn't look like a reliable process.


platforms/linux/riscv64-071-gcc.toolchain.cmake

Could we reuse generic RISC-V toolchains? (with appropriate CPU_BASELINE/CPU_DISPATCH CMake parameters)

@mshabunin
Copy link
Copy Markdown
Contributor

CPU features uses common principles for detection / control / compilation / execution and diagnostic.
We could work without all of this, but it doesn't look like a reliable process.

Yes, in general I agree, but in this specific case - limited HW availability, specialized toolchain, non-ratified extension, which is not available in generic toolchains - it looks more like RVV 0.7.1. Also there is no actual runtime check for this extension, so dispatched implementations do not make sense, in this PR dispatching was implemented only because of DNN module specifics (no hal::, no Universal Inrinsics, raw SIMD blocks, existing dispatching).

So, IMHO experimental less-invasive approach similar to early RVV 0.7.1 would fit better than generalized P-extension support. Later, when various implementations converge to some stable form and the extension is supported in the upstream, we will implement it as a full-fledged CPU feature.

@Junyan721113
Copy link
Copy Markdown
Contributor Author

Junyan721113 commented Dec 12, 2023

Tricky part is dispatched fastConv, fastDepthwiseConv and fastGEMM - I suggest adding new files conv_depthwise.rvp052.cpp/.hpp with your implementation and include/call it if that macro is enabled.

Files with .rvp052.cpp suffix could trigger CMake CPU dispatch filter, resulting in Excluding from source files list: modules/dnn/src/int8layers/conv_depthwise.rvp052.cpp, so conv_depthwise.dispatch.cpp may be a better solution.

As for marcos, there are 2 marcos called __ANDES and __riscv_dsp filling the need.

Meanwhile, I wonder if it is acceptable to implement all these 3 convolution functions inside one conv_depthwise.dispatch.cpp file (maybe renaming it to layers_common.dispatch.cpp is better?), rather than put them in 3 .cpp files.

In total, is the following code acceptable?

// modules/core/include/opencv2/core/cv_cpu_dispatch.h
#if defined(__riscv) && defined(__riscv_dsp) && defined(__ANDES)
# include <nds_intrinsic.h>
# define CV_RVP052 1
#endif
// modules/dnn/src/int8layers/layers_common.simd.hpp
#include "layers_common.dispatch.hpp"
// modules/dnn/src/int8layers/layers_common.dispatch.cpp
namespace cv {
namespace dnn {
namespace opt_RVP052 {

#if CV_RVP052
//RVP Optimizations
// modules/dnn/src/int8layers/convolution_layer.cpp
#if CV_RVP052
    if(isConv2D)
        opt_RVP052::fastDepthwiseConv(wptr, kernel_h, kernel_w,
            stride_h, stride_w, dilation_h, dilation_w, pad_t, pad_l,
            biasptr, multptr, inptr_, height, width, outptr_, out_d, outH, outW, inpZp, outZp);
    else

Copy link
Copy Markdown
Contributor

@mshabunin mshabunin Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest renaming files to something like layers_rvp052.cpp/.hpp to avoid confusion with .dispatch files in other modules because they usually serve different purpose.

Disable whole .cpp body if macro is not defined or is false and include .hpp file into layers_common.hpp with the same macro condition.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

else
#endif
#if CV_RVP052
if(useRVP052)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

useRVP052 is always the same as CV_RVP052 and does not have external interface, so I suggest removing boolean flag completely. Here and in other files.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fully_connected_layer.cpp this is absolutely right. But in convolution_layer.cpp, useRVP052 is not always the same as CV_RVP052, because of line 769 p.useRVP052 = CV_RVP052 && isConv2D; introducing a little difference.
So change this boolean flag into isConv2D might be better.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest moving these changes to the dnn module, maybe to int8layers/layers_common.hpp?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In layers_rvp052.cpp, including layers_common.hpp to get CV_RVP052 could cause HAVE_OPENCL malfunction as follows:

In file included from /home/junyan/opencv_rvp/modules/dnn/src/int8layers/./layers_common.hpp:17,
                 from /home/junyan/opencv_rvp/modules/dnn/src/int8layers/layers_rvp052.cpp:5:
/home/junyan/opencv_rvp/modules/dnn/src/int8layers/./../ocl4dnn/include/ocl4dnn.hpp:196:9: error: 'ocl' does not name a type; did you mean 'ogl'?
  196 |         ocl::Program compileKernel();
      |         ^~~
      |         ogl

So maybe moving them into layers_rvp052.hpp is better.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifications in this file will not be necessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@asmorkalov asmorkalov added this to the 4.9.0 milestone Dec 20, 2023
@asmorkalov asmorkalov merged commit 99c86bb into opencv:4.x Jan 16, 2024
@asmorkalov asmorkalov mentioned this pull request Jan 23, 2024
@Junyan721113
Copy link
Copy Markdown
Contributor Author

Development boards for accuracy test and performance test have been set up, results will soon come out.

@Junyan721113
Copy link
Copy Markdown
Contributor Author

Junyan721113 commented Mar 2, 2024

Here's the accuracy test and performance test results!

TL; DR: EfficientDet_int8 in opencv_perf_dnn have gained a 1.95x performance boost.

The 3 functions optimized by RVP only appeared in the following tests:

./opencv_test_dnn --gtest_filter=*EfficientDet_int8*:*Quant*:*Int8* --gtest_output=xml
./opencv_perf_dnn --gtest_filter=*EfficientDet_int8* --gtest_output=xml

Meanwhile Test_Int8_nets.CaffeNet and Test_Int8_nets.RCNN_ILSVRC13 took up too much memory to be run on the board.

So the final filter is:

./opencv_test_dnn --gtest_filter=*EfficientDet_int8*:*Quant*:*Int8*--*CaffeNet*:*RCNN_ILSVRC13* --gtest_output=xml
./opencv_perf_dnn --gtest_filter=*EfficientDet_int8* --gtest_output=xml

opencv_perf_dnn summary

> python .\misc\summary.py .\opencv_bin_blank\opencv_perf_dnn.xml .\opencv_bin_rvp\opencv_perf_dnn.xml
Geometric mean (ms)

               Name of Test                 opencv    opencv     opencv
                                             perf      perf       perf
                                              dnn       dnn       dnn
                                                                   vs
                                                                 opencv
                                                                  perf
                                                                  dnn
                                                               (x-factor)
EfficientDet_int8::DNNTestNetwork::OCV/CPU 42451.011 21728.436    1.95

opencv_perf_dnn optimized

<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2020-09-22T16:39:16" time="241.631" cv_module_name="dnn" cv_implementation="plain" cv_num_threads="-1" test_tags="" test_tags_skip="mem_6gb,verylong,debug_verylong" test_tags_force="" cv_version="4.9.0-dev" cv_version_build="4.9.0-dev" cv_vcs_version="4.9.0-216-g09c6961694-dirty" cv_build_type="Debug" cv_build_type_build="Debug" cv_compiler="/home/junyan/opt/andes/bin/riscv64-linux-g++  (ver 10.3.0)" cv_parallel_framework="pthreads" cv_parallel_threads="1" cv_cpu_features="" cv_ocl="disabled" name="AllTests">
  <testsuite name="DNNTestNetwork" tests="1" failures="0" disabled="0" errors="0" time="241.625">
    <testcase name="EfficientDet_int8/0" value_param="OCV/CPU" status="run" time="241.623" classname="DNNTestNetwork">
<properties>
<property name="ocl_memory_usage" value="0"/>
<property name="bytesIn" value="0"/>
<property name="bytesOut" value="0"/>
<property name="term" value="1"/>
<property name="samples" value="10"/>
<property name="outliers" value="0"/>
<property name="frequency" value="1000000000"/>
<property name="min" value="21683049745"/>
<property name="median" value="21712831994"/>
<property name="gmean" value="21728435820"/>
<property name="gstddev" value="0.002588"/>
<property name="mean" value="21728501353"/>
<property name="stddev" value="56310680"/>
</properties>
    </testcase>
  </testsuite>
</testsuites>

opencv_perf_dnn control

<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2020-09-22T15:10:47" time="469.468" cv_module_name="dnn" cv_implementation="plain" cv_num_threads="-1" test_tags="" test_tags_skip="mem_6gb,verylong,debug_verylong" test_tags_force="" cv_version="4.9.0-dev" cv_version_build="4.9.0-dev" cv_vcs_version="4.9.0-212-g0e44f3a544-dirty" cv_build_type="Debug" cv_build_type_build="Debug" cv_compiler="/home/junyan/opt/andes/bin/riscv64-linux-g++  (ver 10.3.0)" cv_parallel_framework="pthreads" cv_parallel_threads="1" cv_cpu_features="" cv_ocl="disabled" name="AllTests">
  <testsuite name="DNNTestNetwork" tests="1" failures="0" disabled="0" errors="0" time="469.462">
    <testcase name="EfficientDet_int8/0" value_param="OCV/CPU" status="run" time="469.46" classname="DNNTestNetwork">
<properties>
<property name="ocl_memory_usage" value="0"/>
<property name="bytesIn" value="0"/>
<property name="bytesOut" value="0"/>
<property name="term" value="1"/>
<property name="samples" value="10"/>
<property name="outliers" value="0"/>
<property name="frequency" value="1000000000"/>
<property name="min" value="42387522406"/>
<property name="median" value="42406298532"/>
<property name="gmean" value="42451010572"/>
<property name="gstddev" value="0.001703"/>
<property name="mean" value="42451066023"/>
<property name="stddev" value="72351781"/>
</properties>
    </testcase>
  </testsuite>
</testsuites>

opencv_test_dnn summary
opencv_test_dnn
Testcases shorter than 1s are not shown above.

How the graph came out:

import xml.etree.ElementTree as ET
import matplotlib.pyplot as plt

# Read the XML files and extract the mean values

rvp_file = 'opencv_bin_rvp/opencv_test_dnn.xml'
blank_file = 'opencv_bin_blank/opencv_test_dnn.xml'

# parse the XML files

rvp_data = ET.parse(rvp_file).getroot()
blank_data = ET.parse(blank_file).getroot()

print(rvp_data.tag, rvp_data.attrib)

test_names = []
for testsuite in rvp_data.iter(tag='testsuite'):
    # print(testsuite.tag, testsuite.attrib)
    test_names.append('Total: ' + testsuite.attrib['name'])
    for testcase in testsuite.iter(tag='testcase'):
        # print(testcase.tag, testcase.attrib)
        test_names.append(testcase.attrib['name'])

# keyw = 'mean'
keyw = 'time'

rvp_means = []
for testsuite in rvp_data.iter(tag='testsuite'):
    rvp_means.append(float(testsuite.attrib['time']))
    for testcase in testsuite.iter(tag='testcase'):
        # print(testcase.tag, testcase.attrib)
        if keyw not in testcase.attrib:
            continue
        rvp_means.append(float(testcase.attrib[keyw]))

blank_means = []
for testsuite in blank_data.iter(tag='testsuite'):
    blank_means.append(float(testsuite.attrib['time']))
    for testcase in testsuite.iter(tag='testcase'):
        # print(testcase.tag, testcase.attrib)
        if keyw not in testcase.attrib:
            continue
        blank_means.append(float(testcase.attrib[keyw]))

print(rvp_means)
print(blank_means)

ratio = [blank_means[i] / rvp_means[i] for i in range(len(rvp_means)) if rvp_means[i] >= 1.0]

# Remove trivial cases

test_names = [test_names[i] for i in range(len(ratio))] # if ratio[i] > 1.05 or ratio[i] < 0.95]
ratio = [ratio[i] for i in range(len(ratio))] # if ratio[i] > 1.05 or ratio[i] < 0.95]

# Plot the bar chart
fig, ax = plt.subplots()
ax.bar(range(len(ratio)), ratio, color='b')
ax.set_xlabel('Test case')
ax.set_ylabel('Speedup')
ax.set_title('Speedup of RVP over blank')
ax.set_xticks(range(len(ratio)))
ax.set_xticklabels(test_names, rotation=90)
ax.set_yticks(range(0, 6, 1))
ax.set_yticklabels([f'{i}x' for i in range(0, 6, 1)])
ax.axhline(y=1, color='r', linestyle='--')
ax.grid(True, axis='y')

# margin the plot
plt.tight_layout()

# Save the plot
# plt.savefig('speedup.png')

# Show the plot

plt.show()

opencv_test_dnn optimized

<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="86" failures="0" disabled="2" errors="0" timestamp="2020-09-22T13:21:45" time="2233.4" cv_version="4.9.0-dev" cv_version_build="4.9.0-dev" cv_vcs_version="4.9.0-216-g09c6961694-dirty" cv_build_type="Debug" cv_build_type_build="Debug" cv_compiler="/home/junyan/opt/andes/bin/riscv64-linux-g++  (ver 10.3.0)" cv_parallel_framework="pthreads" cv_parallel_threads="1" cv_cpu_features="" cv_ocl="disabled" test_tags="" test_tags_skip="mem_6gb,verylong,debug_verylong,dnn_skip_opencv_backend,dnn_skip_cpu,dnn_skip_cpu_fp16,dnn_skip_ocl,dnn_skip_ocl_fp16,dnn_skip_onnx_conformance,dnn_skip_parser" test_tags_force="" name="AllTests">
  <testsuite name="Test_Int8_layers" tests="40" failures="0" disabled="2" errors="0" time="6.545">
    <testcase name="Convolution1D/0" value_param="OCV/CPU" status="run" time="0.091" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Convolution2D/0" value_param="OCV/CPU" status="run" time="0.856" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Convolution3D/0" value_param="OCV/CPU" status="run" time="0.074" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Flatten/0" value_param="OCV/CPU" status="run" time="0.114" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Padding/0" value_param="OCV/CPU" status="run" time="0.266" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="AvePooling/0" value_param="OCV/CPU" status="run" time="0.348" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MaxPooling/0" value_param="OCV/CPU" status="run" time="0.445" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Reduce/0" value_param="OCV/CPU" status="run" time="0.24" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="ReLU/0" value_param="OCV/CPU" status="run" time="0.126" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="LeakyReLU/0" value_param="OCV/CPU" status="run" time="0.015" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="ReLU6/0" value_param="OCV/CPU" status="run" time="0.065" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Sigmoid/0" value_param="OCV/CPU" status="run" time="0.026" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Sigmoid_dynamic_axes/0" value_param="OCV/CPU" status="run" time="0.037" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Sigmoid_1d/0" value_param="OCV/CPU" status="run" time="0.045" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Mish/0" value_param="OCV/CPU" status="run" time="0.035" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_Caffe/0" value_param="OCV/CPU" status="run" time="0.174" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_keras_TF/0" value_param="OCV/CPU" status="run" time="0.02" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_slim_TF/0" value_param="OCV/CPU" status="run" time="0.026" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_slim_v2_TF/0" value_param="OCV/CPU" status="run" time="0.036" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_ONNX/0" value_param="OCV/CPU" status="run" time="0.021" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_log_ONNX/0" value_param="OCV/CPU" status="run" time="0.02" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="DISABLED_Softmax_unfused_ONNX/0" value_param="OCV/CPU" status="notrun" time="0" classname="Test_Int8_layers" />
    <testcase name="Concat/0" value_param="OCV/CPU" status="run" time="0.22" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="BatchNorm/0" value_param="OCV/CPU" status="run" time="0.411" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Scale/0" value_param="OCV/CPU" status="run" time="0.143" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="InnerProduct/0" value_param="OCV/CPU" status="run" time="1.244" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Reshape/0" value_param="OCV/CPU" status="run" time="0.412" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Permute/0" value_param="OCV/CPU" status="run" time="0.045" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Identity/0" value_param="OCV/CPU" status="run" time="0.077" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_split_tf/0" value_param="OCV/CPU" status="run" time="0.02" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_4d_tf/0" value_param="OCV/CPU" status="run" time="0.022" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_strided_tf/0" value_param="OCV/CPU" status="run" time="0.024" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="DISABLED_Slice_onnx/0" value_param="OCV/CPU" status="notrun" time="0" classname="Test_Int8_layers" />
    <testcase name="Slice_dynamic_axes_onnx/0" value_param="OCV/CPU" status="run" time="0.026" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_steps_2d_onnx11/0" value_param="OCV/CPU" status="run" time="0.042" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_steps_3d_onnx11/0" value_param="OCV/CPU" status="run" time="0.053" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_steps_4d_onnx11/0" value_param="OCV/CPU" status="run" time="0.041" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_steps_5d_onnx11/0" value_param="OCV/CPU" status="run" time="0.042" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Dropout/0" value_param="OCV/CPU" status="run" time="0.143" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Eltwise/0" value_param="OCV/CPU" status="run" time="0.433" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
  </testsuite>
  <testsuite name="Test_Int8_nets" tests="24" failures="0" disabled="0" errors="0" time="2172.39">
    <testcase name="AlexNet/0" value_param="OCV/CPU" status="run" time="81.558" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
<property name="tags" value="mem_512mb"/>
<property name="tags_implied" value=""/>
</properties>
    </testcase>
    <testcase name="GoogLeNet/0" value_param="OCV/CPU" status="run" time="237.368" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="ResNet50/0" value_param="OCV/CPU" status="run" time="0.065" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb,debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="DenseNet121/0" value_param="OCV/CPU" status="run" time="215.475" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
<property name="tags" value="mem_512mb"/>
<property name="tags_implied" value=""/>
</properties>
    </testcase>
    <testcase name="SqueezeNet_v1_1/0" value_param="OCV/CPU" status="run" time="30.185" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Inception_v2/0" value_param="OCV/CPU" status="run" time="168.073" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MobileNet_v2/0" value_param="OCV/CPU" status="run" time="38.848" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Shufflenet/0" value_param="OCV/CPU" status="run" time="15.797" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MobileNet_SSD/0" value_param="OCV/CPU" status="run" time="89.717" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MobileNet_v1_SSD/0" value_param="OCV/CPU" status="run" time="99.273" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MobileNet_v1_SSD_PPN/0" value_param="OCV/CPU" status="run" time="92.041" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Inception_v2_SSD/0" value_param="OCV/CPU" status="run" time="368.112" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
<property name="tags" value="mem_512mb"/>
<property name="tags_implied" value=""/>
</properties>
    </testcase>
    <testcase name="opencv_face_detector/0" value_param="OCV/CPU" status="run" time="238.913" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="EfficientDet/0" value_param="OCV/CPU" status="run" time="0.002" classname="Test_Int8_nets">
<properties>
<property name="tags" value="debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="FasterRCNN_resnet50/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_1gb,long,debug_verylong"/>
<property name="tags_implied" value="mem_512mb,debug_long"/>
</properties>
    </testcase>
    <testcase name="FasterRCNN_inceptionv2/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_1gb,long,debug_verylong"/>
<property name="tags_implied" value="mem_512mb,debug_long"/>
</properties>
    </testcase>
    <testcase name="FasterRCNN_vgg16/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_2gb,long,debug_verylong"/>
<property name="tags_implied" value="mem_1gb,mem_512mb,debug_long"/>
</properties>
    </testcase>
    <testcase name="FasterRCNN_zf/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb,debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="RFCN/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb,long,debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="YoloVoc/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_1gb,long,debug_verylong"/>
<property name="tags_implied" value="mem_512mb,debug_long"/>
</properties>
    </testcase>
    <testcase name="TinyYoloVoc/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb,debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="YOLOv3/0" value_param="OCV/CPU" status="run" time="0" classname="Test_Int8_nets">
<properties>
<property name="tags" value="long,mem_1gb,debug_verylong"/>
<property name="tags_implied" value="debug_long,mem_512mb"/>
</properties>
    </testcase>
    <testcase name="YOLOv4/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="long,mem_1gb,debug_verylong"/>
<property name="tags_implied" value="debug_long,mem_512mb"/>
</properties>
    </testcase>
    <testcase name="YOLOv4_tiny/0" value_param="OCV/CPU" status="run" time="496.879" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb"/>
<property name="tags_implied" value=""/>
</properties>
    </testcase>
  </testsuite>
  <testsuite name="Test_ONNX_layers" tests="20" failures="0" disabled="0" errors="0" time="1.548">
    <testcase name="Quantized_Convolution/0" value_param="OCV/CPU" status="run" time="0.391" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_MatMul/0" value_param="OCV/CPU" status="run" time="0.133" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Gemm/0" value_param="OCV/CPU" status="run" time="0.039" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_MatMul_Variable_Weights/0" value_param="OCV/CPU" status="run" time="0.09" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Eltwise/0" value_param="OCV/CPU" status="run" time="0.051" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Eltwise_Scalar/0" value_param="OCV/CPU" status="run" time="0.041" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Eltwise_Broadcast/0" value_param="OCV/CPU" status="run" time="0.042" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_LeakyReLU/0" value_param="OCV/CPU" status="run" time="0.035" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Sigmoid/0" value_param="OCV/CPU" status="run" time="0.034" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_MaxPool/0" value_param="OCV/CPU" status="run" time="0.036" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_AvgPool/0" value_param="OCV/CPU" status="run" time="0.037" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Split/0" value_param="OCV/CPU" status="run" time="0.045" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Pad/0" value_param="OCV/CPU" status="run" time="0.045" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Reshape/0" value_param="OCV/CPU" status="run" time="0.036" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Transpose/0" value_param="OCV/CPU" status="run" time="0.035" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Squeeze/0" value_param="OCV/CPU" status="run" time="0.035" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Unsqueeze/0" value_param="OCV/CPU" status="run" time="0.035" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Resize/0" value_param="OCV/CPU" status="run" time="0.112" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Concat/0" value_param="OCV/CPU" status="run" time="0.081" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Constant/0" value_param="OCV/CPU" status="run" time="0.159" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
  </testsuite>
  <testsuite name="Test_ONNX_nets" tests="1" failures="0" disabled="0" errors="0" time="28.347">
    <testcase name="ResNet50_Int8/0" value_param="OCV/CPU" status="run" time="28.345" classname="Test_ONNX_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
  </testsuite>
  <testsuite name="Test_TFLite" tests="1" failures="0" disabled="0" errors="0" time="24.551">
    <testcase name="EfficientDet_int8/0" value_param="OCV/CPU" status="run" time="24.55" classname="Test_TFLite">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
  </testsuite>
</testsuites>

opencv_test_dnn control

<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="86" failures="0" disabled="2" errors="0" timestamp="2020-09-22T13:59:53" time="2899.68" cv_version="4.9.0-dev" cv_version_build="4.9.0-dev" cv_vcs_version="4.9.0-212-g0e44f3a544-dirty" cv_build_type="Debug" cv_build_type_build="Debug" cv_compiler="/home/junyan/opt/andes/bin/riscv64-linux-g++  (ver 10.3.0)" cv_parallel_framework="pthreads" cv_parallel_threads="1" cv_cpu_features="" cv_ocl="disabled" test_tags="" test_tags_skip="mem_6gb,verylong,debug_verylong,dnn_skip_opencv_backend,dnn_skip_cpu,dnn_skip_cpu_fp16,dnn_skip_ocl,dnn_skip_ocl_fp16,dnn_skip_onnx_conformance,dnn_skip_parser" test_tags_force="" name="AllTests">
  <testsuite name="Test_Int8_layers" tests="40" failures="0" disabled="2" errors="0" time="6.676">
    <testcase name="Convolution1D/0" value_param="OCV/CPU" status="run" time="0.108" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Convolution2D/0" value_param="OCV/CPU" status="run" time="0.902" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Convolution3D/0" value_param="OCV/CPU" status="run" time="0.073" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Flatten/0" value_param="OCV/CPU" status="run" time="0.114" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Padding/0" value_param="OCV/CPU" status="run" time="0.267" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="AvePooling/0" value_param="OCV/CPU" status="run" time="0.292" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MaxPooling/0" value_param="OCV/CPU" status="run" time="0.489" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Reduce/0" value_param="OCV/CPU" status="run" time="0.238" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="ReLU/0" value_param="OCV/CPU" status="run" time="0.127" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="LeakyReLU/0" value_param="OCV/CPU" status="run" time="0.014" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="ReLU6/0" value_param="OCV/CPU" status="run" time="0.064" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Sigmoid/0" value_param="OCV/CPU" status="run" time="0.025" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Sigmoid_dynamic_axes/0" value_param="OCV/CPU" status="run" time="0.027" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Sigmoid_1d/0" value_param="OCV/CPU" status="run" time="0.024" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Mish/0" value_param="OCV/CPU" status="run" time="0.023" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_Caffe/0" value_param="OCV/CPU" status="run" time="0.213" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_keras_TF/0" value_param="OCV/CPU" status="run" time="0.032" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_slim_TF/0" value_param="OCV/CPU" status="run" time="0.025" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_slim_v2_TF/0" value_param="OCV/CPU" status="run" time="0.034" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_ONNX/0" value_param="OCV/CPU" status="run" time="0.021" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_log_ONNX/0" value_param="OCV/CPU" status="run" time="0.021" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="DISABLED_Softmax_unfused_ONNX/0" value_param="OCV/CPU" status="notrun" time="0" classname="Test_Int8_layers" />
    <testcase name="Concat/0" value_param="OCV/CPU" status="run" time="0.231" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="BatchNorm/0" value_param="OCV/CPU" status="run" time="0.409" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Scale/0" value_param="OCV/CPU" status="run" time="0.095" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="InnerProduct/0" value_param="OCV/CPU" status="run" time="1.34" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Reshape/0" value_param="OCV/CPU" status="run" time="0.41" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Permute/0" value_param="OCV/CPU" status="run" time="0.045" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Identity/0" value_param="OCV/CPU" status="run" time="0.078" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_split_tf/0" value_param="OCV/CPU" status="run" time="0.02" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_4d_tf/0" value_param="OCV/CPU" status="run" time="0.022" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_strided_tf/0" value_param="OCV/CPU" status="run" time="0.024" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="DISABLED_Slice_onnx/0" value_param="OCV/CPU" status="notrun" time="0" classname="Test_Int8_layers" />
    <testcase name="Slice_dynamic_axes_onnx/0" value_param="OCV/CPU" status="run" time="0.026" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_steps_2d_onnx11/0" value_param="OCV/CPU" status="run" time="0.042" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_steps_3d_onnx11/0" value_param="OCV/CPU" status="run" time="0.057" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_steps_4d_onnx11/0" value_param="OCV/CPU" status="run" time="0.044" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_steps_5d_onnx11/0" value_param="OCV/CPU" status="run" time="0.043" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Dropout/0" value_param="OCV/CPU" status="run" time="0.116" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Eltwise/0" value_param="OCV/CPU" status="run" time="0.476" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
  </testsuite>
  <testsuite name="Test_Int8_nets" tests="24" failures="0" disabled="0" errors="0" time="2740.59">
    <testcase name="AlexNet/0" value_param="OCV/CPU" status="run" time="97.623" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
<property name="tags" value="mem_512mb"/>
<property name="tags_implied" value=""/>
</properties>
    </testcase>
    <testcase name="GoogLeNet/0" value_param="OCV/CPU" status="run" time="300.924" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="ResNet50/0" value_param="OCV/CPU" status="run" time="0.031" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb,debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="DenseNet121/0" value_param="OCV/CPU" status="run" time="272.641" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
<property name="tags" value="mem_512mb"/>
<property name="tags_implied" value=""/>
</properties>
    </testcase>
    <testcase name="SqueezeNet_v1_1/0" value_param="OCV/CPU" status="run" time="38.057" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Inception_v2/0" value_param="OCV/CPU" status="run" time="208.418" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MobileNet_v2/0" value_param="OCV/CPU" status="run" time="47.593" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Shufflenet/0" value_param="OCV/CPU" status="run" time="18.378" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MobileNet_SSD/0" value_param="OCV/CPU" status="run" time="112.532" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MobileNet_v1_SSD/0" value_param="OCV/CPU" status="run" time="123.763" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MobileNet_v1_SSD_PPN/0" value_param="OCV/CPU" status="run" time="115.315" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Inception_v2_SSD/0" value_param="OCV/CPU" status="run" time="464.413" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
<property name="tags" value="mem_512mb"/>
<property name="tags_implied" value=""/>
</properties>
    </testcase>
    <testcase name="opencv_face_detector/0" value_param="OCV/CPU" status="run" time="304.788" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="EfficientDet/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="FasterRCNN_resnet50/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_1gb,long,debug_verylong"/>
<property name="tags_implied" value="mem_512mb,debug_long"/>
</properties>
    </testcase>
    <testcase name="FasterRCNN_inceptionv2/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_1gb,long,debug_verylong"/>
<property name="tags_implied" value="mem_512mb,debug_long"/>
</properties>
    </testcase>
    <testcase name="FasterRCNN_vgg16/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_2gb,long,debug_verylong"/>
<property name="tags_implied" value="mem_1gb,mem_512mb,debug_long"/>
</properties>
    </testcase>
    <testcase name="FasterRCNN_zf/0" value_param="OCV/CPU" status="run" time="0.002" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb,debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="RFCN/0" value_param="OCV/CPU" status="run" time="0.002" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb,long,debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="YoloVoc/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_1gb,long,debug_verylong"/>
<property name="tags_implied" value="mem_512mb,debug_long"/>
</properties>
    </testcase>
    <testcase name="TinyYoloVoc/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb,debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="YOLOv3/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="long,mem_1gb,debug_verylong"/>
<property name="tags_implied" value="debug_long,mem_512mb"/>
</properties>
    </testcase>
    <testcase name="YOLOv4/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="long,mem_1gb,debug_verylong"/>
<property name="tags_implied" value="debug_long,mem_512mb"/>
</properties>
    </testcase>
    <testcase name="YOLOv4_tiny/0" value_param="OCV/CPU" status="run" time="636.056" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb"/>
<property name="tags_implied" value=""/>
</properties>
    </testcase>
  </testsuite>
  <testsuite name="Test_ONNX_layers" tests="20" failures="0" disabled="0" errors="0" time="1.534">
    <testcase name="Quantized_Convolution/0" value_param="OCV/CPU" status="run" time="0.343" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_MatMul/0" value_param="OCV/CPU" status="run" time="0.132" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Gemm/0" value_param="OCV/CPU" status="run" time="0.038" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_MatMul_Variable_Weights/0" value_param="OCV/CPU" status="run" time="0.082" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Eltwise/0" value_param="OCV/CPU" status="run" time="0.054" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Eltwise_Scalar/0" value_param="OCV/CPU" status="run" time="0.043" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Eltwise_Broadcast/0" value_param="OCV/CPU" status="run" time="0.059" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_LeakyReLU/0" value_param="OCV/CPU" status="run" time="0.059" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Sigmoid/0" value_param="OCV/CPU" status="run" time="0.035" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_MaxPool/0" value_param="OCV/CPU" status="run" time="0.037" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_AvgPool/0" value_param="OCV/CPU" status="run" time="0.04" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Split/0" value_param="OCV/CPU" status="run" time="0.049" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Pad/0" value_param="OCV/CPU" status="run" time="0.046" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Reshape/0" value_param="OCV/CPU" status="run" time="0.037" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Transpose/0" value_param="OCV/CPU" status="run" time="0.037" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Squeeze/0" value_param="OCV/CPU" status="run" time="0.037" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Unsqueeze/0" value_param="OCV/CPU" status="run" time="0.037" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Resize/0" value_param="OCV/CPU" status="run" time="0.118" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Concat/0" value_param="OCV/CPU" status="run" time="0.077" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Constant/0" value_param="OCV/CPU" status="run" time="0.134" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
  </testsuite>
  <testsuite name="Test_ONNX_nets" tests="1" failures="0" disabled="0" errors="0" time="105.911">
    <testcase name="ResNet50_Int8/0" value_param="OCV/CPU" status="run" time="105.91" classname="Test_ONNX_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
  </testsuite>
  <testsuite name="Test_TFLite" tests="1" failures="0" disabled="0" errors="0" time="44.96">
    <testcase name="EfficientDet_int8/0" value_param="OCV/CPU" status="run" time="44.958" classname="Test_TFLite">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
  </testsuite>
</testsuites>

asmorkalov pushed a commit that referenced this pull request May 28, 2024
3rdparty: NDSRVP - A New 3rdparty Library with Optimizations Based on RISC-V P Extension v0.5.2 - Part 1: Basic Functions #25167

# Summary

### Previous context
From PR #24556: 

>> * As you wrote, the P-extension differs from RVV thus can not be easily implemented via Universal Intrinsics mechanism, but there is another HAL mechanism for lower-level CPU optimizations which is used by the [Carotene](https://github.com/opencv/opencv/tree/4.x/3rdparty/carotene) library on ARM platforms. I suggest moving all non-dnn code to similar third-party component. For example, FAST algorithm should allow such optimization-shortcut: see https://github.com/opencv/opencv/blob/4.x/modules/features2d/src/hal_replacement.hpp
>>   Reference documentation is here:
>>   
>>   * https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html
>>   * https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html
>>   * https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html
>>   * Carotene library is turned on here: https://github.com/opencv/opencv/blob/8bbf08f0de9c387c12afefdb05af7780d989e4c3/CMakeLists.txt#L906-L911

> As a test outside of this PR, A 3rdparty component called ndsrvp is created, containing one of the non-dnn code (integral_SIMD), and it works very well.
> All the non-dnn code in this PR have been removed, currently this PR can be focused on dnn optinizations.
> This HAL mechanism is quite suitable for rvp optimizations, all the non-dnn code is expected to be moved into ndsrvp soon.

### Progress

#### Part 1 (This PR)

- [Core](https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html)
- [x] Element-wise add and subtract
- [x] Element-wise minimum or maximum
- [x] Element-wise absolute difference
- [x] Bitwise logical operations
- [x] Element-wise compare
- [ImgProc](https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html)
- [x] Integral
- [x] Threshold
- [x] WarpAffine
- [x] WarpPerspective
- [Features2D](https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html)

#### Part 2 (Next PR)

**Rough Estimate. Todo List May Change.**

- [Core](https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html)
- [ImgProc](https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html)
- smaller remap HAL interface
- AdaptiveThreshold
- BoxFilter
- Canny
- Convert
- Filter
- GaussianBlur
- MedianBlur
- Morph
- Pyrdown
- Resize
- Scharr
- SepFilter
- Sobel
- [Features2D](https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html)
- FAST

### Performance Tests

The optimization does not contain floating point opreations.

**Absolute Difference**

Geometric mean (ms)

|Name of Test|opencv perf core Absdiff|opencv perf core Absdiff|opencv perf core Absdiff vs opencv perf core Absdiff (x-factor)|
|---|:-:|:-:|:-:|
|Absdiff::OCL_AbsDiffFixture::(640x480, 8UC1)|23.104|5.972|3.87|
|Absdiff::OCL_AbsDiffFixture::(640x480, 32FC1)|39.500|40.830|0.97|
|Absdiff::OCL_AbsDiffFixture::(640x480, 8UC3)|69.155|15.051|4.59|
|Absdiff::OCL_AbsDiffFixture::(640x480, 32FC3)|118.715|120.509|0.99|
|Absdiff::OCL_AbsDiffFixture::(640x480, 8UC4)|93.001|19.770|4.70|
|Absdiff::OCL_AbsDiffFixture::(640x480, 32FC4)|161.136|160.791|1.00|
|Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC1)|69.211|15.140|4.57|
|Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC1)|118.762|119.263|1.00|
|Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC3)|212.414|44.692|4.75|
|Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC3)|367.512|366.569|1.00|
|Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC4)|285.337|59.708|4.78|
|Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC4)|490.395|491.118|1.00|
|Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC1)|158.827|33.462|4.75|
|Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC1)|273.503|273.668|1.00|
|Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC3)|484.175|100.520|4.82|
|Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC3)|828.758|829.689|1.00|
|Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC4)|648.592|137.195|4.73|
|Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC4)|1116.755|1109.587|1.01|
|Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC1)|648.715|134.875|4.81|
|Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC1)|1115.939|1113.818|1.00|
|Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC3)|1944.791|413.420|4.70|
|Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC3)|3354.193|3324.672|1.01|
|Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC4)|2594.585|553.486|4.69|
|Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC4)|4473.543|4438.453|1.01|

**Bitwise Operation**

Geometric mean (ms)

|Name of Test|opencv perf core Bit|opencv perf core Bit|opencv perf core Bit vs opencv perf core Bit (x-factor)|
|---|:-:|:-:|:-:|
|Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC1)|22.542|4.971|4.53|
|Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC1)|90.210|19.917|4.53|
|Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC3)|68.429|15.037|4.55|
|Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC3)|280.168|59.239|4.73|
|Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC4)|90.565|19.735|4.59|
|Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC4)|374.695|79.257|4.73|
|Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC1)|67.824|14.873|4.56|
|Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC1)|279.514|59.232|4.72|
|Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC3)|208.337|44.234|4.71|
|Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC3)|851.211|182.522|4.66|
|Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC4)|279.529|59.095|4.73|
|Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC4)|1132.065|244.877|4.62|
|Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC1)|155.685|33.078|4.71|
|Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC1)|635.253|137.482|4.62|
|Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC3)|474.494|100.166|4.74|
|Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC3)|1907.340|412.841|4.62|
|Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC4)|635.538|134.544|4.72|
|Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC4)|2552.666|556.397|4.59|
|Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC1)|634.736|136.355|4.66|
|Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC1)|2548.283|561.827|4.54|
|Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC3)|1911.454|421.571|4.53|
|Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC3)|7663.803|1677.289|4.57|
|Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC4)|2543.983|562.780|4.52|
|Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC4)|10211.693|2237.393|4.56|
|Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC1)|22.341|4.811|4.64|
|Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC1)|89.975|19.288|4.66|
|Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC3)|67.237|14.643|4.59|
|Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC3)|276.324|58.609|4.71|
|Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC4)|89.587|19.554|4.58|
|Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC4)|370.986|77.136|4.81|
|Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC1)|67.227|14.541|4.62|
|Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC1)|276.357|58.076|4.76|
|Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC3)|206.752|43.376|4.77|
|Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC3)|841.638|177.787|4.73|
|Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC4)|276.773|57.784|4.79|
|Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC4)|1127.740|237.472|4.75|
|Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC1)|153.808|32.531|4.73|
|Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC1)|627.765|129.990|4.83|
|Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC3)|469.799|98.249|4.78|
|Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC3)|1893.591|403.694|4.69|
|Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC4)|627.724|129.962|4.83|
|Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC4)|2529.967|540.744|4.68|
|Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC1)|628.089|130.277|4.82|
|Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC1)|2521.817|540.146|4.67|
|Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC3)|1905.004|404.704|4.71|
|Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC3)|7567.971|1627.898|4.65|
|Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC4)|2531.476|540.181|4.69|
|Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC4)|10075.594|2181.654|4.62|
|Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC1)|22.566|5.076|4.45|
|Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC1)|90.391|19.928|4.54|
|Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC3)|67.758|14.740|4.60|
|Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC3)|279.253|59.844|4.67|
|Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC4)|90.296|19.802|4.56|
|Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC4)|373.972|79.815|4.69|
|Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC1)|67.815|14.865|4.56|
|Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC1)|279.398|60.054|4.65|
|Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC3)|208.643|45.043|4.63|
|Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC3)|850.042|180.985|4.70|
|Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC4)|279.363|60.385|4.63|
|Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC4)|1134.858|243.062|4.67|
|Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC1)|155.212|33.155|4.68|
|Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC1)|634.985|134.911|4.71|
|Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC3)|474.648|100.407|4.73|
|Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC3)|1912.049|414.184|4.62|
|Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC4)|635.252|132.587|4.79|
|Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC4)|2544.471|560.737|4.54|
|Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC1)|634.574|134.966|4.70|
|Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC1)|2545.129|561.498|4.53|
|Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC3)|1910.900|419.365|4.56|
|Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC3)|7662.603|1685.812|4.55|
|Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC4)|2548.971|560.787|4.55|
|Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC4)|10201.407|2237.552|4.56|
|Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC1)|22.718|4.961|4.58|
|Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC1)|91.496|19.831|4.61|
|Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC3)|67.910|15.151|4.48|
|Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC3)|279.612|59.792|4.68|
|Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC4)|91.073|19.853|4.59|
|Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC4)|374.641|79.155|4.73|
|Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC1)|67.704|15.008|4.51|
|Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC1)|279.229|60.088|4.65|
|Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC3)|208.156|44.426|4.69|
|Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC3)|849.501|180.848|4.70|
|Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC4)|279.642|59.728|4.68|
|Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC4)|1129.826|242.880|4.65|
|Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC1)|155.585|33.354|4.66|
|Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC1)|634.090|134.995|4.70|
|Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC3)|474.931|99.598|4.77|
|Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC3)|1910.519|413.138|4.62|
|Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC4)|635.026|135.155|4.70|
|Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC4)|2560.167|560.838|4.56|
|Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC1)|634.893|134.883|4.71|
|Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC1)|2548.166|560.831|4.54|
|Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC3)|1911.392|419.816|4.55|
|Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC3)|7646.634|1677.988|4.56|
|Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC4)|2560.637|560.805|4.57|
|Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC4)|10227.044|2249.458|4.55|

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants