Optimization based on RISC-V P Packed SIMD Extension v0.5.2 by Junyan721113 · Pull Request #24556 · opencv/opencv

Junyan721113 · 2023-11-19T09:31:03Z

Summary

Provides OpenCV optimizations for the RISC-V P extension (v0.5.2).

Added RVP as a new backend to the OpenCV build system;
Optimized some of the algorithms in the DNN, features2d (feature detection), and imgproc (image processing) modules using RVP Intrinsic functions;
Verified the correctness of the optimized algorithms using the QEMU simulator.

The writer of the code and the author of the PR is an intern at ISCAS (Institute of Software, Chinese Academy of Sciences).

List of RVP optimizations

Optimization of three convolution functions for int8 layers of deep neural networks

// modules/dnn/src/int8layers/layers_common.simd.hpp
void cv::dnn::fastConv( ... );
void cv::dnn::fastDepthwiseConv( ... );
void cv::dnn::fastGEMM1T( ... );

Optimization of matrix affine transformations

// modules/imgproc/src/imgwarp.rvp.cpp
int cv::opt_RVP::warpAffineBlockline( ... );

Optimization of nearest neighbor interpolation for matrix scaling with pix_size 2 or 4

// modules/imgproc/src/resize.rvp.cpp
class cv::opt_RVP::resizeNNInvokerRVP4;
class cv::opt_RVP::resizeNNInvokerRVP2;

Optimization of Array Accumulation with Squares or Element Multiplication

// modules/imgproc/src/accum.simd.hpp
void accSqr_simd_( ... );
void accProd_simd_( ... );

Optimization of integral for unsigned char arrays

// modules/imgproc/src/sumpixels.simd.hpp
template <>
struct Integral_SIMD<uchar, int, double>;

Optimization of FAST corner detection algorithm with patternSize 16

// modules/features2d/src/fast.rvp.cpp
class cv::opt_RVP::FAST_t_patternSize16_RVP;

Correctness validation (QEMU)

opencv_test_dnn_rvp Consistent with control (before adding RVP optimization)

opencv_test_imgproc_rvp Consistent with controls

opencv_test_features2d_rvp Consistent with controls

Q&A

Why RVP ?

As a lightweight extension, there is some potential for P extensions to be used in the embedded domain.

Why v0.5.2 ?

Although RVP is not frozen, Andes has massive plans based on version 0.5.2, just like T-Head and RVV071.

Why not Universal Intrinsics ?

RVP052 has no floating-point arithmetic and only supports parallel arithmetic up to 64 bits, which makes it less capable of implementing Universal Intrinsics, and thus most of its optimizations refer to existing function-specific optimizations.

How to perform tests ?

The correctness tests are as follows. (Due to hardware issues, performance test results are not available at this time)

Environment

export RISCV=/opt/andes
export OPENCV_TEST_DATA_PATH=**path_to_opencv_extra**/testdata

Toolchain

nds-gnu-toolchain

build_linux_toolchain.sh

TARGET=riscv64-linux
PREFIX=/opt/andes
ARCH=rv64imafdcxandes
ABI=lp64d
CPU=andes-25-series
XLEN=64
BUILD=`pwd`/build-nds64le-linux-glibc-v5d

Qemu

qemu

../configure --prefix=/opt/andes --target-list=riscv32-linux-user,riscv64-linux-user --disable-werror --static

Build

cmake -D CMAKE_BUILD_TYPE=Debug -D CMAKE_INSTALL_PREFIX=/opt/andes -D BUILD_SHARED_LIBS=OFF --toolchain ../platforms/linux/riscv64-andes-gcc.toolchain.cmake ..

Related Tests

dnn module test

qemu-riscv64 -cpu andes-ax25 -L /opt/riscv/sysroot opencv_test_dnn
# int8layers/layers_common_simd.hpp
# --gtest_filter=*Int8*
# --gtest_filter=*Conv*
# --gtest_filter=*Gemm*

imgproc module test

qemu-riscv64 -cpu andes-ax25 -L /opt/riscv/sysroot opencv_test_imgproc
# imgwarp.rvp.cpp
# --gtest_filter=*Affine*
#
# resize.rvp.cpp
# --gtest_filter=*Resize*
#
# sumpixels.simd.hpp
# --gtest_filter=*Integ*

features2d module test

qemu-riscv64 -cpu andes-ax25 -L /opt/riscv/sysroot opencv_test_features2d
# fast.rvp.cpp
# --gtest_filter=*FAST*
# --gtest_filter=*ORB*

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

asmorkalov · 2023-11-20T07:10:22Z

cc @hanliutong @vpisarev

asmorkalov · 2023-11-20T07:28:54Z

@mshabunin Is it possible to add P extension to QEMU configuration on CI? It should help a lot.

vpisarev · 2023-11-20T10:43:12Z

@Junyan721113, thank you for the contribution! This is a useful effort.

In the long term, however, it will be extremely difficult for our small team to maintain 1000 different branches of the same code. We do it, sometimes, for critical paths in critical modules, such as deep learning convolution etc., but for general-purpose functions using platform-specific intrinsics is too much. Please, consider implementing universal intrinsics backend instead: https://github.com/opencv/opencv/tree/4.x/modules/core/include/opencv2/core/hal.

In this case many hundreds of optimized loops in OpenCV can immediately make use of these instructions. Many other backends rely on 128-bit extensions, whereas P-extension is 64-bit, as far as I know. The solution could be to use a pair of registers to emulate 128-bit simd register.

mshabunin · 2023-11-21T11:38:34Z

I have several questions, concerns and suggestions.

Lower level or technical:

CPU check uses __nd__ prefix while other code uses __rv__v_ prefix
code uses nds_intrinsic.h header, but I have seen other variant - riscv-dsp.h in the T-Head toolchain.
you claim that this is v0.5.2, but P-extension revision history states that __nds__ prefix has been replaced with to __rv__ in v0.8
you used -mext-dsp GCC option for enabling this extension, but it seem to be toolchain-specific option because generic GCC doesn't have it. T-Head toolchain, for example, uses common ISA-string syntax: -mcpu=rv64gcp.

Higher level or more strategic questions and proposals:

As you wrote, the P-extension differs from RVV thus can not be easily implemented via Universal Intrinsics mechanism, but there is another HAL mechanism for lower-level CPU optimizations which is used by the Carotene library on ARM platforms. I suggest moving all non-dnn code to similar third-party component. For example, FAST algorithm should allow such optimization-shortcut: see https://github.com/opencv/opencv/blob/4.x/modules/features2d/src/hal_replacement.hpp
Reference documentation is here:

https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html
https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html
https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html

Carotene library is turned on here:

opencv/CMakeLists.txt

Lines 906 to 911 in 8bbf08f

    
           if(WITH_CAROTENE) 
        
             ocv_debug_message(STATUS "Enable carotene acceleration") 
        
             if(NOT ";${OpenCV_HAL};" MATCHES ";carotene;") 
        
               set(OpenCV_HAL "carotene;${OpenCV_HAL}") 
        
             endif() 
        
           endif()

Is T-Head DSP implementation compatible with Andes? Is it possible to implement this optimization in a way compatible with both platforms?
P-extension documentation has v0.9.11 already, several incompatible changes have been added there since v0.5.2 and v0.8. For example, all intrinsics should now have __rv_ prefix instead of __rv__. Is it possible to distinguish between the extension revisions and either support multiple of them or only a single one? We already had similar problems with RVV and RVV intrinsics specifications: new spec comes out and our code becomes broken and now we have to support multiple revisions.
Is there any consumer-grade harware available for purchase for real tests?
Do you know about any plans to add P-extension support to the mainline GCC and LLVM toolchains and the mainline QEMU? It is OK to use custom toolchain for development for specific device, but we try to use more generic approaches to optimizations.

Junyan721113 · 2023-11-21T11:56:04Z

@Junyan721113, thank you for the contribution! This is a useful effort.

In the long term, however, it will be extremely difficult for our small team to maintain 1000 different branches of the same code. We do it, sometimes, for critical paths in critical modules, such as deep learning convolution etc., but for general-purpose functions using platform-specific intrinsics is too much. Please, consider implementing universal intrinsics backend instead: https://github.com/opencv/opencv/tree/4.x/modules/core/include/opencv2/core/hal.

Thank you for your guidance! Most of the current optimizations for P extensions are where other platform-specific optimizations already exist (such as int8layers/layers_common.simd.hpp). I would like to know exactly what parts of the code "critical paths in critical modules" refer to, so that P extensions can be optimized in other ways if Universal Intrinsics is not possible.

In this case many hundreds of optimized loops in OpenCV can immediately make use of these instructions. Many other backends rely on 128-bit extensions, whereas P-extension is 64-bit, as far as I know. The solution could be to use a pair of registers to emulate 128-bit simd register.

However, I'm sorry to say that I'm currently having trouble implementing Universal Intrinsics with the P extension for the following reasons:

P extensions do not have floating point instructions, thus making it difficult to implement the floating point vector part of Universal Intrinsics; moreover, P extensions do not have vector registers, limiting many optimization operations.
Another solution is to fall back to a pure C++ implementation of Universal Intrinsics on floating-point vectors, but this may lead to negative optimizations, just as RVV generates redundant Load/Stores. (modules/core/include/opencv2/core/hal/intrin_rvv.hpp)

Junyan721113 · 2023-12-05T10:42:41Z

CPU check uses __nd__ prefix while other code uses __rv__v_ prefix

you claim that this is v0.5.2, but P-extension revision history states that __nds__ prefix has been replaced with to __rv__ in v0.8

This is my fault. RVP v0.5.2 should use __nds__ prefix rather than __rv__ prefix.

code uses nds_intrinsic.h header, but I have seen other variant - riscv-dsp.h in the T-Head toolchain.

you used -mext-dsp GCC option for enabling this extension, but it seem to be toolchain-specific option because generic GCC doesn't have it. T-Head toolchain, for example, uses common ISA-string syntax: -mcpu=rv64gcp.

I'm sorry, but Andes toolchain uses nds_intrinsic.h as header, and the -mext-dsp option is documented in Andes DSP Library.

As you wrote, the P-extension differs from RVV thus can not be easily implemented via Universal Intrinsics mechanism, but there is another HAL mechanism for lower-level CPU optimizations which is used by the Carotene library on ARM platforms. I suggest moving all non-dnn code to similar third-party component. For example, FAST algorithm should allow such optimization-shortcut: see https://github.com/opencv/opencv/blob/4.x/modules/features2d/src/hal_replacement.hpp
Reference documentation is here:

https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html

https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html

https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html

Carotene library is turned on here:

opencv/CMakeLists.txt

Lines 906 to 911 in 8bbf08f

if(WITH_CAROTENE)

ocv_debug_message(STATUS "Enable carotene acceleration")

if(NOT ";${OpenCV_HAL};" MATCHES ";carotene;")

set(OpenCV_HAL "carotene;${OpenCV_HAL}")

endif()

endif()

As a test outside of this PR, A 3rdparty component called ndsrvp is created, containing one of the non-dnn code (integral_SIMD), and it works very well.
All the non-dnn code in this PR have been removed, currently this PR can be focused on dnn optinizations.
This HAL mechanism is quite suitable for rvp optimizations, all the non-dnn code is expected to be moved into ndsrvp soon.

Is T-Head DSP implementation compatible with Andes? Is it possible to implement this optimization in a way compatible with both platforms?

T-Head DSP implementation does not support __nds__ prefix, and has different intrinsic function definations using intXLEN_t and uintXLEN_t, so it is possibly incompatible. And this PR is only intended to add optimizations based on rvp v0.5.2, which is Andes RVP.

P-extension documentation has v0.9.11 already, several incompatible changes have been added there since v0.5.2 and v0.8. For example, all intrinsics should now have __rv_ prefix instead of __rv__. Is it possible to distinguish between the extension revisions and either support multiple of them or only a single one? We already had similar problems with RVV and RVV intrinsics specifications: new spec comes out and our code becomes broken and now we have to support multiple revisions.

Supporting only v0.5.2 might be the best solution of this PR. RVP is renamed to RVP052 in order to distinguish RVP revisions.
Andes has plans for RVP052, just as T-Head has plans for RVV071.

Is there any consumer-grade harware available for purchase for real tests?

Communication has been made with Andes, development board will soon be available for perfromance tests.

Do you know about any plans to add P-extension support to the mainline GCC and LLVM toolchains and the mainline QEMU? It is OK to use custom toolchain for development for specific device, but we try to use more generic approaches to optimizations.

I'm sorry, but currently I don't know about any plans related to Andes adding support to mainline.

mshabunin

I suggest simplifying CPU-feature part: instead of adding RVP052 as a separate CPU feature, let's use custom macro defined in cmake toolchain file, like it is done in platforms/linux/riscv64-071-gcc.toolchain.cmake.

Basically you have to revert all core modifications and add some macro definition to the riscv64-andes-gcc.toolchain.cmake (e.g. -D__riscv_andes_rvp052 or maybe there is one built into the compiler already?). Then use plain #ifdef guard for optimized code sections.

Tricky part is dispatched fastConv, fastDepthwiseConv and fastGEMM - I suggest adding new files conv_depthwise.rvp052.cpp/.hpp with your implementation and include/call it if that macro is enabled.

Probably some additional cmake variable should be set in the toolchain file, so that dnn/CMakeLists.txt would know when to add new rvp052.cpp files to the build (or it can be just guarded by the same macro and added to the build unconditionally).

cc @opencv-alalek , what do you think?

opencv-alalek · 2023-12-08T19:19:58Z

CPU features uses common principles for detection / control / compilation / execution and diagnostic.
We could work without all of this, but it doesn't look like a reliable process.

platforms/linux/riscv64-071-gcc.toolchain.cmake

Could we reuse generic RISC-V toolchains? (with appropriate CPU_BASELINE/CPU_DISPATCH CMake parameters)

mshabunin · 2023-12-08T20:28:16Z

CPU features uses common principles for detection / control / compilation / execution and diagnostic.
We could work without all of this, but it doesn't look like a reliable process.

Yes, in general I agree, but in this specific case - limited HW availability, specialized toolchain, non-ratified extension, which is not available in generic toolchains - it looks more like RVV 0.7.1. Also there is no actual runtime check for this extension, so dispatched implementations do not make sense, in this PR dispatching was implemented only because of DNN module specifics (no hal::, no Universal Inrinsics, raw SIMD blocks, existing dispatching).

So, IMHO experimental less-invasive approach similar to early RVV 0.7.1 would fit better than generalized P-extension support. Later, when various implementations converge to some stable form and the extension is supported in the upstream, we will implement it as a full-fledged CPU feature.

Junyan721113 · 2023-12-12T08:08:54Z

Tricky part is dispatched fastConv, fastDepthwiseConv and fastGEMM - I suggest adding new files conv_depthwise.rvp052.cpp/.hpp with your implementation and include/call it if that macro is enabled.

Files with .rvp052.cpp suffix could trigger CMake CPU dispatch filter, resulting in Excluding from source files list: modules/dnn/src/int8layers/conv_depthwise.rvp052.cpp, so conv_depthwise.dispatch.cpp may be a better solution.

As for marcos, there are 2 marcos called __ANDES and __riscv_dsp filling the need.

Meanwhile, I wonder if it is acceptable to implement all these 3 convolution functions inside one conv_depthwise.dispatch.cpp file (maybe renaming it to layers_common.dispatch.cpp is better?), rather than put them in 3 .cpp files.

In total, is the following code acceptable?

// modules/core/include/opencv2/core/cv_cpu_dispatch.h
#if defined(__riscv) && defined(__riscv_dsp) && defined(__ANDES)
# include <nds_intrinsic.h>
# define CV_RVP052 1
#endif

// modules/dnn/src/int8layers/layers_common.simd.hpp
#include "layers_common.dispatch.hpp"

// modules/dnn/src/int8layers/layers_common.dispatch.cpp
namespace cv {
namespace dnn {
namespace opt_RVP052 {

#if CV_RVP052
//RVP Optimizations

// modules/dnn/src/int8layers/convolution_layer.cpp
#if CV_RVP052
    if(isConv2D)
        opt_RVP052::fastDepthwiseConv(wptr, kernel_h, kernel_w,
            stride_h, stride_w, dilation_h, dilation_w, pad_t, pad_l,
            biasptr, multptr, inptr_, height, width, outptr_, out_d, outH, outW, inpZp, outZp);
    else

mshabunin · 2023-12-20T09:56:33Z

modules/dnn/src/int8layers/layers_common.dispatch.cpp

I suggest renaming files to something like layers_rvp052.cpp/.hpp to avoid confusion with .dispatch files in other modules because they usually serve different purpose.

Disable whole .cpp body if macro is not defined or is false and include .hpp file into layers_common.hpp with the same macro condition.

mshabunin · 2023-12-20T09:58:12Z

modules/dnn/src/int8layers/convolution_layer.cpp

                            else
+                        #endif
+                        #if CV_RVP052
+                            if(useRVP052)


useRVP052 is always the same as CV_RVP052 and does not have external interface, so I suggest removing boolean flag completely. Here and in other files.

In fully_connected_layer.cpp this is absolutely right. But in convolution_layer.cpp, useRVP052 is not always the same as CV_RVP052, because of line 769 p.useRVP052 = CV_RVP052 && isConv2D; introducing a little difference.
So change this boolean flag into isConv2D might be better.

mshabunin · 2023-12-20T10:00:35Z

modules/core/include/opencv2/core/cv_cpu_dispatch.h

I suggest moving these changes to the dnn module, maybe to int8layers/layers_common.hpp?

In layers_rvp052.cpp, including layers_common.hpp to get CV_RVP052 could cause HAVE_OPENCL malfunction as follows:

In file included from /home/junyan/opencv_rvp/modules/dnn/src/int8layers/./layers_common.hpp:17, from /home/junyan/opencv_rvp/modules/dnn/src/int8layers/layers_rvp052.cpp:5: /home/junyan/opencv_rvp/modules/dnn/src/int8layers/./../ocl4dnn/include/ocl4dnn.hpp:196:9: error: 'ocl' does not name a type; did you mean 'ogl'? 196 | ocl::Program compileKernel(); | ^~~ | ogl

So maybe moving them into layers_rvp052.hpp is better.

mshabunin · 2023-12-20T10:01:04Z

modules/dnn/src/int8layers/layers_common.simd.hpp

Modifications in this file will not be necessary.

Junyan721113 · 2024-02-28T04:51:19Z

Development boards for accuracy test and performance test have been set up, results will soon come out.

Junyan721113 · 2024-03-02T08:37:32Z

Here's the accuracy test and performance test results!

TL; DR: EfficientDet_int8 in opencv_perf_dnn have gained a 1.95x performance boost.

The 3 functions optimized by RVP only appeared in the following tests:

./opencv_test_dnn --gtest_filter=*EfficientDet_int8*:*Quant*:*Int8* --gtest_output=xml
./opencv_perf_dnn --gtest_filter=*EfficientDet_int8* --gtest_output=xml

Meanwhile Test_Int8_nets.CaffeNet and Test_Int8_nets.RCNN_ILSVRC13 took up too much memory to be run on the board.

So the final filter is:

./opencv_test_dnn --gtest_filter=*EfficientDet_int8*:*Quant*:*Int8*--*CaffeNet*:*RCNN_ILSVRC13* --gtest_output=xml
./opencv_perf_dnn --gtest_filter=*EfficientDet_int8* --gtest_output=xml

opencv_perf_dnn summary

> python .\misc\summary.py .\opencv_bin_blank\opencv_perf_dnn.xml .\opencv_bin_rvp\opencv_perf_dnn.xml
Geometric mean (ms)

               Name of Test                 opencv    opencv     opencv
                                             perf      perf       perf
                                              dnn       dnn       dnn
                                                                   vs
                                                                 opencv
                                                                  perf
                                                                  dnn
                                                               (x-factor)
EfficientDet_int8::DNNTestNetwork::OCV/CPU 42451.011 21728.436    1.95

opencv_perf_dnn optimized

<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2020-09-22T16:39:16" time="241.631" cv_module_name="dnn" cv_implementation="plain" cv_num_threads="-1" test_tags="" test_tags_skip="mem_6gb,verylong,debug_verylong" test_tags_force="" cv_version="4.9.0-dev" cv_version_build="4.9.0-dev" cv_vcs_version="4.9.0-216-g09c6961694-dirty" cv_build_type="Debug" cv_build_type_build="Debug" cv_compiler="/home/junyan/opt/andes/bin/riscv64-linux-g++  (ver 10.3.0)" cv_parallel_framework="pthreads" cv_parallel_threads="1" cv_cpu_features="" cv_ocl="disabled" name="AllTests">
  <testsuite name="DNNTestNetwork" tests="1" failures="0" disabled="0" errors="0" time="241.625">
    <testcase name="EfficientDet_int8/0" value_param="OCV/CPU" status="run" time="241.623" classname="DNNTestNetwork">
<properties>
<property name="ocl_memory_usage" value="0"/>
<property name="bytesIn" value="0"/>
<property name="bytesOut" value="0"/>
<property name="term" value="1"/>
<property name="samples" value="10"/>
<property name="outliers" value="0"/>
<property name="frequency" value="1000000000"/>
<property name="min" value="21683049745"/>
<property name="median" value="21712831994"/>
<property name="gmean" value="21728435820"/>
<property name="gstddev" value="0.002588"/>
<property name="mean" value="21728501353"/>
<property name="stddev" value="56310680"/>
</properties>
    </testcase>
  </testsuite>
</testsuites>

opencv_perf_dnn control

<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="1" failures="0" disabled="0" errors="0" timestamp="2020-09-22T15:10:47" time="469.468" cv_module_name="dnn" cv_implementation="plain" cv_num_threads="-1" test_tags="" test_tags_skip="mem_6gb,verylong,debug_verylong" test_tags_force="" cv_version="4.9.0-dev" cv_version_build="4.9.0-dev" cv_vcs_version="4.9.0-212-g0e44f3a544-dirty" cv_build_type="Debug" cv_build_type_build="Debug" cv_compiler="/home/junyan/opt/andes/bin/riscv64-linux-g++  (ver 10.3.0)" cv_parallel_framework="pthreads" cv_parallel_threads="1" cv_cpu_features="" cv_ocl="disabled" name="AllTests">
  <testsuite name="DNNTestNetwork" tests="1" failures="0" disabled="0" errors="0" time="469.462">
    <testcase name="EfficientDet_int8/0" value_param="OCV/CPU" status="run" time="469.46" classname="DNNTestNetwork">
<properties>
<property name="ocl_memory_usage" value="0"/>
<property name="bytesIn" value="0"/>
<property name="bytesOut" value="0"/>
<property name="term" value="1"/>
<property name="samples" value="10"/>
<property name="outliers" value="0"/>
<property name="frequency" value="1000000000"/>
<property name="min" value="42387522406"/>
<property name="median" value="42406298532"/>
<property name="gmean" value="42451010572"/>
<property name="gstddev" value="0.001703"/>
<property name="mean" value="42451066023"/>
<property name="stddev" value="72351781"/>
</properties>
    </testcase>
  </testsuite>
</testsuites>

opencv_test_dnn summary

Testcases shorter than 1s are not shown above.

How the graph came out:

import xml.etree.ElementTree as ET
import matplotlib.pyplot as plt

# Read the XML files and extract the mean values

rvp_file = 'opencv_bin_rvp/opencv_test_dnn.xml'
blank_file = 'opencv_bin_blank/opencv_test_dnn.xml'

# parse the XML files

rvp_data = ET.parse(rvp_file).getroot()
blank_data = ET.parse(blank_file).getroot()

print(rvp_data.tag, rvp_data.attrib)

test_names = []
for testsuite in rvp_data.iter(tag='testsuite'):
    # print(testsuite.tag, testsuite.attrib)
    test_names.append('Total: ' + testsuite.attrib['name'])
    for testcase in testsuite.iter(tag='testcase'):
        # print(testcase.tag, testcase.attrib)
        test_names.append(testcase.attrib['name'])

# keyw = 'mean'
keyw = 'time'

rvp_means = []
for testsuite in rvp_data.iter(tag='testsuite'):
    rvp_means.append(float(testsuite.attrib['time']))
    for testcase in testsuite.iter(tag='testcase'):
        # print(testcase.tag, testcase.attrib)
        if keyw not in testcase.attrib:
            continue
        rvp_means.append(float(testcase.attrib[keyw]))

blank_means = []
for testsuite in blank_data.iter(tag='testsuite'):
    blank_means.append(float(testsuite.attrib['time']))
    for testcase in testsuite.iter(tag='testcase'):
        # print(testcase.tag, testcase.attrib)
        if keyw not in testcase.attrib:
            continue
        blank_means.append(float(testcase.attrib[keyw]))

print(rvp_means)
print(blank_means)

ratio = [blank_means[i] / rvp_means[i] for i in range(len(rvp_means)) if rvp_means[i] >= 1.0]

# Remove trivial cases

test_names = [test_names[i] for i in range(len(ratio))] # if ratio[i] > 1.05 or ratio[i] < 0.95]
ratio = [ratio[i] for i in range(len(ratio))] # if ratio[i] > 1.05 or ratio[i] < 0.95]

# Plot the bar chart
fig, ax = plt.subplots()
ax.bar(range(len(ratio)), ratio, color='b')
ax.set_xlabel('Test case')
ax.set_ylabel('Speedup')
ax.set_title('Speedup of RVP over blank')
ax.set_xticks(range(len(ratio)))
ax.set_xticklabels(test_names, rotation=90)
ax.set_yticks(range(0, 6, 1))
ax.set_yticklabels([f'{i}x' for i in range(0, 6, 1)])
ax.axhline(y=1, color='r', linestyle='--')
ax.grid(True, axis='y')

# margin the plot
plt.tight_layout()

# Save the plot
# plt.savefig('speedup.png')

# Show the plot

plt.show()

opencv_test_dnn optimized

<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="86" failures="0" disabled="2" errors="0" timestamp="2020-09-22T13:21:45" time="2233.4" cv_version="4.9.0-dev" cv_version_build="4.9.0-dev" cv_vcs_version="4.9.0-216-g09c6961694-dirty" cv_build_type="Debug" cv_build_type_build="Debug" cv_compiler="/home/junyan/opt/andes/bin/riscv64-linux-g++  (ver 10.3.0)" cv_parallel_framework="pthreads" cv_parallel_threads="1" cv_cpu_features="" cv_ocl="disabled" test_tags="" test_tags_skip="mem_6gb,verylong,debug_verylong,dnn_skip_opencv_backend,dnn_skip_cpu,dnn_skip_cpu_fp16,dnn_skip_ocl,dnn_skip_ocl_fp16,dnn_skip_onnx_conformance,dnn_skip_parser" test_tags_force="" name="AllTests">
  <testsuite name="Test_Int8_layers" tests="40" failures="0" disabled="2" errors="0" time="6.545">
    <testcase name="Convolution1D/0" value_param="OCV/CPU" status="run" time="0.091" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Convolution2D/0" value_param="OCV/CPU" status="run" time="0.856" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Convolution3D/0" value_param="OCV/CPU" status="run" time="0.074" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Flatten/0" value_param="OCV/CPU" status="run" time="0.114" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Padding/0" value_param="OCV/CPU" status="run" time="0.266" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="AvePooling/0" value_param="OCV/CPU" status="run" time="0.348" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MaxPooling/0" value_param="OCV/CPU" status="run" time="0.445" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Reduce/0" value_param="OCV/CPU" status="run" time="0.24" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="ReLU/0" value_param="OCV/CPU" status="run" time="0.126" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="LeakyReLU/0" value_param="OCV/CPU" status="run" time="0.015" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="ReLU6/0" value_param="OCV/CPU" status="run" time="0.065" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Sigmoid/0" value_param="OCV/CPU" status="run" time="0.026" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Sigmoid_dynamic_axes/0" value_param="OCV/CPU" status="run" time="0.037" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Sigmoid_1d/0" value_param="OCV/CPU" status="run" time="0.045" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Mish/0" value_param="OCV/CPU" status="run" time="0.035" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_Caffe/0" value_param="OCV/CPU" status="run" time="0.174" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_keras_TF/0" value_param="OCV/CPU" status="run" time="0.02" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_slim_TF/0" value_param="OCV/CPU" status="run" time="0.026" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_slim_v2_TF/0" value_param="OCV/CPU" status="run" time="0.036" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_ONNX/0" value_param="OCV/CPU" status="run" time="0.021" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_log_ONNX/0" value_param="OCV/CPU" status="run" time="0.02" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="DISABLED_Softmax_unfused_ONNX/0" value_param="OCV/CPU" status="notrun" time="0" classname="Test_Int8_layers" />
    <testcase name="Concat/0" value_param="OCV/CPU" status="run" time="0.22" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="BatchNorm/0" value_param="OCV/CPU" status="run" time="0.411" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Scale/0" value_param="OCV/CPU" status="run" time="0.143" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="InnerProduct/0" value_param="OCV/CPU" status="run" time="1.244" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Reshape/0" value_param="OCV/CPU" status="run" time="0.412" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Permute/0" value_param="OCV/CPU" status="run" time="0.045" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Identity/0" value_param="OCV/CPU" status="run" time="0.077" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_split_tf/0" value_param="OCV/CPU" status="run" time="0.02" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_4d_tf/0" value_param="OCV/CPU" status="run" time="0.022" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_strided_tf/0" value_param="OCV/CPU" status="run" time="0.024" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="DISABLED_Slice_onnx/0" value_param="OCV/CPU" status="notrun" time="0" classname="Test_Int8_layers" />
    <testcase name="Slice_dynamic_axes_onnx/0" value_param="OCV/CPU" status="run" time="0.026" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_steps_2d_onnx11/0" value_param="OCV/CPU" status="run" time="0.042" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_steps_3d_onnx11/0" value_param="OCV/CPU" status="run" time="0.053" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_steps_4d_onnx11/0" value_param="OCV/CPU" status="run" time="0.041" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_steps_5d_onnx11/0" value_param="OCV/CPU" status="run" time="0.042" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Dropout/0" value_param="OCV/CPU" status="run" time="0.143" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Eltwise/0" value_param="OCV/CPU" status="run" time="0.433" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
  </testsuite>
  <testsuite name="Test_Int8_nets" tests="24" failures="0" disabled="0" errors="0" time="2172.39">
    <testcase name="AlexNet/0" value_param="OCV/CPU" status="run" time="81.558" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
<property name="tags" value="mem_512mb"/>
<property name="tags_implied" value=""/>
</properties>
    </testcase>
    <testcase name="GoogLeNet/0" value_param="OCV/CPU" status="run" time="237.368" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="ResNet50/0" value_param="OCV/CPU" status="run" time="0.065" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb,debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="DenseNet121/0" value_param="OCV/CPU" status="run" time="215.475" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
<property name="tags" value="mem_512mb"/>
<property name="tags_implied" value=""/>
</properties>
    </testcase>
    <testcase name="SqueezeNet_v1_1/0" value_param="OCV/CPU" status="run" time="30.185" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Inception_v2/0" value_param="OCV/CPU" status="run" time="168.073" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MobileNet_v2/0" value_param="OCV/CPU" status="run" time="38.848" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Shufflenet/0" value_param="OCV/CPU" status="run" time="15.797" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MobileNet_SSD/0" value_param="OCV/CPU" status="run" time="89.717" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MobileNet_v1_SSD/0" value_param="OCV/CPU" status="run" time="99.273" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MobileNet_v1_SSD_PPN/0" value_param="OCV/CPU" status="run" time="92.041" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Inception_v2_SSD/0" value_param="OCV/CPU" status="run" time="368.112" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
<property name="tags" value="mem_512mb"/>
<property name="tags_implied" value=""/>
</properties>
    </testcase>
    <testcase name="opencv_face_detector/0" value_param="OCV/CPU" status="run" time="238.913" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="EfficientDet/0" value_param="OCV/CPU" status="run" time="0.002" classname="Test_Int8_nets">
<properties>
<property name="tags" value="debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="FasterRCNN_resnet50/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_1gb,long,debug_verylong"/>
<property name="tags_implied" value="mem_512mb,debug_long"/>
</properties>
    </testcase>
    <testcase name="FasterRCNN_inceptionv2/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_1gb,long,debug_verylong"/>
<property name="tags_implied" value="mem_512mb,debug_long"/>
</properties>
    </testcase>
    <testcase name="FasterRCNN_vgg16/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_2gb,long,debug_verylong"/>
<property name="tags_implied" value="mem_1gb,mem_512mb,debug_long"/>
</properties>
    </testcase>
    <testcase name="FasterRCNN_zf/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb,debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="RFCN/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb,long,debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="YoloVoc/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_1gb,long,debug_verylong"/>
<property name="tags_implied" value="mem_512mb,debug_long"/>
</properties>
    </testcase>
    <testcase name="TinyYoloVoc/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb,debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="YOLOv3/0" value_param="OCV/CPU" status="run" time="0" classname="Test_Int8_nets">
<properties>
<property name="tags" value="long,mem_1gb,debug_verylong"/>
<property name="tags_implied" value="debug_long,mem_512mb"/>
</properties>
    </testcase>
    <testcase name="YOLOv4/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="long,mem_1gb,debug_verylong"/>
<property name="tags_implied" value="debug_long,mem_512mb"/>
</properties>
    </testcase>
    <testcase name="YOLOv4_tiny/0" value_param="OCV/CPU" status="run" time="496.879" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb"/>
<property name="tags_implied" value=""/>
</properties>
    </testcase>
  </testsuite>
  <testsuite name="Test_ONNX_layers" tests="20" failures="0" disabled="0" errors="0" time="1.548">
    <testcase name="Quantized_Convolution/0" value_param="OCV/CPU" status="run" time="0.391" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_MatMul/0" value_param="OCV/CPU" status="run" time="0.133" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Gemm/0" value_param="OCV/CPU" status="run" time="0.039" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_MatMul_Variable_Weights/0" value_param="OCV/CPU" status="run" time="0.09" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Eltwise/0" value_param="OCV/CPU" status="run" time="0.051" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Eltwise_Scalar/0" value_param="OCV/CPU" status="run" time="0.041" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Eltwise_Broadcast/0" value_param="OCV/CPU" status="run" time="0.042" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_LeakyReLU/0" value_param="OCV/CPU" status="run" time="0.035" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Sigmoid/0" value_param="OCV/CPU" status="run" time="0.034" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_MaxPool/0" value_param="OCV/CPU" status="run" time="0.036" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_AvgPool/0" value_param="OCV/CPU" status="run" time="0.037" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Split/0" value_param="OCV/CPU" status="run" time="0.045" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Pad/0" value_param="OCV/CPU" status="run" time="0.045" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Reshape/0" value_param="OCV/CPU" status="run" time="0.036" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Transpose/0" value_param="OCV/CPU" status="run" time="0.035" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Squeeze/0" value_param="OCV/CPU" status="run" time="0.035" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Unsqueeze/0" value_param="OCV/CPU" status="run" time="0.035" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Resize/0" value_param="OCV/CPU" status="run" time="0.112" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Concat/0" value_param="OCV/CPU" status="run" time="0.081" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Constant/0" value_param="OCV/CPU" status="run" time="0.159" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
  </testsuite>
  <testsuite name="Test_ONNX_nets" tests="1" failures="0" disabled="0" errors="0" time="28.347">
    <testcase name="ResNet50_Int8/0" value_param="OCV/CPU" status="run" time="28.345" classname="Test_ONNX_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
  </testsuite>
  <testsuite name="Test_TFLite" tests="1" failures="0" disabled="0" errors="0" time="24.551">
    <testcase name="EfficientDet_int8/0" value_param="OCV/CPU" status="run" time="24.55" classname="Test_TFLite">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
  </testsuite>
</testsuites>

opencv_test_dnn control

<?xml version="1.0" encoding="UTF-8"?>
<testsuites tests="86" failures="0" disabled="2" errors="0" timestamp="2020-09-22T13:59:53" time="2899.68" cv_version="4.9.0-dev" cv_version_build="4.9.0-dev" cv_vcs_version="4.9.0-212-g0e44f3a544-dirty" cv_build_type="Debug" cv_build_type_build="Debug" cv_compiler="/home/junyan/opt/andes/bin/riscv64-linux-g++  (ver 10.3.0)" cv_parallel_framework="pthreads" cv_parallel_threads="1" cv_cpu_features="" cv_ocl="disabled" test_tags="" test_tags_skip="mem_6gb,verylong,debug_verylong,dnn_skip_opencv_backend,dnn_skip_cpu,dnn_skip_cpu_fp16,dnn_skip_ocl,dnn_skip_ocl_fp16,dnn_skip_onnx_conformance,dnn_skip_parser" test_tags_force="" name="AllTests">
  <testsuite name="Test_Int8_layers" tests="40" failures="0" disabled="2" errors="0" time="6.676">
    <testcase name="Convolution1D/0" value_param="OCV/CPU" status="run" time="0.108" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Convolution2D/0" value_param="OCV/CPU" status="run" time="0.902" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Convolution3D/0" value_param="OCV/CPU" status="run" time="0.073" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Flatten/0" value_param="OCV/CPU" status="run" time="0.114" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Padding/0" value_param="OCV/CPU" status="run" time="0.267" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="AvePooling/0" value_param="OCV/CPU" status="run" time="0.292" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MaxPooling/0" value_param="OCV/CPU" status="run" time="0.489" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Reduce/0" value_param="OCV/CPU" status="run" time="0.238" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="ReLU/0" value_param="OCV/CPU" status="run" time="0.127" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="LeakyReLU/0" value_param="OCV/CPU" status="run" time="0.014" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="ReLU6/0" value_param="OCV/CPU" status="run" time="0.064" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Sigmoid/0" value_param="OCV/CPU" status="run" time="0.025" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Sigmoid_dynamic_axes/0" value_param="OCV/CPU" status="run" time="0.027" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Sigmoid_1d/0" value_param="OCV/CPU" status="run" time="0.024" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Mish/0" value_param="OCV/CPU" status="run" time="0.023" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_Caffe/0" value_param="OCV/CPU" status="run" time="0.213" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_keras_TF/0" value_param="OCV/CPU" status="run" time="0.032" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_slim_TF/0" value_param="OCV/CPU" status="run" time="0.025" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_slim_v2_TF/0" value_param="OCV/CPU" status="run" time="0.034" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_ONNX/0" value_param="OCV/CPU" status="run" time="0.021" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Softmax_log_ONNX/0" value_param="OCV/CPU" status="run" time="0.021" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="DISABLED_Softmax_unfused_ONNX/0" value_param="OCV/CPU" status="notrun" time="0" classname="Test_Int8_layers" />
    <testcase name="Concat/0" value_param="OCV/CPU" status="run" time="0.231" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="BatchNorm/0" value_param="OCV/CPU" status="run" time="0.409" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Scale/0" value_param="OCV/CPU" status="run" time="0.095" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="InnerProduct/0" value_param="OCV/CPU" status="run" time="1.34" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Reshape/0" value_param="OCV/CPU" status="run" time="0.41" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Permute/0" value_param="OCV/CPU" status="run" time="0.045" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Identity/0" value_param="OCV/CPU" status="run" time="0.078" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_split_tf/0" value_param="OCV/CPU" status="run" time="0.02" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_4d_tf/0" value_param="OCV/CPU" status="run" time="0.022" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_strided_tf/0" value_param="OCV/CPU" status="run" time="0.024" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="DISABLED_Slice_onnx/0" value_param="OCV/CPU" status="notrun" time="0" classname="Test_Int8_layers" />
    <testcase name="Slice_dynamic_axes_onnx/0" value_param="OCV/CPU" status="run" time="0.026" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_steps_2d_onnx11/0" value_param="OCV/CPU" status="run" time="0.042" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_steps_3d_onnx11/0" value_param="OCV/CPU" status="run" time="0.057" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_steps_4d_onnx11/0" value_param="OCV/CPU" status="run" time="0.044" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Slice_steps_5d_onnx11/0" value_param="OCV/CPU" status="run" time="0.043" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Dropout/0" value_param="OCV/CPU" status="run" time="0.116" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Eltwise/0" value_param="OCV/CPU" status="run" time="0.476" classname="Test_Int8_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
  </testsuite>
  <testsuite name="Test_Int8_nets" tests="24" failures="0" disabled="0" errors="0" time="2740.59">
    <testcase name="AlexNet/0" value_param="OCV/CPU" status="run" time="97.623" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
<property name="tags" value="mem_512mb"/>
<property name="tags_implied" value=""/>
</properties>
    </testcase>
    <testcase name="GoogLeNet/0" value_param="OCV/CPU" status="run" time="300.924" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="ResNet50/0" value_param="OCV/CPU" status="run" time="0.031" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb,debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="DenseNet121/0" value_param="OCV/CPU" status="run" time="272.641" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
<property name="tags" value="mem_512mb"/>
<property name="tags_implied" value=""/>
</properties>
    </testcase>
    <testcase name="SqueezeNet_v1_1/0" value_param="OCV/CPU" status="run" time="38.057" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Inception_v2/0" value_param="OCV/CPU" status="run" time="208.418" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MobileNet_v2/0" value_param="OCV/CPU" status="run" time="47.593" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Shufflenet/0" value_param="OCV/CPU" status="run" time="18.378" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MobileNet_SSD/0" value_param="OCV/CPU" status="run" time="112.532" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MobileNet_v1_SSD/0" value_param="OCV/CPU" status="run" time="123.763" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="MobileNet_v1_SSD_PPN/0" value_param="OCV/CPU" status="run" time="115.315" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Inception_v2_SSD/0" value_param="OCV/CPU" status="run" time="464.413" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
<property name="tags" value="mem_512mb"/>
<property name="tags_implied" value=""/>
</properties>
    </testcase>
    <testcase name="opencv_face_detector/0" value_param="OCV/CPU" status="run" time="304.788" classname="Test_Int8_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="EfficientDet/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="FasterRCNN_resnet50/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_1gb,long,debug_verylong"/>
<property name="tags_implied" value="mem_512mb,debug_long"/>
</properties>
    </testcase>
    <testcase name="FasterRCNN_inceptionv2/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_1gb,long,debug_verylong"/>
<property name="tags_implied" value="mem_512mb,debug_long"/>
</properties>
    </testcase>
    <testcase name="FasterRCNN_vgg16/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_2gb,long,debug_verylong"/>
<property name="tags_implied" value="mem_1gb,mem_512mb,debug_long"/>
</properties>
    </testcase>
    <testcase name="FasterRCNN_zf/0" value_param="OCV/CPU" status="run" time="0.002" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb,debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="RFCN/0" value_param="OCV/CPU" status="run" time="0.002" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb,long,debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="YoloVoc/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_1gb,long,debug_verylong"/>
<property name="tags_implied" value="mem_512mb,debug_long"/>
</properties>
    </testcase>
    <testcase name="TinyYoloVoc/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb,debug_verylong"/>
<property name="tags_implied" value="debug_long"/>
</properties>
    </testcase>
    <testcase name="YOLOv3/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="long,mem_1gb,debug_verylong"/>
<property name="tags_implied" value="debug_long,mem_512mb"/>
</properties>
    </testcase>
    <testcase name="YOLOv4/0" value_param="OCV/CPU" status="run" time="0.001" classname="Test_Int8_nets">
<properties>
<property name="tags" value="long,mem_1gb,debug_verylong"/>
<property name="tags_implied" value="debug_long,mem_512mb"/>
</properties>
    </testcase>
    <testcase name="YOLOv4_tiny/0" value_param="OCV/CPU" status="run" time="636.056" classname="Test_Int8_nets">
<properties>
<property name="tags" value="mem_512mb"/>
<property name="tags_implied" value=""/>
</properties>
    </testcase>
  </testsuite>
  <testsuite name="Test_ONNX_layers" tests="20" failures="0" disabled="0" errors="0" time="1.534">
    <testcase name="Quantized_Convolution/0" value_param="OCV/CPU" status="run" time="0.343" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_MatMul/0" value_param="OCV/CPU" status="run" time="0.132" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Gemm/0" value_param="OCV/CPU" status="run" time="0.038" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_MatMul_Variable_Weights/0" value_param="OCV/CPU" status="run" time="0.082" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Eltwise/0" value_param="OCV/CPU" status="run" time="0.054" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Eltwise_Scalar/0" value_param="OCV/CPU" status="run" time="0.043" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Eltwise_Broadcast/0" value_param="OCV/CPU" status="run" time="0.059" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_LeakyReLU/0" value_param="OCV/CPU" status="run" time="0.059" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Sigmoid/0" value_param="OCV/CPU" status="run" time="0.035" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_MaxPool/0" value_param="OCV/CPU" status="run" time="0.037" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_AvgPool/0" value_param="OCV/CPU" status="run" time="0.04" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Split/0" value_param="OCV/CPU" status="run" time="0.049" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Pad/0" value_param="OCV/CPU" status="run" time="0.046" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Reshape/0" value_param="OCV/CPU" status="run" time="0.037" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Transpose/0" value_param="OCV/CPU" status="run" time="0.037" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Squeeze/0" value_param="OCV/CPU" status="run" time="0.037" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Unsqueeze/0" value_param="OCV/CPU" status="run" time="0.037" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Resize/0" value_param="OCV/CPU" status="run" time="0.118" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Concat/0" value_param="OCV/CPU" status="run" time="0.077" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
    <testcase name="Quantized_Constant/0" value_param="OCV/CPU" status="run" time="0.134" classname="Test_ONNX_layers">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
  </testsuite>
  <testsuite name="Test_ONNX_nets" tests="1" failures="0" disabled="0" errors="0" time="105.911">
    <testcase name="ResNet50_Int8/0" value_param="OCV/CPU" status="run" time="105.91" classname="Test_ONNX_nets">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
  </testsuite>
  <testsuite name="Test_TFLite" tests="1" failures="0" disabled="0" errors="0" time="44.96">
    <testcase name="EfficientDet_int8/0" value_param="OCV/CPU" status="run" time="44.958" classname="Test_TFLite">
<properties>
<property name="ocl_memory_usage" value="0"/>
</properties>
    </testcase>
  </testsuite>
</testsuites>

3rdparty: NDSRVP - A New 3rdparty Library with Optimizations Based on RISC-V P Extension v0.5.2 - Part 1: Basic Functions #25167 # Summary ### Previous context From PR #24556: >> * As you wrote, the P-extension differs from RVV thus can not be easily implemented via Universal Intrinsics mechanism, but there is another HAL mechanism for lower-level CPU optimizations which is used by the [Carotene](https://github.com/opencv/opencv/tree/4.x/3rdparty/carotene) library on ARM platforms. I suggest moving all non-dnn code to similar third-party component. For example, FAST algorithm should allow such optimization-shortcut: see https://github.com/opencv/opencv/blob/4.x/modules/features2d/src/hal_replacement.hpp >> Reference documentation is here: >> >> * https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html >> * https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html >> * https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html >> * Carotene library is turned on here: https://github.com/opencv/opencv/blob/8bbf08f0de9c387c12afefdb05af7780d989e4c3/CMakeLists.txt#L906-L911 > As a test outside of this PR, A 3rdparty component called ndsrvp is created, containing one of the non-dnn code (integral_SIMD), and it works very well. > All the non-dnn code in this PR have been removed, currently this PR can be focused on dnn optinizations. > This HAL mechanism is quite suitable for rvp optimizations, all the non-dnn code is expected to be moved into ndsrvp soon. ### Progress #### Part 1 (This PR) - [Core](https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html) - [x] Element-wise add and subtract - [x] Element-wise minimum or maximum - [x] Element-wise absolute difference - [x] Bitwise logical operations - [x] Element-wise compare - [ImgProc](https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html) - [x] Integral - [x] Threshold - [x] WarpAffine - [x] WarpPerspective - [Features2D](https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html) #### Part 2 (Next PR) **Rough Estimate. Todo List May Change.** - [Core](https://docs.opencv.org/4.x/d1/d1b/group__core__hal__interface.html) - [ImgProc](https://docs.opencv.org/4.x/dd/d8b/group__imgproc__hal__interface.html) - smaller remap HAL interface - AdaptiveThreshold - BoxFilter - Canny - Convert - Filter - GaussianBlur - MedianBlur - Morph - Pyrdown - Resize - Scharr - SepFilter - Sobel - [Features2D](https://docs.opencv.org/4.x/db/d47/group__features2d__hal__interface.html) - FAST ### Performance Tests The optimization does not contain floating point opreations. **Absolute Difference** Geometric mean (ms) |Name of Test|opencv perf core Absdiff|opencv perf core Absdiff|opencv perf core Absdiff vs opencv perf core Absdiff (x-factor)| |---|:-:|:-:|:-:| |Absdiff::OCL_AbsDiffFixture::(640x480, 8UC1)|23.104|5.972|3.87| |Absdiff::OCL_AbsDiffFixture::(640x480, 32FC1)|39.500|40.830|0.97| |Absdiff::OCL_AbsDiffFixture::(640x480, 8UC3)|69.155|15.051|4.59| |Absdiff::OCL_AbsDiffFixture::(640x480, 32FC3)|118.715|120.509|0.99| |Absdiff::OCL_AbsDiffFixture::(640x480, 8UC4)|93.001|19.770|4.70| |Absdiff::OCL_AbsDiffFixture::(640x480, 32FC4)|161.136|160.791|1.00| |Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC1)|69.211|15.140|4.57| |Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC1)|118.762|119.263|1.00| |Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC3)|212.414|44.692|4.75| |Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC3)|367.512|366.569|1.00| |Absdiff::OCL_AbsDiffFixture::(1280x720, 8UC4)|285.337|59.708|4.78| |Absdiff::OCL_AbsDiffFixture::(1280x720, 32FC4)|490.395|491.118|1.00| |Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC1)|158.827|33.462|4.75| |Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC1)|273.503|273.668|1.00| |Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC3)|484.175|100.520|4.82| |Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC3)|828.758|829.689|1.00| |Absdiff::OCL_AbsDiffFixture::(1920x1080, 8UC4)|648.592|137.195|4.73| |Absdiff::OCL_AbsDiffFixture::(1920x1080, 32FC4)|1116.755|1109.587|1.01| |Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC1)|648.715|134.875|4.81| |Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC1)|1115.939|1113.818|1.00| |Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC3)|1944.791|413.420|4.70| |Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC3)|3354.193|3324.672|1.01| |Absdiff::OCL_AbsDiffFixture::(3840x2160, 8UC4)|2594.585|553.486|4.69| |Absdiff::OCL_AbsDiffFixture::(3840x2160, 32FC4)|4473.543|4438.453|1.01| **Bitwise Operation** Geometric mean (ms) |Name of Test|opencv perf core Bit|opencv perf core Bit|opencv perf core Bit vs opencv perf core Bit (x-factor)| |---|:-:|:-:|:-:| |Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC1)|22.542|4.971|4.53| |Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC1)|90.210|19.917|4.53| |Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC3)|68.429|15.037|4.55| |Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC3)|280.168|59.239|4.73| |Bitwise_and::OCL_BitwiseAndFixture::(640x480, 8UC4)|90.565|19.735|4.59| |Bitwise_and::OCL_BitwiseAndFixture::(640x480, 32FC4)|374.695|79.257|4.73| |Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC1)|67.824|14.873|4.56| |Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC1)|279.514|59.232|4.72| |Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC3)|208.337|44.234|4.71| |Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC3)|851.211|182.522|4.66| |Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 8UC4)|279.529|59.095|4.73| |Bitwise_and::OCL_BitwiseAndFixture::(1280x720, 32FC4)|1132.065|244.877|4.62| |Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC1)|155.685|33.078|4.71| |Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC1)|635.253|137.482|4.62| |Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC3)|474.494|100.166|4.74| |Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC3)|1907.340|412.841|4.62| |Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 8UC4)|635.538|134.544|4.72| |Bitwise_and::OCL_BitwiseAndFixture::(1920x1080, 32FC4)|2552.666|556.397|4.59| |Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC1)|634.736|136.355|4.66| |Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC1)|2548.283|561.827|4.54| |Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC3)|1911.454|421.571|4.53| |Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC3)|7663.803|1677.289|4.57| |Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 8UC4)|2543.983|562.780|4.52| |Bitwise_and::OCL_BitwiseAndFixture::(3840x2160, 32FC4)|10211.693|2237.393|4.56| |Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC1)|22.341|4.811|4.64| |Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC1)|89.975|19.288|4.66| |Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC3)|67.237|14.643|4.59| |Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC3)|276.324|58.609|4.71| |Bitwise_not::OCL_BitwiseNotFixture::(640x480, 8UC4)|89.587|19.554|4.58| |Bitwise_not::OCL_BitwiseNotFixture::(640x480, 32FC4)|370.986|77.136|4.81| |Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC1)|67.227|14.541|4.62| |Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC1)|276.357|58.076|4.76| |Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC3)|206.752|43.376|4.77| |Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC3)|841.638|177.787|4.73| |Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 8UC4)|276.773|57.784|4.79| |Bitwise_not::OCL_BitwiseNotFixture::(1280x720, 32FC4)|1127.740|237.472|4.75| |Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC1)|153.808|32.531|4.73| |Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC1)|627.765|129.990|4.83| |Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC3)|469.799|98.249|4.78| |Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC3)|1893.591|403.694|4.69| |Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 8UC4)|627.724|129.962|4.83| |Bitwise_not::OCL_BitwiseNotFixture::(1920x1080, 32FC4)|2529.967|540.744|4.68| |Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC1)|628.089|130.277|4.82| |Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC1)|2521.817|540.146|4.67| |Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC3)|1905.004|404.704|4.71| |Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC3)|7567.971|1627.898|4.65| |Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 8UC4)|2531.476|540.181|4.69| |Bitwise_not::OCL_BitwiseNotFixture::(3840x2160, 32FC4)|10075.594|2181.654|4.62| |Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC1)|22.566|5.076|4.45| |Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC1)|90.391|19.928|4.54| |Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC3)|67.758|14.740|4.60| |Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC3)|279.253|59.844|4.67| |Bitwise_or::OCL_BitwiseOrFixture::(640x480, 8UC4)|90.296|19.802|4.56| |Bitwise_or::OCL_BitwiseOrFixture::(640x480, 32FC4)|373.972|79.815|4.69| |Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC1)|67.815|14.865|4.56| |Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC1)|279.398|60.054|4.65| |Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC3)|208.643|45.043|4.63| |Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC3)|850.042|180.985|4.70| |Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 8UC4)|279.363|60.385|4.63| |Bitwise_or::OCL_BitwiseOrFixture::(1280x720, 32FC4)|1134.858|243.062|4.67| |Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC1)|155.212|33.155|4.68| |Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC1)|634.985|134.911|4.71| |Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC3)|474.648|100.407|4.73| |Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC3)|1912.049|414.184|4.62| |Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 8UC4)|635.252|132.587|4.79| |Bitwise_or::OCL_BitwiseOrFixture::(1920x1080, 32FC4)|2544.471|560.737|4.54| |Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC1)|634.574|134.966|4.70| |Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC1)|2545.129|561.498|4.53| |Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC3)|1910.900|419.365|4.56| |Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC3)|7662.603|1685.812|4.55| |Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 8UC4)|2548.971|560.787|4.55| |Bitwise_or::OCL_BitwiseOrFixture::(3840x2160, 32FC4)|10201.407|2237.552|4.56| |Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC1)|22.718|4.961|4.58| |Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC1)|91.496|19.831|4.61| |Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC3)|67.910|15.151|4.48| |Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC3)|279.612|59.792|4.68| |Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 8UC4)|91.073|19.853|4.59| |Bitwise_xor::OCL_BitwiseXorFixture::(640x480, 32FC4)|374.641|79.155|4.73| |Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC1)|67.704|15.008|4.51| |Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC1)|279.229|60.088|4.65| |Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC3)|208.156|44.426|4.69| |Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC3)|849.501|180.848|4.70| |Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 8UC4)|279.642|59.728|4.68| |Bitwise_xor::OCL_BitwiseXorFixture::(1280x720, 32FC4)|1129.826|242.880|4.65| |Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC1)|155.585|33.354|4.66| |Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC1)|634.090|134.995|4.70| |Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC3)|474.931|99.598|4.77| |Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC3)|1910.519|413.138|4.62| |Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 8UC4)|635.026|135.155|4.70| |Bitwise_xor::OCL_BitwiseXorFixture::(1920x1080, 32FC4)|2560.167|560.838|4.56| |Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC1)|634.893|134.883|4.71| |Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC1)|2548.166|560.831|4.54| |Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC3)|1911.392|419.816|4.55| |Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC3)|7646.634|1677.988|4.56| |Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 8UC4)|2560.637|560.805|4.57| |Bitwise_xor::OCL_BitwiseXorFixture::(3840x2160, 32FC4)|10227.044|2249.458|4.55| ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake

opencv-alalek added optimization category: build/install platform: riscv category: imgproc category: features2d category: dnn labels Nov 19, 2023

Junyan721113 force-pushed the rvp branch from c3aea72 to 8e69a62 Compare November 19, 2023 11:39

vpisarev self-requested a review November 20, 2023 10:37

vpisarev requested a review from mshabunin November 21, 2023 06:52

Junyan721113 force-pushed the rvp branch from 8e69a62 to 7f0f83c Compare December 5, 2023 07:46

Junyan721113 force-pushed the rvp branch from 7f0f83c to bf3059b Compare December 5, 2023 11:36

mshabunin reviewed Dec 8, 2023

View reviewed changes

Junyan721113 requested a review from mshabunin December 19, 2023 15:17

Junyan721113 force-pushed the rvp branch from bf3059b to f44423b Compare December 20, 2023 05:12

mshabunin reviewed Dec 20, 2023

View reviewed changes

asmorkalov added this to the 4.9.0 milestone Dec 20, 2023

feat: RVP052 Optimization for DNN int8layers

a30c987

Junyan721113 force-pushed the rvp branch from f44423b to a30c987 Compare December 21, 2023 06:57

Junyan721113 requested a review from mshabunin December 21, 2023 06:59

asmorkalov modified the milestones: 4.9.0, 4.10.0 Dec 22, 2023

mshabunin approved these changes Jan 15, 2024

View reviewed changes

asmorkalov merged commit 99c86bb into opencv:4.x Jan 16, 2024

asmorkalov mentioned this pull request Jan 23, 2024

5.x merge 4.x #24912

Merged

Junyan721113 mentioned this pull request Mar 6, 2024

3rdparty: NDSRVP - A New 3rdparty Library with Optimizations Based on RISC-V P Extension v0.5.2 - Part 1: Basic Functions #25167

Merged

15 tasks

Uh oh!

Conversation

Junyan721113 commented Nov 19, 2023

Summary

List of RVP optimizations

Correctness validation (QEMU)

Q&A

Why RVP ?

Why v0.5.2 ?

Why not Universal Intrinsics ?

How to perform tests ?

Environment

Toolchain

Qemu

Build

Related Tests

Pull Request Readiness Checklist

Uh oh!

asmorkalov commented Nov 20, 2023

Uh oh!

asmorkalov commented Nov 20, 2023

Uh oh!

vpisarev commented Nov 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mshabunin commented Nov 21, 2023

Uh oh!

Junyan721113 commented Nov 21, 2023

Uh oh!

Junyan721113 commented Dec 5, 2023

Uh oh!

mshabunin left a comment

Choose a reason for hiding this comment

Uh oh!

opencv-alalek commented Dec 8, 2023

Uh oh!

mshabunin commented Dec 8, 2023

Uh oh!

Junyan721113 commented Dec 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mshabunin Dec 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Junyan721113 Dec 21, 2023

Choose a reason for hiding this comment

Uh oh!

mshabunin Dec 20, 2023

Choose a reason for hiding this comment

Uh oh!

Junyan721113 Dec 21, 2023

Choose a reason for hiding this comment

Uh oh!

mshabunin Dec 20, 2023

Choose a reason for hiding this comment

Uh oh!

Junyan721113 Dec 21, 2023

Choose a reason for hiding this comment

Uh oh!

mshabunin Dec 20, 2023

Choose a reason for hiding this comment

Uh oh!

Junyan721113 Dec 21, 2023

Choose a reason for hiding this comment

Uh oh!

Junyan721113 commented Feb 28, 2024

Uh oh!

Junyan721113 commented Mar 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vpisarev commented Nov 20, 2023 •

edited

Loading

Junyan721113 commented Dec 12, 2023 •

edited

Loading

mshabunin Dec 20, 2023 •

edited

Loading

Junyan721113 commented Mar 2, 2024 •

edited

Loading