-
-
Notifications
You must be signed in to change notification settings - Fork 56.5k
The function countNonZero produces wrong output on SCALABLE RVV with Debug mode #25193
Description
System Information
OpenCV version: 4.x (commit 1eb061f) and tag 4.9 (commit dad8af6)
Operating System: /opt/riscv/bin/qemu-riscv64 --version
qemu-riscv64 version 8.2.1 (v8.2.1)
Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers
Compiler: /opt/riscv/bin/clang++ --version
clang version 17.0.2 (https://github.com/llvm/llvm-project.git b2417f51dbbd7435eb3aaf203de24de6754da50e)
Target: riscv64-unknown-linux-gnu
Thread model: posix
Detailed description
I wanted to bring to your attention an issue similar to a previous one (#25191) we encountered. It appears that another test has failed on SCALABLE RVV with Debug mode.
As shown in the below, we observed that the failing test occurred exclusively on SCALABLE RVV with Debug mode. Additionally, the numerical errors seem to vary with different VLEN configurations.
> /opt/riscv/bin/qemu-riscv64 -L /opt/riscv/sysroot -cpu rv64,v=true,vlen=128,vext_spec=v1.0, -E OPENCV_TEST_DATA_PATH=/workdir/src/opencv_extra/testdata build/Debug/bin/opencv_test_dnn --gtest_filter="Test_TFLite.face_landmark/0"
CTEST_FULL_OUTPUT
OpenCV version: 4.9.0-dev
OpenCV VCS version: 4.9.0-241-g1eb061f89d
Build type: Debug
Compiler: /opt/riscv/bin/clang++ (ver 17.0.2)
[ INFO:0@0.653] global registry_parallel.impl.hpp:96 ParallelBackendRegistry core(parallel): Enabled backends(3, sorted by priority): ONETBB(1000); TBB(990); OPENMP(980)
Parallel framework: pthreads (nthreads=8)
CPU features: RVV
TEST: Skip tests with tags: 'mem_6gb', 'verylong', 'debug_verylong', 'dnn_skip_opencv_backend', 'dnn_skip_cpu', 'dnn_skip_cpu_fp16', 'dnn_skip_ocl', 'dnn_skip_ocl_fp16', 'dnn_skip_onnx_conformance', 'dnn_skip_parser'
Note: Google Test filter = Test_TFLite.face_landmark/0
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Test_TFLite
[ RUN ] Test_TFLite.face_landmark/0, where GetParam() = OCV/CPU
/workdir/src/opencv/modules/dnn/test/test_common.impl.hpp:76: Failure
Expected: (normL1) <= (l1), actual: 2.05424 vs 2e-05
conv2d_30 |ref| = 1.7733331918716431
/workdir/src/opencv/modules/dnn/test/test_common.impl.hpp:79: Failure
Expected: (normInf) <= (lInf), actual: 2.05424 vs 0.0002
conv2d_30 |ref| = 1.7733331918716431
/workdir/src/opencv/modules/dnn/test/test_common.impl.hpp:76: Failure
Expected: (normL1) <= (l1), actual: 14.6945 vs 2e-05
conv2d_20 |ref| = 164.86170959472656
/workdir/src/opencv/modules/dnn/test/test_common.impl.hpp:79: Failure
Expected: (normInf) <= (lInf), actual: 44.1663 vs 0.0002
conv2d_20 |ref| = 164.86170959472656
[ FAILED ] Test_TFLite.face_landmark/0, where GetParam() = OCV/CPU (1919 ms)
[----------] 1 test from Test_TFLite (1921 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (1925 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] Test_TFLite.face_landmark/0, where GetParam() = OCV/CPU
1 FAILED TEST
> /workdir$ /opt/riscv/bin/qemu-riscv64 -L /opt/riscv/sysroot -cpu rv64,v=true,vlen=1024,vext_spec=v1.0, -E OPENCV_TEST_DATA_PATH=/workdir/src/opencv_extra/testdata build/Debug/bin/opencv_test_dnn --gtest_filter="Test_TFLite.face_landmark/0"
CTEST_FULL_OUTPUT
OpenCV version: 4.9.0-dev
OpenCV VCS version: 4.9.0-241-g1eb061f89d
Build type: Debug
Compiler: /opt/riscv/bin/clang++ (ver 17.0.2)
[ INFO:0@0.680] global registry_parallel.impl.hpp:96 ParallelBackendRegistry core(parallel): Enabled backends(3, sorted by priority): ONETBB(1000); TBB(990); OPENMP(980)
Parallel framework: pthreads (nthreads=8)
CPU features: RVV
TEST: Skip tests with tags: 'mem_6gb', 'verylong', 'debug_verylong', 'dnn_skip_opencv_backend', 'dnn_skip_cpu', 'dnn_skip_cpu_fp16', 'dnn_skip_ocl', 'dnn_skip_ocl_fp16', 'dnn_skip_onnx_conformance', 'dnn_skip_parser'
Note: Google Test filter = Test_TFLite.face_landmark/0
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Test_TFLite
[ RUN ] Test_TFLite.face_landmark/0, where GetParam() = OCV/CPU
/workdir/src/opencv/modules/dnn/test/test_common.impl.hpp:76: Failure
Expected: (normL1) <= (l1), actual: 3.63442 vs 2e-05
conv2d_30 |ref| = 1.7733331918716431
/workdir/src/opencv/modules/dnn/test/test_common.impl.hpp:79: Failure
Expected: (normInf) <= (lInf), actual: 3.63442 vs 0.0002
conv2d_30 |ref| = 1.7733331918716431
/workdir/src/opencv/modules/dnn/test/test_common.impl.hpp:76: Failure
Expected: (normL1) <= (l1), actual: 5.82887 vs 2e-05
conv2d_20 |ref| = 164.86170959472656
/workdir/src/opencv/modules/dnn/test/test_common.impl.hpp:79: Failure
Expected: (normInf) <= (lInf), actual: 26.0545 vs 0.0002
conv2d_20 |ref| = 164.86170959472656
[ FAILED ] Test_TFLite.face_landmark/0, where GetParam() = OCV/CPU (1701 ms)
[----------] 1 test from Test_TFLite (1703 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (1709 ms total)
[ PASSED ] 0 tests.
[ FAILED ] 1 test, listed below:
[ FAILED ] Test_TFLite.face_landmark/0, where GetParam() = OCV/CPU
1 FAILED TEST
> /opt/riscv/bin/qemu-riscv64 -L /opt/riscv/sysroot -cpu rv64,v=true,vlen=1024,vext_spec=v1.0, -E OPENCV_TEST_DATA_PATH=/workdir/src/opencv_extra/testdata build/RelWithDebInfo/bin/opencv_test_dnn --gtest_filter="Test_TFLite.face_landmark/0"
CTEST_FULL_OUTPUT
OpenCV version: 4.9.0-dev
OpenCV VCS version: 4.9.0-241-g1eb061f89d
Build type: RelWithDebInfo
WARNING: build value differs from runtime: Release
Compiler: /opt/riscv/bin/clang++ (ver 17.0.2)
Parallel framework: pthreads (nthreads=8)
CPU features: RVV
TEST: Skip tests with tags: 'mem_6gb', 'verylong', 'dnn_skip_opencv_backend', 'dnn_skip_cpu', 'dnn_skip_cpu_fp16', 'dnn_skip_ocl', 'dnn_skip_ocl_fp16', 'dnn_skip_onnx_conformance', 'dnn_skip_parser'
Note: Google Test filter = Test_TFLite.face_landmark/0
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Test_TFLite
[ RUN ] Test_TFLite.face_landmark/0, where GetParam() = OCV/CPU
[ OK ] Test_TFLite.face_landmark/0 (717 ms)
[----------] 1 test from Test_TFLite (719 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (729 ms total)
[ PASSED ] 1 test.
Upon thorough investigation, we have determined that the issue is triggered by incorrect output from the function countNonZero. Specifically, the variable scale is a 1-dimensional vector. When all values are equal to slope, the condition is satisfied, rendering the current PReLU layer equivalent to a ReLU layer. The details are shown in below:
// modules/dnn/src/layers/elementwise_layers.cpp:L3131-L3157@tag4.9
Ptr<Layer> ChannelsPReLULayer::create(const LayerParams& params)
{
CV_Assert(params.blobs.size() == 1);
Mat scale = params.blobs[0];
float slope = *scale.ptr<float>();
if (scale.total() == 1 || countNonZero(scale != slope) == 0)
{
LayerParams reluParams = params;
reluParams.set("negative_slope", slope);
return ReLULayer::create(reluParams);
}
Ptr<Layer> l;
// Check first two dimensions of scale (batch, channels)
MatShape scaleShape = shape(scale);
if (std::count_if(scaleShape.begin(), scaleShape.end(), [](int d){ return d != 1;}) > 1)
{
l = new ElementWiseLayer<PReLUFunctor>(PReLUFunctor(scale));
}
else
{
l = new ElementWiseLayer<ChannelsPReLUFunctor>(ChannelsPReLUFunctor(scale));
}
l->setParamsFrom(params);
return l;
}The occurrence of various errors with different VLENs stems from OpenCV's decision to call SCALABLE RVV based on the length of the VLEN. However, I believe this is unrelated to the current issue. Temporarily removing the second condition countNonZero(scale != slope) == 0 can obtain the expected results.
Steps to reproduce
# step 0: clone source
mkdir -p src && \
cd src && \
git clone https://github.com/opencv/opencv.git
git clone https://github.com/opencv/opencv_extra.git
# step 1: generate a camke configuration
cd opencv && \
cmake -DCMAKE_TOOLCHAIN_FILE=platforms/linux/riscv64-clang.toolchain.cmake \
-DRISCV_CLANG_BUILD_ROOT=/opt/riscv \
-DRISCV_GCC_INSTALL_ROOT=/opt/riscv \
-DCPU_BASELINE=RVV \
-DCPU_BASELINE_REQUIRE=RVV \
-DRISCV_RVV_SCALABLE=ON \
-DWITH_OPENCL=OFF \
-DBUILD_SHARED_LIBS=OFF \
-DBUILD_EXAMPLES=ON \
-DOPENCV_ENABLE_NONFREE=ON \
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
-DCMAKE_BUILD_TYPE=Debug \
-B ../../build/Debug \
-S .
# step2: build the project
cd ../../ && \
cmake --build build/Debug -j4
# step3: execute unit tests
/opt/riscv/bin/qemu-riscv64 -L /opt/riscv/sysroot -cpu rv64,v=true,vlen=128,vext_spec=v1.0, -E OPENCV_TEST_DATA_PATH=/workdir/src/opencv_extra/testdata build/Debug/bin/opencv_test_dnn --gtest_filter="Test_TFLite.face_landmark/0"
/opt/riscv/bin/qemu-riscv64 -L /opt/riscv/sysroot -cpu rv64,v=true,vlen=1024,vext_spec=v1.0, -E OPENCV_TEST_DATA_PATH=/workdir/src/opencv_extra/testdata build/Debug/bin/opencv_test_dnn --gtest_filter="Test_TFLite.face_landmark/0"Issue submission checklist
- I report the issue, it's not a question
- I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found any solution
- I updated to the latest OpenCV version and the issue is still there
- There is reproducer code and related data files (videos, images, onnx, etc)