-
-
Notifications
You must be signed in to change notification settings - Fork 56.5k
Description
System Information
OpenCV version: 4.x (commit 1eb061f) and tag 4.9 (commit dad8af6)
Operating System: /opt/riscv/bin/qemu-riscv64 --version
qemu-riscv64 version 8.2.1 (v8.2.1)
Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers
Compiler: /opt/riscv/bin/clang++ --version
clang version 17.0.2 (https://github.com/llvm/llvm-project.git b2417f51dbbd7435eb3aaf203de24de6754da50e)
Target: riscv64-unknown-linux-gnu
Thread model: posix
Detailed description
In the excerpts provided below, it is evident that RVV with debug mode encounters failing tests identified by unit tests when VLEN is not equal to 1024.
> /opt/riscv/bin/qemu-riscv64 -L /opt/riscv/sysroot -cpu rv64,v=true,vlen=128,vext_spec=v1.0, -E OPENCV_TEST_DATA_PATH=/workdir/src/opencv_extra/testdata build/Debug/ bin/opencv_test_core --gtest_filter="hal*"
CTEST_FULL_OUTPUT
OpenCV version: 4.9.0-dev
OpenCV VCS version: 4.9.0-241-g1eb061f89d
Build type: Debug
Compiler: /opt/riscv/bin/clang++ (ver 17.0.2)
[ INFO:0@0.716] global registry_parallel.impl.hpp:96 ParallelBackendRegistry core(parallel): Enabled backends(3, sorted by priority): ONETBB(1000); TBB(990); OPENMP(980)
Parallel framework: pthreads (nthreads=8)
CPU features: RVV
TEST: Skip tests with tags: 'mem_6gb', 'verylong', 'debug_verylong'
Note: Google Test filter = hal*
[==========] Running 23 tests from 3 test cases.
[----------] Global test environment set-up.
[----------] 21 tests from hal_intrin128
[ RUN ] hal_intrin128.uint8x16_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_uint8()
[ OK ] hal_intrin128.uint8x16_CPP_EMULATOR (34 ms)
[ RUN ] hal_intrin128.int8x16_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_int8()
[ OK ] hal_intrin128.int8x16_CPP_EMULATOR (24 ms)
[ RUN ] hal_intrin128.uint16x8_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_uint16()
[ OK ] hal_intrin128.uint16x8_CPP_EMULATOR (26 ms)
[ RUN ] hal_intrin128.int16x8_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_int16()
[ OK ] hal_intrin128.int16x8_CPP_EMULATOR (24 ms)
[ RUN ] hal_intrin128.int32x4_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_int32()
[ OK ] hal_intrin128.int32x4_CPP_EMULATOR (28 ms)
[ RUN ] hal_intrin128.uint32x4_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_uint32()
[ OK ] hal_intrin128.uint32x4_CPP_EMULATOR (22 ms)
[ RUN ] hal_intrin128.uint64x2_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_uint64()
[ OK ] hal_intrin128.uint64x2_CPP_EMULATOR (13 ms)
[ RUN ] hal_intrin128.int64x2_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_int64()
[ OK ] hal_intrin128.int64x2_CPP_EMULATOR (15 ms)
[ RUN ] hal_intrin128.float32x4_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_float32()
[ OK ] hal_intrin128.float32x4_CPP_EMULATOR (26 ms)
[ RUN ] hal_intrin128.float64x2_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_float64()
[ OK ] hal_intrin128.float64x2_CPP_EMULATOR (14 ms)
[ RUN ] hal_intrin128.uint8x16_BASELINE
SIMD128: void opencv_test::hal::intrin128::cpu_baseline::test_hal_intrin_uint8()
opencv_test_core: /workdir/src/opencv/modules/core/include/opencv2/core/hal/intrin_rvv_scalable.hpp:433: v_uint8 cv::hal_baseline::v_load(std::initializer_list<uchar>): Assertion `nScalars.size() == VTraits<v_uint8>::vlanes()' failed.
Segmentation fault (core dumped)
> /opt/riscv/bin/qemu-riscv64 -L /opt/riscv/sysroot -cpu rv64,v=true,vlen=1024,vext_spec=v1.0, -E OPENCV_TEST_DATA_PATH=/workdir/src/opencv_extra/testdata build/Debug/ bin/opencv_test_core --gtest_filter="hal*"
CTEST_FULL_OUTPUT
CTEST_FULL_OUTPUT
OpenCV version: 4.9.0-dev
OpenCV VCS version: 4.9.0-241-g1eb061f89d
Build type: Debug
Compiler: /opt/riscv/bin/clang++ (ver 17.0.2)
[ INFO:0@0.730] global registry_parallel.impl.hpp:96 ParallelBackendRegistry core(parallel): Enabled backends(3, sorted by priority): ONETBB(1000); TBB(990); OPENMP(980)
Parallel framework: pthreads (nthreads=8)
CPU features: RVV
TEST: Skip tests with tags: 'mem_6gb', 'verylong', 'debug_verylong'
Note: Google Test filter = hal*
[==========] Running 23 tests from 3 test cases.
[----------] Global test environment set-up.
[----------] 21 tests from hal_intrin128
[ RUN ] hal_intrin128.uint8x16_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_uint8()
[ OK ] hal_intrin128.uint8x16_CPP_EMULATOR (34 ms)
[ RUN ] hal_intrin128.int8x16_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_int8()
[ OK ] hal_intrin128.int8x16_CPP_EMULATOR (24 ms)
[ RUN ] hal_intrin128.uint16x8_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_uint16()
[ OK ] hal_intrin128.uint16x8_CPP_EMULATOR (26 ms)
[ RUN ] hal_intrin128.int16x8_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_int16()
[ OK ] hal_intrin128.int16x8_CPP_EMULATOR (27 ms)
[ RUN ] hal_intrin128.int32x4_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_int32()
[ OK ] hal_intrin128.int32x4_CPP_EMULATOR (28 ms)
[ RUN ] hal_intrin128.uint32x4_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_uint32()
[ OK ] hal_intrin128.uint32x4_CPP_EMULATOR (21 ms)
[ RUN ] hal_intrin128.uint64x2_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_uint64()
[ OK ] hal_intrin128.uint64x2_CPP_EMULATOR (9 ms)
[ RUN ] hal_intrin128.int64x2_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_int64()
[ OK ] hal_intrin128.int64x2_CPP_EMULATOR (9 ms)
[ RUN ] hal_intrin128.float32x4_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_float32()
[ OK ] hal_intrin128.float32x4_CPP_EMULATOR (21 ms)
[ RUN ] hal_intrin128.float64x2_CPP_EMULATOR
SIMD128: void opencv_test::hal::intrin128::opt_EMULATOR_CPP::test_hal_intrin_float64()
[ OK ] hal_intrin128.float64x2_CPP_EMULATOR (13 ms)
[ RUN ] hal_intrin128.uint8x16_BASELINE
SIMD1024: void opencv_test::hal::intrin128::cpu_baseline::test_hal_intrin_uint8()
[ OK ] hal_intrin128.uint8x16_BASELINE (91 ms)
[ RUN ] hal_intrin128.int8x16_BASELINE
SIMD1024: void opencv_test::hal::intrin128::cpu_baseline::test_hal_intrin_int8()
[ OK ] hal_intrin128.int8x16_BASELINE (67 ms)
[ RUN ] hal_intrin128.uint16x8_BASELINE
SIMD1024: void opencv_test::hal::intrin128::cpu_baseline::test_hal_intrin_uint16()
[ OK ] hal_intrin128.uint16x8_BASELINE (64 ms)
[ RUN ] hal_intrin128.int16x8_BASELINE
SIMD1024: void opencv_test::hal::intrin128::cpu_baseline::test_hal_intrin_int16()
[ OK ] hal_intrin128.int16x8_BASELINE (51 ms)
[ RUN ] hal_intrin128.int32x4_BASELINE
SIMD1024: void opencv_test::hal::intrin128::cpu_baseline::test_hal_intrin_int32()
[ OK ] hal_intrin128.int32x4_BASELINE (48 ms)
[ RUN ] hal_intrin128.uint32x4_BASELINE
SIMD1024: void opencv_test::hal::intrin128::cpu_baseline::test_hal_intrin_uint32()
[ OK ] hal_intrin128.uint32x4_BASELINE (38 ms)
[ RUN ] hal_intrin128.uint64x2_BASELINE
SIMD1024: void opencv_test::hal::intrin128::cpu_baseline::test_hal_intrin_uint64()
[ OK ] hal_intrin128.uint64x2_BASELINE (25 ms)
[ RUN ] hal_intrin128.int64x2_BASELINE
SIMD1024: void opencv_test::hal::intrin128::cpu_baseline::test_hal_intrin_int64()
[ OK ] hal_intrin128.int64x2_BASELINE (17 ms)
[ RUN ] hal_intrin128.float32x4_BASELINE
SIMD1024: void opencv_test::hal::intrin128::cpu_baseline::test_hal_intrin_float32()
[ OK ] hal_intrin128.float32x4_BASELINE (36 ms)
[ RUN ] hal_intrin128.float64x2_BASELINE
SIMD1024: void opencv_test::hal::intrin128::cpu_baseline::test_hal_intrin_float64()
SKIP: CV_SIMD_64F is not available
[ OK ] hal_intrin128.float64x2_BASELINE (1 ms)
[ RUN ] hal_intrin128.float16x8_FP16
[ SKIP ] Unsupported hardware: FP16 is not available
[ OK ] hal_intrin128.float16x8_FP16 (11 ms)
[----------] 21 tests from hal_intrin128 (674 ms total)
[----------] 1 test from hal_intrin256
[ RUN ] hal_intrin256.float16x16_FP16
[ SKIP ] Unsupported: FP16 is not available
[ OK ] hal_intrin256.float16x16_FP16 (1 ms)
[----------] 1 test from hal_intrin256 (2 ms total)
[----------] 1 test from hal_intrin512
[ RUN ] hal_intrin512.float16x32_FP16
[ SKIP ] Unsupported: FP16 is not available
[ OK ] hal_intrin512.float16x32_FP16 (0 ms)
[----------] 1 test from hal_intrin512 (0 ms total)
[----------] Global test environment tear-down
[ SKIPSTAT ] 3 tests skipped
[ SKIPSTAT ] TAG='skip_other' skip 3 tests
[==========] 23 tests from 3 test cases ran. (683 ms total)
[ PASSED ] 23 tests.
The root cause of this issue is related to the inconsistent definition of nlanes and max_nlanes. As illustrated below, nlanes and max_nlanes are identical for non-RVV cases:
// src/opencv/modules/core/include/opencv2/core/hal/intrin.hpp:L727-L737@tag4.9
template<typename T> struct VTraits {
static inline int vlanes() { return T::nlanes; }
enum { nlanes = T::nlanes, max_nlanes = T::nlanes };
using lane_type = typename T::lane_type;
};In this context, max_nlanes may refer to the maximum length of vector supported by the current hardware. may refer to the maximum length of vector supported by the current hardware. However, max_nlanes is a predefined constant set to CV_RVV_MAX_VLEN (1024) for the RVV case:
// src/opencv/modules/core/include/opencv2/core/hal/intrin_rvv_scalable.hpp:L90-L97@tag4.9
#define OPENCV_HAL_IMPL_RVV_TRAITS(REG, TYP, SUF, SZ) \
template <> \
struct VTraits<REG> \
{ \
static inline int vlanes() { return __cv_rvv_##SUF##_nlanes; } \
using lane_type = TYP; \
static const int max_nlanes = CV_RVV_MAX_VLEN/SZ; \
};For the SCALABLE RVV version, max_nlanes refers to the maximum length of the vector implemented by OpenCV. These diverse implementations highlight the ambiguity surrounding the meaning of max_nlanes.
It has been observed that resource allocation using VTraits<TYPE>::max_nlanes appears to be a common syntax. Numerous occurrences can be found using the following command: grep --exclude-dir=build -rn "::max_nlanes". However, in cases where VLEN is not equal to CV_RVV_MAX_VLEN (1024), it leads to inefficient vector register usage and subsequently failing tests. The reason why failing tests are only triggered by Debug mode is that CV_ASSERT or certain checks are not discarded.
Furthermore, the maximum length of the vector supported by SCALABLE RVV remains unknown at compile time. Thus, it seems necessary to revise the current implementation.
Steps to reproduce
# step 0: clone source
mkdir -p src && \
cd src && \
git clone https://github.com/opencv/opencv.git
git clone https://github.com/opencv/opencv_extra.git
# step 1: generate a camke configuration
cd opencv && \
cmake -DCMAKE_TOOLCHAIN_FILE=platforms/linux/riscv64-clang.toolchain.cmake \
-DRISCV_CLANG_BUILD_ROOT=/opt/riscv \
-DRISCV_GCC_INSTALL_ROOT=/opt/riscv \
-DCPU_BASELINE=RVV \
-DCPU_BASELINE_REQUIRE=RVV \
-DRISCV_RVV_SCALABLE=ON \
-DWITH_OPENCL=OFF \
-DBUILD_SHARED_LIBS=OFF \
-DBUILD_EXAMPLES=ON \
-DOPENCV_ENABLE_NONFREE=ON \
-DCMAKE_EXPORT_COMPILE_COMMANDS=ON \
-DCMAKE_BUILD_TYPE=Debug \
-B ../../build/Debug \
-S .
# step2: build the project
cd ../../ && \
cmake --build build/Debug -j4
# step3: execute unit tests
/opt/riscv/bin/qemu-riscv64 -L /opt/riscv/sysroot -cpu rv64,v=true,vlen=128,vext_spec=v1.0, -E OPENCV_TEST_DATA_PATH=/workdir/src/opencv_extra/testdata build/Debug/ bin/opencv_test_core --gtest_filter="hal*"
/opt/riscv/bin/qemu-riscv64 -L /opt/riscv/sysroot -cpu rv64,v=true,vlen=1024,vext_spec=v1.0, -E OPENCV_TEST_DATA_PATH=/workdir/src/opencv_extra/testdata build/Debug/ bin/opencv_test_core --gtest_filter="hal*"Issue submission checklist
- I report the issue, it's not a question
- I checked the problem with documentation, FAQ, open issues, forum.opencv.org, Stack Overflow, etc and have not found any solution
- I updated to the latest OpenCV version and the issue is still there
- There is reproducer code and related data files (videos, images, onnx, etc)