Skip to content

Parallel_for in box Filter and support for 32f box filter in Fastcv hal#27182

Merged
asmorkalov merged 6 commits intoopencv:4.xfrom
CodeLinaro:boxFilter_hal_changes
Apr 16, 2025
Merged

Parallel_for in box Filter and support for 32f box filter in Fastcv hal#27182
asmorkalov merged 6 commits intoopencv:4.xfrom
CodeLinaro:boxFilter_hal_changes

Conversation

@adsha-quic
Copy link
Copy Markdown
Contributor

Added parallel_for in box filter hal and support for 32f box filter

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@asmorkalov asmorkalov self-requested a review April 1, 2025 13:10
@asmorkalov asmorkalov added optimization platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc labels Apr 1, 2025
@asmorkalov asmorkalov added this to the 4.12.0 milestone Apr 1, 2025
@asmorkalov
Copy link
Copy Markdown
Contributor

asmorkalov commented Apr 2, 2025

With my Jetson Orin:

./bin/opencv_perf_imgproc --gtest_filter=Size_MatType_BorderType_blur16x16.blur16x16/36
TEST: Skip tests with tags: 'mem_6gb', 'verylong'
CTEST_FULL_OUTPUT
OpenCV version: 4.12.0-dev
OpenCV VCS version: 4.11.0-315-g42de7e6ee8
Build type: Release
Compiler: /usr/bin/c++  (ver 9.4.0)
Algorithm hint: ALGO_HINT_ACCURATE
HAL: YES (carotene (ver 0.0.1) fastcv (ver 0.0.1))
Parallel framework: pthreads (nthreads=12)
CPU features: NEON FP16 *NEON_DOTPROD *NEON_FP16
OpenCL is disabled
Note: Google Test filter = Size_MatType_BorderType_blur16x16.blur16x16/36
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Size_MatType_BorderType_blur16x16
[ RUN      ] Size_MatType_BorderType_blur16x16.blur16x16/36, where GetParam() = (1280x720, 32FC1, BORDER_REPLICATE)
/mnt/flashdrive/opencv/modules/ts/src/ts_perf.cpp:381: Failure
The difference between expect_last and actual_last is 0.001220703125, which exceeds eps, where
expect_last evaluates to -565.76580810546875,
actual_last evaluates to -565.76702880859375, and
eps evaluates to 0.001.
Argument "dst" has unexpected value of the last element

params    = (1280x720, 32FC1, BORDER_REPLICATE)
termination reason:  reached maximum number of iterations
bytesIn   =    3686400
bytesOut  =    3686400
samples   =        100
outliers  =          8
frequency = 1000000000
min       =    1792400 = 1.79ms
median    =    1994482 = 1.99ms
gmean     =    2041647 = 2.04ms
gstddev   = 0.08525334 = 1.06ms for 97% dispersion interval
mean      =    2049113 = 2.05ms
stddev    =     178893 = 0.18ms
[  FAILED  ] Size_MatType_BorderType_blur16x16.blur16x16/36, where GetParam() = (1280x720, 32FC1, BORDER_REPLICATE) (228 ms)
[----------] 1 test from Size_MatType_BorderType_blur16x16 (228 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (228 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Size_MatType_BorderType_blur16x16.blur16x16/36, where GetParam() = (1280x720, 32FC1, BORDER_REPLICATE)

@asmorkalov asmorkalov self-assigned this Apr 2, 2025
@adsha-quic
Copy link
Copy Markdown
Contributor Author

adsha-quic commented Apr 2, 2025

With my Jetson Orin:

./bin/opencv_perf_imgproc --gtest_filter=Size_MatType_BorderType_blur16x16.blur16x16/36
TEST: Skip tests with tags: 'mem_6gb', 'verylong'
CTEST_FULL_OUTPUT
OpenCV version: 4.12.0-dev
OpenCV VCS version: 4.11.0-315-g42de7e6ee8
Build type: Release
Compiler: /usr/bin/c++  (ver 9.4.0)
Algorithm hint: ALGO_HINT_ACCURATE
HAL: YES (carotene (ver 0.0.1) fastcv (ver 0.0.1))
Parallel framework: pthreads (nthreads=12)
CPU features: NEON FP16 *NEON_DOTPROD *NEON_FP16
OpenCL is disabled
Note: Google Test filter = Size_MatType_BorderType_blur16x16.blur16x16/36
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Size_MatType_BorderType_blur16x16
[ RUN      ] Size_MatType_BorderType_blur16x16.blur16x16/36, where GetParam() = (1280x720, 32FC1, BORDER_REPLICATE)
/mnt/flashdrive/opencv/modules/ts/src/ts_perf.cpp:381: Failure
The difference between expect_last and actual_last is 0.001220703125, which exceeds eps, where
expect_last evaluates to -565.76580810546875,
actual_last evaluates to -565.76702880859375, and
eps evaluates to 0.001.
Argument "dst" has unexpected value of the last element

params    = (1280x720, 32FC1, BORDER_REPLICATE)
termination reason:  reached maximum number of iterations
bytesIn   =    3686400
bytesOut  =    3686400
samples   =        100
outliers  =          8
frequency = 1000000000
min       =    1792400 = 1.79ms
median    =    1994482 = 1.99ms
gmean     =    2041647 = 2.04ms
gstddev   = 0.08525334 = 1.06ms for 97% dispersion interval
mean      =    2049113 = 2.05ms
stddev    =     178893 = 0.18ms
[  FAILED  ] Size_MatType_BorderType_blur16x16.blur16x16/36, where GetParam() = (1280x720, 32FC1, BORDER_REPLICATE) (228 ms)
[----------] 1 test from Size_MatType_BorderType_blur16x16 (228 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (228 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Size_MatType_BorderType_blur16x16.blur16x16/36, where GetParam() = (1280x720, 32FC1, BORDER_REPLICATE)

Hi Alex

Is it possible to tune the eps a bit
"The difference between expect_last and actual_last is 0.001220703125, which exceeds eps 0.001"

@asmorkalov
Copy link
Copy Markdown
Contributor

Similar test failure on Android:

umi:/data/local/tmp/fastcv_pr $ ./opencv_perf_imgproc --gtest_filter=*Size_MatType_BorderType_blur16x16.blur16x16/36*                                                                                                                                                        
TEST: Skip tests with tags: 'mem_6gb', 'verylong'
CTEST_FULL_OUTPUT
OpenCV version: 4.12.0-dev
OpenCV VCS version: 4.11.0-315-g42de7e6ee8
Build type: Release
Compiler: /mnt/Projects/Android/Sdk/ndk/28.0.12433566/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++  (ver 19.0.0)
Algorithm hint: ALGO_HINT_ACCURATE
HAL: YES (carotene (ver 0.0.1) KleidiCV (ver 0.3.0) fastcv (ver 0.0.1))
Parallel framework: pthreads (nthreads=2)
CPU features: NEON FP16 *NEON_DOTPROD *NEON_FP16 *NEON_BF16?
Note: Google Test filter = *Size_MatType_BorderType_blur16x16.blur16x16/36*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from Size_MatType_BorderType_blur16x16
[ RUN      ] Size_MatType_BorderType_blur16x16.blur16x16/36, where GetParam() = (1280x720, 32FC1, BORDER_REPLICATE)
/mnt/Projects/Projects/opencv/modules/ts/src/ts_perf.cpp:381: Failure
The difference between expect_last and actual_last is 0.001220703125, which exceeds eps, where
expect_last evaluates to -565.76580810546875,
actual_last evaluates to -565.76702880859375, and
eps evaluates to 0.001.
Argument "dst" has unexpected value of the last element

params    = (1280x720, 32FC1, BORDER_REPLICATE)
termination reason:  unknown
bytesIn   =    3686400
bytesOut  =    3686400
samples   =         83 of 100
outliers  =          6
frequency = 1000000000
min       =    3481198 = 3.48ms
median    =    3547604 = 3.55ms
gmean     =    3553248 = 3.55ms
gstddev   = 0.02692594 = 0.57ms for 97% dispersion interval
mean      =    3554609 = 3.55ms
stddev    =     106086 = 0.11ms
[  FAILED  ] Size_MatType_BorderType_blur16x16.blur16x16/36, where GetParam() = (1280x720, 32FC1, BORDER_REPLICATE) (321 ms)
[----------] 1 test from Size_MatType_BorderType_blur16x16 (321 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (321 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Size_MatType_BorderType_blur16x16.blur16x16/36, where GetParam() = (1280x720, 32FC1, BORDER_REPLICATE)

@adsha-quic
Copy link
Copy Markdown
Contributor Author

Hey Alex
I will also raise the threshold tuning patch in this PR shortly

@asmorkalov asmorkalov merged commit 6ffc515 into opencv:4.x Apr 16, 2025
27 of 28 checks passed
@asmorkalov asmorkalov mentioned this pull request Apr 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

optimization platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants