Speed up line merging in INTER_AREA by vrabaud · Pull Request #24412 · opencv/opencv

vrabaud · 2023-10-16T13:17:45Z

This provides a 10 to 20% speed-up.

Related perf test fix: #24417
This is a split of #23525 that will be updated to only deal with column merging.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

vrabaud · 2023-10-17T08:49:42Z

Running:

taskset -c 33 python3 .//modules/ts/misc/run.py ./build/bin/ -t imgproc --gtest_filter=*MatInfo_Size_Scale_Area* --perf_min_samples=500 --perf_force_samples=500

Here are the results ("real" INTER_AREA is only used for non-integer scale division):

                        Name of Test                             imgproc     imgproc     imgproc                                                                 20231017-104246  patch       patch     
                                                                                           vs                                                                                                imgproc    
                                                                                     20231017-104246
                                                                                       (x-factor)   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 0.1)         0.107       0.099       1.07      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 0.25)        0.164       0.164       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 0.81)        1.285       1.119       1.15      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 0.1)         0.167       0.167       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 0.25)        0.276       0.276       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 0.81)        2.169       1.890       1.15      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 0.1)        0.296       0.297       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 0.25)       0.491       0.492       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 0.81)       3.898       3.420       1.14      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1920x1080, 0.1)       0.673       0.673       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1920x1080, 0.25)      1.121       1.121       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1920x1080, 0.81)      8.704       7.642       1.14      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 3840x2160, 0.1)       2.643       2.644       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 3840x2160, 0.25)      4.434       4.433       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 3840x2160, 0.81)     35.254      30.796       1.14      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 640x480, 0.1)         0.296       0.296       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 640x480, 0.25)        0.487       0.487       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 640x480, 0.81)        2.990       2.477       1.21      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 960x540, 0.1)         0.505       0.505       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 960x540, 0.25)        0.843       0.843       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 960x540, 0.81)        5.000       4.189       1.19      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1280x720, 0.1)        0.889       0.887       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1280x720, 0.25)       1.475       1.475       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1280x720, 0.81)       9.054       7.559       1.20      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1920x1080, 0.1)       1.996       1.990       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1920x1080, 0.25)      3.289       3.289       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1920x1080, 0.81)     20.178      16.801       1.20      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 3840x2160, 0.1)       7.963       7.961       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 3840x2160, 0.25)     13.174      13.190       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 3840x2160, 0.81)     81.505      67.946       1.20      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 0.1)         0.400       0.400       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 0.25)        0.671       0.671       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 0.81)        3.868       3.206       1.21      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 0.1)         0.666       0.666       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 0.25)        1.112       1.111       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 0.81)        6.553       5.399       1.21      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 0.1)        1.186       1.186       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 0.25)       1.962       1.962       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 0.81)      11.697       9.631       1.21      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1920x1080, 0.1)       2.652       2.647       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1920x1080, 0.25)      4.411       4.410       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1920x1080, 0.81)     26.304      21.632       1.22      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 3840x2160, 0.1)      10.567      10.563       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 3840x2160, 0.25)     17.683      17.692       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 3840x2160, 0.81)     106.167     87.211       1.22

asmorkalov · 2023-10-17T12:42:39Z

My results for AMD Ryzen 7 2700X:

ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 0.1)             0.091       0.091         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 0.25)            0.125       0.125         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 0.81)            0.579       0.371         1.56   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 0.1)             0.148       0.149         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 0.25)            0.204       0.392         0.52   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 0.81)            0.986       0.554         1.78   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 0.1)            0.260       0.257         1.01   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 0.25)           0.359       0.358         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 0.81)           1.000       0.756         1.32   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1920x1080, 0.1)           0.590       0.591         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1920x1080, 0.25)          0.416       0.423         0.98   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1920x1080, 0.81)          1.517       1.378         1.10   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 3840x2160, 0.1)           2.395       2.444         0.98   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 3840x2160, 0.25)          0.720       0.759         0.95   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 3840x2160, 0.81)          4.486       4.018         1.12   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 640x480, 0.1)             0.273       0.273         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 640x480, 0.25)            0.371       0.371         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 640x480, 0.81)            0.974       0.903         1.08   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 960x540, 0.1)             0.440       0.443         0.99   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 960x540, 0.25)            0.607       0.607         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 960x540, 0.81)            1.197       1.184         1.01   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1280x720, 0.1)            0.782       0.782         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1280x720, 0.25)           1.076       1.075         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1280x720, 0.81)           1.890       1.642         1.15   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1920x1080, 0.1)           1.754       1.785         0.98   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1920x1080, 0.25)          1.216       1.227         0.99   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1920x1080, 0.81)          3.192       3.043         1.05   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 3840x2160, 0.1)           7.130       7.174         0.99   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 3840x2160, 0.25)          2.201       1.928         1.14   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 3840x2160, 0.81)          9.497       8.959         1.06   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 0.1)             0.361       0.361         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 0.25)            0.496       0.499         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 0.81)            1.247       1.057         1.18   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 0.1)             0.596       0.605         0.99   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 0.25)            0.812       0.834         0.97   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 0.81)            1.298       1.103         1.18   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 0.1)            1.060       1.075         0.99   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 0.25)           1.465       1.492         0.98   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 0.81)           2.253       2.042         1.10   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1920x1080, 0.1)           2.786       2.394         1.16   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1920x1080, 0.25)          1.634       1.644         0.99   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1920x1080, 0.81)          4.238       3.883         1.09   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 3840x2160, 0.1)           9.363       9.626         0.97   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 3840x2160, 0.25)          2.429       2.637         0.92   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 3840x2160, 0.81)         12.587      11.660         1.08

asmorkalov · 2023-10-17T12:49:41Z

Less obvious result for Jetson TK1 (Armv7 with NEON):

ubuntu@jetson1:~/Projects/perf-resize$ ../opencv/modules/ts/misc/summary.py ./perf_imgproc-4.x-2.xml ./perf_imgproc-patched-2.xml | grep MatInfo_Size_Scale_Area
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 0.1)           0.351      0.351   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 0.25)          0.057      0.056   1.01  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 0.81)          5.853      5.547   1.06  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 0.1)           0.593      0.594   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 0.25)          0.088      0.087   1.01  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 0.81)         10.575      9.523   1.11  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 0.1)          1.087      1.067   1.02  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 0.25)         0.151      0.152   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 0.81)        19.225     18.115   1.06  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1920x1080, 0.1)         3.074      3.122   0.98  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1920x1080, 0.25)        0.384      0.389   0.99  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1920x1080, 0.81)       47.033     44.744   1.05  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 3840x2160, 0.1)        16.553     16.479   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 3840x2160, 0.25)        2.205      2.171   1.02  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 3840x2160, 0.81)       192.111    183.870  1.04  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 640x480, 0.1)           1.068      1.064   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 640x480, 0.25)          0.223      0.223   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 640x480, 0.81)         11.450     10.327   1.11  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 960x540, 0.1)           2.035      1.957   1.04  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 960x540, 0.25)          0.390      0.404   0.96  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 960x540, 0.81)         19.645     18.686   1.05  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1280x720, 0.1)          4.919      4.907   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1280x720, 0.25)         0.959      0.940   1.02  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1280x720, 0.81)        36.524     33.644   1.09  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1920x1080, 0.1)        12.503     12.508   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1920x1080, 0.25)        2.478      2.431   1.02  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1920x1080, 0.81)       83.945     76.746   1.09  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 3840x2160, 0.1)        50.541     50.496   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 3840x2160, 0.25)        9.283      9.418   0.99  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 3840x2160, 0.81)       331.895    314.355  1.06  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 0.1)           1.460      1.445   1.01  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 0.25)          0.140      0.142   0.99  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 0.81)         13.958     13.407   1.04  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 0.1)           3.063      3.128   0.98  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 0.25)          0.271      0.283   0.96  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 0.81)         24.871     23.424   1.06  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 0.1)          7.147      7.103   1.01  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 0.25)         0.758      0.744   1.02  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 0.81)        44.767     43.367   1.03  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1920x1080, 0.1)        16.615     16.530   1.01  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1920x1080, 0.25)        2.009      1.992   1.01  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1920x1080, 0.81)       102.265    98.891   1.03  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 3840x2160, 0.1)        66.154     66.484   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 3840x2160, 0.25)        7.917      7.922   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 3840x2160, 0.81)       410.325    390.888  1.05

asmorkalov

OpenCV team migrates from static-size vector code to scalable vector code to support RISC-V RVV and other platforms. Could you accommodate to it? Example: #24166

vrabaud · 2023-10-17T14:30:56Z

Thx, I added CV_SIMD_SCALABLE.
To get more precise results, may I suggest you run on a specific core? E.g.

taskset -c 33 python3 .//modules/ts/misc/run.py ......

modules/imgproc/src/resize.cpp

asmorkalov · 2023-10-18T06:23:58Z

modules/imgproc/src/resize.cpp

+    const v_int32 tmp0 = v_round(vx_load(src + 0 * v_float32::nlanes));
+    const v_int32 tmp1 = v_round(vx_load(src + 1 * v_float32::nlanes));
+    const v_int32 tmp2 = v_round(vx_load(src + 2 * v_float32::nlanes));
+    const v_int32 tmp3 = v_round(vx_load(src + 3 * v_float32::nlanes));


v_float32::nlanes -> VTraits<float32>::vlanes()

Thx, done. BTW, it could be nice to change that API to make it constexpr.

modules/imgproc/src/resize.cpp

vrabaud · 2023-10-18T09:38:00Z

@mshabunin , it seems the CI is hanging for RISC-V.

vpisarev · 2023-10-18T11:56:10Z

@vrabaud, thank you for the patch! I wonder why in several places of the patch double-precision floating-point arithmetics is used? Isn't FP32 enough for this algorithm? (unless resize is applied to FP64 images).

I mean, it's fine to use double precision to construct the interpolation tables, but when we do actual interpolation and accumulation, FP32 should probably be enough, right?

vrabaud · 2023-10-18T13:20:05Z

@vpisarev , indeed float32 is used for all integer types and float32. Double is only used for doubles:

opencv/modules/imgproc/src/resize.cpp

Line 3797 in 2f1d529

resizeArea_<uchar, float>, 0, resizeArea_<ushort, float>,

This is the current behavior so I kept it.
BTW, any idea as to why int and schar are disabled?

asmorkalov · 2023-10-19T07:09:35Z

RSIC-V builds are broken in CI now. Manual build error messages:

[ 80%] Building CXX object modules/imgproc/CMakeFiles/opencv_imgproc.dir/src/samplers.cpp.o
/opencv/modules/imgproc/src/resize.cpp:3103:44: error: invalid operands to binary expression ('v_float32' (aka '__rvv_float32m1_t') and 'v_float32')
        vx_store(sum + dx, vx_setall(beta) * vx_load(buf + dx));
                           ~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3229:29: note: in instantiation of function template specialization 'cv::inter_area::mul<float>' requested here
                inter_area::mul(buf, dsize.width, beta, sum);
                            ^
/opencv/modules/imgproc/src/resize.cpp:3258:18: note: in instantiation of member function 'cv::ResizeArea_Invoker<unsigned char, float>::operator()' requested here
                 ResizeArea_Invoker<T, WT>(src, dst, xtab, xtab_size, ytab, ytab_size, tabofs),
                 ^
/opencv/modules/imgproc/src/resize.cpp:3892:9: note: in instantiation of function template specialization 'cv::resizeArea_<unsigned char, float>' requested here
        resizeArea_<uchar, float>, 0, resizeArea_<ushort, float>,
        ^
/opencv/modules/imgproc/src/resize.cpp:3117:64: error: invalid operands to binary expression ('v_float32' (aka '__rvv_float32m1_t') and 'v_float32')
        vx_store(sum + dx, vx_load(sum + dx) + vx_setall(beta) * vx_load(buf + dx));
                                               ~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3234:29: note: in instantiation of function template specialization 'cv::inter_area::muladd<float>' requested here
                inter_area::muladd(buf, dsize.width, beta, sum);
                            ^
/opencv/modules/imgproc/src/resize.cpp:3258:18: note: in instantiation of member function 'cv::ResizeArea_Invoker<unsigned char, float>::operator()' requested here
                 ResizeArea_Invoker<T, WT>(src, dst, xtab, xtab_size, ytab, ytab_size, tabofs),
                 ^
/opencv/modules/imgproc/src/resize.cpp:3892:9: note: in instantiation of function template specialization 'cv::resizeArea_<unsigned char, float>' requested here
        resizeArea_<uchar, float>, 0, resizeArea_<ushort, float>,
        ^
/opencv/modules/imgproc/src/resize.cpp:3229:17: error: no matching function for call to 'mul'
                inter_area::mul(buf, dsize.width, beta, sum);
                ^~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3258:18: note: in instantiation of member function 'cv::ResizeArea_Invoker<unsigned short, float>::operator()' requested here
                 ResizeArea_Invoker<T, WT>(src, dst, xtab, xtab_size, ytab, ytab_size, tabofs),
                 ^
/opencv/modules/imgproc/src/resize.cpp:3892:39: note: in instantiation of function template specialization 'cv::resizeArea_<unsigned short, float>' requested here
        resizeArea_<uchar, float>, 0, resizeArea_<ushort, float>,
                                      ^
/opencv/modules/imgproc/src/resize.cpp:3098:13: note: candidate template ignored: substitution failure [with WT = float]
inline void mul(const WT* buf, int width, WT beta, WT* sum) {
            ^
/opencv/modules/imgproc/src/resize.cpp:3234:17: error: no matching function for call to 'muladd'
                inter_area::muladd(buf, dsize.width, beta, sum);
                ^~~~~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3112:13: note: candidate template ignored: substitution failure [with WT = float]
inline void muladd(const WT* buf, int width, WT beta, WT* sum) {
            ^
/opencv/modules/imgproc/src/resize.cpp:3229:17: error: no matching function for call to 'mul'
                inter_area::mul(buf, dsize.width, beta, sum);
                ^~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3258:18: note: in instantiation of member function 'cv::ResizeArea_Invoker<short, float>::operator()' requested here
                 ResizeArea_Invoker<T, WT>(src, dst, xtab, xtab_size, ytab, ytab_size, tabofs),
                 ^
/opencv/modules/imgproc/src/resize.cpp:3893:9: note: in instantiation of function template specialization 'cv::resizeArea_<short, float>' requested here
        resizeArea_<short, float>, 0, resizeArea_<float, float>,
        ^
/opencv/modules/imgproc/src/resize.cpp:3098:13: note: candidate template ignored: substitution failure [with WT = float]
inline void mul(const WT* buf, int width, WT beta, WT* sum) {
            ^
/opencv/modules/imgproc/src/resize.cpp:3234:17: error: no matching function for call to 'muladd'
                inter_area::muladd(buf, dsize.width, beta, sum);
                ^~~~~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3112:13: note: candidate template ignored: substitution failure [with WT = float]
inline void muladd(const WT* buf, int width, WT beta, WT* sum) {
            ^
/opencv/modules/imgproc/src/resize.cpp:3229:17: error: no matching function for call to 'mul'
                inter_area::mul(buf, dsize.width, beta, sum);
                ^~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3258:18: note: in instantiation of member function 'cv::ResizeArea_Invoker<float, float>::operator()' requested here
                 ResizeArea_Invoker<T, WT>(src, dst, xtab, xtab_size, ytab, ytab_size, tabofs),
                 ^
/opencv/modules/imgproc/src/resize.cpp:3893:39: note: in instantiation of function template specialization 'cv::resizeArea_<float, float>' requested here
        resizeArea_<short, float>, 0, resizeArea_<float, float>,
                                      ^
/opencv/modules/imgproc/src/resize.cpp:3098:13: note: candidate template ignored: substitution failure [with WT = float]
inline void mul(const WT* buf, int width, WT beta, WT* sum) {
            ^
/opencv/modules/imgproc/src/resize.cpp:3234:17: error: no matching function for call to 'muladd'
                inter_area::muladd(buf, dsize.width, beta, sum);
                ^~~~~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3112:13: note: candidate template ignored: substitution failure [with WT = float]
inline void muladd(const WT* buf, int width, WT beta, WT* sum) {
            ^
/opencv/modules/imgproc/src/resize.cpp:3103:44: error: invalid operands to binary expression ('v_float64' (aka '__rvv_float64m1_t') and 'v_float64')
        vx_store(sum + dx, vx_setall(beta) * vx_load(buf + dx));
                           ~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3229:29: note: in instantiation of function template specialization 'cv::inter_area::mul<double>' requested here
                inter_area::mul(buf, dsize.width, beta, sum);
                            ^
/opencv/modules/imgproc/src/resize.cpp:3258:18: note: in instantiation of member function 'cv::ResizeArea_Invoker<double, double>::operator()' requested here
                 ResizeArea_Invoker<T, WT>(src, dst, xtab, xtab_size, ytab, ytab_size, tabofs),
                 ^
/opencv/modules/imgproc/src/resize.cpp:3894:9: note: in instantiation of function template specialization 'cv::resizeArea_<double, double>' requested here
        resizeArea_<double, double>, 0
        ^
/opencv/modules/imgproc/src/resize.cpp:3117:64: error: invalid operands to binary expression ('v_float64' (aka '__rvv_float64m1_t') and 'v_float64')
        vx_store(sum + dx, vx_load(sum + dx) + vx_setall(beta) * vx_load(buf + dx));
                                               ~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3234:29: note: in instantiation of function template specialization 'cv::inter_area::muladd<double>' requested here
                inter_area::muladd(buf, dsize.width, beta, sum);
                            ^
/opencv/modules/imgproc/src/resize.cpp:3258:18: note: in instantiation of member function 'cv::ResizeArea_Invoker<double, double>::operator()' requested here
                 ResizeArea_Invoker<T, WT>(src, dst, xtab, xtab_size, ytab, ytab_size, tabofs),
                 ^
/opencv/modules/imgproc/src/resize.cpp:3894:9: note: in instantiation of function template specialization 'cv::resizeArea_<double, double>' requested here
        resizeArea_<double, double>, 0
        ^
10 errors generated.

asmorkalov · 2023-10-19T07:10:35Z

@vrabaud you need to use v_add, v_mul and other functions instead of overloaded +, * and other operators.

This provides a 10 to 20% speed-up.

vrabaud · 2023-10-19T08:54:44Z

Thx @asmorkalov , I believe I fixed it using v_add and v_mul as suggested.

asmorkalov

👍
Tested RISC-V config manually.

asmorkalov · 2023-10-19T10:57:20Z

As summary:
I see 10-20% speedup for the cases, where scale coefficient is closer to 1, e.g. 0.81 in our perf test. Other cases with scale 0.1 and 0.25 have the same performance as before. The effect is even more stable with single thread configuration (--perf_threads=1).

vrabaud · 2023-10-19T14:55:30Z

0.1 and 0.25 mean integer proportions (a tenth and 4th of the original image) whose dimensions are divisible by 10 or 4. In that case, a different algorithm is used (areafast) that this pull request does not speed up. It is normal to get no change there. 0.81 is a non-even scale, normal INTER_AREA is used there and that's where you can see the speed-up.

Speed up line merging in INTER_AREA opencv#24412 This provides a 10 to 20% speed-up. Related perf test fix: opencv#24417 This is a split of opencv#23525 that will be updated to only deal with column merging. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake

vrabaud force-pushed the inter_area1 branch from 0e2da92 to 73e6e96 Compare October 16, 2023 13:28

asmorkalov requested a review from vpisarev October 16, 2023 13:45

asmorkalov added optimization category: imgproc labels Oct 16, 2023

asmorkalov added this to the 4.9.0 milestone Oct 16, 2023

vrabaud force-pushed the inter_area1 branch from 73e6e96 to a9b9004 Compare October 16, 2023 19:13

asmorkalov self-requested a review October 17, 2023 12:57

asmorkalov requested changes Oct 17, 2023

View reviewed changes

vrabaud force-pushed the inter_area1 branch from a9b9004 to efd65bd Compare October 17, 2023 14:30

vrabaud requested a review from asmorkalov October 17, 2023 14:43

mshabunin reviewed Oct 17, 2023

View reviewed changes

modules/imgproc/src/resize.cpp Outdated Show resolved Hide resolved

asmorkalov reviewed Oct 18, 2023

View reviewed changes

vrabaud force-pushed the inter_area1 branch from 0aee71e to 1315d3e Compare October 18, 2023 08:05

vrabaud added 5 commits October 19, 2023 10:50

Speed up line merging in INTER_AREA

5957763

This provides a 10 to 20% speed-up.

Allow scalable vectors.

fe925e7

Fix nlanes for RISC-V

1ec0dbf

Replace v_float32::nlanes -> VTraits<v_float32>::vlanes()

986e60d

Use v_Add and v_mul instead of +,*

8e61fbe

vrabaud force-pushed the inter_area1 branch from 1315d3e to 8e61fbe Compare October 19, 2023 08:52

asmorkalov approved these changes Oct 19, 2023

View reviewed changes

asmorkalov assigned vpisarev Oct 19, 2023

asmorkalov merged commit c96f48e into opencv:4.x Oct 19, 2023

vrabaud deleted the inter_area1 branch October 19, 2023 15:24

asmorkalov mentioned this pull request Oct 20, 2023

WIP: Vectorize cv::resize for INTER_AREA #23525

Closed

6 tasks

asmorkalov mentioned this pull request Nov 3, 2023

(5.x) Merge 4.x #24486

Merged

Uh oh!

Conversation

vrabaud commented Oct 16, 2023 • edited by asmorkalov Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

vrabaud commented Oct 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asmorkalov commented Oct 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asmorkalov commented Oct 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asmorkalov left a comment

Choose a reason for hiding this comment

Uh oh!

vrabaud commented Oct 17, 2023

Uh oh!

Uh oh!

asmorkalov Oct 18, 2023

Choose a reason for hiding this comment

Uh oh!

vrabaud Oct 18, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vrabaud commented Oct 18, 2023

Uh oh!

vpisarev commented Oct 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vrabaud commented Oct 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asmorkalov commented Oct 19, 2023

Uh oh!

asmorkalov commented Oct 19, 2023

Uh oh!

vrabaud commented Oct 19, 2023

Uh oh!

asmorkalov left a comment

Choose a reason for hiding this comment

Uh oh!

asmorkalov commented Oct 19, 2023

Uh oh!

vrabaud commented Oct 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vrabaud commented Oct 16, 2023 •

edited by asmorkalov

Loading

vrabaud commented Oct 17, 2023 •

edited

Loading

asmorkalov commented Oct 17, 2023 •

edited

Loading

asmorkalov commented Oct 17, 2023 •

edited

Loading

vpisarev commented Oct 18, 2023 •

edited

Loading

vrabaud commented Oct 18, 2023 •

edited

Loading