Skip to content

Faster implementation of blobFromImages for cpu nchw output#26127

Merged
asmorkalov merged 4 commits intoopencv:4.xfrom
alexlyulkov:al/blob-from-images
Dec 23, 2024
Merged

Faster implementation of blobFromImages for cpu nchw output#26127
asmorkalov merged 4 commits intoopencv:4.xfrom
alexlyulkov:al/blob-from-images

Conversation

@alexlyulkov
Copy link
Copy Markdown
Contributor

Faster implementation of blobFromImage and blobFromImages for
HWC cv::Mat images -> NCHW cv::Mat
case

Running time on my pc in ms:

blobFromImage

image size            old        new   speed-up
32x32x3             0.008      0.002       4.0x
64x64x3             0.021      0.009       2.3x
128x128x3           0.164      0.037       4.4x
256x256x3           0.728      0.158       4.6x
512x512x3           3.310      0.628       5.2x
1024x1024x3        14.503      3.124       4.6x
2048x2048x3        61.647     28.049       2.2x

blobFromImages

image size            old        new   speed-up
16x32x32x3          0.122      0.041       3.0x
16x64x64x3          0.790      0.165       4.8x
16x128x128x3        3.313      0.652       5.1x
16x256x256x3       13.495      3.127       4.3x
16x512x512x3       58.795     28.127       2.1x
16x1024x1024x3    251.135    121.955       2.1x
16x2048x2048x3   1023.570    487.188       2.1x

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@asmorkalov asmorkalov requested a review from vpisarev September 7, 2024 11:26
@asmorkalov asmorkalov added this to the 4.11.0 milestone Sep 7, 2024
@asmorkalov asmorkalov added the pr: needs test New functionality requires minimal tests set label Sep 17, 2024
@asmorkalov
Copy link
Copy Markdown
Contributor

@alexlyulkov please add performance test.

@vpisarev vpisarev requested a review from fengyuentau December 6, 2024 07:17
@vpisarev
Copy link
Copy Markdown
Contributor

vpisarev commented Dec 6, 2024

@fengyuentau, after you finish with C3 optimization in warping functions and before you move to bicubic case optimization, may I ask you to take a look at it? We need to compare speed of this implementation with existing one in 4.x and 5.x branches.

Copy link
Copy Markdown
Member

@fengyuentau fengyuentau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vpisarev @asmorkalov This patch generally brings better performance regardless 4.x or 5.x branch, although I only tested on my Macbook Air with M1. See below for detailed performance testing results. Code for perfomance testing: fengyuentau@b01f28c

Geometric mean (ms)

                    Name of Test                      base-4x patch-4x  patch-4x
                                                                           vs
                                                                        base-4x
                                                                       (x-factor)
HWC_TO_NCHW::Utils_blobFromImage::{ 32, 32 }           0.005   0.001      5.00
HWC_TO_NCHW::Utils_blobFromImage::{ 64, 64 }           0.013   0.001      8.94
HWC_TO_NCHW::Utils_blobFromImage::{ 128, 128 }         0.052   0.009      5.53
HWC_TO_NCHW::Utils_blobFromImage::{ 256, 256 }         0.205   0.037      5.58
HWC_TO_NCHW::Utils_blobFromImage::{ 512, 512 }         0.935   0.274      3.42
HWC_TO_NCHW::Utils_blobFromImage::{ 1024, 1024 }       3.246   0.671      4.84
HWC_TO_NCHW::Utils_blobFromImage::{ 2048, 2048 }      15.888   5.352      2.97
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 32, 32 }      0.068   0.011      6.41
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 64, 64 }      0.212   0.032      6.68
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 128, 128 }    0.921   0.261      3.53
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 256, 256 }    4.046   1.315      3.08
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 512, 512 }   16.397   5.695      2.88
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 1024, 1024 } 64.182   21.845     2.94
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 2048, 2048 } 255.997  86.815     2.95
Geometric mean (ms)

                    Name of Test                      base-5x patch-5x  patch-5x
                                                                           vs
                                                                        base-5x
                                                                       (x-factor)
HWC_TO_NCHW::Utils_blobFromImage::{ 32, 32 }           0.005   0.001      5.17
HWC_TO_NCHW::Utils_blobFromImage::{ 64, 64 }           0.013   0.001      8.68
HWC_TO_NCHW::Utils_blobFromImage::{ 128, 128 }         0.050   0.009      5.34
HWC_TO_NCHW::Utils_blobFromImage::{ 256, 256 }         0.189   0.036      5.19
HWC_TO_NCHW::Utils_blobFromImage::{ 512, 512 }         0.910   0.433      2.10
HWC_TO_NCHW::Utils_blobFromImage::{ 1024, 1024 }       3.239   0.663      4.88
HWC_TO_NCHW::Utils_blobFromImage::{ 2048, 2048 }      15.499   9.550      1.62
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 32, 32 }      0.067   0.011      5.85
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 64, 64 }      0.207   0.035      5.98
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 128, 128 }    0.902   0.450      2.00
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 256, 256 }    3.893   2.279      1.71
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 512, 512 }   15.899   9.360      1.70
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 1024, 1024 } 61.762   23.486     2.63
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 2048, 2048 } 249.740  94.881     2.63

PR26127_Perf_M1.zip


BTW, I need to do the following changes so as to fix compile errors.

@asmorkalov
Copy link
Copy Markdown
Contributor

asmorkalov commented Dec 20, 2024

@fengyuentau Could you push your commit to Alex's branch. He is on vocation now.

Signed-off-by: Yuantao Feng <yuantao.feng@opencv.org.cn>
Signed-off-by: Yuantao Feng <yuantao.feng@opencv.org.cn>
@asmorkalov asmorkalov removed the pr: needs test New functionality requires minimal tests set label Dec 23, 2024
@asmorkalov asmorkalov self-assigned this Dec 23, 2024
@asmorkalov asmorkalov merged commit aa52daf into opencv:4.x Dec 23, 2024
RoshniUG pushed a commit to RoshniUG/opencv that referenced this pull request Dec 24, 2024
Faster implementation of blobFromImages for cpu nchw output opencv#26127

Faster implementation of blobFromImage and blobFromImages for
HWC cv::Mat images -> NCHW cv::Mat
case

Running time on my pc in ms:

**blobFromImage**
```
image size            old        new   speed-up
32x32x3             0.008      0.002       4.0x
64x64x3             0.021      0.009       2.3x
128x128x3           0.164      0.037       4.4x
256x256x3           0.728      0.158       4.6x
512x512x3           3.310      0.628       5.2x
1024x1024x3        14.503      3.124       4.6x
2048x2048x3        61.647     28.049       2.2x
```

**blobFromImages**
```
image size            old        new   speed-up
16x32x32x3          0.122      0.041       3.0x
16x64x64x3          0.790      0.165       4.8x
16x128x128x3        3.313      0.652       5.1x
16x256x256x3       13.495      3.127       4.3x
16x512x512x3       58.795     28.127       2.1x
16x1024x1024x3    251.135    121.955       2.1x
16x2048x2048x3   1023.570    487.188       2.1x
```

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake

Update window_cocoa.mm
@asmorkalov asmorkalov mentioned this pull request Jan 15, 2025
shyama7004 pushed a commit to shyama7004/opencv that referenced this pull request Jan 20, 2025
Faster implementation of blobFromImages for cpu nchw output opencv#26127

Faster implementation of blobFromImage and blobFromImages for
HWC cv::Mat images -> NCHW cv::Mat
case

Running time on my pc in ms:

**blobFromImage**
```
image size            old        new   speed-up
32x32x3             0.008      0.002       4.0x
64x64x3             0.021      0.009       2.3x
128x128x3           0.164      0.037       4.4x
256x256x3           0.728      0.158       4.6x
512x512x3           3.310      0.628       5.2x
1024x1024x3        14.503      3.124       4.6x
2048x2048x3        61.647     28.049       2.2x
```

**blobFromImages**
```
image size            old        new   speed-up
16x32x32x3          0.122      0.041       3.0x
16x64x64x3          0.790      0.165       4.8x
16x128x128x3        3.313      0.652       5.1x
16x256x256x3       13.495      3.127       4.3x
16x512x512x3       58.795     28.127       2.1x
16x1024x1024x3    251.135    121.955       2.1x
16x2048x2048x3   1023.570    487.188       2.1x
```


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
NanQin555 pushed a commit to NanQin555/opencv that referenced this pull request Feb 24, 2025
Faster implementation of blobFromImages for cpu nchw output opencv#26127

Faster implementation of blobFromImage and blobFromImages for
HWC cv::Mat images -> NCHW cv::Mat
case

Running time on my pc in ms:

**blobFromImage**
```
image size            old        new   speed-up
32x32x3             0.008      0.002       4.0x
64x64x3             0.021      0.009       2.3x
128x128x3           0.164      0.037       4.4x
256x256x3           0.728      0.158       4.6x
512x512x3           3.310      0.628       5.2x
1024x1024x3        14.503      3.124       4.6x
2048x2048x3        61.647     28.049       2.2x
```

**blobFromImages**
```
image size            old        new   speed-up
16x32x32x3          0.122      0.041       3.0x
16x64x64x3          0.790      0.165       4.8x
16x128x128x3        3.313      0.652       5.1x
16x256x256x3       13.495      3.127       4.3x
16x512x512x3       58.795     28.127       2.1x
16x1024x1024x3    251.135    121.955       2.1x
16x2048x2048x3   1023.570    487.188       2.1x
```


### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants