Faster implementation of blobFromImages for cpu nchw output#26127
Merged
asmorkalov merged 4 commits intoopencv:4.xfrom Dec 23, 2024
Merged
Faster implementation of blobFromImages for cpu nchw output#26127asmorkalov merged 4 commits intoopencv:4.xfrom
asmorkalov merged 4 commits intoopencv:4.xfrom
Conversation
Contributor
|
@alexlyulkov please add performance test. |
Contributor
|
@fengyuentau, after you finish with C3 optimization in warping functions and before you move to bicubic case optimization, may I ask you to take a look at it? We need to compare speed of this implementation with existing one in 4.x and 5.x branches. |
fengyuentau
requested changes
Dec 20, 2024
Member
fengyuentau
left a comment
There was a problem hiding this comment.
@vpisarev @asmorkalov This patch generally brings better performance regardless 4.x or 5.x branch, although I only tested on my Macbook Air with M1. See below for detailed performance testing results. Code for perfomance testing: fengyuentau@b01f28c
Geometric mean (ms)
Name of Test base-4x patch-4x patch-4x
vs
base-4x
(x-factor)
HWC_TO_NCHW::Utils_blobFromImage::{ 32, 32 } 0.005 0.001 5.00
HWC_TO_NCHW::Utils_blobFromImage::{ 64, 64 } 0.013 0.001 8.94
HWC_TO_NCHW::Utils_blobFromImage::{ 128, 128 } 0.052 0.009 5.53
HWC_TO_NCHW::Utils_blobFromImage::{ 256, 256 } 0.205 0.037 5.58
HWC_TO_NCHW::Utils_blobFromImage::{ 512, 512 } 0.935 0.274 3.42
HWC_TO_NCHW::Utils_blobFromImage::{ 1024, 1024 } 3.246 0.671 4.84
HWC_TO_NCHW::Utils_blobFromImage::{ 2048, 2048 } 15.888 5.352 2.97
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 32, 32 } 0.068 0.011 6.41
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 64, 64 } 0.212 0.032 6.68
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 128, 128 } 0.921 0.261 3.53
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 256, 256 } 4.046 1.315 3.08
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 512, 512 } 16.397 5.695 2.88
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 1024, 1024 } 64.182 21.845 2.94
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 2048, 2048 } 255.997 86.815 2.95
Geometric mean (ms)
Name of Test base-5x patch-5x patch-5x
vs
base-5x
(x-factor)
HWC_TO_NCHW::Utils_blobFromImage::{ 32, 32 } 0.005 0.001 5.17
HWC_TO_NCHW::Utils_blobFromImage::{ 64, 64 } 0.013 0.001 8.68
HWC_TO_NCHW::Utils_blobFromImage::{ 128, 128 } 0.050 0.009 5.34
HWC_TO_NCHW::Utils_blobFromImage::{ 256, 256 } 0.189 0.036 5.19
HWC_TO_NCHW::Utils_blobFromImage::{ 512, 512 } 0.910 0.433 2.10
HWC_TO_NCHW::Utils_blobFromImage::{ 1024, 1024 } 3.239 0.663 4.88
HWC_TO_NCHW::Utils_blobFromImage::{ 2048, 2048 } 15.499 9.550 1.62
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 32, 32 } 0.067 0.011 5.85
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 64, 64 } 0.207 0.035 5.98
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 128, 128 } 0.902 0.450 2.00
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 256, 256 } 3.893 2.279 1.71
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 512, 512 } 15.899 9.360 1.70
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 1024, 1024 } 61.762 23.486 2.63
HWC_TO_NCHW::Utils_blobFromImages::{ 16, 2048, 2048 } 249.740 94.881 2.63
BTW, I need to do the following changes so as to fix compile errors.
Contributor
|
@fengyuentau Could you push your commit to Alex's branch. He is on vocation now. |
Signed-off-by: Yuantao Feng <yuantao.feng@opencv.org.cn>
Signed-off-by: Yuantao Feng <yuantao.feng@opencv.org.cn>
asmorkalov
approved these changes
Dec 23, 2024
RoshniUG
pushed a commit
to RoshniUG/opencv
that referenced
this pull request
Dec 24, 2024
Faster implementation of blobFromImages for cpu nchw output opencv#26127 Faster implementation of blobFromImage and blobFromImages for HWC cv::Mat images -> NCHW cv::Mat case Running time on my pc in ms: **blobFromImage** ``` image size old new speed-up 32x32x3 0.008 0.002 4.0x 64x64x3 0.021 0.009 2.3x 128x128x3 0.164 0.037 4.4x 256x256x3 0.728 0.158 4.6x 512x512x3 3.310 0.628 5.2x 1024x1024x3 14.503 3.124 4.6x 2048x2048x3 61.647 28.049 2.2x ``` **blobFromImages** ``` image size old new speed-up 16x32x32x3 0.122 0.041 3.0x 16x64x64x3 0.790 0.165 4.8x 16x128x128x3 3.313 0.652 5.1x 16x256x256x3 13.495 3.127 4.3x 16x512x512x3 58.795 28.127 2.1x 16x1024x1024x3 251.135 121.955 2.1x 16x2048x2048x3 1023.570 487.188 2.1x ``` See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake Update window_cocoa.mm
Merged
shyama7004
pushed a commit
to shyama7004/opencv
that referenced
this pull request
Jan 20, 2025
Faster implementation of blobFromImages for cpu nchw output opencv#26127 Faster implementation of blobFromImage and blobFromImages for HWC cv::Mat images -> NCHW cv::Mat case Running time on my pc in ms: **blobFromImage** ``` image size old new speed-up 32x32x3 0.008 0.002 4.0x 64x64x3 0.021 0.009 2.3x 128x128x3 0.164 0.037 4.4x 256x256x3 0.728 0.158 4.6x 512x512x3 3.310 0.628 5.2x 1024x1024x3 14.503 3.124 4.6x 2048x2048x3 61.647 28.049 2.2x ``` **blobFromImages** ``` image size old new speed-up 16x32x32x3 0.122 0.041 3.0x 16x64x64x3 0.790 0.165 4.8x 16x128x128x3 3.313 0.652 5.1x 16x256x256x3 13.495 3.127 4.3x 16x512x512x3 58.795 28.127 2.1x 16x1024x1024x3 251.135 121.955 2.1x 16x2048x2048x3 1023.570 487.188 2.1x ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
NanQin555
pushed a commit
to NanQin555/opencv
that referenced
this pull request
Feb 24, 2025
Faster implementation of blobFromImages for cpu nchw output opencv#26127 Faster implementation of blobFromImage and blobFromImages for HWC cv::Mat images -> NCHW cv::Mat case Running time on my pc in ms: **blobFromImage** ``` image size old new speed-up 32x32x3 0.008 0.002 4.0x 64x64x3 0.021 0.009 2.3x 128x128x3 0.164 0.037 4.4x 256x256x3 0.728 0.158 4.6x 512x512x3 3.310 0.628 5.2x 1024x1024x3 14.503 3.124 4.6x 2048x2048x3 61.647 28.049 2.2x ``` **blobFromImages** ``` image size old new speed-up 16x32x32x3 0.122 0.041 3.0x 16x64x64x3 0.790 0.165 4.8x 16x128x128x3 3.313 0.652 5.1x 16x256x256x3 13.495 3.127 4.3x 16x512x512x3 58.795 28.127 2.1x 16x1024x1024x3 251.135 121.955 2.1x 16x2048x2048x3 1023.570 487.188 2.1x ``` ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [ ] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Faster implementation of blobFromImage and blobFromImages for
HWC cv::Mat images -> NCHW cv::Mat
case
Running time on my pc in ms:
blobFromImage
blobFromImages
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.