Use T-API for critical parts of blobFromImagesWithParams#23894
Use T-API for critical parts of blobFromImagesWithParams#23894asmorkalov merged 9 commits intoopencv:4.xfrom
Conversation
|
@kallaballa, thank you for the pull request. I have some doubts regarding it.
|
|
Alright. At least I'll provide some numbers based on my machine. |
|
I modified one of the V4D demos to track face detection time using In this scenario 1000 iterations of FaceDetectorYN::detect take:
I'll add flame graphs. I understand your arguments but given the simplicity of the patch the risk/gain ratio isn't so bad :) |
|
Btw. I made similar patches e.g for TrackerKCF with considerable performance gain. I guess I should drop those? |
|
@kallaballa, thank you for the quick response. The acceleration in your case is noticeable, indeed! What's the resolution of images that you feed to blobFromImagesWithParams? I think, I can propose a compromise solution that will make everybody happy.
|
960x540
Sounds good. Will implement it like that. |
c231091 to
e026820
Compare
4610c9d to
8f72b3b
Compare
fixed separate code paths for face detect
8f72b3b to
00804cb
Compare
|
I think that's it. 00804cb |
If you are alright with the general approach, i'd improve the implementation a bit more. |
|
Also I wrote a test that compares detection. there are differences (some frames not detected with UMat) I am trying to track down. |
modules/dnn/src/dnn_utils.cpp
Outdated
| void getChannelFromBlob(UMat& m, InputArray blob, int i, int j, int rows, int cols, int type) { | ||
| UMat ublob = blob.getUMat(); | ||
| int offset = i * cols + j; | ||
| int offset = (i * ublob.step.p[0] + j * ublob.step.p[1]) / ublob.elemSize(); |
There was a problem hiding this comment.
I forgot to take into account step() and elemSize() for the offset. Now it works on par.
|
I have put other more code on the fast-path by porting NaryEltwiseLayer to UMat. I understand given the developments around 5.0 that this doesn't have priority but there is considerable gain. Should i post figures and make a PR? Only part left to port is ResizeLayer to keep it on the GPU all the time. |
modules/dnn/src/dnn_utils.cpp
Outdated
| if(blob_.kind() == _InputArray::UMAT) | ||
| blob = blob_.getUMat(); | ||
| else if(blob_.kind() == _InputArray::MAT) { | ||
| blob = blob_.getMat().getUMat(flag); |
There was a problem hiding this comment.
Current UMat design has limitation for storing results of .getMat() / .getUMat somewhere (should be used locally only) - upstream lifetime check should pass.
There was a problem hiding this comment.
.clone() is overkill.
blob_.getMat() return temporary object - it should be alive till .getUMat(flag) release.
We don't need this method at all as there is _InputArray::getUMat(): https://github.com/opencv/opencv/blob/4.8.0/modules/core/src/matrix_wrap.cpp#L126C5-L126C27
modules/dnn/src/dnn_utils.cpp
Outdated
| void blobFromImagesWithParams(InputArrayOfArrays images, OutputArray blob, const Image2BlobParams& param) { | ||
| CV_TRACE_FUNCTION(); | ||
|
|
||
| if (images.kind() == _InputArray::STD_VECTOR_UMAT) { |
|
| padWithDivisor(input_image, pad_image); | ||
| // Build blob from input image | ||
| input_blob = dnn::blobFromImage(pad_image); | ||
| } else { |
There was a problem hiding this comment.
@vpisarev T-API declares what we should not have such code separation on "user" side.
|
@opencv-alalek @vpisarev Is it ready for merge? |
|
Yay! |
Pertaining Issue: opencv#5697 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Pertaining Issue: opencv#5697 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Pertaining Issue: opencv#5697 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake


Pertaining Issue: #5697
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.