Use T-API for critical parts of blobFromImagesWithParams by kallaballa · Pull Request #23894 · opencv/opencv

kallaballa · 2023-06-29T15:58:02Z

Pertaining Issue: #5697

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

vpisarev · 2023-06-29T18:19:32Z

@kallaballa, thank you for the pull request. I have some doubts regarding it.

it references the issue opencl has no impact on the performance of face detection #5697 that complains about zero performance gain when using OpenCV + T-API for inference. However, a typical deep learning model includes a few dozens if not 100+ layers, so the preprocessing, which is just a few operations.
in the experimental inference engine https://github.com/vpisarev/ficus/tree/master/lib/NN, which we slowly, part by part, migrate to OpenCV, there is a single-pass "blobFromImage" (it's called differently there) preprocessing operation (https://github.com/vpisarev/ficus/blob/master/lib/NN/Preprocess.fx) - I would prefer to migrate it and optimize using OpenCL rather than to use multi-pass collection of primitive operations.
If the whole pipeline is to capture high-resolution images/video stream, downscale it and run a deep net, then resize might be one of the bottleneck. In such a case user could manually copy the input to UMat, call OpenCV's resize, download result to CPU (.getMat()) and then run the rest of preprocessing on a downscaled image.
It would be nice to do a little research and provide numbers - by how much this patch improves performance on different hardware: 1. desktop CPU with discrete GPU, 2. mobile Intel/AMD CPU with integrated graphics, 3. ARM with Mali or another OpenCL-capable graphics.
If the performance, as measured in item 4, is not noticeably better than with CPU (it could well be the case, see item 1), I would prefer to keep preprocessing on CPU. From time to time we get various problem with our OpenCL kernels. I would like to keep things more stable and predictable, unless there is clear advantage.
In OpenCV 5 we plan, among other things, to make OpenCV much more friendly to various GPU/NPU accelerators. UMat should become universal data structure for buffers, directly accessible by various acceleration APIs, including OpenCL, Vulkan, GLSL, CUDA etc. Then we can get back to such a patch. But, as I said in 2., I would prefer the whole blobFromImage(s) to be converted into a single well-optimized parallel loop. Then it could be ported to OpenCL/GLSL/CUDA etc. and then user can directly call blobFromImages and pass UMat there. Now it's not the proper time yet, in my opinion.

kallaballa · 2023-06-29T19:10:42Z

Alright. At least I'll provide some numbers based on my machine.

kallaballa · 2023-06-29T20:53:56Z

I modified one of the V4D demos to track face detection time using getTickCount().
My machine: 11th Gen Intel(R) Core(TM) i7-1160G7 @ 1.20GHz with Iris Xe Graphics
Video used (from 00:01:00): https://www.youtube.com/watch?v=hUAT8Jm_dvw

In this scenario 1000 iterations of FaceDetectorYN::detect take:

With patch: 25.7823s
Without patch: 34.0648s

I'll add flame graphs.

I understand your arguments but given the simplicity of the patch the risk/gain ratio isn't so bad :)

kallaballa · 2023-06-29T21:28:34Z

Flame graphs zoomed in on FaceDetectorYN::detect
Without patch:

With patch:

kallaballa · 2023-06-29T22:19:23Z

Btw. I made similar patches e.g for TrackerKCF with considerable performance gain. I guess I should drop those?

vpisarev · 2023-06-30T06:43:23Z

@kallaballa, thank you for the quick response. The acceleration in your case is noticeable, indeed! What's the resolution of images that you feed to blobFromImagesWithParams?

I think, I can propose a compromise solution that will make everybody happy.
Let's implement "compute follow data" principle (that is already used in OpenCV when it comes to T-API):
if inputs are UMat's (images.kind == _InputArray::STD_VECTOR_UMAT), use your new branch, otherwise use the original one.

OpenCV video capture can return the frames in UMat's, not just Mat's, and some video capture backends support UMat output in very efficient way.
You can use .getUMat() function to convert each processed image into UMat quickly.
Yes, users will have to modify their apps to use your branch, but for now such explicit behaviour for this for the better.

kallaballa · 2023-06-30T08:33:09Z

@kallaballa, thank you for the quick response. The acceleration in your case is noticeable, indeed! What's the resolution of images that you feed to blobFromImagesWithParams?

960x540

I think, I can propose a compromise solution that will make everybody happy. Let's implement "compute follow data" principle (that is already used in OpenCV when it comes to T-API): if inputs are UMat's (images.kind == _InputArray::STD_VECTOR_UMAT), use your new branch, otherwise use the original one.

OpenCV video capture can return the frames in UMat's, not just Mat's, and some video capture backends support UMat output in very efficient way.

You can use .getUMat() function to convert each processed image into UMat quickly.

Yes, users will have to modify their apps to use your branch, but for now such explicit behaviour for this for the better.

Sounds good. Will implement it like that.

fixed separate code paths for face detect

kallaballa · 2023-07-16T06:46:49Z

I think that's it. 00804cb

kallaballa · 2023-07-18T16:52:57Z

I think that's it. 00804cb

If you are alright with the general approach, i'd improve the implementation a bit more.

kallaballa · 2023-07-18T17:21:14Z

Also I wrote a test that compares detection. there are differences (some frames not detected with UMat) I am trying to track down.

kallaballa · 2023-07-20T04:53:46Z

modules/dnn/src/dnn_utils.cpp

+void getChannelFromBlob(UMat& m, InputArray blob, int i, int j, int rows, int cols, int type) {
    UMat ublob = blob.getUMat();
-    int offset = i * cols + j;
+    int offset = (i * ublob.step.p[0] + j * ublob.step.p[1]) / ublob.elemSize();


I forgot to take into account step() and elemSize() for the offset. Now it works on par.

kallaballa · 2023-09-28T20:42:45Z

I have put other more code on the fast-path by porting NaryEltwiseLayer to UMat. I understand given the developments around 5.0 that this doesn't have priority but there is considerable gain. Should i post figures and make a PR? Only part left to port is ResizeLayer to keep it on the GPU all the time.

kallaballa · 2023-09-28T21:04:02Z

Here a flame graph of the current state:

opencv-alalek · 2023-09-15T06:56:36Z

modules/dnn/src/dnn_utils.cpp

+    if(blob_.kind() == _InputArray::UMAT)
+        blob = blob_.getUMat();
+    else if(blob_.kind() == _InputArray::MAT) {
+        blob = blob_.getMat().getUMat(flag);


Current UMat design has limitation for storing results of .getMat() / .getUMat somewhere (should be used locally only) - upstream lifetime check should pass.

.clone() is overkill.

blob_.getMat() return temporary object - it should be alive till .getUMat(flag) release.

We don't need this method at all as there is _InputArray::getUMat(): https://github.com/opencv/opencv/blob/4.8.0/modules/core/src/matrix_wrap.cpp#L126C5-L126C27

opencv-alalek · 2023-09-15T06:58:08Z

modules/dnn/src/dnn_utils.cpp

+void blobFromImagesWithParams(InputArrayOfArrays images, OutputArray blob, const Image2BlobParams& param) {
+    CV_TRACE_FUNCTION();
+
+        if (images.kind() == _InputArray::STD_VECTOR_UMAT) {


broken indentation

modules/dnn/src/dnn_utils.cpp

asmorkalov · 2023-10-13T09:43:24Z

modules/dnn/src/dnn_utils.cpp:298: tab in indent.
+	if (images.kind() == _InputArray::STD_VECTOR_UMAT) {
modules/dnn/src/dnn_utils.cpp:299: tab in indent.
+		if(blob.kind() == _InputArray::UMAT) {
modules/dnn/src/dnn_utils.cpp:300: tab in indent.
+			UMat& u = blob.getUMatRef();
modules/dnn/src/dnn_utils.cpp:301: tab in indent.
+			blobFromImagesWithParamsImpl<cv::UMat>(images, u, param);
modules/dnn/src/dnn_utils.cpp:302: tab in indent.
+			return;
modules/dnn/src/dnn_utils.cpp:303: tab in indent.
+		} else if(blob.kind() == _InputArray::MAT) {
modules/dnn/src/dnn_utils.cpp:304: tab in indent.
+			UMat u = blob.getMatRef().getUMat(ACCESS_WRITE);
modules/dnn/src/dnn_utils.cpp:305: tab in indent.
+			blobFromImagesWithParamsImpl<cv::UMat>(images, u, param);
modules/dnn/src/dnn_utils.cpp:306: tab in indent.
+			u.copyTo(blob);
modules/dnn/src/dnn_utils.cpp:307: tab in indent.
+			return;
modules/dnn/src/dnn_utils.cpp:308: tab in indent.
+		}
modules/dnn/src/dnn_utils.cpp:309: tab in indent.
+	} else if (images.kind() == _InputArray::STD_VECTOR_MAT) {
modules/dnn/src/dnn_utils.cpp:310: tab in indent.
+		if(blob.kind() == _InputArray::UMAT) {
modules/dnn/src/dnn_utils.cpp:311: tab in indent.
+			Mat m = blob.getUMatRef().getMat(ACCESS_WRITE);
modules/dnn/src/dnn_utils.cpp:312: tab in indent.
+			blobFromImagesWithParamsImpl<cv::Mat>(images, m, param);
modules/dnn/src/dnn_utils.cpp:313: tab in indent.
+			m.copyTo(blob);
modules/dnn/src/dnn_utils.cpp:314: tab in indent.
+			return;
modules/dnn/src/dnn_utils.cpp:315: tab in indent.
+		} else if(blob.kind() == _InputArray::MAT) {
modules/dnn/src/dnn_utils.cpp:316: tab in indent.
+			Mat& m = blob.getMatRef();
modules/dnn/src/dnn_utils.cpp:317: tab in indent.
+			blobFromImagesWithParamsImpl<cv::Mat>(images, m, param);
modules/dnn/src/dnn_utils.cpp:318: tab in indent.
+			return;
modules/dnn/src/dnn_utils.cpp:319: tab in indent.
+		}
modules/dnn/src/dnn_utils.cpp:320: tab in indent.
+	}

kallaballa · 2023-10-14T18:52:39Z

fbe2ccb

opencv-alalek · 2023-10-17T07:39:35Z

modules/objdetect/src/face_detect.cpp

+            padWithDivisor(input_image, pad_image);
+            // Build blob from input image
+            input_blob = dnn::blobFromImage(pad_image);
+        } else {


@vpisarev T-API declares what we should not have such code separation on "user" side.

asmorkalov · 2023-10-18T07:45:39Z

@opencv-alalek @vpisarev Is it ready for merge?

kallaballa · 2023-10-21T04:10:10Z

Yay!

Pertaining Issue: opencv#5697 ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake

asmorkalov requested a review from vpisarev June 29, 2023 17:05

asmorkalov added optimization category: dnn labels Jun 29, 2023

kallaballa force-pushed the blobFromImagesWithParams branch 2 times, most recently from c231091 to e026820 Compare July 5, 2023 11:16

kallaballa force-pushed the blobFromImagesWithParams branch from 4610c9d to 8f72b3b Compare July 16, 2023 05:04

Use T-API for critical parts of blobFromImagesWithParams

00804cb

fixed separate code paths for face detect

kallaballa force-pushed the blobFromImagesWithParams branch from 8f72b3b to 00804cb Compare July 16, 2023 06:43

renaming + fix offset in getChannelFromBlob for UMats

1c96a0c

kallaballa commented Jul 20, 2023

View reviewed changes

asmorkalov added this to the 4.9.0 milestone Sep 15, 2023

vpisarev approved these changes Oct 6, 2023

View reviewed changes

opencv-alalek reviewed Oct 9, 2023

View reviewed changes

kallaballa added 5 commits October 9, 2023 20:16

formatting

e7107e7

support all kinds of InputArrays

86095e6

clone InputArray storage

a030263

merge

5094da6

merge gone wrong

b866de1

use direct getters for UMat and Mat on InputArray

9b85e92

replaced tabs with spaces

fbe2ccb

kallaballa mentioned this pull request Oct 16, 2023

Partially porting dnn resize_layer to T-API #24413

Open

opencv-alalek reviewed Oct 17, 2023

View reviewed changes

asmorkalov merged commit c2f909f into opencv:4.x Oct 20, 2023

asmorkalov assigned vpisarev Oct 20, 2023

asmorkalov mentioned this pull request Nov 3, 2023

(5.x) Merge 4.x #24486

Merged

Uh oh!

Conversation

kallaballa commented Jun 29, 2023

Pull Request Readiness Checklist

Uh oh!

vpisarev commented Jun 29, 2023

Uh oh!

kallaballa commented Jun 29, 2023

Uh oh!

kallaballa commented Jun 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kallaballa commented Jun 29, 2023

Uh oh!

kallaballa commented Jun 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vpisarev commented Jun 30, 2023

Uh oh!

kallaballa commented Jun 30, 2023

Uh oh!

kallaballa commented Jul 16, 2023

Uh oh!

kallaballa commented Jul 18, 2023

Uh oh!

kallaballa commented Jul 18, 2023

Uh oh!

kallaballa Jul 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kallaballa commented Sep 28, 2023

Uh oh!

kallaballa commented Sep 28, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

asmorkalov commented Oct 13, 2023

Uh oh!

kallaballa commented Oct 14, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asmorkalov commented Oct 18, 2023

Uh oh!

kallaballa commented Oct 21, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kallaballa commented Jun 29, 2023 •

edited

Loading

kallaballa commented Jun 29, 2023 •

edited

Loading

kallaballa Jul 20, 2023 •

edited

Loading