G-API: fix OCL backend issue with UMat lifetime by andrey-golubev · Pull Request #14985 · opencv/opencv

andrey-golubev · 2019-07-05T11:15:22Z

This pullrequest

Fixes issue with cv::UMat lifetime in OCL backend: output may "outlive" the cv::Mat parent (this issue is reproduced locally with the newly added test)
Adds test checking how G-API backends handle re-initialized output

The issue can be described in a code snippet

// assume we have some graph declared
cv::GComputation c(in, out);

// now we want to execute the computation
cv::Mat input_mat = getInput();
cv::Mat output_mat = cv::Mat(sz, type);
c.apply(input_mat, output_mat, cv::compile_args(cv::gapi::core::gpu::kernels())); // (1)

// now we want to execute the computation but our output_mat is changed by us (the user)
show(output_mat);  // assume output is used somewhere else e.g. shown on screen
output_mat = cv::Mat(sz, type); // (2)
c.apply(input_mat, output_mat, cv::compile_args(cv::gapi::core::gpu::kernels())); // (3)
// <segmentation fault>

let's investigate:

step (1) successfully executes the graph and result is written into output_mat.
Internally, (since we use OpenCL backend) output data in form of UMat is still held by the backend. Note: we construct UMat inside the backend
step (2) we reset the output buffer, now remember that at step (1) there's a lonely UMat somewhere inside the G-API backend. Now what we have in this lonely UMat is a dangling pointer to the original data (UMatData::origdata) that still points to already re-allocated Mat::data. What's more: this original data pointer is also "captured" by OCL runtime since we explicitly set this UMatData::origdata as host_ptr to clCreateBuffer()
at step (3) comes the point of no return: we need to store newly arrived output data into internal container inside OpenCL backend to execute our graph and write result to that buffer. We do it via assignment operation and previously created UMat (that lonely guy from step (2)) is queued for destruction: but its host_ptr is dangling so when we do clReleaseMemObject() the UB starts. When we request new UMat from new output buffer (assuming this happens after "invalid" clReleaseMemObject()) there's a segmentation fault in clCreateBuffer() - this is the next call to OCL runtime in my case so it fails

Bottom line: we get segmentation fault inside OpenCL due to the following:

// <on allocation>:
// we set u->origdata as host_ptr
handle = clCreateBuffer(ctx_handle, CL_MEM_USE_HOST_PTR|createFlags,
    u->size, u->origdata, &retval);
u->handle = handle;
...
// between allocation and deallocation u->origdata becomes dangling
// because original cv::Mat is reinitialized
...
// <on deallocation>:
// we release internal OCL stuff and (very likely) access _dangling_ u->origdata <=> host_ptr
cl_int retval = clReleaseMemObject((cl_mem)u->handle); // the handle is the same

I am not sure exactly what happens inside the OpenCL but I see that void* data = clEnqueueMapBuffer(q, (cl_mem)u->handle, CL_TRUE, (CL_MAP_READ | CL_MAP_WRITE), 0, u->size, 0, 0, 0, &retval); returns the same (already dangling) pointer (there's explicit CV_Assert(u->origdata == data)).

So what we get after step (2) is the dangling pointer inside OpenCL and we explicitly ask OpenCL to clear the data by that pointer using the cl_mem handle => undefined behavior

I have the following configuration with which I am able to reproduce the issue:
(using Ubuntu 16.04.1 kernel 4.15.0-54-generic x86_64 GNU/Linux)

CTEST_FULL_OUTPUT
OpenCV version: 4.1.1-pre
OpenCV VCS version: 4.1.0-530-gae8d598-dirty
Build type: Release
Compiler: /usr/bin/c++  (ver 5.5.0)
Parallel framework: tbb
CPU features: SSE SSE2 SSE3 *SSE4.1 *SSE4.2 *FP16 *AVX *AVX2
Intel(R) IPP version: ippIP AVX2 (l9) 2019.0.0 Gold (-) Jul 24 2018
OpenCL Platforms: 
    Intel(R) OpenCL HD Graphics
        iGPU: Intel(R) Gen9 HD Graphics NEO (OpenCL 2.1 NEO )
Current OpenCL device: 
    Type = iGPU
    Name = Intel(R) Gen9 HD Graphics NEO
    Version = OpenCL 2.1 NEO 
    Driver version = 18.24.10921
    Address bits = 64
    Compute units = 24
    Max work group size = 256
    Local memory size = 64 KB
    Max memory allocation size = 3 GB 1023 MB 1016 KB
    Double support = Yes
    Host unified memory = Yes
    Device extensions:
        cl_khr_3d_image_writes
        cl_khr_byte_addressable_store
        cl_khr_fp16
        cl_khr_depth_images
        cl_khr_global_int32_base_atomics
        cl_khr_global_int32_extended_atomics
        cl_khr_icd
        cl_khr_image2d_from_buffer
        cl_khr_local_int32_base_atomics
        cl_khr_local_int32_extended_atomics
        cl_intel_subgroups
        cl_intel_required_subgroup_size
        cl_intel_subgroups_short
        cl_khr_spir
        cl_intel_accelerator
        cl_intel_media_block_io
        cl_intel_driver_diagnostics
        cl_intel_device_side_avc_motion_estimation
        cl_khr_priority_hints
        cl_khr_throttle_hints
        cl_khr_create_command_queue
        cl_khr_fp64
        cl_khr_subgroups
        cl_khr_il_program
        cl_khr_mipmap_image
        cl_khr_mipmap_image_writes
        cl_intel_planar_yuv
        cl_intel_packed_yuv
        cl_intel_motion_estimation
        cl_intel_advanced_motion_estimation
        cl_intel_va_api_media_sharing
    Has AMD Blas = No
    Has AMD Fft = No
    Preferred vector width char = 16
    Preferred vector width short = 8
    Preferred vector width int = 4
    Preferred vector width long = 1
    Preferred vector width float = 1
    Preferred vector width double = 1

Add tests on initialized/uninitialized outputs for all backends

andrey-golubev · 2019-07-05T11:15:56Z

@dmatveev @dbudniko @rgarnov @anton-potapov please review

andrey-golubev · 2019-07-08T07:53:33Z

Tagged "WIP": need to investigate whether clean up procedure is valid

dmatveev · 2019-07-08T13:14:07Z

Please ask @dbudniko, @rgarnov , and maybe @alalek to review the bug/fix.

modules/gapi/test/common/gapi_core_tests_inl.hpp

dmatveev · 2019-07-08T14:01:46Z

modules/gapi/test/cpu/gapi_core_tests_cpu.cpp

+                                       cv::Size(250, 320),      // out size < in size
+                                       cv::Size(1920, 1080),    // out size > in size
+                                       cv::Size(640, 400),      // out width = in width
+                                       cv::Size(10, 480)        // out height = in height


Testing the above contracts by passing a cartesian product of combinations is not the right way to go. Every test should be expessed separately to clearly show what it means.

dbudniko

Please fix tests according to @dmatveev review.

andrey-golubev · 2019-07-09T08:52:57Z

@dmatveev let's make a proper test for output in a different PR, from your feedback it looks like quite a lot of code that can be better reviewed separately. I reduced this test to only reproduce the issue and it takes ~300 ms for this "test suite" to complete (all backends), so I guess we're safe to add it "as is" as an additional coverage

dmatveev · 2019-07-11T15:50:16Z

I see some bug fixed here but can't find any description on that.
What is the root cause of the problem, when it is reproduced, and how it was fixed.

Please add it here.

andrey-golubev · 2019-07-12T09:20:34Z

@dmatveev added the issue to the description

dmatveev · 2019-07-15T14:27:23Z

@andrey-golubev thanks! the description is now really nice & detailed.

at (3) since we do c.apply(...) for the second time and input stays unchanged, the backend is not "re-created" (destroyed and constructed).
Then comes the point of no return: we need store newly arrived output data into internal container inside OpenCL backend to execute our graph and write result to that buffer. But, when we request new UMat from new output buffer, there's a segmentation fault in clCreateBuffer(...)

So according to this text, we do not pass the existing dangling UMat to OCL for execution but instead create a new one for the (changed) output buffer? So why it crashes then?

andrey-golubev · 2019-07-15T16:35:09Z

@dmatveev I think step (3) is misleading: will try to re-phrase.

We have this dangling pointer implicitly already in OCL (at step (2)) because we set it to OCL runtime with clCreateBuffer(..., CL_MEM_USE_HOST_PTR, ...) on first call to c.apply().

Now on the second call, before creating new UMat, we release the previous one: as we release it with cl_mem handle that "remembered" the original pointer (which is now dangling), there's something very bad happening inside OCL when we request clReleaseMemObject(handle) (UB so to say): and on next call to OCL we get a segmentation fault (which happened to be clCreateBuffer in my case)

andrey-golubev · 2019-07-15T16:35:45Z

Maybe @alalek can add up to my explanation

dmatveev · 2019-07-16T15:45:14Z

Now on the second call, before creating new UMat, we release the previous one: as we release it with cl_mem handle that "remembered" the original pointer (which is now dangling), there's something very bad happening inside OCL when we request clReleaseMemObject(handle) (UB so to say): and on next call to OCL we get a segmentation fault (which happened to be clCreateBuffer in my case)

Thanks, now it's clear. So if UMat/Mat relationship is designed this way, it seems the only proper thing to do is to drop our input/output UMat objects after execution?

dmatveev

Did u measure the performance on master against this? How bad is this now? Can u share a diff please? (there's an OpenCV script for that)

dmatveev · 2019-07-16T15:47:24Z

modules/gapi/src/backends/ocl/goclbackend.cpp

+    //     For example, if the original output (cv::Mat) is re-initialized by the user but we still
+    //     hold cv::UMat -> we get cv::UMat that has a parent that was already destroyed. Also,
+    //     since we don't own the data (the user does), there's no point holding it after we're done
+    const auto clean_up = [&input_objs, &output_objs] (cv::gimpl::Mag* p)


due to std::unique_ptr usage below: pointer to object is expected as an input/output parameter

dmatveev · 2019-07-16T15:48:37Z

modules/gapi/src/backends/ocl/goclbackend.cpp

+        for (auto& it : output_objs) umats.erase(it.first.id);
+    };
+    // RAII wrapper to clean-up m_res
+    std::unique_ptr<cv::gimpl::Mag, decltype(clean_up)> cleaner(&m_res, clean_up);


Jez c y. thx!

I am not sure I understand what you mean here. Figured out that std::unique_ptr with custom deleter is less boilerplate than custom Mag wrapper class

andrey-golubev · 2019-07-17T07:27:29Z

@dmatveev answered you below:

Thanks, now it's clear. So if UMat/Mat relationship is designed this way, it seems the only proper thing to do is to drop our input/output UMat objects after execution?

Since we create UMats for input/output internally and in place, yes, I think right now it's the only proper option (the other would be to allow user to pass UMats as input/output but it's a different story).
As a side note: we unconditionally create input/output UMats each time in run() when doing bindInArgs/bindOutArgs so the current logic clearly states that input/output UMats are only required for single run() execution

As for the perf. measurements: didn't check yet, will do and share

andrey-golubev · 2019-07-17T16:08:14Z

Marked as "WIP": please do not merge yet. I have one more rabbit hole to go down

dmatveev · 2019-07-23T18:07:47Z

Didn't we agree to day to merge this stuff?
Please file a task to understand further performance implications & attach this performance data there.

Thanks! 👍

andrey-golubev · 2019-07-24T08:15:31Z

@dmatveev indeed we agreed. Will add a task to our tracking system. Going to check that everything works with latest master and we can merge after that

…ix_ocl_umat

andrey-golubev · 2019-07-24T09:55:12Z

@alalek please feel free to merge

andrey-golubev · 2019-07-25T09:17:08Z

Thanks!

dmatveev · 2019-07-25T21:24:06Z

Thanks!

* G-API: fix GOCLExecutable issue with UMat lifetime Add tests on initialized/uninitialized outputs for all backends * Use proper clean-up procedure for magazine * Rename InitOut test and reduce tested sizes * Enable output allocation test

G-API: fix GOCLExecutable issue with UMat lifetime

ba9af76

Add tests on initialized/uninitialized outputs for all backends

andrey-golubev mentioned this pull request Jul 5, 2019

G-API: clean up accuracy tests #14945

Merged

andrey-golubev requested review from dmatveev and rgarnov and removed request for rgarnov July 8, 2019 06:32

andrey-golubev changed the title ~~G-API: fix OCL backend issue with UMat lifetime && add test on output for backends~~ [WIP] G-API: fix OCL backend issue with UMat lifetime && add test on output for backends Jul 8, 2019

Use proper clean-up procedure for magazine

ae8d598

andrey-golubev changed the title ~~[WIP] G-API: fix OCL backend issue with UMat lifetime && add test on output for backends~~ G-API: fix OCL backend issue with UMat lifetime && add test on output for backends Jul 8, 2019

andrey-golubev requested a review from rgarnov July 8, 2019 08:21

dmatveev reviewed Jul 8, 2019

View reviewed changes

dbudniko approved these changes Jul 9, 2019

View reviewed changes

Rename InitOut test and reduce tested sizes

1f10c5f

andrey-golubev changed the title ~~G-API: fix OCL backend issue with UMat lifetime && add test on output for backends~~ G-API: fix OCL backend issue with UMat lifetime Jul 9, 2019

andrey-golubev requested a review from dmatveev July 9, 2019 14:40

dmatveev approved these changes Jul 16, 2019

View reviewed changes

andrey-golubev closed this Jul 17, 2019

andrey-golubev reopened this Jul 17, 2019

andrey-golubev added 2 commits July 17, 2019 12:55

Merge remote-tracking branch 'upstream/master' into gapi_fix_ocl_umat

d910ddc

Enable output allocation test

aa83353

andrey-golubev changed the title ~~G-API: fix OCL backend issue with UMat lifetime~~ [WIP] G-API: fix OCL backend issue with UMat lifetime Jul 17, 2019

andrey-golubev changed the title ~~[WIP] G-API: fix OCL backend issue with UMat lifetime~~ G-API: fix OCL backend issue with UMat lifetime Jul 24, 2019

Merge branch 'master' of https://github.com/opencv/opencv into gapi_f…

001a9e5

…ix_ocl_umat

alalek assigned dmatveev Jul 24, 2019

alalek merged commit b10ec8e into opencv:master Jul 24, 2019

andrey-golubev deleted the gapi_fix_ocl_umat branch July 25, 2019 09:17

Uh oh!

Conversation

andrey-golubev commented Jul 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This pullrequest

The issue can be described in a code snippet

Uh oh!

andrey-golubev commented Jul 5, 2019

Uh oh!

andrey-golubev commented Jul 8, 2019

Uh oh!

dmatveev commented Jul 8, 2019

Uh oh!

Uh oh!

dmatveev Jul 8, 2019

Choose a reason for hiding this comment

Uh oh!

dbudniko left a comment

Choose a reason for hiding this comment

Uh oh!

andrey-golubev commented Jul 9, 2019

Uh oh!

dmatveev commented Jul 11, 2019

Uh oh!

andrey-golubev commented Jul 12, 2019

Uh oh!

dmatveev commented Jul 15, 2019

Uh oh!

andrey-golubev commented Jul 15, 2019

Uh oh!

andrey-golubev commented Jul 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dmatveev commented Jul 16, 2019

Uh oh!

dmatveev left a comment

Choose a reason for hiding this comment

Uh oh!

dmatveev Jul 16, 2019

Choose a reason for hiding this comment

Uh oh!

andrey-golubev Jul 17, 2019

Choose a reason for hiding this comment

Uh oh!

dmatveev Jul 16, 2019

Choose a reason for hiding this comment

Uh oh!

andrey-golubev Jul 17, 2019

Choose a reason for hiding this comment

Uh oh!

andrey-golubev commented Jul 17, 2019

Uh oh!

andrey-golubev commented Jul 17, 2019

Uh oh!

dmatveev commented Jul 23, 2019

Uh oh!

andrey-golubev commented Jul 24, 2019

Uh oh!

andrey-golubev commented Jul 24, 2019

Uh oh!

andrey-golubev commented Jul 25, 2019

Uh oh!

dmatveev commented Jul 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

andrey-golubev commented Jul 5, 2019 •

edited

Loading

andrey-golubev commented Jul 15, 2019 •

edited

Loading