Skip to content

G-API: fix OCL backend issue with UMat lifetime#14985

Merged
alalek merged 6 commits intoopencv:masterfrom
andrey-golubev:gapi_fix_ocl_umat
Jul 24, 2019
Merged

G-API: fix OCL backend issue with UMat lifetime#14985
alalek merged 6 commits intoopencv:masterfrom
andrey-golubev:gapi_fix_ocl_umat

Conversation

@andrey-golubev
Copy link
Copy Markdown
Member

@andrey-golubev andrey-golubev commented Jul 5, 2019

This pullrequest

  • Fixes issue with cv::UMat lifetime in OCL backend: output may "outlive" the cv::Mat parent (this issue is reproduced locally with the newly added test)
  • Adds test checking how G-API backends handle re-initialized output

The issue can be described in a code snippet

// assume we have some graph declared
cv::GComputation c(in, out);

// now we want to execute the computation
cv::Mat input_mat = getInput();
cv::Mat output_mat = cv::Mat(sz, type);
c.apply(input_mat, output_mat, cv::compile_args(cv::gapi::core::gpu::kernels())); // (1)

// now we want to execute the computation but our output_mat is changed by us (the user)
show(output_mat);  // assume output is used somewhere else e.g. shown on screen
output_mat = cv::Mat(sz, type); // (2)
c.apply(input_mat, output_mat, cv::compile_args(cv::gapi::core::gpu::kernels())); // (3)
// <segmentation fault>

let's investigate:

  • step (1) successfully executes the graph and result is written into output_mat.
    Internally, (since we use OpenCL backend) output data in form of UMat is still held by the backend. Note: we construct UMat inside the backend

  • step (2) we reset the output buffer, now remember that at step (1) there's a lonely UMat somewhere inside the G-API backend. Now what we have in this lonely UMat is a dangling pointer to the original data (UMatData::origdata) that still points to already re-allocated Mat::data. What's more: this original data pointer is also "captured" by OCL runtime since we explicitly set this UMatData::origdata as host_ptr to clCreateBuffer()

  • at step (3) comes the point of no return: we need to store newly arrived output data into internal container inside OpenCL backend to execute our graph and write result to that buffer. We do it via assignment operation and previously created UMat (that lonely guy from step (2)) is queued for destruction: but its host_ptr is dangling so when we do clReleaseMemObject() the UB starts. When we request new UMat from new output buffer (assuming this happens after "invalid" clReleaseMemObject()) there's a segmentation fault in clCreateBuffer() - this is the next call to OCL runtime in my case so it fails

Bottom line: we get segmentation fault inside OpenCL due to the following:

// <on allocation>:
// we set u->origdata as host_ptr
handle = clCreateBuffer(ctx_handle, CL_MEM_USE_HOST_PTR|createFlags,
    u->size, u->origdata, &retval);
u->handle = handle;
...
// between allocation and deallocation u->origdata becomes dangling
// because original cv::Mat is reinitialized
...
// <on deallocation>:
// we release internal OCL stuff and (very likely) access _dangling_ u->origdata <=> host_ptr
cl_int retval = clReleaseMemObject((cl_mem)u->handle); // the handle is the same

I am not sure exactly what happens inside the OpenCL but I see that void* data = clEnqueueMapBuffer(q, (cl_mem)u->handle, CL_TRUE, (CL_MAP_READ | CL_MAP_WRITE), 0, u->size, 0, 0, 0, &retval); returns the same (already dangling) pointer (there's explicit CV_Assert(u->origdata == data)).

So what we get after step (2) is the dangling pointer inside OpenCL and we explicitly ask OpenCL to clear the data by that pointer using the cl_mem handle => undefined behavior


I have the following configuration with which I am able to reproduce the issue:
(using Ubuntu 16.04.1 kernel 4.15.0-54-generic x86_64 GNU/Linux)

CTEST_FULL_OUTPUT
OpenCV version: 4.1.1-pre
OpenCV VCS version: 4.1.0-530-gae8d598-dirty
Build type: Release
Compiler: /usr/bin/c++  (ver 5.5.0)
Parallel framework: tbb
CPU features: SSE SSE2 SSE3 *SSE4.1 *SSE4.2 *FP16 *AVX *AVX2
Intel(R) IPP version: ippIP AVX2 (l9) 2019.0.0 Gold (-) Jul 24 2018
OpenCL Platforms: 
    Intel(R) OpenCL HD Graphics
        iGPU: Intel(R) Gen9 HD Graphics NEO (OpenCL 2.1 NEO )
Current OpenCL device: 
    Type = iGPU
    Name = Intel(R) Gen9 HD Graphics NEO
    Version = OpenCL 2.1 NEO 
    Driver version = 18.24.10921
    Address bits = 64
    Compute units = 24
    Max work group size = 256
    Local memory size = 64 KB
    Max memory allocation size = 3 GB 1023 MB 1016 KB
    Double support = Yes
    Host unified memory = Yes
    Device extensions:
        cl_khr_3d_image_writes
        cl_khr_byte_addressable_store
        cl_khr_fp16
        cl_khr_depth_images
        cl_khr_global_int32_base_atomics
        cl_khr_global_int32_extended_atomics
        cl_khr_icd
        cl_khr_image2d_from_buffer
        cl_khr_local_int32_base_atomics
        cl_khr_local_int32_extended_atomics
        cl_intel_subgroups
        cl_intel_required_subgroup_size
        cl_intel_subgroups_short
        cl_khr_spir
        cl_intel_accelerator
        cl_intel_media_block_io
        cl_intel_driver_diagnostics
        cl_intel_device_side_avc_motion_estimation
        cl_khr_priority_hints
        cl_khr_throttle_hints
        cl_khr_create_command_queue
        cl_khr_fp64
        cl_khr_subgroups
        cl_khr_il_program
        cl_khr_mipmap_image
        cl_khr_mipmap_image_writes
        cl_intel_planar_yuv
        cl_intel_packed_yuv
        cl_intel_motion_estimation
        cl_intel_advanced_motion_estimation
        cl_intel_va_api_media_sharing
    Has AMD Blas = No
    Has AMD Fft = No
    Preferred vector width char = 16
    Preferred vector width short = 8
    Preferred vector width int = 4
    Preferred vector width long = 1
    Preferred vector width float = 1
    Preferred vector width double = 1

Add tests on initialized/uninitialized outputs for all
backends
@andrey-golubev
Copy link
Copy Markdown
Member Author

@dmatveev @dbudniko @rgarnov @anton-potapov please review

@andrey-golubev andrey-golubev requested review from dmatveev and rgarnov and removed request for rgarnov July 8, 2019 06:32
@andrey-golubev andrey-golubev changed the title G-API: fix OCL backend issue with UMat lifetime && add test on output for backends [WIP] G-API: fix OCL backend issue with UMat lifetime && add test on output for backends Jul 8, 2019
@andrey-golubev
Copy link
Copy Markdown
Member Author

Tagged "WIP": need to investigate whether clean up procedure is valid

@andrey-golubev andrey-golubev changed the title [WIP] G-API: fix OCL backend issue with UMat lifetime && add test on output for backends G-API: fix OCL backend issue with UMat lifetime && add test on output for backends Jul 8, 2019
@andrey-golubev andrey-golubev requested a review from rgarnov July 8, 2019 08:21
@dmatveev
Copy link
Copy Markdown
Contributor

dmatveev commented Jul 8, 2019

Please ask @dbudniko, @rgarnov , and maybe @alalek to review the bug/fix.

cv::Size(250, 320), // out size < in size
cv::Size(1920, 1080), // out size > in size
cv::Size(640, 400), // out width = in width
cv::Size(10, 480) // out height = in height
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing the above contracts by passing a cartesian product of combinations is not the right way to go. Every test should be expessed separately to clearly show what it means.

Copy link
Copy Markdown

@dbudniko dbudniko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix tests according to @dmatveev review.

@andrey-golubev
Copy link
Copy Markdown
Member Author

@dmatveev let's make a proper test for output in a different PR, from your feedback it looks like quite a lot of code that can be better reviewed separately. I reduced this test to only reproduce the issue and it takes ~300 ms for this "test suite" to complete (all backends), so I guess we're safe to add it "as is" as an additional coverage

@andrey-golubev andrey-golubev changed the title G-API: fix OCL backend issue with UMat lifetime && add test on output for backends G-API: fix OCL backend issue with UMat lifetime Jul 9, 2019
@andrey-golubev andrey-golubev requested a review from dmatveev July 9, 2019 14:40
@dmatveev
Copy link
Copy Markdown
Contributor

I see some bug fixed here but can't find any description on that.
What is the root cause of the problem, when it is reproduced, and how it was fixed.

Please add it here.

@andrey-golubev
Copy link
Copy Markdown
Member Author

@dmatveev added the issue to the description

@dmatveev
Copy link
Copy Markdown
Contributor

@andrey-golubev thanks! the description is now really nice & detailed.

at (3) since we do c.apply(...) for the second time and input stays unchanged, the backend is not "re-created" (destroyed and constructed).
Then comes the point of no return: we need store newly arrived output data into internal container inside OpenCL backend to execute our graph and write result to that buffer. But, when we request new UMat from new output buffer, there's a segmentation fault in clCreateBuffer(...)

So according to this text, we do not pass the existing dangling UMat to OCL for execution but instead create a new one for the (changed) output buffer? So why it crashes then?

@andrey-golubev
Copy link
Copy Markdown
Member Author

@dmatveev I think step (3) is misleading: will try to re-phrase.

We have this dangling pointer implicitly already in OCL (at step (2)) because we set it to OCL runtime with clCreateBuffer(..., CL_MEM_USE_HOST_PTR, ...) on first call to c.apply().

Now on the second call, before creating new UMat, we release the previous one: as we release it with cl_mem handle that "remembered" the original pointer (which is now dangling), there's something very bad happening inside OCL when we request clReleaseMemObject(handle) (UB so to say): and on next call to OCL we get a segmentation fault (which happened to be clCreateBuffer in my case)

@andrey-golubev
Copy link
Copy Markdown
Member Author

andrey-golubev commented Jul 15, 2019

Maybe @alalek can add up to my explanation

@dmatveev
Copy link
Copy Markdown
Contributor

Now on the second call, before creating new UMat, we release the previous one: as we release it with cl_mem handle that "remembered" the original pointer (which is now dangling), there's something very bad happening inside OCL when we request clReleaseMemObject(handle) (UB so to say): and on next call to OCL we get a segmentation fault (which happened to be clCreateBuffer in my case)

Thanks, now it's clear. So if UMat/Mat relationship is designed this way, it seems the only proper thing to do is to drop our input/output UMat objects after execution?

Copy link
Copy Markdown
Contributor

@dmatveev dmatveev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did u measure the performance on master against this? How bad is this now? Can u share a diff please? (there's an OpenCV script for that)

// For example, if the original output (cv::Mat) is re-initialized by the user but we still
// hold cv::UMat -> we get cv::UMat that has a parent that was already destroyed. Also,
// since we don't own the data (the user does), there's no point holding it after we're done
const auto clean_up = [&input_objs, &output_objs] (cv::gimpl::Mag* p)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

y * not &?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

due to std::unique_ptr usage below: pointer to object is expected as an input/output parameter

for (auto& it : output_objs) umats.erase(it.first.id);
};
// RAII wrapper to clean-up m_res
std::unique_ptr<cv::gimpl::Mag, decltype(clean_up)> cleaner(&m_res, clean_up);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jez c y. thx!

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand what you mean here. Figured out that std::unique_ptr with custom deleter is less boilerplate than custom Mag wrapper class

@andrey-golubev
Copy link
Copy Markdown
Member Author

@dmatveev answered you below:

Thanks, now it's clear. So if UMat/Mat relationship is designed this way, it seems the only proper thing to do is to drop our input/output UMat objects after execution?

Since we create UMats for input/output internally and in place, yes, I think right now it's the only proper option (the other would be to allow user to pass UMats as input/output but it's a different story).
As a side note: we unconditionally create input/output UMats each time in run() when doing bindInArgs/bindOutArgs so the current logic clearly states that input/output UMats are only required for single run() execution

As for the perf. measurements: didn't check yet, will do and share

@andrey-golubev andrey-golubev changed the title G-API: fix OCL backend issue with UMat lifetime [WIP] G-API: fix OCL backend issue with UMat lifetime Jul 17, 2019
@andrey-golubev
Copy link
Copy Markdown
Member Author

Marked as "WIP": please do not merge yet. I have one more rabbit hole to go down

@dmatveev
Copy link
Copy Markdown
Contributor

Didn't we agree to day to merge this stuff?
Please file a task to understand further performance implications & attach this performance data there.

Thanks! 👍

@andrey-golubev andrey-golubev changed the title [WIP] G-API: fix OCL backend issue with UMat lifetime G-API: fix OCL backend issue with UMat lifetime Jul 24, 2019
@andrey-golubev
Copy link
Copy Markdown
Member Author

@dmatveev indeed we agreed. Will add a task to our tracking system. Going to check that everything works with latest master and we can merge after that

@andrey-golubev
Copy link
Copy Markdown
Member Author

@alalek please feel free to merge

@alalek alalek merged commit b10ec8e into opencv:master Jul 24, 2019
@andrey-golubev andrey-golubev deleted the gapi_fix_ocl_umat branch July 25, 2019 09:17
@andrey-golubev
Copy link
Copy Markdown
Member Author

Thanks!

1 similar comment
@dmatveev
Copy link
Copy Markdown
Contributor

Thanks!

dvd42 pushed a commit to dvd42/opencv that referenced this pull request Aug 6, 2019
* G-API: fix GOCLExecutable issue with UMat lifetime

Add tests on initialized/uninitialized outputs for all
backends

* Use proper clean-up procedure for magazine

* Rename InitOut test and reduce tested sizes

* Enable output allocation test
a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023
* G-API: fix GOCLExecutable issue with UMat lifetime

Add tests on initialized/uninitialized outputs for all
backends

* Use proper clean-up procedure for magazine

* Rename InitOut test and reduce tested sizes

* Enable output allocation test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants