-
-
Notifications
You must be signed in to change notification settings - Fork 56.5k
Description
When UMat objects are created, they increment the refcount on the ocl::Context that was bind() at the time of their creation. Unfortunately, they:
- do not decrement that refcount when the UMat go out of scope
- do not decrement when a different Context is bound via
OpenCLExecutionContext.bind()
This results in the original Context::Impl before the bind() to have many refcounts and not destroy itself yet there is nothing logically that needs it anymore (since everything is out of scope and a new Context was bind()).
Also, all the UMats (or their underlying pool of GPU-based data objects) that were created before the bind are still associated with the original GPU; retaining resources there and potentially leading to faults if those UMat data objects are used against the newly bound GPU.
All this found while debug tracing for #18906
System information (version)
- OpenCV => 4.5.0
- Operating System / Platform => Windows 10 64-Bit
- Compiler => Visual Studio Community 2019 v16.8.2
Steps to reproduce
- Setup, compile the master branch (4.5.0 tag is fine) as a Debug build
- Setup your debugger to run
opencv_test_cored.exe --gtest_filter=OCL_OpenCL* - Set breakpoint at the following line
opencv/modules/core/test/test_opencl.cpp
Line 113 in 85b0fb2
| OpenCLExecutionContext ctx = OpenCLExecutionContext::getCurrent(); |
- Start debugging
- When it stops at the above line 113, set a new debug break at the following line (or step the debugger several functions deep until you reach this line)
opencv/modules/core/src/ocl.cpp
Line 2413 in 85b0fb2
| if (configuration.empty() && !container.empty()) |
- Look at the var
container, its size, and the Context::Impl that are stored within it.
Result
Container has 1 item. It is the default ExecutionContext created on line 29 of the test case file.
That one Impl has 5 ref counts.
Expected
Container has 1 item. And that 1 item has 1 ref count.
Review the test case code itself between that line and line 113. All objects are out of scope. The only thing that is in scope and needs this Context::Impl is the global cv::ocl system due to it being bind()
Notes
I suspect these additional 4 refcounts are due to the UMat. The ocl::Context variable that is created in some scopes correctly decrements the refcount when it is destroyed at the scope's end. I also suspect it is the UMat because when I remove calls to the executeUMatCall(), I have the refcount I would expect.
If this behavior is desired, then I caution all use of UMat and OpenCLExecutionContext in test cases. Why? Because this behavior makes earlier test cases change the state of both individual Context's and the global Context collection -- and these changes persist across test cases. Therefore, it may surface issues (or hide issues) that the test cases are trying to test.
I am also concerned about the integrity of UMats before/after a bind() that change GPU device. UMat's before the bind() will still be associated with the old GPU. New UMats, and all OCL-enabled functions will expect only the new GPU. So what happens when:
- create a OpenCLExecutionContext on a default GPU and bind() it. Or let this happen automatically
- Create some UMat
- create new OpenCLExecutionContext on a different GPU and bind() it
- call a function like
cv::remap()with old UMats, or a mix of old+new UMat
This area of concern is perhaps supported by something @alalek wrote
Unfortunately UMat doesn't support context sharing/migration at all. It is assumed that UMat is created, used and destroyed with the same active OpenCL context. UMat requires redesign to support multiple OpenCL contexts.
Issue submission checklist
- I report the issue, it's not a question
- I checked the problem with documentation, FAQ, open issues,
answers.opencv.org, Stack Overflow, etc and have not found solution - I updated to latest OpenCV version and the issue is still there
- There is reproducer code and related data files: videos, images, onnx, etc