Skip to content

DNN module with OpenCL needs clarification on context, queue, and SVM support #20583

@diablodale

Description

@diablodale

Another issue while updating the OpenCL code in OpenCV...the DNN module does a few unsafe things related to OpenCL implementation internals. The DNN module should instead use better supported APIs so the OpenCL implementation is separated from modules that use it. This is the overall design approach of OpenCV with Impl.

These few issues only need clarification and agreement so I can move foward with the code. The changes to code will be small and enable DNN module to be safe for multi-threaded use with OpenCL context switching, and SVM memory.

System information (version)

  • OpenCV => 4.x
  • Operating System / Platform => all
  • Compiler => all

Detailed description

In general it is unsafe to assume a consistent OpenCL execution context (cl_context + cl_command_queue). Failures in adequate multi-threading in the module code and fragility of OpenCV's thread will lead to failure that I've exposed in my testing. In addition, the ability to change contexts mean that device memory being used needs to have adequate association to its execution context so that later use (e.g. submatrix, kernel runs, etc.) will work.

OCL4DNNConvSpatial<float>::CreateSubBuffer and the related ocl::convertFromBuffer are problematic and need clarification.

CreateSubBuffer() ...Can it instead use the well supported roi constructure UMat (const UMat &m, const Range *ranges)?
If yes, then this whole function is unneeded and I'll change the calling code to use that constructor.
If no, what is the reason?

CreateSubBuffer() is not thread-safe for ExecContext, and does not support SVM memory. One cannot create a subbuffer from SVM memory in the same way. Instead, SVM memory is simply an address in memory directly accessible by the CPU and one can move the pointers as one wishes. Or more easily, one can use the standard roi constructure UMat (const UMat &m, const Range *ranges) to create a submatrix and return that. This latter approach is easy and also why I prefer this function to universally use it.

ocl::convertFromBuffer() has an issue of ownership. When CreateSubBuffer() calls it, no ownership is also sent. Meaning, when ocl::convertFromBuffer() creates a new UMatData to contain the subbuffer...who owns that? In what ExecutionContext and/or Queue was this created? This is needed so that later use of this sub-UMat will correctly have its ExecutionContext (cl_context + cl_command_queue) for kernels, deallocation, etc.

Recommendation

If possible I want the DNN module to use the roi UMat constructor.

Otherwise, I could derive the cl_context by calling OpenCL native apis. I could CV_DbgAssert() to ensure it is the same as the current ExecContext::Context. And I could make an assumption to use the current Queue via the current ExecContext. Best case...this is exactly what the running code intended. Worse case...the assert fails and reports the problem to the running code -- earlier and more exact that some later api trying to use invalid data. All this is another reason I prefer to use the roi UMat constructor. This is all handled automatically.

In all of OpenCV itself and the contrib modules, ocl::convertFromBuffer() is only used in this one DNN location and one location in samples. It is a public api so we can't get rid of it. Therefore, I still need to fix ocl::convertFromBuffer() for those unknown users of it. I recommend the context derive, assert and queue derive that I describe directly above.

Repro steps

Run standard DNN accuracy test suite DNNTestNetwork.AlexNet with OpenCL and SVM memory enabled.
It will quickly assert due to invalid OpenCL buffers in subbuffer creations and/or deallocation.

Issue submission checklist
  • I report the issue, it's not a question
  • I checked the problem with documentation, FAQ, open issues,
    forum.opencv.org, Stack Overflow, etc and have not found solution
  • I updated to latest OpenCV version and the issue is still there
  • There is reproducer code and related data files: videos, images, onnx, etc

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions