-
-
Notifications
You must be signed in to change notification settings - Fork 56.5k
add feature: cuda::Stream(cudaStream_t&&) constructor #19285
Description
I propose adding a cuda::Stream constructor that takes ownership of a cudaStream_t. Therefore, when that Stream is destructed, it also cudaStreamDestroy(cudaStream_t). This proposal is driven from the needs of a multi-threaded application that decodes frames of images using Nvidia's decoding SDKs like nvJpeg.
System information (version)
- OpenCV => 4.5.1
- Operating System / Platform => Microsoft Windows [Version 10.0.19042.685] 64-bit
- Compiler => Visual Studio 2019 Community v16.8.3
Detailed description
- Each thread needs its own CUDA stream. It can use the single CUDA default stream or
create unique CUDA streams per-thread. And as is written in OpenCV documentation,
"Please use different Stream objects for different CPU threads." - It is not possible in OpenCV to construct a
cuda::Streamfrom acudaStream_t.
There is no way tonew cuda::Stream(cudaStream_t). cuda::StreamAccessor::wrapStream(cudaStream_t stream)is not a constructor, it is a
static function. Therefore, it can not be used to construct a cuda::Stream and return
a pointer to it. It can not be used in shared_ptr, not used in concurrency::combinable, etc.- It is unreasonable to create a shared_ptr, that has a lambda with a custom deleter,
that has a custom struct created with new, that holds the Stream returned from wrapStream(), etc. This is not a good approach.
Therefore, I propose a new constructor for cuda::Stream. It is a small and focused new
feature, easy to use in code, and meets all my needs. Use of C++11 move semantics makes
it clear that ownership of the cudaStream_t is passed to the cuda::Stream. And when the
Stream is destructed, it cleanly deallocates the cudaStream_t using already existing code.
Personally, the constructor would accept a cudaStream_t type. However, it is
abundantly clear that the authors of the cuda module do not want to include the needed
CUDA headers in that module. Therefore, the constructor accepts a void*. This also
aligns with the return of Stream::cudaPtr().
I have a PR ready and will submit it after this issue is written. It includes the new
constructor, expands the signature of the Stream::Impl as is needed, includes a test,
and documentation.
Proposed signature and usage
Imagine an app that has a pool of threads. Perhaps it is a pipeline, flow graph, or other mechanism that does not manually create a thread with std::thread(func()). Threads and created, destroyed, increased, reduced automatically by a managed system. Yet every thread needs to initialize some thread local storage/data like a CUDA stream. Now there is a need to make this into a class, use TLS tools like concurrency::combinable to only initialize once, etc. And further needing a Stream constructor to support this.
cv::cuda::Stream::Stream(void*&& cudaStream);
cv::cuda::Stream threadInitLocalData() {
cudaStream_t stream;
if (cudaStreamCreateWithFlags(&stream, cudaStreamNonBlocking) != cudaSuccess)
throw std::runtime_error("Can not create CUDA stream");
return cv::cuda::Stream(std::move((void*)stream));
}Issue submission checklist
- I report the issue, it's not a question
- I checked the problem with documentation, FAQ, open issues,
answers.opencv.org, Stack Overflow, etc and have not found solution - I updated to latest OpenCV version and the issue is still there
- There is reproducer code and related data files: videos, images, onnx, etc