Minimal implementation of GpuMatND#19259
Conversation
54ad47d to
d4862b6
Compare
675aece to
0e2b741
Compare
|
Changed the behavior of:
These functions now return Previously these created a header for
Therefore, it seems reasonable to add these two methods for creating a
|
alalek
left a comment
There was a problem hiding this comment.
Thank you for working on this!
Please take a look on the comments below.
| element of the submatrix, and it can be different from data_->data. | ||
| If this is not a submatrix, then data is always equal to data_->data. | ||
| */ | ||
| uchar* data; |
There was a problem hiding this comment.
Perhaps size_t offset should be more suitable.
There was a problem hiding this comment.
uchar* data is private now along with a new member size_t offset. I have also added a public member function getDevicePtr() to get the first byte.
| DevicePtr(DevicePtr&&) = delete; | ||
| DevicePtr& operator=(DevicePtr&&) = delete; | ||
|
|
||
| uchar* data; |
There was a problem hiding this comment.
DevicePtr
It make sense to move this class on upper level, make it more generic, and change its name:
GpuData(likeUMatData)- or
GpuBuffer - or
GpuDataContainer
In the future, we can reuse/share this "container" with existed GpuMat.
Please add size_t size field (size of allocated buffer in bytes) to perform accurate validation checks.
There was a problem hiding this comment.
GpuData is a global struct now with a new member size_t size.
asmorkalov
left a comment
There was a problem hiding this comment.
Looks good to me in general. I added some comments for testing code in contrib. Manual test passed on Ubuntu 18.04 with NVIDIA GeForce 1080ti.
| @param _step Array of _size.size()-1 steps in case of a multi-dimensional array (the last step is always | ||
| set to the element size). If not specified, the matrix is assumed to be continuous. | ||
| */ | ||
| GpuMatND(SizeArray _size, int _type, void* _data, StepArray _step = StepArray()); |
There was a problem hiding this comment.
@asmorkalov Does this ctor has some intersection with similar cv::Mat ctor?
If so, then it make sense to "wrap" (or move) cv::Mat instead of void* (perhaps out of the scope of this PR)
There was a problem hiding this comment.
_data is GPU memory, is not it?
There was a problem hiding this comment.
upload call handles the case with regular cv::Mat
There was a problem hiding this comment.
Main point here is to use named static function instead of constructor (to avoid confusions):
static GpuMatND wrapMemoryPtrGPU(...);
| @param _type Array type. Use CV_8UC1, ..., CV_16FC4 to create 1-4 channel matrices, or | ||
| CV_8UC(n), ..., CV_64FC(n) to create multi-channel (up to CV_CN_MAX channels) matrices. | ||
| */ | ||
| GpuMatND(SizeArray _size, int _type); |
There was a problem hiding this comment.
_size
_type
No need to use underscores in declarations without implementation code. Lets keep docs and bindings clear
There was a problem hiding this comment.
I'll do that in the next commit.
| using StepArray = std::vector<size_t>; | ||
| using IndexArray = std::vector<int>; | ||
|
|
||
| ~GpuMatND() = default; |
There was a problem hiding this comment.
= default
It makes sense to have implementation (empty) in .cpp file for that.
There are several non-trivial fields.
There was a problem hiding this comment.
I'll move the definition to the .cpp file to hide the implementation.
| GpuMatND GpuMatND::clone() const | ||
| { | ||
| CV_DbgAssert(!empty()); |
There was a problem hiding this comment.
Why is not just return clone(Stream::Null());?
(reduce duplicated code, DRY principle)
There was a problem hiding this comment.
The one with the empty parameter list calls synchronous CUDA APIs: cudaMemcpy and cudaMemcpy2D, whereas the overloaded function with one argument calls asynchronous CUDA APIs: cudaMemcpyAsync and cudaMemcpyAsync2D.
GpuMat::upload() and GpuMat::download() also have overloades for asynchronous CUDA API calls.
However, I agree that we should reduce duplicated code. So I suggest we need to keep the two overloads while reducing duplication as much as possible.
There was a problem hiding this comment.
I have fixed it, but please look at this comment below.
| ///////////////////////////////////////////////////// | ||
| /// clone | ||
|
|
||
| static bool next(uchar*& d, const uchar*& s, std::vector<int>& idx, const int dims, const GpuMatND& dst, const GpuMatND& src) |
There was a problem hiding this comment.
This function signature does not look so good, but I'd like to keep it as is for now. Making an internal iterator class that iterates each 2d plane seems a better design. I'll make that on the next PR when I implement the convertTo, copyTo, and setTo member functions of GpuMatND. A straightforward implementation would be to iterator each 2d plane of GpuMatND and apply GpuMat counterparts. The iterator class would also fit well for this.
| do | ||
| { | ||
| CV_CUDEV_SAFE_CALL( | ||
| cudaMemcpy2D( | ||
| d, ret.step[dims-2], s, step[dims-2], | ||
| size[dims-1]*step[dims-1], size[dims-2], cudaMemcpyDeviceToDevice) | ||
| ); | ||
| } | ||
| while (next(d, s, idx, dims, ret, *this)); |
There was a problem hiding this comment.
Did you check performance of this loop?
If this a synchronized call, then the sequence of several synchronized calls may show bad performance.
I believe this scheme may perform better:
- schedule async tasks
- wait for completion
(code block behaves as synchronized)
There was a problem hiding this comment.
I have changed cudaMemcpy2D to cudaMemcpy2DAsync and added cudaStreamSynchronize(0) to wait for completion.
Minimal implementation of GpuMatND * GpuMatND - minimal implementation * GpuMatND - createGpuMatHeader * GpuMatND - GpuData, offset, getDevicePtr(), license * reviews * reviews
make cuda::GpuMatND compatible with InputArray/OutputArray #23913 continuation of [PR#19259](#19259) Make cuda::GpuMatND wrappable in InputArray/OutputArray The goal for now is just wrapping, some functions are not supported (InputArray::size(), InputArray::convertTo(), InputArray::assign()...) No new feature for cuda::GpuMatND - [X] I agree to contribute to the project under Apache 2 License. - [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [X] The PR is proposed to the proper branch - [X] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
make cuda::GpuMatND compatible with InputArray/OutputArray opencv#23913 continuation of [PR#19259](opencv#19259) Make cuda::GpuMatND wrappable in InputArray/OutputArray The goal for now is just wrapping, some functions are not supported (InputArray::size(), InputArray::convertTo(), InputArray::assign()...) No new feature for cuda::GpuMatND - [X] I agree to contribute to the project under Apache 2 License. - [X] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [X] The PR is proposed to the proper branch - [X] There is a reference to the original bug report and related work - [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [ ] The feature is well documented and sample code can be built with the project CMake
For the first step to resolving #15897 and #16433, this PR tries to implement a minimal set of functions and entities that a
GpuMatNDclass should have.This PR includes:
clone()for deep copying, with the resulting array being always continuousMatGpuMatTests are in opencv/opencv_contrib#2805
The following leak check was performed with the test.
I hope this PR could be accepted as a minimal implementation.
Further things that should be added include:
GpuMatNDto the list of classes handled by the proxy classes(InputArrayandOutputArray)cuda::StreamcopyTo,convertTo,setTo, as inGpuMatI'm guessing that these can be made as a separate PR.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.