Conv1D and Pool1D for CUDA backend#19058
Conversation
|
@YashasSamaga Can you please also review these changes? I also plan to extend performance tests for the convolutions, could you advise me about it? |
| #endif | ||
| if (inputs[0].dims == 3) | ||
| { | ||
| //Pool1D |
There was a problem hiding this comment.
modules/dnn/src/layers/convolution_layer.cpp
Pool1D
Pool?
It is very unclear how condition correlates with the comment.
Looks like there are several assumptions which are just not mentioned here (they should be part of condition to have robust code)
| if (vector.size() > 1)\ | ||
| vector.pop_back(); | ||
| REMOVE_EXTRA_ELEM(kernel_size); | ||
| REMOVE_EXTRA_ELEM(strides); |
There was a problem hiding this comment.
Why we should drop part of information?
All these fields are initialized in ctor BaseConvolutionLayerImpl().
This finalize() method can be called multiple times.
On each finalize() call such fields should have the same state on entry, otherwise we should have code to handle different states (which is hard to test, so we must avoid modification of "initial layer state").
It is unlikely that such network would keep alive during setUp() recall with updated parameters or input reshape.
BTW, finalize() is not the last stage of DNN "pipeline" before forward*().
There was a problem hiding this comment.
Why we should drop part of information?
Because it was unnecessarily added in
This code in
layers_common.cpp is reached only when we load Caffe/Darknet networks or create OpenCV layers directly (like in test_layers.cpp)
These changes in convolution_layer.cpp and pooling_layer.cpp will work only once and will delete unnecessary elements from these vectors
There was a problem hiding this comment.
But does it works for Conv1D on CPU?
There was a problem hiding this comment.
But does it works for Conv1D on CPU?
Yes, it works, but for CUDA I need these changes because of checks like this
There was a problem hiding this comment.
So maybe we can move this closer to CUDA related part?
There was a problem hiding this comment.
Dilations are defined for the spatial dimensions only. So logically, the number of dilation parameters must be equal to the number of spatial dimensions (which is equal to the convolution order). I think this assertion should stay as it is correct and this problem should be dealt elsewhere.
I think that resize in layers_common.hpp is unnecessary (and is probably a bug). Somebody might have added it when OpenCV DNN supported 2D operations only (and it wasn't updated when 3D operations were introduced). It's presumably there for convenience so that user can specify a single value for all spatial dimensions (same padding for all dimensions for example). But this is wrong for convolutions of different order. It's wrong for 1D as you needed to patch it using REMOVE_EXTRA_ELEM. It's also wrong for 3D. What if you needed 3D convolution and provided a single value for padding? Resizing to two might tip off some assertions.
If fixing that isn't possible, I think initCUDA is a more appropriate place to do this. The whole REMOVE_EXTRA_ELEM is an attempt to make parameters stored in BaseConvolutionalLayer compatible with the constructor of cuda4dnn::ConvoltuionOp. Hence, this code should be at the site where the BaseConvolutionalLayer parameters are transferred to ConvolutionOp (which is initCUDA).
There was a problem hiding this comment.
Note that preprocessor symbols do not have scope. The REMOVE_EXTRA_ELEM spills to the global namespace. I think you forgot an #undef REMOVE_EXTRA_ELEM. A lambda function is more appropriate in this case (in my opinion).
There was a problem hiding this comment.
This is a bug (tested on master dd1494e but likely affects 3.4 too). It was introduced in the 3D convolution PR (#14301). The intention was to broadcast parameters to all spatial dimensions if only one value was provided (the intent is mentioned in the review comments). The number of spatial dimensions was hardcoded to two for some reason.
MCVE (for 3D convolution):
#include <opencv2/dnn.hpp>
int main ()
{
cv::dnn::Net net;
{
cv::dnn::LayerParams lp;
int ksizes[] = {3}; // expect a broadcast to {3, 3, 3}
lp.set("kernel_size", cv::dnn::DictValue::arrayInt(ksizes, std::size(ksizes)));
lp.set("num_output", 10);
lp.set("bias_term", false);
lp.type = "Convolution";
lp.name = "testConv";
int weightsShape[] = {10, 3, 3, 3, 3};
cv::Mat weights(5, &weightsShape[0], CV_32F);
randu(weights, -1.0, 1.0);
lp.blobs.push_back(weights);
net.addLayerToPrev(lp.name, lp.type, lp);
}
int inputShape[] = {1, 3, 5, 5, 5};
cv::Mat input(5, &inputShape[0], CV_32F);
randu(input, 1.0f, 2.0f);
net.setInput(input);
net.forward();
return 0;
}
I was getting an floating point exception in getMemoryShapes(presumably due to garbage float in the OOB access). So I added the following assertion:
--- a/modules/dnn/src/layers/convolution_layer.cpp
+++ b/modules/dnn/src/layers/convolution_layer.cpp
@@ -367,7 +367,10 @@ public:
if (padMode.empty())
{
for (int i = 0; i < inpShape.size(); i++)
+ {
+ CV_CheckLT(i, (int)kernel_size.size(), "OOB access in kernel size");
outShape.push_back((inpShape[i] + pads_begin[i] + pads_end[i] - dilations[i] * (kernel_size[i] - 1) - 1) / strides[i] + 1);
+ }
}Output:
terminate called after throwing an instance of 'cv::Exception'
what(): OpenCV(4.5.1-pre) /fakepath/opencv/opencv4/modules/dnn/src/layers/convolution_layer.cpp:371: error: (-2:Unspecified error) in function 'virtual bool cv::dnn::ConvolutionLayerImpl::getMemoryShapes(const std::vector<std::vector<int> >&, int, std::vector<std::vector<int> >&, std::vector<std::vector<int> >&) const'
> OOB access in kernel size (expected: 'i < (int)kernel_size.size()'), where
> 'i' is 2
> must be less than
> '(int)kernel_size.size()' is 2
Aborted (core dumped)
| @@ -207,14 +222,17 @@ namespace cv { namespace dnn { namespace cuda4dnn { namespace kernels { | |||
| } | |||
|
|
|||
| /* only max_pooling2d and max_pooling3d are supported */ | |||
There was a problem hiding this comment.
This comment is stale/misleading after adding support for 1D max pooling.
| std::size_t input_width = params.input_shape.back(); | ||
| params.input_shape.pop_back(); | ||
| params.input_shape.push_back(1); | ||
| params.input_shape.push_back(input_width); |
There was a problem hiding this comment.
I don't have strong opinions about this but I think doing a back(), pop_back() and then two push_backs is more confusing/convoluted than a single insert call (which also makes the intention explicit). I think this four line sequence can be written succinctly as:
auto& input_shape = params.input_shape;
input_shape.insert(std::begin(input_shape), 1);
I don't have any idea how cuDNN handles 1D convolutions. cuDNN 8 release notes hints that it supports and has heuristics for 1D convolutions. I don't know since when cuDNN started supporting 1D convolutions. Maybe cuDNN 8 can work with 3D tensors for convolution or maybe it still requires 4D tensors but checks if a 2D convolution reduces to 1D convolution in algorithm selection heuristics. I will add support for cuDNN v8 API in the coming months (currently the v7 API is used in cuDNN 8 too). I'll look into 1D convolution performance then. |
|
@YashasSamaga @alalek I have made some research and have found that I need the changes in this PR to make 1d operations work correctly on CUDA without removing To remove this code in layers_common.cpp we will need to edit quite a lot of code for Caffe, Darknet and other loaders, and also change several tests. I think these changes should be a part of another PR.
|
| if (backend == DNN_BACKEND_CUDA || backend == DNN_BACKEND_VKCOM) { | ||
| applyTestTag(CV_TEST_TAG_DNN_SKIP_CUDA); // not supported | ||
|
|
||
| applyTestTag(CV_TEST_TAG_DNN_SKIP_VULKAN); // not supported | ||
| } |
|
|
||
| if (pooling_order != 2 && pooling_order != 3) | ||
| CV_Error(Error::StsNotImplemented, "Only 2D/3D max-pooling are supported."); | ||
| if (pooling_order > 3) |
There was a problem hiding this comment.
Let's combine this condition with assert above:
| if (pooling_order > 3) | |
| if (pooling_order < 1 || pooling_order > 3) |
Conv1D and Pool1D for CUDA backend * CUDA-independent changes * Add Conv1D and Pool1D for CUDA backend * CUDA-independent changes * Fix typo * fix comment * Update fix * make changes more correct for pooling layer * Minor fixes for review * Split skip blocks
relates: #18862 #18783
Enable 1D operations for CUDA.