Skip to content

Conv1D and Pool1D for CUDA backend#19058

Merged
alalek merged 9 commits intoopencv:masterfrom
sl-sergei:cuda_1d
Jan 21, 2021
Merged

Conv1D and Pool1D for CUDA backend#19058
alalek merged 9 commits intoopencv:masterfrom
sl-sergei:cuda_1d

Conversation

@sl-sergei
Copy link
Copy Markdown
Contributor

@sl-sergei sl-sergei commented Dec 8, 2020

relates: #18862 #18783

Enable 1D operations for CUDA.

force_builders=Custom,Custom Win,Custom Mac
build_image:Custom=ubuntu-openvino-2021.1.0:20.04
build_image:Custom Win=openvino-2021.1.0
build_image:Custom Mac=openvino-2021.1.0

test_modules:Custom=dnn,python2,python3,java
test_modules:Custom Win=dnn,python2,python3,java
test_modules:Custom Mac=dnn,python2,python3,java

buildworker:Custom=linux-1
# disabled due high memory usage: test_opencl:Custom=ON
test_opencl:Custom=OFF
test_bigdata:Custom=1
test_filter:Custom=*

@sl-sergei sl-sergei requested a review from alalek December 8, 2020 18:54
@sl-sergei
Copy link
Copy Markdown
Contributor Author

@YashasSamaga Can you please also review these changes? I also plan to extend performance tests for the convolutions, could you advise me about it?

#endif
if (inputs[0].dims == 3)
{
//Pool1D
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modules/dnn/src/layers/convolution_layer.cpp
Pool1D

Pool?

It is very unclear how condition correlates with the comment.
Looks like there are several assumptions which are just not mentioned here (they should be part of condition to have robust code)

if (vector.size() > 1)\
vector.pop_back();
REMOVE_EXTRA_ELEM(kernel_size);
REMOVE_EXTRA_ELEM(strides);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we should drop part of information?
All these fields are initialized in ctor BaseConvolutionLayerImpl().
This finalize() method can be called multiple times.
On each finalize() call such fields should have the same state on entry, otherwise we should have code to handle different states (which is hard to test, so we must avoid modification of "initial layer state").
It is unlikely that such network would keep alive during setUp() recall with updated parameters or input reshape.

BTW, finalize() is not the last stage of DNN "pipeline" before forward*().

Copy link
Copy Markdown
Contributor Author

@sl-sergei sl-sergei Dec 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we should drop part of information?

Because it was unnecessarily added in

parameter.resize(2, parameter[0]);

This code in layers_common.cpp is reached only when we load Caffe/Darknet networks or create OpenCV layers directly (like in test_layers.cpp)

These changes in convolution_layer.cpp and pooling_layer.cpp will work only once and will delete unnecessary elements from these vectors

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But does it works for Conv1D on CPU?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But does it works for Conv1D on CPU?

Yes, it works, but for CUDA I need these changes because of checks like this

CV_Assert(convolution_order == dilations.size());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe we can move this closer to CUDA related part?

Copy link
Copy Markdown
Contributor

@YashasSamaga YashasSamaga Dec 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dilations are defined for the spatial dimensions only. So logically, the number of dilation parameters must be equal to the number of spatial dimensions (which is equal to the convolution order). I think this assertion should stay as it is correct and this problem should be dealt elsewhere.

I think that resize in layers_common.hpp is unnecessary (and is probably a bug). Somebody might have added it when OpenCV DNN supported 2D operations only (and it wasn't updated when 3D operations were introduced). It's presumably there for convenience so that user can specify a single value for all spatial dimensions (same padding for all dimensions for example). But this is wrong for convolutions of different order. It's wrong for 1D as you needed to patch it using REMOVE_EXTRA_ELEM. It's also wrong for 3D. What if you needed 3D convolution and provided a single value for padding? Resizing to two might tip off some assertions.

If fixing that isn't possible, I think initCUDA is a more appropriate place to do this. The whole REMOVE_EXTRA_ELEM is an attempt to make parameters stored in BaseConvolutionalLayer compatible with the constructor of cuda4dnn::ConvoltuionOp. Hence, this code should be at the site where the BaseConvolutionalLayer parameters are transferred to ConvolutionOp (which is initCUDA).

Copy link
Copy Markdown
Contributor

@YashasSamaga YashasSamaga Dec 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that preprocessor symbols do not have scope. The REMOVE_EXTRA_ELEM spills to the global namespace. I think you forgot an #undef REMOVE_EXTRA_ELEM. A lambda function is more appropriate in this case (in my opinion).

Copy link
Copy Markdown
Contributor

@YashasSamaga YashasSamaga Dec 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug (tested on master dd1494e but likely affects 3.4 too). It was introduced in the 3D convolution PR (#14301). The intention was to broadcast parameters to all spatial dimensions if only one value was provided (the intent is mentioned in the review comments). The number of spatial dimensions was hardcoded to two for some reason.

MCVE (for 3D convolution):

#include <opencv2/dnn.hpp>

int main ()
{
    cv::dnn::Net net;

    {
        cv::dnn::LayerParams lp;

        int ksizes[] = {3}; // expect a broadcast to {3, 3, 3}
        lp.set("kernel_size", cv::dnn::DictValue::arrayInt(ksizes, std::size(ksizes)));

        lp.set("num_output", 10);
        lp.set("bias_term", false);
        lp.type = "Convolution";
        lp.name = "testConv";

        int weightsShape[] = {10, 3, 3, 3, 3};
        cv::Mat weights(5, &weightsShape[0], CV_32F);
        randu(weights, -1.0, 1.0);
        lp.blobs.push_back(weights);

        net.addLayerToPrev(lp.name, lp.type, lp);
    }

    int inputShape[] = {1, 3, 5, 5, 5};
    cv::Mat input(5, &inputShape[0], CV_32F);
    randu(input, 1.0f, 2.0f);

    net.setInput(input);
    net.forward();
    return 0;
}

I was getting an floating point exception in getMemoryShapes(presumably due to garbage float in the OOB access). So I added the following assertion:

--- a/modules/dnn/src/layers/convolution_layer.cpp
+++ b/modules/dnn/src/layers/convolution_layer.cpp
@@ -367,7 +367,10 @@ public:
         if (padMode.empty())
         {
             for (int i = 0; i < inpShape.size(); i++)
+            {
+                CV_CheckLT(i, (int)kernel_size.size(), "OOB access in kernel size");
                 outShape.push_back((inpShape[i] + pads_begin[i] + pads_end[i] - dilations[i] * (kernel_size[i] - 1) - 1) / strides[i] + 1);
+            }
         }

Output:

terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(4.5.1-pre) /fakepath/opencv/opencv4/modules/dnn/src/layers/convolution_layer.cpp:371: error: (-2:Unspecified error) in function 'virtual bool cv::dnn::ConvolutionLayerImpl::getMemoryShapes(const std::vector<std::vector<int> >&, int, std::vector<std::vector<int> >&, std::vector<std::vector<int> >&) const'
> OOB access in kernel size (expected: 'i < (int)kernel_size.size()'), where
>     'i' is 2
> must be less than
>     '(int)kernel_size.size()' is 2

Aborted (core dumped)

@@ -207,14 +222,17 @@ namespace cv { namespace dnn { namespace cuda4dnn { namespace kernels {
}

/* only max_pooling2d and max_pooling3d are supported */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is stale/misleading after adding support for 1D max pooling.

std::size_t input_width = params.input_shape.back();
params.input_shape.pop_back();
params.input_shape.push_back(1);
params.input_shape.push_back(input_width);
Copy link
Copy Markdown
Contributor

@YashasSamaga YashasSamaga Dec 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong opinions about this but I think doing a back(), pop_back() and then two push_backs is more confusing/convoluted than a single insert call (which also makes the intention explicit). I think this four line sequence can be written succinctly as:

auto& input_shape = params.input_shape;
input_shape.insert(std::begin(input_shape), 1);

@YashasSamaga
Copy link
Copy Markdown
Contributor

YashasSamaga commented Dec 12, 2020

I also plan to extend performance tests for the convolutions, could you advise me about it?

I don't have any idea how cuDNN handles 1D convolutions. cuDNN 8 release notes hints that it supports and has heuristics for 1D convolutions. I don't know since when cuDNN started supporting 1D convolutions. Maybe cuDNN 8 can work with 3D tensors for convolution or maybe it still requires 4D tensors but checks if a 2D convolution reduces to 1D convolution in algorithm selection heuristics.

I will add support for cuDNN v8 API in the coming months (currently the v7 API is used in cuDNN 8 too). I'll look into 1D convolution performance then.

@sl-sergei
Copy link
Copy Markdown
Contributor Author

@YashasSamaga @alalek I have made some research and have found that I need the changes in this PR to make 1d operations work correctly on CUDA without removing

parameter.resize(2, parameter[0]);

To remove this code in layers_common.cpp we will need to edit quite a lot of code for Caffe, Darknet and other loaders, and also change several tests. I think these changes should be a part of another PR.

@asmorkalov asmorkalov requested a review from alalek January 18, 2021 11:36
Comment on lines +123 to +126
if (backend == DNN_BACKEND_CUDA || backend == DNN_BACKEND_VKCOM) {
applyTestTag(CV_TEST_TAG_DNN_SKIP_CUDA); // not supported

applyTestTag(CV_TEST_TAG_DNN_SKIP_VULKAN); // not supported
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not merge "skip" blocks.


if (pooling_order != 2 && pooling_order != 3)
CV_Error(Error::StsNotImplemented, "Only 2D/3D max-pooling are supported.");
if (pooling_order > 3)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's combine this condition with assert above:

CV_Assert(pooling_order >= 1);

Suggested change
if (pooling_order > 3)
if (pooling_order < 1 || pooling_order > 3)

Copy link
Copy Markdown
Contributor

@l-bat l-bat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 👍

@alalek alalek merged commit ea41f89 into opencv:master Jan 21, 2021
@alalek alalek mentioned this pull request Apr 9, 2021
a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023
Conv1D and Pool1D for CUDA backend

* CUDA-independent changes

* Add Conv1D and Pool1D for CUDA backend

* CUDA-independent changes

* Fix typo

* fix comment

* Update fix

* make changes more correct for pooling layer

* Minor fixes for review

* Split skip blocks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: dnn category: gpu/cuda (contrib) OpenCV 4.0+: moved to opencv_contrib

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants