Add padding='same' mode to conv{1,2,3}d, nn.Conv{1,2,3}d#42190
Add padding='same' mode to conv{1,2,3}d, nn.Conv{1,2,3}d#42190peterbell10 wants to merge 3 commits intopytorch:masterfrom
Conversation
💊 CI failures summary and remediationsAs of commit 28ff463 (more details on the Dr. CI page):
🕵️ 9 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
922b06f to
735a7f9
Compare
48ffdf3 to
c2996d8
Compare
There was a problem hiding this comment.
Okay, turns out I was mistaken and native_functions.yaml does indeed support str. It was actually failing to compile because it won't accept str padding as not having a default while the earlier arguments bias and stride do.
If padding is given a default value it can be ambiguous with the int[1] overload and C++ code fails to compile. At the same time, bias and stride need to have defaults in python for when you call it like conv1d(input, weight, padding='same'). I found this ugly workaround of defining these two extra overloads where padding is keyword-only and to the left of all the defaulted arguments.
436104c to
c836ae5
Compare
aten/src/ATen/native/mkldnn/Conv.cpp
Outdated
There was a problem hiding this comment.
Tensor.contiguous() here doesn't work for mkldnn tensors. There are also other places where the grads are all returned as CPU tensors and autograd will raise an error about expecting layout=Mkldnn.
It seems the mkldnn backend is not actually tested properly with mkldnn tensors as inputs. Is that not the expected use case?
|
The JIT schema changes look... pretty gnarly. We may need to figure out if there is some core changes we should make first to support this addition. |
|
what do ppl think of the approach in https://github.com/pytorch/pytorch/pull/22484/files ? |
|
Oh, I somehow missed that PR reading through the issue. The obvious problem is that by calculating the padding in python, there is no support in the C++ |
573dfcb to
c28fb97
Compare
6905d13 to
438bca6
Compare
|
@peterbell10 it's true that it doesn't cover the C++ frontend, but I think we should also have some consideration for the op library. Adding overloads like |
|
We're going to talk about this at the composability meeting today |
@ezyang what was the outcome of that discussion? |
|
@bhosmer signed up to scribe here |
Confirming - I'll put up a summary of our current thinking in the next day or so. [edit: next day or two 😬 ] |
|
HI @peterbell10, many apologies for the delayed recap. Here are the main points we arrived at in that discussion, along with some subsequent thought:
I think our next steps are:
Also cc @eellison - I don't think we gave your misgivings a full airing in our discussion, can you elaborate a little on the downside of adding the padding-mode overloads for |
438bca6 to
aea0c7f
Compare
aea0c7f to
28ff463
Compare
I think I've found a nice solution in the latest commit. I noticed that the tracer stops at If I understand correctly, the tracer will stop at the first function call that isn't in the |
|
This PR was getting pretty large with changes touching codegen, convolution and mkldnn-specific changes. So, I've split this into a ghstack of PRs, the main one being #45667 |
First part of #3867 (Pooling operators still to do)
This adds a
padding='same'mode to the interface ofconv{n}dandnn.Conv{n}d. This should match the behaviour oftensorflow. I couldn't find it explicitly documented but through experimentation I foundtensorflowreturns the shapeceil(len/stride)and always adds any extra asymmetric padding onto the right side of the input.Since the
native_functions.yamlschema doesn't seem to support strings or enums, I've moved the function interface into python and it now dispatches between the numerically paddedconv{n}dand the_conv{n}d_samevariant. Underscores because I couldn't see any way to avoid exporting a function into thetorchnamespace.A note on asymmetric padding. The total padding required can be odd if both the kernel-length is even and the dilation is odd. mkldnn has native support for asymmetric padding, so there is no overhead there, but for other backends I resort to padding the input tensor by 1 on the right hand side to make the remaining padding symmetrical. In these cases, I use
TORCH_WARN_ONCEto notify the user of the performance implications.