Use NNPACK for strided convolutions.#27402
Use NNPACK for strided convolutions.#27402AshkanAliabadi wants to merge 16 commits intogh/AshkanAliabadi/3/basefrom
Conversation
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
| if (params.use_cpu_depthwise3x3_winograd(input, weight)) { | ||
| output = convolution_depthwise3x3_winograd_stub( | ||
| input.device().type(), input, weight, bias, params.padding, params.stride, params.groups); | ||
| input.device().type(), input, weight, bias, params.stride, params.padding, params.groups); |
There was a problem hiding this comment.
wait, was it a bug? do we have a unittest for it?
There was a problem hiding this comment.
This is a crazy one. Here's my explanation of how I think it was working: #27117 (comment)
[ghstack-poisoned]
[ghstack-poisoned]
AshkanAliabadi
left a comment
There was a problem hiding this comment.
Pending CI, it seems that I finally managed to get this working. This is really not an ideal solution in my opinion and definitely not the direction we should be taking PyTorch mobile in, in the long run. I think the ideal solution would be to have one single mobile-focused backend, just like the reference or vendor-provided backends we currently have, that handles floating point and quantized operations both for NHWC and NCHW layout but efficiently for mobile. That is a considerable undertaking though, and if we want to close the gap in the meanwhile, we need short-term solutions like this unfortunately.
aten/src/ATen/CMakeLists.txt
Outdated
| if(AT_NNPACK_ENABLED) | ||
| include_directories(${NNPACK_INCLUDE_DIRS}) | ||
| list(APPEND ATen_CPU_DEPENDENCY_LIBS nnpack) # cpuinfo is added below | ||
| list(APPEND ATen_CPU_DEPENDENCY_LIBS nnpack nnpack_reference_layers) # cpuinfo is added below |
There was a problem hiding this comment.
NNPACK does not support strided convolutions on batches (used during training,) hence we fallback to the slower reference implementation here.
aten/src/ATen/native/NNPACK.cpp
Outdated
| weight.data_ptr<float>(), | ||
| bias_.data_ptr<float>(), | ||
| output.data_ptr<float>(), | ||
| nnpack_threadpool()); |
There was a problem hiding this comment.
NNPACK does not support strided convolutions on batches (used during training,) hence we fallback to the slower reference implementation here.
[ghstack-poisoned]
Differential Revision: [D18126265](https://our.internmc.facebook.com/intern/diff/D18126265) [ghstack-poisoned]
Differential Revision: [D18126265](https://our.internmc.facebook.com/intern/diff/D18126265) [ghstack-poisoned]
Differential Revision: [D18126265](https://our.internmc.facebook.com/intern/diff/D18126265) [ghstack-poisoned]
Differential Revision: [D18126265](https://our.internmc.facebook.com/intern/diff/D18126265) [ghstack-poisoned]
Differential Revision: [D18126265](https://our.internmc.facebook.com/intern/diff/D18126265) [ghstack-poisoned]
Differential Revision: [D18126265](https://our.internmc.facebook.com/intern/diff/D18126265) [ghstack-poisoned]
Differential Revision: [D18126265](https://our.internmc.facebook.com/intern/diff/D18126265) [ghstack-poisoned]
Differential Revision: [D18126265](https://our.internmc.facebook.com/intern/diff/D18126265) [ghstack-poisoned]
Differential Revision: [D18126265](https://our.internmc.facebook.com/intern/diff/D18126265) [ghstack-poisoned]
Stack from ghstack:
Differential Revision: D18126265