Fold activation permutation inside quantized conv operator#26242
Fold activation permutation inside quantized conv operator#26242dzhulgakov wants to merge 8 commits intogh/dzhulgakov/3/basefrom
Conversation
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC. Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call. [ghstack-poisoned]
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC. Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call. [ghstack-poisoned]
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC. Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call. [ghstack-poisoned]
jamesr66a
left a comment
There was a problem hiding this comment.
Have you tested performance?
| // mind. Ideally, we'd be compatible with conv2d behavior and preserve the | ||
| // inputs layout as is (doing necessary upconversions). | ||
| // | ||
| // However, to be more robust, we'd just force output layout to always be |
There was a problem hiding this comment.
"we'd just..." doesn't make it clear whether this is something you're proposing as a follow-up, or if this is how it's currently implemented
| outShape, device(kCPU).dtype(kQUInt8), output_scale, output_zero_point); | ||
| outShape, device(kCPU).dtype(kQUInt8), output_scale, output_zero_point, | ||
| MemoryFormat::ChannelsLast); | ||
| auto buffer = at::zeros_like(output, output.options().dtype(at::kInt)); |
There was a problem hiding this comment.
FYI: buffer here will be non channels last. Is it intended?
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC. Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call. Differential Revision: [D17443218](https://our.internmc.facebook.com/intern/diff/D17443218) [ghstack-poisoned]
|
@jamesr66a - thanks for the benchmarking comment, I used benchmarks/operator_benchmarks and now they are the same before and after this diff. Also add a benchmark that has chained convolutions to make sure format is propagated through. @VitalyFedyunin - there's currently a regression in |
|
@dzhulgakov I submitted my PR for quantize conv, so you might want to rebase.. |
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC. Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call. Differential Revision: [D17443218](https://our.internmc.facebook.com/intern/diff/D17443218) [ghstack-poisoned]
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC. Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call. Differential Revision: [D17443218](https://our.internmc.facebook.com/intern/diff/D17443218) [ghstack-poisoned]
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC. Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call. Differential Revision: [D17443218](https://our.internmc.facebook.com/intern/diff/D17443218) [ghstack-poisoned]
Summary: Pull Request resolved: pytorch/pytorch#26242 According to pytorch/pytorch#19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC. Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call. Test Plan: Imported from OSS Differential Revision: D17443218 Pulled By: dzhulgakov fbshipit-source-id: cfd136ae0465acd8d8c26ffad87385dac9c88726
|
@dzhulgakov merged this pull request in af64789. |
…6242) Summary: Pull Request resolved: pytorch#26242 According to pytorch#19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC. Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call. Test Plan: Imported from OSS Differential Revision: D17443218 Pulled By: dzhulgakov fbshipit-source-id: cfd136ae0465acd8d8c26ffad87385dac9c88726
Stack from ghstack:
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC.
Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call.
Differential Revision: D17443218