Fold activation permutation inside quantized conv operator by dzhulgakov · Pull Request #26242 · pytorch/pytorch

dzhulgakov · 2019-09-14T23:44:17Z

Stack from ghstack:

Serialization for per channel qtensor #26339 Serialization for per channel qtensor
Fix _empty_per_channel_affine_quantized to be less hacky #26243 Fix _empty_per_channel_affine_quantized to be less hacky
Fold activation permutation inside quantized conv operator #26242 Fold activation permutation inside quantized conv operator
Fold weight permutation inside quantized conv operator #26241 Fold weight permutation inside quantized conv operator
Implement more support for per-channel quantization #26240 Implement more support for per-channel quantization

According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC.

Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call.

Differential Revision: D17443218

According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC. Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call. [ghstack-poisoned]

jamesr66a

Have you tested performance?

jamesr66a · 2019-09-18T18:53:43Z

+    // mind. Ideally, we'd be compatible with conv2d behavior and preserve the
+    // inputs layout as is (doing necessary upconversions).
+    //
+    // However, to be more robust, we'd just force output layout to always be


"we'd just..." doesn't make it clear whether this is something you're proposing as a follow-up, or if this is how it's currently implemented

VitalyFedyunin · 2019-09-18T19:13:41Z

-        outShape, device(kCPU).dtype(kQUInt8), output_scale, output_zero_point);
+        outShape, device(kCPU).dtype(kQUInt8), output_scale, output_zero_point,
+        MemoryFormat::ChannelsLast);
    auto buffer = at::zeros_like(output, output.options().dtype(at::kInt));


FYI: buffer here will be non channels last. Is it intended?

According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC. Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call. Differential Revision: [D17443218](https://our.internmc.facebook.com/intern/diff/D17443218) [ghstack-poisoned]

dzhulgakov · 2019-09-19T01:03:09Z

@jamesr66a - thanks for the benchmarking comment, I used benchmarks/operator_benchmarks and now they are the same before and after this diff. Also add a benchmark that has chained convolutions to make sure format is propagated through.

@VitalyFedyunin - there's currently a regression in .contiguous(torch.channels_last) call compared with permute. I kept it on permute (but inside C++), we can fix it later:

In [8]: orig = torch.empty((10,256,64,64))

In [9]: %timeit orig.contiguous(memory_format=torch.channels_last)
100 loops, best of 3: 8.61 ms per loop

In [10]: %timeit orig.permute([0,2,3,1]).contiguous()
100 loops, best of 3: 4.93 ms per loop

supriyar · 2019-09-19T01:31:37Z

@dzhulgakov I submitted my PR for quantize conv, so you might want to rebase..

According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC. Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call. Differential Revision: [D17443218](https://our.internmc.facebook.com/intern/diff/D17443218) [ghstack-poisoned]

Summary: Pull Request resolved: pytorch/pytorch#26242 According to pytorch/pytorch#19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC. Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call. Test Plan: Imported from OSS Differential Revision: D17443218 Pulled By: dzhulgakov fbshipit-source-id: cfd136ae0465acd8d8c26ffad87385dac9c88726

facebook-github-bot · 2019-09-20T01:05:47Z

@dzhulgakov merged this pull request in af64789.

…6242) Summary: Pull Request resolved: pytorch#26242 According to pytorch#19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC. Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call. Test Plan: Imported from OSS Differential Revision: D17443218 Pulled By: dzhulgakov fbshipit-source-id: cfd136ae0465acd8d8c26ffad87385dac9c88726

permute for activations

0047517

dzhulgakov requested a review from apaszke as a code owner September 14, 2019 23:44

pytorchbot added module: nn Related to torch.nn module: operators oncall: quantization Quantization support in PyTorch labels Sep 14, 2019

This was referenced Sep 14, 2019

Implement more support for per-channel quantization #26240

Closed

Fold weight permutation inside quantized conv operator #26241

Closed

Fix _empty_per_channel_affine_quantized to be less hacky #26243

Closed

dzhulgakov changed the title ~~permute for activations~~ Fold activation permutation inside quantized conv operator Sep 14, 2019

dzhulgakov requested review from VitalyFedyunin, dskhudia, jamesr66a, raghuramank100 and supriyar September 14, 2019 23:57

dzhulgakov mentioned this pull request Sep 17, 2019

Serialization for per channel qtensor #26339

Closed

Dmytro Dzhulgakov added 2 commits September 17, 2019 20:04

jamesr66a reviewed Sep 18, 2019

View reviewed changes

VitalyFedyunin reviewed Sep 18, 2019

View reviewed changes

raghuramank100 approved these changes Sep 18, 2019

View reviewed changes

supriyar mentioned this pull request Sep 18, 2019

Unify Quantization APIs for add, pool and relu #26335

Closed

pytorchbot added the caffe2 label Sep 19, 2019

Dmytro Dzhulgakov added 3 commits September 18, 2019 22:49

facebook-github-bot closed this in af64789 Sep 19, 2019

facebook-github-bot added the merged label Sep 20, 2019

facebook-github-bot deleted the gh/dzhulgakov/3/head branch October 28, 2019 22:08

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fold activation permutation inside quantized conv operator#26242

Fold activation permutation inside quantized conv operator#26242
dzhulgakov wants to merge 8 commits intogh/dzhulgakov/3/basefrom
gh/dzhulgakov/3/head

dzhulgakov commented Sep 14, 2019 •

edited

Loading

Uh oh!

jamesr66a left a comment

Uh oh!

jamesr66a Sep 18, 2019

Uh oh!

VitalyFedyunin Sep 18, 2019

Uh oh!

dzhulgakov commented Sep 19, 2019

Uh oh!

supriyar commented Sep 19, 2019

Uh oh!

facebook-github-bot commented Sep 20, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Conversation

dzhulgakov commented Sep 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jamesr66a left a comment

Choose a reason for hiding this comment

Uh oh!

jamesr66a Sep 18, 2019

Choose a reason for hiding this comment

Uh oh!

VitalyFedyunin Sep 18, 2019

Choose a reason for hiding this comment

Uh oh!

dzhulgakov commented Sep 19, 2019

Uh oh!

supriyar commented Sep 19, 2019

Uh oh!

facebook-github-bot commented Sep 20, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

dzhulgakov commented Sep 14, 2019 •

edited

Loading