Skip to content

Fold activation permutation inside quantized conv operator#26242

Closed
dzhulgakov wants to merge 8 commits intogh/dzhulgakov/3/basefrom
gh/dzhulgakov/3/head
Closed

Fold activation permutation inside quantized conv operator#26242
dzhulgakov wants to merge 8 commits intogh/dzhulgakov/3/basefrom
gh/dzhulgakov/3/head

Conversation

@dzhulgakov
Copy link
Copy Markdown
Collaborator

@dzhulgakov dzhulgakov commented Sep 14, 2019

Stack from ghstack:

According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC.

Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call.

Differential Revision: D17443218

@dzhulgakov dzhulgakov requested a review from apaszke as a code owner September 14, 2019 23:44
@pytorchbot pytorchbot added module: nn Related to torch.nn module: operators oncall: quantization Quantization support in PyTorch labels Sep 14, 2019
@dzhulgakov dzhulgakov changed the title permute for activations Fold activation permutation inside quantized conv operator Sep 14, 2019
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC.

Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call.


[ghstack-poisoned]
Dmytro Dzhulgakov added 2 commits September 17, 2019 20:04
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC.

Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call.


[ghstack-poisoned]
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC.

Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call.


[ghstack-poisoned]
Copy link
Copy Markdown
Collaborator

@jamesr66a jamesr66a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tested performance?

// mind. Ideally, we'd be compatible with conv2d behavior and preserve the
// inputs layout as is (doing necessary upconversions).
//
// However, to be more robust, we'd just force output layout to always be
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"we'd just..." doesn't make it clear whether this is something you're proposing as a follow-up, or if this is how it's currently implemented

outShape, device(kCPU).dtype(kQUInt8), output_scale, output_zero_point);
outShape, device(kCPU).dtype(kQUInt8), output_scale, output_zero_point,
MemoryFormat::ChannelsLast);
auto buffer = at::zeros_like(output, output.options().dtype(at::kInt));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: buffer here will be non channels last. Is it intended?

According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC.

Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call.

Differential Revision: [D17443218](https://our.internmc.facebook.com/intern/diff/D17443218)

[ghstack-poisoned]
@dzhulgakov
Copy link
Copy Markdown
Collaborator Author

@jamesr66a - thanks for the benchmarking comment, I used benchmarks/operator_benchmarks and now they are the same before and after this diff. Also add a benchmark that has chained convolutions to make sure format is propagated through.

@VitalyFedyunin - there's currently a regression in .contiguous(torch.channels_last) call compared with permute. I kept it on permute (but inside C++), we can fix it later:

In [8]: orig = torch.empty((10,256,64,64))

In [9]: %timeit orig.contiguous(memory_format=torch.channels_last)
100 loops, best of 3: 8.61 ms per loop

In [10]: %timeit orig.permute([0,2,3,1]).contiguous()
100 loops, best of 3: 4.93 ms per loop

@supriyar
Copy link
Copy Markdown
Contributor

@dzhulgakov I submitted my PR for quantize conv, so you might want to rebase..

Dmytro Dzhulgakov added 3 commits September 18, 2019 22:49
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC.

Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call.

Differential Revision: [D17443218](https://our.internmc.facebook.com/intern/diff/D17443218)

[ghstack-poisoned]
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC.

Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call.

Differential Revision: [D17443218](https://our.internmc.facebook.com/intern/diff/D17443218)

[ghstack-poisoned]
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC.

Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call.

Differential Revision: [D17443218](https://our.internmc.facebook.com/intern/diff/D17443218)

[ghstack-poisoned]
zdevito pushed a commit to zdevito/ATen that referenced this pull request Sep 19, 2019
Summary:
Pull Request resolved: pytorch/pytorch#26242

According to pytorch/pytorch#19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC.

Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call.

Test Plan: Imported from OSS

Differential Revision: D17443218

Pulled By: dzhulgakov

fbshipit-source-id: cfd136ae0465acd8d8c26ffad87385dac9c88726
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@dzhulgakov merged this pull request in af64789.

@facebook-github-bot facebook-github-bot deleted the gh/dzhulgakov/3/head branch October 28, 2019 22:08
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
…6242)

Summary:
Pull Request resolved: pytorch#26242

According to pytorch#19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for activations of the qconv by using MemoryLayout mechanism - activations stay logically as NCHW but strided as NHWC.

Note, that this version is more aggressive than eventual MemoryLayout mechanism - the QConv's output is always NHWC regardless of the input striding. I think it's ok as we don't have NCHW quantized kernels anyway - so the very first conv would magically switch the order, but I'm open to suggestions. Btw, it doesn't change behavior - same happens today in master because of the explicit permute() call.

Test Plan: Imported from OSS

Differential Revision: D17443218

Pulled By: dzhulgakov

fbshipit-source-id: cfd136ae0465acd8d8c26ffad87385dac9c88726
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

caffe2 Merged module: nn Related to torch.nn oncall: quantization Quantization support in PyTorch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants