Add channel shuffle op fp32 + quantized. by kimishpatel · Pull Request #36815 · pytorch/pytorch

kimishpatel · 2020-04-17T18:26:49Z

Stack from ghstack:

Add channel shuffle op fp32 + quantized. #36815 Add channel shuffle op fp32 + quantized.
Replace empty_affine_quantizer with direct dispatch to at::native::empty_affine.. . #36814 Replace empty_affine_quantizer with direct dispatch to at::native::empty_affine.. .
Add quantized adaptive avgpool. #36813 Add quantized adaptive avgpool.
Move to using MemoryFormat::ChannelsLast for quantized avgpool2d. #36812 Move to using MemoryFormat::ChannelsLast for quantized avgpool2d.

Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.

Differential Revision: D21093841

NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!

Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]

Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! ghstack-source-id: 102384115 Pull Request resolved: #36815

dr-ci · 2020-04-17T18:43:23Z

💊 Build failures summary and remediations

As of commit e281e94 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 87 times.

dreiss · 2020-04-23T16:46:28Z

+      "Number of groups to divide channels in must be positive.",
+      " Value of groups:", groups);
+  AT_ASSERTM((c % groups) == 0,
+             "Number of channels must be divisible gy groups. Got ",


"gy" -> "by"

dreiss · 2020-04-23T16:54:29Z

+  // For ChannelsFirst, a.k.a Contiguous, memory format we will also need
+  // a fast custom implementation perhaps.
+  return input_reshaped.permute({0 /* b */, 2 /* oc */, 1 /* groups */, 3})
+                       .contiguous()


Why do we need to call contiguous at all here?

So this will actually rearrange the channels, as we need to reshape it back to the original shape. Was that your question?

What happens if you remove this contiguous call entirely? What happens if you move it after the reshape and give it the input.suggest_input_format() argument?

For removing contiguous entirely: This depends how will the strides be computed since reshape's input shape will be post permute and how might actually try to map part of permuted channels to h and w dims. I think. It is not clear without looking at the details of restriding logic.
For the second question my response is the same.
In the current form it says. If you have 16 channels that you divide in 2 groups each of 8 channels, rearrange as follows: All groups of a particular channel are together. Thus 2 groups of channel 0, 2 groups of channel 1.... 2 groups of channel 7. Having asked this kind of permute of channels, it seems cleaner to call contiguous on that so that all groups are actually contiguous in memory and then you reshape it back to original shape. Now you get the same number of original channels but arranged differently. Doing it other ways your question asked, requires to understand how restriding is done.

dreiss · 2020-04-23T16:58:36Z

+  std::unique_ptr<pytorch_qnnp_operator, QnnpackOperatorDeleter>
+      qnnpack_uniq_ptr(qnnpack_operator);


I commented on the old diff as well, but this style ensure that the operator is freed even if we throw an exception.

dreiss · 2020-04-23T17:00:13Z

+  // of the input. However since the above reshape clobbers h and w
+  // it may not be safe to do that, since channels_last contiguous
+  // may think oc and and the last dim correspond to h,w?


Yeah, I think this is right.

dreiss · 2020-04-23T17:00:36Z

+  // a fast custom implementation perhaps.
+  return input_reshaped.permute({0 /* b */, 2 /* oc */, 1 /* groups */, 3})
+                       .contiguous()
+                       .reshape(self.sizes());


After calling reshape, can we also try to preserve the dimension names from the input?

Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]

Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 102848193 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]

Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 102892214 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]

Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 102951482 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]

Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 102963073 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]

Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 102976204 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]

Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 103071381 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]

Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 103123471 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]

Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 103143646 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]

Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 103218414 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]

Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 103254925 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]

Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 103267234 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

facebook-github-bot · 2020-05-01T18:16:16Z

This pull request has been merged in df31ddb.

Summary: Pull Request resolved: pytorch#36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 103267234 Test Plan: buck run caffe2/test:quantization -- quantization.test_quantized.TestQuantizedOps.test_channel_shuffle X86 implementation for QNNPACK is sse2 so this may not be the most efficient for x86. Reviewed By: dreiss Differential Revision: D21093841 fbshipit-source-id: 5282945f352df43fdffaa8544fe34dba99a5b97e

kimishpatel requested a review from apaszke as a code owner April 17, 2020 18:26

This was referenced Apr 17, 2020

Move to using MemoryFormat::ChannelsLast for quantized avgpool2d. #36812

Closed

Add quantized adaptive avgpool. #36813

Closed

Replace empty_affine_quantizer with direct dispatch to at::native::empty_affine.. . #36814

Closed

kimishpatel requested review from dreiss and supriyar April 17, 2020 18:34

kimishpatel mentioned this pull request Apr 17, 2020

Add channel shuffle op fp32 + quantized. #36489

Closed

dreiss approved these changes Apr 23, 2020

View reviewed changes

dreiss reviewed Apr 23, 2020

View reviewed changes

facebook-github-bot closed this in df31ddb May 1, 2020

facebook-github-bot added the merged label May 1, 2020

facebook-github-bot deleted the gh/kimishpatel/4/head branch May 5, 2020 14:17

mruberry added the Merged label Oct 28, 2020

dzhulgakov mentioned this pull request Nov 17, 2021

Feature: Add derivative for channel_shuffle #67240

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add channel shuffle op fp32 + quantized.#36815

Add channel shuffle op fp32 + quantized.#36815
kimishpatel wants to merge 13 commits intogh/kimishpatel/4/basefrom
gh/kimishpatel/4/head

kimishpatel commented Apr 17, 2020 •

edited

Loading

Uh oh!

dr-ci Bot commented Apr 17, 2020 •

edited

Loading

Uh oh!

dreiss Apr 23, 2020

Uh oh!

dreiss Apr 23, 2020

Uh oh!

kimishpatel Apr 24, 2020

Uh oh!

dreiss Apr 27, 2020

Uh oh!

kimishpatel Apr 27, 2020

Uh oh!

dreiss Apr 23, 2020

Uh oh!

dreiss Apr 23, 2020

Uh oh!

dreiss Apr 23, 2020

Uh oh!

facebook-github-bot commented May 1, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		std::unique_ptr<pytorch_qnnp_operator, QnnpackOperatorDeleter>
		qnnpack_uniq_ptr(qnnpack_operator);

Conversation

kimishpatel commented Apr 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci Bot commented Apr 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 Build failures summary and remediations

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented May 1, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kimishpatel commented Apr 17, 2020 •

edited

Loading

dr-ci Bot commented Apr 17, 2020 •

edited

Loading