Add channel shuffle op fp32 + quantized.#36815
Add channel shuffle op fp32 + quantized.#36815kimishpatel wants to merge 13 commits intogh/kimishpatel/4/basefrom
Conversation
Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]
Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! ghstack-source-id: 102384115 Pull Request resolved: #36815
💊 Build failures summary and remediationsAs of commit e281e94 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 87 times. |
| "Number of groups to divide channels in must be positive.", | ||
| " Value of groups:", groups); | ||
| AT_ASSERTM((c % groups) == 0, | ||
| "Number of channels must be divisible gy groups. Got ", |
| // For ChannelsFirst, a.k.a Contiguous, memory format we will also need | ||
| // a fast custom implementation perhaps. | ||
| return input_reshaped.permute({0 /* b */, 2 /* oc */, 1 /* groups */, 3}) | ||
| .contiguous() |
There was a problem hiding this comment.
Why do we need to call contiguous at all here?
There was a problem hiding this comment.
So this will actually rearrange the channels, as we need to reshape it back to the original shape. Was that your question?
There was a problem hiding this comment.
What happens if you remove this contiguous call entirely? What happens if you move it after the reshape and give it the input.suggest_input_format() argument?
There was a problem hiding this comment.
For removing contiguous entirely: This depends how will the strides be computed since reshape's input shape will be post permute and how might actually try to map part of permuted channels to h and w dims. I think. It is not clear without looking at the details of restriding logic.
For the second question my response is the same.
In the current form it says. If you have 16 channels that you divide in 2 groups each of 8 channels, rearrange as follows: All groups of a particular channel are together. Thus 2 groups of channel 0, 2 groups of channel 1.... 2 groups of channel 7. Having asked this kind of permute of channels, it seems cleaner to call contiguous on that so that all groups are actually contiguous in memory and then you reshape it back to original shape. Now you get the same number of original channels but arranged differently. Doing it other ways your question asked, requires to understand how restriding is done.
| std::unique_ptr<pytorch_qnnp_operator, QnnpackOperatorDeleter> | ||
| qnnpack_uniq_ptr(qnnpack_operator); |
There was a problem hiding this comment.
I commented on the old diff as well, but this style ensure that the operator is freed even if we throw an exception.
| // of the input. However since the above reshape clobbers h and w | ||
| // it may not be safe to do that, since channels_last contiguous | ||
| // may think oc and and the last dim correspond to h,w? |
There was a problem hiding this comment.
Yeah, I think this is right.
| // a fast custom implementation perhaps. | ||
| return input_reshaped.permute({0 /* b */, 2 /* oc */, 1 /* groups */, 3}) | ||
| .contiguous() | ||
| .reshape(self.sizes()); |
There was a problem hiding this comment.
After calling reshape, can we also try to preserve the dimension names from the input?
Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]
Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 102848193 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]
Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 102892214 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]
Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 102951482 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]
Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 102963073 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]
Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]
Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 102976204 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]
Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 103071381 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]
Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 103123471 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]
Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 103143646 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]
Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 103218414 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]
Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 103254925 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)! [ghstack-poisoned]
Pull Request resolved: #36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 103267234 Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
|
This pull request has been merged in df31ddb. |
Summary: Pull Request resolved: pytorch#36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 103267234 Test Plan: buck run caffe2/test:quantization -- quantization.test_quantized.TestQuantizedOps.test_channel_shuffle X86 implementation for QNNPACK is sse2 so this may not be the most efficient for x86. Reviewed By: dreiss Differential Revision: D21093841 fbshipit-source-id: 5282945f352df43fdffaa8544fe34dba99a5b97e
Stack from ghstack:
Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.
Differential Revision: D21093841
NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!