Skip to content

Add channel shuffle op fp32 + quantized.#36815

Closed
kimishpatel wants to merge 13 commits intogh/kimishpatel/4/basefrom
gh/kimishpatel/4/head
Closed

Add channel shuffle op fp32 + quantized.#36815
kimishpatel wants to merge 13 commits intogh/kimishpatel/4/basefrom
gh/kimishpatel/4/head

Conversation

@kimishpatel
Copy link
Copy Markdown
Contributor

@kimishpatel kimishpatel commented Apr 17, 2020

Stack from ghstack:

Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.

Differential Revision: D21093841

NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!

Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

[ghstack-poisoned]
@kimishpatel kimishpatel requested a review from apaszke as a code owner April 17, 2020 18:26
kimishpatel added a commit that referenced this pull request Apr 17, 2020
Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

ghstack-source-id: 102384115
Pull Request resolved: #36815
@kimishpatel kimishpatel requested review from dreiss and supriyar April 17, 2020 18:34
@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Apr 17, 2020

💊 Build failures summary and remediations

As of commit e281e94 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 87 times.

Comment thread aten/src/ATen/native/ChanelShuffle.cpp Outdated
"Number of groups to divide channels in must be positive.",
" Value of groups:", groups);
AT_ASSERTM((c % groups) == 0,
"Number of channels must be divisible gy groups. Got ",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"gy" -> "by"

Comment thread aten/src/ATen/native/ChanelShuffle.cpp Outdated
// For ChannelsFirst, a.k.a Contiguous, memory format we will also need
// a fast custom implementation perhaps.
return input_reshaped.permute({0 /* b */, 2 /* oc */, 1 /* groups */, 3})
.contiguous()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to call contiguous at all here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will actually rearrange the channels, as we need to reshape it back to the original shape. Was that your question?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you remove this contiguous call entirely? What happens if you move it after the reshape and give it the input.suggest_input_format() argument?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For removing contiguous entirely: This depends how will the strides be computed since reshape's input shape will be post permute and how might actually try to map part of permuted channels to h and w dims. I think. It is not clear without looking at the details of restriding logic.
For the second question my response is the same.
In the current form it says. If you have 16 channels that you divide in 2 groups each of 8 channels, rearrange as follows: All groups of a particular channel are together. Thus 2 groups of channel 0, 2 groups of channel 1.... 2 groups of channel 7. Having asked this kind of permute of channels, it seems cleaner to call contiguous on that so that all groups are actually contiguous in memory and then you reshape it back to original shape. Now you get the same number of original channels but arranged differently. Doing it other ways your question asked, requires to understand how restriding is done.

Comment on lines +71 to +72
std::unique_ptr<pytorch_qnnp_operator, QnnpackOperatorDeleter>
qnnpack_uniq_ptr(qnnpack_operator);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented on the old diff as well, but this style ensure that the operator is freed even if we throw an exception.

Comment on lines +28 to +30
// of the input. However since the above reshape clobbers h and w
// it may not be safe to do that, since channels_last contiguous
// may think oc and and the last dim correspond to h,w?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this is right.

Comment thread aten/src/ATen/native/ChanelShuffle.cpp Outdated
// a fast custom implementation perhaps.
return input_reshaped.permute({0 /* b */, 2 /* oc */, 1 /* groups */, 3})
.contiguous()
.reshape(self.sizes());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After calling reshape, can we also try to preserve the dimension names from the input?

Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Apr 24, 2020
Pull Request resolved: #36815

Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.
ghstack-source-id: 102848193

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Apr 25, 2020
Pull Request resolved: #36815

Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.
ghstack-source-id: 102892214

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Apr 27, 2020
Pull Request resolved: #36815

Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.
ghstack-source-id: 102951482

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Apr 27, 2020
Pull Request resolved: #36815

Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.
ghstack-source-id: 102963073

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

[ghstack-poisoned]
Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Apr 27, 2020
Pull Request resolved: #36815

Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.
ghstack-source-id: 102976204

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Apr 28, 2020
Pull Request resolved: #36815

Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.
ghstack-source-id: 103071381

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Apr 29, 2020
Pull Request resolved: #36815

Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.
ghstack-source-id: 103123471

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Apr 29, 2020
Pull Request resolved: #36815

Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.
ghstack-source-id: 103143646

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Apr 30, 2020
Pull Request resolved: #36815

Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.
ghstack-source-id: 103218414

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request Apr 30, 2020
Pull Request resolved: #36815

Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.
ghstack-source-id: 103254925

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!

[ghstack-poisoned]
kimishpatel added a commit that referenced this pull request May 1, 2020
Pull Request resolved: #36815

Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.
ghstack-source-id: 103267234

Differential Revision: [D21093841](https://our.internmc.facebook.com/intern/diff/D21093841/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D21093841/)!
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request has been merged in df31ddb.

@facebook-github-bot facebook-github-bot deleted the gh/kimishpatel/4/head branch May 5, 2020 14:17
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Pull Request resolved: pytorch#36815

Pytorch does not have native channel shuffle op.
This diff adds that for both fp and quantized tensors.
For FP implementation is inefficient one. For quantized there is a native
QNNPACK op for this.
ghstack-source-id: 103267234

Test Plan:
buck run caffe2/test:quantization --
quantization.test_quantized.TestQuantizedOps.test_channel_shuffle
X86 implementation for QNNPACK is sse2 so this may not be the most efficient
for x86.

Reviewed By: dreiss

Differential Revision: D21093841

fbshipit-source-id: 5282945f352df43fdffaa8544fe34dba99a5b97e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants