Fold weight permutation inside quantized conv operator#26241
Fold weight permutation inside quantized conv operator#26241dzhulgakov wants to merge 8 commits intogh/dzhulgakov/2/basefrom
Conversation
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for weights of the qconv by using MemoryLayout mechanism. [ghstack-poisoned]
|
What about quantize linear ops? |
|
What do you mean about linear? Linear ops don't have memory order Also cc @VitalyFedyunin to verify that memory format stuff looks good |
|
|
||
| b = torch.randn(oC, dtype=torch.float32) if use_bias else None | ||
| q_bias = torch.quantize_linear(b, scale=1.0 / 1024, zero_point=0, dtype=torch.qint32) if use_bias else None | ||
| q_filters_ref = torch.ops.quantized.conv_prepack(qw.permute([0, 2, 3, 1]), |
There was a problem hiding this comment.
Don't you also need similar changes to the tests in test_quantized.py?
There was a problem hiding this comment.
oh, I think I put it accidentally in #26242 - let me just squash them together
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for weights of the qconv by using MemoryLayout mechanism. [ghstack-poisoned]
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for weights of the qconv by using MemoryLayout mechanism. [ghstack-poisoned]
|
Please don't land, I need to read it carefully. |
VitalyFedyunin
left a comment
There was a problem hiding this comment.
Memory Format manipulations looks good!
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for weights of the qconv by using MemoryLayout mechanism. Differential Revision: [D17443219](https://our.internmc.facebook.com/intern/diff/D17443219) [ghstack-poisoned]
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for weights of the qconv by using MemoryLayout mechanism. Differential Revision: [D17443219](https://our.internmc.facebook.com/intern/diff/D17443219) [ghstack-poisoned]
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for weights of the qconv by using MemoryLayout mechanism. Differential Revision: [D17443219](https://our.internmc.facebook.com/intern/diff/D17443219) [ghstack-poisoned]
| W_KRSC = W.permute([0, 2, 3, 1]).contiguous() | ||
| if channelwise: | ||
| W_q = torch.quantize_linear_per_channel(W_KRSC, | ||
| W_q = torch.quantize_linear_per_channel(W, |
There was a problem hiding this comment.
Thanks for changing this too. Did you test it locally by setting PYTORCH_TEST_WITH_QNNPACK ?
| scales.toType(kDouble), | ||
| zero_points.toType(kLong), | ||
| {output_channels, kernel_h, kernel_w, C_per_G}, | ||
| {output_channels, C_per_G, kernel_h, kernel_w}, |
There was a problem hiding this comment.
Do we need to change it similarly for QNNPACK as well below since we store it as MemoryFormat::ChannelsLast in the packed struct?
There was a problem hiding this comment.
No, you store the original Tensor and it's fine to return Tensor in ChannelLast format (it'd just happen to be non-contiguous but still semantically correct)
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for weights of the qconv by using MemoryLayout mechanism. Differential Revision: [D17443219](https://our.internmc.facebook.com/intern/diff/D17443219) [ghstack-poisoned]
Summary: Pull Request resolved: pytorch/pytorch#26241 According to pytorch/pytorch#19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for weights of the qconv by using MemoryLayout mechanism. Test Plan: Imported from OSS Differential Revision: D17443219 Pulled By: dzhulgakov fbshipit-source-id: ce0eb92034a9977b3303dafab8b0414575171062
|
@dzhulgakov merged this pull request in d5daac7. |
Summary: Pull Request resolved: pytorch#26241 According to pytorch#19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for weights of the qconv by using MemoryLayout mechanism. Test Plan: Imported from OSS Differential Revision: D17443219 Pulled By: dzhulgakov fbshipit-source-id: ce0eb92034a9977b3303dafab8b0414575171062
Stack from ghstack:
According to #19092 we always keep NCHW order and do handling inside the kernels. This PR fixes it for weights of the qconv by using MemoryLayout mechanism.
Differential Revision: D17443219