Implement more support for per-channel quantization#26240
Implement more support for per-channel quantization#26240dzhulgakov wants to merge 7 commits intogh/dzhulgakov/1/basefrom
Conversation
| use_memory_format); | ||
| auto qscheme = self.qscheme(); | ||
| if (qscheme == kPerTensorAffine) { | ||
| return at::_empty_affine_quantized(self.sizes(), options, |
There was a problem hiding this comment.
we have different ordering of arguments here...
maybe we can have:
at::_empty_affine_quantized(q_scale, q_zero_point, sizes, options, use_memory_format);
and
at::_empty_per_channel_affine_quantized_like(q_per_channel_scales, q_per_channel_zero_points, q_per_channel_axis, sizes, options, use_memory_format);
But should do in a separate PR: @dskhudia could you change the order for _empty_per_channel_affine_quantized_like?
There was a problem hiding this comment.
oh q_per_channel_axis is added in this PR, then @dzhulgakov could you change the order as well?
There was a problem hiding this comment.
looks like this is addressed in later PRs
There was a problem hiding this comment.
So the order and name for _empty_per_channel_affine_quantized_like (_like!) - was weird as a workaround for some codegen peculiarity - #26243 fixes it
I the order is still not the same - TensorOptions and additional arguments are in the wrong order. It's C++ only as the args are marked as kwargs in python. But we should fix it anyway. Since it's an intrusive change - I'll send a separate PR for it after these ones land
In particular adds support for empty/empty_like which is needed for memory layouts to work. [ghstack-poisoned]
| # change memory format | ||
| qlast = qr.contiguous(memory_format=torch.channels_last) | ||
| self.assertEqual(qr.stride(), list(reversed(sorted(qr.stride())))) | ||
| self.assertNotEqual(qlast.stride(), list(reversed(sorted(qlast.stride())))) |
There was a problem hiding this comment.
what is this for? to check the contiguous operation actually happened?
There was a problem hiding this comment.
yeah, just making sure. I can remove it, but it doesn't hurt
| self.assertTrue(torch.equal(qr.int_repr(), qlast.int_repr())) | ||
| self.assertTrue(qr.q_scale() == qlast.q_scale()) | ||
| self.assertTrue(qr.q_zero_point() == qlast.q_zero_point()) | ||
| self.assertTrue(np.array_equal(qlast.dequantize().numpy(), qr.dequantize().numpy())) |
There was a problem hiding this comment.
I think we have self.assertEqual on pytorch tensor as well. so
self.assertEqual(qlast.dequantize(), qr.dequantize())
should work
There was a problem hiding this comment.
ah, good point! when I tried it I typed assertEquals instead of assertEqual and we don't override that one :-P
| qlast = qr.contiguous(memory_format=torch.channels_last) | ||
| self.assertEqual(qr.stride(), list(reversed(sorted(qr.stride())))) | ||
| self.assertNotEqual(qlast.stride(), list(reversed(sorted(qlast.stride())))) | ||
| self.assertTrue(torch.equal(qr.int_repr(), qlast.int_repr())) |
There was a problem hiding this comment.
FYI: int_repr() will do contiguous right now, but I think it should preserve the strides, will fix in #25429 when I have some time.
| auto result = detail::make_tensor<QTensorImpl>(Storage(self.storage()), self.type_set(), get_qtensorimpl(self)->quantizer()); | ||
| auto quantizer = get_qtensorimpl(self)->quantizer(); | ||
| TORCH_CHECK( | ||
| quantizer->qscheme() == QScheme::PER_TENSOR_SYMMETRIC || |
There was a problem hiding this comment.
we don't have QScheme::PER_TENSOR_SYMMETRIC in backend. please remove.
see: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/quantized/Quantizer.h#L121
In particular adds support for empty/empty_like which is needed for memory layouts to work. [ghstack-poisoned]
In particular adds support for empty/empty_like which is needed for memory layouts to work. [ghstack-poisoned]
| self.q_scale(), | ||
| self.q_zero_point(), | ||
| use_memory_format); | ||
| } else if (qscheme == kPerChannelAffine) { |
There was a problem hiding this comment.
Do we also need to support kPerChannelSymmetric and kPerTensorSymmetric here?
There was a problem hiding this comment.
again, they are not present in the backend
| auto result = detail::make_tensor<QTensorImpl>(Storage(self.storage()), self.type_set(), get_qtensorimpl(self)->quantizer()); | ||
| auto quantizer = get_qtensorimpl(self)->quantizer(); | ||
| TORCH_CHECK( | ||
| quantizer->qscheme() == QScheme::PER_TENSOR_AFFINE, |
There was a problem hiding this comment.
Similar comments as above for Qscheme::PER_TENSOR_SYMMETRIC
| return get_qtensorimpl(self)->quantizer().get(); | ||
| IntArrayRef q_per_channel_axis_quant(const Tensor& self) { | ||
| auto quantizer = get_qtensorimpl(self)->quantizer(); | ||
| TORCH_CHECK(quantizer->qscheme() == kPerChannelAffine); |
There was a problem hiding this comment.
Support for PerChannelSymmetric
|
|
||
| Tensor quantized_clone(const Tensor& self) { | ||
| // TODO: add per channel support | ||
| TORCH_INTERNAL_ASSERT( |
There was a problem hiding this comment.
Similar comments as above
| q = torch._empty_per_channel_affine_quantized_like(scales, zero_points, [numel], [ch_axis], dtype=torch.quint8) | ||
| self.assertEqual(scales, q.q_per_channel_scales()) | ||
| self.assertEqual(zero_points, q.q_per_channel_zero_points()) | ||
| self.assertEqual([ch_axis], q.q_per_channel_axis()) |
There was a problem hiding this comment.
Both per_channel and qtensor creation tests should sweep over symmetric and affine quant.
| self.assertEqual(qlast.dequantize(), qr.dequantize()) | ||
|
|
||
| def test_qtensor_per_channel_permute(self): | ||
| r = torch.rand(20, 10, 2, 2, dtype=torch.float) * 4 - 2 |
There was a problem hiding this comment.
Can we allow for permutes that do not change the axis that is quantized?
Would this work?
# x is in nchw
i.e x_q = torch.quantize_linear_per_channel(... axis =[0])
# x_q is also in nchw
x_nhwc = x_q.permute([0, 2, 3,1])There was a problem hiding this comment.
as I mentioned in another place, we could, but I think we can add it later in a separate diff. And memory format works today already so it should be sufficient
| q_per_channel_zero_points() -> tuple of ints | ||
|
|
||
| Given a Tensor quantized by linear (affine) per-channel quantization, | ||
| returns a indices of dimensions on which per-channel quantization is applied. |
There was a problem hiding this comment.
Do we really support a tuple for axis? Why not restrict axis to be scalar/list of length 1 and clarify this in the API.
Also NIT: the doc string should say q_per_channel_axis()
There was a problem hiding this comment.
How likely are we to have multidim in the future? Having both apis would be confusing
raghuramank100
left a comment
There was a problem hiding this comment.
Please see comments, thanks
In particular adds support for empty/empty_like which is needed for memory layouts to work. Differential Revision: [D17443220](https://our.internmc.facebook.com/intern/diff/D17443220) [ghstack-poisoned]
In particular adds support for empty/empty_like which is needed for memory layouts to work. Differential Revision: [D17443220](https://our.internmc.facebook.com/intern/diff/D17443220) [ghstack-poisoned]
In particular adds support for empty/empty_like which is needed for memory layouts to work. Differential Revision: [D17443220](https://our.internmc.facebook.com/intern/diff/D17443220) [ghstack-poisoned]
Summary: Pull Request resolved: pytorch/pytorch#26240 In particular adds support for empty/empty_like which is needed for memory layouts to work. Test Plan: Imported from OSS Differential Revision: D17443220 Pulled By: dzhulgakov fbshipit-source-id: 9c9e25981999c0edaf40be104a5741e9c62a1333
|
@dzhulgakov merged this pull request in 8c1354c. |
|
|
||
| Quantizer* quantizer(const Tensor& self) { | ||
| return get_qtensorimpl(self)->quantizer().get(); | ||
| IntArrayRef q_per_channel_axis_quant(const Tensor& self) { |
There was a problem hiding this comment.
This is the very first occurrence of an IntArrayRef at return site of a native function, and I don't like it. The returned ref is non-owning. So what's its lifetime tied to? This is not documented anywhere. (In fact, the lifetime is tied to the quantizer, not the tensor itself; so the semantics don't even match sizes() and strides(), whose lifetime is tied to the tensor). Furthermore, this would have to be supported properly in the JIT, as all values in the stack must be owning. C.f. implementation of sizes():
Operator(
"aten::size(Tensor self) -> int[]",
[](Stack& stack) {
RECORD_FUNCTION("size", last(stack, 1));
auto t = std::move(pop(stack)).toTensor();
pack(stack, t.sizes().vec());
return 0;
},
aliasAnalysisFromSchema()),
I suppose, from first principles, the ability to return a primitive array from a function is desirable functionality, I think we should think carefully about how we want to design this. As a first proposal, the return type of this function should be a std::vector<>, not an IntArrayRef. As a second proposal, this shouldn't be a native function, and just hard coded in the same way sizes/strides are.
Summary: Pull Request resolved: pytorch#26240 In particular adds support for empty/empty_like which is needed for memory layouts to work. Test Plan: Imported from OSS Differential Revision: D17443220 Pulled By: dzhulgakov fbshipit-source-id: 9c9e25981999c0edaf40be104a5741e9c62a1333
Stack from ghstack:
In particular adds support for empty/empty_like which is needed for memory layouts to work.
Differential Revision: D17443220