Skip to content

Implement more support for per-channel quantization#26240

Closed
dzhulgakov wants to merge 7 commits intogh/dzhulgakov/1/basefrom
gh/dzhulgakov/1/head
Closed

Implement more support for per-channel quantization#26240
dzhulgakov wants to merge 7 commits intogh/dzhulgakov/1/basefrom
gh/dzhulgakov/1/head

Conversation

@dzhulgakov
Copy link
Copy Markdown
Collaborator

@dzhulgakov dzhulgakov commented Sep 14, 2019

Stack from ghstack:

In particular adds support for empty/empty_like which is needed for memory layouts to work.

Differential Revision: D17443220

@pytorchbot pytorchbot added module: internals Related to internal abstractions in c10 and ATen module: operators module: pybind Related to our Python bindings / interactions with other Python libraries oncall: quantization Quantization support in PyTorch labels Sep 14, 2019
@dzhulgakov dzhulgakov changed the title implement more per-channel code Implement more support for per-channel quantization Sep 14, 2019
use_memory_format);
auto qscheme = self.qscheme();
if (qscheme == kPerTensorAffine) {
return at::_empty_affine_quantized(self.sizes(), options,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have different ordering of arguments here...
maybe we can have:
at::_empty_affine_quantized(q_scale, q_zero_point, sizes, options, use_memory_format);
and
at::_empty_per_channel_affine_quantized_like(q_per_channel_scales, q_per_channel_zero_points, q_per_channel_axis, sizes, options, use_memory_format);

But should do in a separate PR: @dskhudia could you change the order for _empty_per_channel_affine_quantized_like?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh q_per_channel_axis is added in this PR, then @dzhulgakov could you change the order as well?

Copy link
Copy Markdown
Contributor

@jerryzh168 jerryzh168 Sep 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this is addressed in later PRs

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the order and name for _empty_per_channel_affine_quantized_like (_like!) - was weird as a workaround for some codegen peculiarity - #26243 fixes it

I the order is still not the same - TensorOptions and additional arguments are in the wrong order. It's C++ only as the args are marked as kwargs in python. But we should fix it anyway. Since it's an intrusive change - I'll send a separate PR for it after these ones land

Comment thread tools/autograd/gen_python_functions.py Outdated
In particular adds support for empty/empty_like which is needed for memory layouts to work.

[ghstack-poisoned]
@dzhulgakov dzhulgakov requested a review from apaszke as a code owner September 17, 2019 03:52
@pytorchbot pytorchbot added the module: autograd Related to torch.autograd, and the autograd engine in general label Sep 17, 2019
# change memory format
qlast = qr.contiguous(memory_format=torch.channels_last)
self.assertEqual(qr.stride(), list(reversed(sorted(qr.stride()))))
self.assertNotEqual(qlast.stride(), list(reversed(sorted(qlast.stride()))))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for? to check the contiguous operation actually happened?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, just making sure. I can remove it, but it doesn't hurt

Comment thread test/test_quantized_tensor.py Outdated
self.assertTrue(torch.equal(qr.int_repr(), qlast.int_repr()))
self.assertTrue(qr.q_scale() == qlast.q_scale())
self.assertTrue(qr.q_zero_point() == qlast.q_zero_point())
self.assertTrue(np.array_equal(qlast.dequantize().numpy(), qr.dequantize().numpy()))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have self.assertEqual on pytorch tensor as well. so

self.assertEqual(qlast.dequantize(), qr.dequantize())

should work

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, good point! when I tried it I typed assertEquals instead of assertEqual and we don't override that one :-P

Comment thread test/test_quantized_tensor.py Outdated
qlast = qr.contiguous(memory_format=torch.channels_last)
self.assertEqual(qr.stride(), list(reversed(sorted(qr.stride()))))
self.assertNotEqual(qlast.stride(), list(reversed(sorted(qlast.stride()))))
self.assertTrue(torch.equal(qr.int_repr(), qlast.int_repr()))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: int_repr() will do contiguous right now, but I think it should preserve the strides, will fix in #25429 when I have some time.

Copy link
Copy Markdown
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks

Comment thread aten/src/ATen/native/TensorShape.cpp Outdated
auto result = detail::make_tensor<QTensorImpl>(Storage(self.storage()), self.type_set(), get_qtensorimpl(self)->quantizer());
auto quantizer = get_qtensorimpl(self)->quantizer();
TORCH_CHECK(
quantizer->qscheme() == QScheme::PER_TENSOR_SYMMETRIC ||
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have QScheme::PER_TENSOR_SYMMETRIC in backend. please remove.

see: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/quantized/Quantizer.h#L121

In particular adds support for empty/empty_like which is needed for memory layouts to work.

[ghstack-poisoned]
@pytorchbot pytorchbot added module: docs Related to our documentation, both in docs/ and docblocks module: printing Issues related to the printing format of tensors labels Sep 18, 2019
In particular adds support for empty/empty_like which is needed for memory layouts to work.

[ghstack-poisoned]
self.q_scale(),
self.q_zero_point(),
use_memory_format);
} else if (qscheme == kPerChannelAffine) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to support kPerChannelSymmetric and kPerTensorSymmetric here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, they are not present in the backend

auto result = detail::make_tensor<QTensorImpl>(Storage(self.storage()), self.type_set(), get_qtensorimpl(self)->quantizer());
auto quantizer = get_qtensorimpl(self)->quantizer();
TORCH_CHECK(
quantizer->qscheme() == QScheme::PER_TENSOR_AFFINE,
Copy link
Copy Markdown
Contributor

@raghuramank100 raghuramank100 Sep 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments as above for Qscheme::PER_TENSOR_SYMMETRIC

return get_qtensorimpl(self)->quantizer().get();
IntArrayRef q_per_channel_axis_quant(const Tensor& self) {
auto quantizer = get_qtensorimpl(self)->quantizer();
TORCH_CHECK(quantizer->qscheme() == kPerChannelAffine);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Support for PerChannelSymmetric


Tensor quantized_clone(const Tensor& self) {
// TODO: add per channel support
TORCH_INTERNAL_ASSERT(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments as above

q = torch._empty_per_channel_affine_quantized_like(scales, zero_points, [numel], [ch_axis], dtype=torch.quint8)
self.assertEqual(scales, q.q_per_channel_scales())
self.assertEqual(zero_points, q.q_per_channel_zero_points())
self.assertEqual([ch_axis], q.q_per_channel_axis())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both per_channel and qtensor creation tests should sweep over symmetric and affine quant.

self.assertEqual(qlast.dequantize(), qr.dequantize())

def test_qtensor_per_channel_permute(self):
r = torch.rand(20, 10, 2, 2, dtype=torch.float) * 4 - 2
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we allow for permutes that do not change the axis that is quantized?
Would this work?

# x is in nchw
i.e x_q = torch.quantize_linear_per_channel(... axis =[0])
# x_q is also in nchw
x_nhwc =  x_q.permute([0, 2, 3,1])

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I mentioned in another place, we could, but I think we can add it later in a separate diff. And memory format works today already so it should be sufficient

Comment thread torch/_tensor_docs.py
q_per_channel_zero_points() -> tuple of ints

Given a Tensor quantized by linear (affine) per-channel quantization,
returns a indices of dimensions on which per-channel quantization is applied.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really support a tuple for axis? Why not restrict axis to be scalar/list of length 1 and clarify this in the API.
Also NIT: the doc string should say q_per_channel_axis()

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How likely are we to have multidim in the future? Having both apis would be confusing

Copy link
Copy Markdown
Contributor

@raghuramank100 raghuramank100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments, thanks

Dmytro Dzhulgakov added 3 commits September 18, 2019 22:49
In particular adds support for empty/empty_like which is needed for memory layouts to work.

Differential Revision: [D17443220](https://our.internmc.facebook.com/intern/diff/D17443220)

[ghstack-poisoned]
In particular adds support for empty/empty_like which is needed for memory layouts to work.

Differential Revision: [D17443220](https://our.internmc.facebook.com/intern/diff/D17443220)

[ghstack-poisoned]
In particular adds support for empty/empty_like which is needed for memory layouts to work.

Differential Revision: [D17443220](https://our.internmc.facebook.com/intern/diff/D17443220)

[ghstack-poisoned]
zdevito pushed a commit to zdevito/ATen that referenced this pull request Sep 19, 2019
Summary:
Pull Request resolved: pytorch/pytorch#26240

In particular adds support for empty/empty_like which is needed for memory layouts to work.

Test Plan: Imported from OSS

Differential Revision: D17443220

Pulled By: dzhulgakov

fbshipit-source-id: 9c9e25981999c0edaf40be104a5741e9c62a1333
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@dzhulgakov merged this pull request in 8c1354c.


Quantizer* quantizer(const Tensor& self) {
return get_qtensorimpl(self)->quantizer().get();
IntArrayRef q_per_channel_axis_quant(const Tensor& self) {
Copy link
Copy Markdown
Contributor

@ezyang ezyang Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the very first occurrence of an IntArrayRef at return site of a native function, and I don't like it. The returned ref is non-owning. So what's its lifetime tied to? This is not documented anywhere. (In fact, the lifetime is tied to the quantizer, not the tensor itself; so the semantics don't even match sizes() and strides(), whose lifetime is tied to the tensor). Furthermore, this would have to be supported properly in the JIT, as all values in the stack must be owning. C.f. implementation of sizes():

    Operator(
        "aten::size(Tensor self) -> int[]",
        [](Stack& stack) {
          RECORD_FUNCTION("size", last(stack, 1));
            
          auto t = std::move(pop(stack)).toTensor();
          pack(stack, t.sizes().vec());
          return 0;
        },    
        aliasAnalysisFromSchema()),

I suppose, from first principles, the ability to return a primitive array from a function is desirable functionality, I think we should think carefully about how we want to design this. As a first proposal, the return type of this function should be a std::vector<>, not an IntArrayRef. As a second proposal, this shouldn't be a native function, and just hard coded in the same way sizes/strides are.

cc @gchanan @zdevito

@facebook-github-bot facebook-github-bot deleted the gh/dzhulgakov/1/head branch October 28, 2019 22:08
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Pull Request resolved: pytorch#26240

In particular adds support for empty/empty_like which is needed for memory layouts to work.

Test Plan: Imported from OSS

Differential Revision: D17443220

Pulled By: dzhulgakov

fbshipit-source-id: 9c9e25981999c0edaf40be104a5741e9c62a1333
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: autograd Related to torch.autograd, and the autograd engine in general module: docs Related to our documentation, both in docs/ and docblocks module: internals Related to internal abstractions in c10 and ATen module: printing Issues related to the printing format of tensors module: pybind Related to our Python bindings / interactions with other Python libraries oncall: quantization Quantization support in PyTorch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants