Skip to content

Serialization for per channel qtensor#26339

Closed
dzhulgakov wants to merge 10 commits intogh/dzhulgakov/5/basefrom
gh/dzhulgakov/5/head
Closed

Serialization for per channel qtensor#26339
dzhulgakov wants to merge 10 commits intogh/dzhulgakov/5/basefrom
gh/dzhulgakov/5/head

Conversation

@dzhulgakov
Copy link
Copy Markdown
Collaborator

@dzhulgakov dzhulgakov commented Sep 17, 2019

Stack from ghstack:

Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points.

@driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet)

Differential Revision: D17443222

Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points.

@driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet)

[ghstack-poisoned]
dzhulgakov pushed a commit that referenced this pull request Sep 17, 2019
ghstack-source-id: b436ddd
Pull Request resolved: #26339
Copy link
Copy Markdown
Contributor

@driazati driazati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm generally apprehensive on the serialization changes, there's a lot of special casing / deeply paths in the tensor serialization and it's going to be hard to ensure there's no forward/backward compatibility problems.

pushInt(tensor.q_zero_point());
push<PickleOpCode>(PickleOpCode::MARK);
pushGlobal("torch", toString(tensor.qscheme()));
// tuple of (qscheme, scale, zp) or (qscheme, scales, zps, axis)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are there two different schemas here? Is it possible to always do it as (qscheme, scales, zps, axis)? Specializing is going to make the (de)serialization logic much more complicated

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on quantizer (which can be per-tensor or per-channel) - the per-tensor one doesn't have axis. I can make axis to be None in the other case, but not sure it makes it cleaner. Specialization still has to be there as the scale vs scales is int vs tuple of int

Comment thread torch/csrc/jit/pickler.cpp Outdated
case at::kPerChannelAffine: {
const auto* quantizer = static_cast<at::PerChannelAffineQuantizer*>(
tensor.quantizer().get());
push<PickleOpCode>(PickleOpCode::MARK);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can shorten all of these by creating a c10::tuple and using pushIValue

Comment thread torch/csrc/jit/pickler.cpp Outdated
axis.push_back(x.toInt());
}
result = _empty_per_channel_affine_quantized(
{0}, scales, zero_points, axis, storage_tensor.options());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are all the parameters for _empty_per_channel_affine_quantized Tensors instead of a list?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because scales is float array, we don't have float array and we don't have plans to support it

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, very theoretically, we might have multi-axis quantization in which case scales would be a multidimensional tensor. It's good to have a support for it in api, but it's not likely to appear soon. (note that in serialiation it's still fine to fold the scales to a tuple as the shape is deducible from the original tensor.

@driazati
Copy link
Copy Markdown
Contributor

As for tests that compare, you can try to save with the pickler (e.g. use torch.save in a @script function) and load it in eager, and vice versa. This test does that for regular tensors:

def _test_pickle_checkpoint(self, device):

Comment thread test/common_utils.py
@@ -722,11 +722,21 @@ def assertTensorsEqual(a, b):
elif x.is_quantized and y.is_quantized:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can remove and y.is_quantized here

Comment thread test/test_quantized_tensor.py Outdated
zero_points = torch.round(torch.rand(10) * 20 + 1).to(torch.long)
# quint32 is not supported yet
for dtype in [torch.quint8, torch.qint8]:
qr = torch.quantize_linear_per_channel(r, scales, zero_points, [1], dtype)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the behavior for slicing(view) on per channel quantized tensor right now?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any slicing / set strides operation today basically errors out with nice message of "can't do it". We could support a subset of these operations, but I doubt it's worth it at this point.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true for permute also? We could conceptually support it, but again doesn't seem like it is really needed

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, permute is implemented as strides setting, so it gives the same error

Copy link
Copy Markdown
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked code except jit/pickler.cpp.

Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points.

@driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet)

[ghstack-poisoned]
dzhulgakov pushed a commit that referenced this pull request Sep 18, 2019
ghstack-source-id: 4e5f0ff
Pull Request resolved: #26339
Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points.

@driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet)

[ghstack-poisoned]
dzhulgakov pushed a commit that referenced this pull request Sep 18, 2019
ghstack-source-id: df30787
Pull Request resolved: #26339
Comment thread torch/tensor.py
# See Note [Don't serialize hooks]
torch.utils.hooks.warn_if_has_hooks(self)
if self.is_quantized:
if self.qscheme() == torch.per_tensor_affine:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still support torch.per_tensor_symmetric and torch.per_channel_symmetric as valid qschemes. They are used only by the observer currently. We should add checks for these too whenever we have a qscheme based check.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code will never be hit by observers - you can't create a tensor with that quantizer

Comment thread torch/csrc/jit/pickler.cpp Outdated
result = at::_empty_affine_quantized(
{0}, storage_tensor.options(), q_scale, q_zero_point);
} break;
case at::kPerChannelAffine: {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also handle kPerChannelSymmetric and kPerTensorSymmetric here

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, we have only two types of quantizers present today

Comment thread test/test_jit.py
Comment thread test/test_jit.py Outdated
Copy link
Copy Markdown
Contributor

@raghuramank100 raghuramank100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add support for symmetric qschemes.

Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points.

@driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet)

Differential Revision: [D17443222](https://our.internmc.facebook.com/intern/diff/D17443222)

[ghstack-poisoned]
dzhulgakov pushed a commit that referenced this pull request Sep 19, 2019
ghstack-source-id: b7b6be2
Pull Request resolved: #26339
Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points.

@driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet)

Differential Revision: [D17443222](https://our.internmc.facebook.com/intern/diff/D17443222)

[ghstack-poisoned]
Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points.

@driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet)

Differential Revision: [D17443222](https://our.internmc.facebook.com/intern/diff/D17443222)

[ghstack-poisoned]
dzhulgakov pushed a commit that referenced this pull request Sep 19, 2019
ghstack-source-id: 77edd23
Pull Request resolved: #26339
Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points.

@driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet)

Differential Revision: [D17443222](https://our.internmc.facebook.com/intern/diff/D17443222)

[ghstack-poisoned]
dzhulgakov pushed a commit that referenced this pull request Sep 19, 2019
ghstack-source-id: 355dbe6
Pull Request resolved: #26339
Dmytro Dzhulgakov added 2 commits September 22, 2019 23:08
Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points.

@driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet)

Differential Revision: [D17443222](https://our.internmc.facebook.com/intern/diff/D17443222)

[ghstack-poisoned]
Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points.

@driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet)

Differential Revision: [D17443222](https://our.internmc.facebook.com/intern/diff/D17443222)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@dzhulgakov merged this pull request in ebc2365.

@facebook-github-bot facebook-github-bot deleted the gh/dzhulgakov/5/head branch October 28, 2019 22:08
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Pull Request resolved: pytorch#26339

Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points.

driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet)

Test Plan: Imported from OSS

Differential Revision: D17443222

Pulled By: dzhulgakov

fbshipit-source-id: a34758de1ffd2ec1cdc5355f5baf95284a4ccf4b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: internals Related to internal abstractions in c10 and ATen module: tests Issues related to tests (not the torch.testing module) oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants