Serialization for per channel qtensor#26339
Serialization for per channel qtensor#26339dzhulgakov wants to merge 10 commits intogh/dzhulgakov/5/basefrom
Conversation
[ghstack-poisoned]
Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points. @driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet) [ghstack-poisoned]
driazati
left a comment
There was a problem hiding this comment.
I'm generally apprehensive on the serialization changes, there's a lot of special casing / deeply paths in the tensor serialization and it's going to be hard to ensure there's no forward/backward compatibility problems.
| pushInt(tensor.q_zero_point()); | ||
| push<PickleOpCode>(PickleOpCode::MARK); | ||
| pushGlobal("torch", toString(tensor.qscheme())); | ||
| // tuple of (qscheme, scale, zp) or (qscheme, scales, zps, axis) |
There was a problem hiding this comment.
Why are there two different schemas here? Is it possible to always do it as (qscheme, scales, zps, axis)? Specializing is going to make the (de)serialization logic much more complicated
There was a problem hiding this comment.
It depends on quantizer (which can be per-tensor or per-channel) - the per-tensor one doesn't have axis. I can make axis to be None in the other case, but not sure it makes it cleaner. Specialization still has to be there as the scale vs scales is int vs tuple of int
| case at::kPerChannelAffine: { | ||
| const auto* quantizer = static_cast<at::PerChannelAffineQuantizer*>( | ||
| tensor.quantizer().get()); | ||
| push<PickleOpCode>(PickleOpCode::MARK); |
There was a problem hiding this comment.
You can shorten all of these by creating a c10::tuple and using pushIValue
| axis.push_back(x.toInt()); | ||
| } | ||
| result = _empty_per_channel_affine_quantized( | ||
| {0}, scales, zero_points, axis, storage_tensor.options()); |
There was a problem hiding this comment.
Why are all the parameters for _empty_per_channel_affine_quantized Tensors instead of a list?
There was a problem hiding this comment.
because scales is float array, we don't have float array and we don't have plans to support it
There was a problem hiding this comment.
also, very theoretically, we might have multi-axis quantization in which case scales would be a multidimensional tensor. It's good to have a support for it in api, but it's not likely to appear soon. (note that in serialiation it's still fine to fold the scales to a tuple as the shape is deducible from the original tensor.
|
As for tests that compare, you can try to save with the pickler (e.g. use Line 14797 in 06c69ad |
| @@ -722,11 +722,21 @@ def assertTensorsEqual(a, b): | |||
| elif x.is_quantized and y.is_quantized: | |||
There was a problem hiding this comment.
nit: we can remove and y.is_quantized here
| zero_points = torch.round(torch.rand(10) * 20 + 1).to(torch.long) | ||
| # quint32 is not supported yet | ||
| for dtype in [torch.quint8, torch.qint8]: | ||
| qr = torch.quantize_linear_per_channel(r, scales, zero_points, [1], dtype) |
There was a problem hiding this comment.
what is the behavior for slicing(view) on per channel quantized tensor right now?
There was a problem hiding this comment.
any slicing / set strides operation today basically errors out with nice message of "can't do it". We could support a subset of these operations, but I doubt it's worth it at this point.
There was a problem hiding this comment.
Is this true for permute also? We could conceptually support it, but again doesn't seem like it is really needed
There was a problem hiding this comment.
yes, permute is implemented as strides setting, so it gives the same error
jerryzh168
left a comment
There was a problem hiding this comment.
checked code except jit/pickler.cpp.
Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points. @driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet) [ghstack-poisoned]
Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points. @driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet) [ghstack-poisoned]
| # See Note [Don't serialize hooks] | ||
| torch.utils.hooks.warn_if_has_hooks(self) | ||
| if self.is_quantized: | ||
| if self.qscheme() == torch.per_tensor_affine: |
There was a problem hiding this comment.
We still support torch.per_tensor_symmetric and torch.per_channel_symmetric as valid qschemes. They are used only by the observer currently. We should add checks for these too whenever we have a qscheme based check.
There was a problem hiding this comment.
this code will never be hit by observers - you can't create a tensor with that quantizer
| result = at::_empty_affine_quantized( | ||
| {0}, storage_tensor.options(), q_scale, q_zero_point); | ||
| } break; | ||
| case at::kPerChannelAffine: { |
There was a problem hiding this comment.
We should also handle kPerChannelSymmetric and kPerTensorSymmetric here
There was a problem hiding this comment.
again, we have only two types of quantizers present today
raghuramank100
left a comment
There was a problem hiding this comment.
Please add support for symmetric qschemes.
Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points. @driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet) Differential Revision: [D17443222](https://our.internmc.facebook.com/intern/diff/D17443222) [ghstack-poisoned]
Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points. @driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet) Differential Revision: [D17443222](https://our.internmc.facebook.com/intern/diff/D17443222) [ghstack-poisoned]
Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points. @driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet) Differential Revision: [D17443222](https://our.internmc.facebook.com/intern/diff/D17443222) [ghstack-poisoned]
Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points. @driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet) Differential Revision: [D17443222](https://our.internmc.facebook.com/intern/diff/D17443222) [ghstack-poisoned]
Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points. @driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet) Differential Revision: [D17443222](https://our.internmc.facebook.com/intern/diff/D17443222) [ghstack-poisoned]
Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points. @driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet) Differential Revision: [D17443222](https://our.internmc.facebook.com/intern/diff/D17443222) [ghstack-poisoned]
|
@dzhulgakov merged this pull request in ebc2365. |
Summary: Pull Request resolved: pytorch#26339 Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points. driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet) Test Plan: Imported from OSS Differential Revision: D17443222 Pulled By: dzhulgakov fbshipit-source-id: a34758de1ffd2ec1cdc5355f5baf95284a4ccf4b
Stack from ghstack:
Serializes per-channel tensor in both torch.serialization and jit. Since we didn't bind Quantizer properly yet, I chose to save a tuple representing quantizer settings. To avoid recursive tensor serialization calls, I'm using tuple instead of tensor to store scales and zero points.
@driazati - please check the serialization logic. Is there a good test that compares that JIT serialization and python serialization are equivalent? (I haven't tested it yet)
Differential Revision: D17443222