Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2732
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f6c9d09 with merge base e6b38bb ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| block_size: the block size for quantization, representing the granularity, for example groupwise quantization will have block_size (1, group_size) | ||
| """ | ||
|
|
||
| tensor_data_attrs = ["int_data", "scale", "zero_point"] |
There was a problem hiding this comment.
btw if you update these to tensor_data_names and tensor_attribute_names you'll be able to remove some of the implementations, see docs in https://github.com/pytorch/ao/pull/2710/files#diff-d2a11602a79e83305208472f1abe6a4106f02ce62a7f9524007181813863fcf6R687, example: #2738
There was a problem hiding this comment.
I can still override the behavior in TorchAOBaseTensor, right?
For example, it looks like aten._to_copy.default gets auto-populated, but I want to define its dtype variant in addition to device variant.
There was a problem hiding this comment.
this should be working, I haven't actively tested this behavior though, I'll try to add a test for this
| ) | ||
|
|
||
| @classmethod | ||
| def from_float( |
There was a problem hiding this comment.
nit: we are standardizing on from_hp now
There was a problem hiding this comment.
What does hp stand for?
| scale_dtype: Optional[torch.dtype] = None | ||
| layout: Layout = QDQLayout() | ||
| packing_format: PackingFormat = PackingFormat.UNPACKED | ||
| VERSION: int = 1 |
There was a problem hiding this comment.
nit: we updated the name to version
|
Any more concerns here @jerryzh168? |
| This format is inteded for torch.export use cases. | ||
|
|
||
| Tensor Attributes: | ||
| int_data: int data for quantization. |
There was a problem hiding this comment.
nit: use qdata to align with other tensors
| block_size=block_size, | ||
| ) | ||
|
|
||
| def get_plain(self): |
There was a problem hiding this comment.
nit: no longer need this I think
| @classmethod | ||
| def from_hp( | ||
| cls, | ||
| float_tensor: torch.Tensor, |
There was a problem hiding this comment.
nit: use hp_tensor to align with the method name
| cls, | ||
| float_tensor: torch.Tensor, | ||
| block_size: Tuple[int], | ||
| dtype: torch.dtype, |
There was a problem hiding this comment.
nit: rename to target_dtype for more clarity
|
|
||
| class IntxUnpackedTensor(TorchAOBaseTensor): | ||
| """ | ||
| intx quantization with unpacked format. Subbyte quantized data is represented as int8. |
There was a problem hiding this comment.
nit: to make it clearer, we can add a bit more description here about subbyte quantized data I think, we should mention the range of the quantized values are restricted to the quant_min and quant_max of the target bit width, e.g. for uint4, the values falls into range of 0 and 15
| block_size: Optional[Tuple[int]] = None, | ||
| ): | ||
| # Check plain data and infer block_size from shapes | ||
| if block_size is None: |
There was a problem hiding this comment.
would it be easier just to make block_size required? when is block_size None?
There was a problem hiding this comment.
Removed. I did use it in the slice implementation, but I just added logic inside slice to recompute the block size.
| self.bit_width = bit_width | ||
| self.block_size = block_size | ||
|
|
||
| def __repr__(self): |
There was a problem hiding this comment.
repr is also implemented by default in TorchAOBaseTensor when you define tensor_data_names and tensor_attribute_names btw
| device = kwargs.pop("device") | ||
| dtype = kwargs.pop("dtype") | ||
| assert dtype in _FLOAT_TYPES | ||
| return self.__class__( |
There was a problem hiding this comment.
nit: self.__class__ --> IntxUnpackedTensor to reduce runtime check and align with other code
| scale = aten.slice.Tensor(self.scale, dim, start_scale, end_scale, step) | ||
| zero_point = aten.slice.Tensor(self.zero_point, dim, start_scale, end_scale, step) | ||
|
|
||
| new = self.__class__( |
93948a4 to
143fe91
Compare
jerryzh168
left a comment
There was a problem hiding this comment.
LG, please add a bit details in PR summary to explain the context for the change and Test Plan as well.
* add intx unpacked tensor * up * up * up * up * up
This adds IntxUnpackedTensor, where subbyte quantized data is represented as int8. The range of the quantized values are restricted to the quant_min and quant_max of the target_dtype, e.g., if target_dtype=torch.int4, qdata will be an int8 tensor with values in [-8, 7]. Quantization is represented in a decomposed way.
This tensor is intended for export use cases that currently use AQT with QDQLayout.
The test plan are the new unit tests.