Introduce IntxOpaqueTensor to replace PackedInt8DynamicActivationIntxWeightLayout in AQT#2742
Introduce IntxOpaqueTensor to replace PackedInt8DynamicActivationIntxWeightLayout in AQT#2742
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2742
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 9 PendingAs of commit 1157903 with merge base 8722c0c ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| new_bias = None # bias is packed with weights | ||
| else: | ||
| assert packing_format == PackingFormat.UNPACKED_TO_INT8 | ||
| new_weight = to_linear_activation_quantized( |
There was a problem hiding this comment.
btw, we are planning to move away from to_linear_activation_quantized as well, to reduce the abstractions, we are implementing dynamic activation in the tensor itself, see
ao/torchao/quantization/quantize_/workflows/float8/float8_tensor.py
Lines 244 to 247 in 72b35bf
this can be a separate PR though
There was a problem hiding this comment.
Ok, let me change that then
| _FLOAT_TYPES: List[torch.dtype] = [torch.float16, torch.bfloat16, torch.float32] | ||
|
|
||
|
|
||
| class ActivationQuantization(enum.Enum): |
There was a problem hiding this comment.
why are we doing this? the recommended way is
ao/torchao/quantization/quantize_/workflows/float8/float8_tensor.py
Lines 243 to 247 in af2cf1e
There was a problem hiding this comment.
The quantization code is inlined in linear in lines 233-249.
The enum was to support applying activation quant (ActivationQuantization. DYNAMIC_INT8_ASYMMETRIC_PER_TOKEN) vs. weight only (None). I made it an enum so it could be extended.
I can use _choose_quant_func_and_quantize_tensor instead, but out of curiosity, why is that is the design to have_choose_quant_func_and_quantize_tensor in a common directory? It seems its implementation would just have many if/else if based on various tensor subclasses?
There was a problem hiding this comment.
the reason for _choose_quant_func_and_quantize_tensor having if/else was for readability and removing indirections, alternative is to have a config I think
but this is not always required, you can also call the activation quantization in the linear op directly as well, without using this function, especially when you only have a single possible type of activation quantization
|
|
||
| class ComputeTarget(enum.Enum): | ||
| """ | ||
| This packs the tensor for PyTorch CPU kernels in ATen. |
There was a problem hiding this comment.
might be good to add some description on how this differs from KernelPreference
There was a problem hiding this comment.
also why is this change here? should this be a separate PR?
There was a problem hiding this comment.
This PR is about updating int8_dynamic_activation_intx_weight to version 2. Most of the work to do that is creating v2 of the tile_packed tensor. But it also required updating the unpacked tensor to support dynamic activation quant.
I can move the unpacked to a separate PR, though.
|
@jerryzh168 I've updated this PR to use the new opaque tensor |
| # Create packed tensor | ||
| if packing_format == PackingFormat.OPAQUE: | ||
| assert compute_target is not None, ( | ||
| "Must specify a compute target for PackingFormat.TILE_PACKED" |
jerryzh168
left a comment
There was a problem hiding this comment.
lg, is this the last change before we can bump the version?
should be. Just need to do the packing format refactor you mentioned |
This adds IntxOpaqueTensor to replace the AQT tensor with PackedInt8DynamicActivationIntxWeightLayout since AQT will be removed.
The test plan are the new unit tests.