GemliteUIntxWeightOnlyConfig (
|
class GemliteUIntXWeightOnlyConfig(AOBaseConfig): |
) is still using old AQT design, and we'd like to migrate to the v2 design. see
#2752 for details.
There are many examples of the new design, you can trace the code path of
|
class Float8WeightOnlyConfig(AOBaseConfig): |
|
""" |
|
Configuration for applying float8 weight-only symmetric per-channel quantization to linear layers. |
|
|
|
Args: |
|
weight_dtype (torch.dtype): The target data type for weight quantization. Default is torch.float8_e4m3fn. |
|
set_inductor_config (bool): if True, adjusts `torchinductor` settings to recommended values. |
|
version (int): the version of the config, version 1 is deprecated, version 2 is using Float8Tensor (default) |
|
|
|
Note: |
|
The actual matmul will be computed in original precision of the weight tensor. |
|
|
|
Example: |
|
|
|
.. literalinclude:: ../../examples/inference/float8_weight_only.py |
|
:language: python |
|
""" |
|
|
|
weight_dtype: torch.dtype = e4m3_dtype |
|
set_inductor_config: bool = True |
|
version: int = 2 |
|
|
|
def __post_init__(self): |
|
torch._C._log_api_usage_once("torchao.quantization.Float8WeightOnlyConfig") |
|
|
|
|
|
def _float8_weight_only_quant_tensor(weight, config): |
|
assert config.version == 2, f"Unexpected version: {config.version}" |
|
weight_dtype = config.weight_dtype |
|
new_weight = Float8Tensor.from_hp( |
|
weight, float8_dtype=weight_dtype, granularity=PerRow() |
|
) |
|
return new_weight |
for example.
In the end we are trying to remove AffineQuantizedTensor from torchao and this is one of the blockers.
For implementation details and file structure, check https://github.com/pytorch/ao/tree/main/torchao/prototype/quantization/float8_static_quant for example.
# in torchao/prototype/quantization/quant_api.py
UIntxWeightOnlyConfig(..., uintx_packing_format="bit_packed")
# torchao/prototype/quantization/uintx/uintx_bit_packed_tensor.py
We will have UIntxBitPackedTensor defined here
# test will be defined in test/prototype/test_uintx_bit_packed_tensor.py
Note:
- gemlite is requested by executorch team for desktop use cases
- GemliteUIntxWeightOnlyConfig doesn't align with our design (we shouldn't have gemlite in the name) so we migrate that to
UIntxWeightOnlyConfig
GemliteUIntxWeightOnlyConfig (
ao/torchao/prototype/quantization/quant_api.py
Line 174 in e915c07
There are many examples of the new design, you can trace the code path of
ao/torchao/quantization/quant_api.py
Lines 1333 to 1365 in e915c07
In the end we are trying to remove AffineQuantizedTensor from torchao and this is one of the blockers.
For implementation details and file structure, check https://github.com/pytorch/ao/tree/main/torchao/prototype/quantization/float8_static_quant for example.
Note:
UIntxWeightOnlyConfig