Skip to content

Add UIntxWeightOnlyConfig #3891

@jerryzh168

Description

@jerryzh168

GemliteUIntxWeightOnlyConfig (

class GemliteUIntXWeightOnlyConfig(AOBaseConfig):
) is still using old AQT design, and we'd like to migrate to the v2 design. see #2752 for details.

There are many examples of the new design, you can trace the code path of

class Float8WeightOnlyConfig(AOBaseConfig):
"""
Configuration for applying float8 weight-only symmetric per-channel quantization to linear layers.
Args:
weight_dtype (torch.dtype): The target data type for weight quantization. Default is torch.float8_e4m3fn.
set_inductor_config (bool): if True, adjusts `torchinductor` settings to recommended values.
version (int): the version of the config, version 1 is deprecated, version 2 is using Float8Tensor (default)
Note:
The actual matmul will be computed in original precision of the weight tensor.
Example:
.. literalinclude:: ../../examples/inference/float8_weight_only.py
:language: python
"""
weight_dtype: torch.dtype = e4m3_dtype
set_inductor_config: bool = True
version: int = 2
def __post_init__(self):
torch._C._log_api_usage_once("torchao.quantization.Float8WeightOnlyConfig")
def _float8_weight_only_quant_tensor(weight, config):
assert config.version == 2, f"Unexpected version: {config.version}"
weight_dtype = config.weight_dtype
new_weight = Float8Tensor.from_hp(
weight, float8_dtype=weight_dtype, granularity=PerRow()
)
return new_weight
for example.

In the end we are trying to remove AffineQuantizedTensor from torchao and this is one of the blockers.

For implementation details and file structure, check https://github.com/pytorch/ao/tree/main/torchao/prototype/quantization/float8_static_quant for example.

# in torchao/prototype/quantization/quant_api.py
UIntxWeightOnlyConfig(..., uintx_packing_format="bit_packed")

# torchao/prototype/quantization/uintx/uintx_bit_packed_tensor.py
We will have UIntxBitPackedTensor defined here

# test will be defined in test/prototype/test_uintx_bit_packed_tensor.py

Note:

  • gemlite is requested by executorch team for desktop use cases
  • GemliteUIntxWeightOnlyConfig doesn't align with our design (we shouldn't have gemlite in the name) so we migrate that to UIntxWeightOnlyConfig

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions