Migrating from AffineQuantizedTensor + Layouts to new structure of tensor subclasses

Note: AffineQuantizedTensor + Layouts are scheduled to be deprecated in 0.17.0 and removed in 0.18.0!

Context:
Previously we use AffineQuantizedTensor for many of our use cases including int4, float8, intx, floatx. It introduces some complicated abstractions like Layout, people have been saying it's a bit hard to understand, and there are many indirections in the code.

As an effort simplify the code base and make it easier to contribute to, we have been adding new features with a different structure in mind. Now we want to structure Tensors by "dtype" and "packing_format", e.g. we'll have Int4PreshuffledTensor, Int8Tensor, Float8Tensor, instead of having AffineQuantizedTensor and multiple layouts.

Please check out our updated docs for the new tensor subclass organization structure and guide for design:
* quantization overview: https://docs-preview.pytorch.org/pytorch/ao/2723/quantization_overview.html
* contributor guide: https://docs-preview.pytorch.org/pytorch/ao/2723/contributor_guide.html
* Examples of tensor subclasses following new design: https://github.com/pytorch/ao/tree/main/torchao/quantization/quantize_/workflows

## migration status

All layouts has been deleted, we will start deleting AffineQuantizedTensor now, @andrewor14 currently working on this.

| inference config name | current status | plan | POC | notes |
| ----------------------------------------- | ---- |  --- | --- | --- |
| `Int8WeightOnlyConfig` | v1 exists | create v2, then deprecate v1 | ? | https://github.com/pytorch/ao/pull/3391 added v2, https://github.com/pytorch/ao/pull/3407 cleaned up, can deprecate in 0.17.0, done: https://github.com/pytorch/ao/pull/4151 |
| `Int8DynamicActivationInt8WeightConfig` | v1 exists | create v2, then deprecate v1, delete v1: https://github.com/pytorch/ao/pull/4019 | ? | https://github.com/pytorch/ao/pull/3391 added v2, https://github.com/pytorch/ao/pull/3407 cleaned up, can deprecate in 0.17.0, done: https://github.com/pytorch/ao/pull/4151 |
| `Int8DynamicActivationInt4WeightConfig` | v1 exists | move to prototype | ? | https://github.com/pytorch/ao/pull/3491 moved to prototype in 0.16.0, can deprecate in 0.17.0, done: https://github.com/pytorch/ao/pull/3884 |
| `UIntXWeightOnlyConfig` | v1 exists | move to prototype | ? | https://github.com/pytorch/ao/pull/3491 moved to prototype in 0.16.0, can deprecate in 0.17.0, done: https://github.com/pytorch/ao/pull/3887 |
| `GemliteUIntXWeightOnlyConfig` | v1 exists | move to prototype | ? | https://github.com/pytorch/ao/pull/3491 moved to prototype in 0.16.0, keep for now, need migration: https://github.com/pytorch/ao/issues/3891, done: https://github.com/pytorch/ao/pull/4082 |
| `Float8DynamicActivationFloat8SemiSparseWeightConfig` | v1 exists | add a new sparse packing_format for float8 dynamic quant config, then deprecate v1 | ? | https://github.com/pytorch/ao/pull/3361 added v2 to 0.16.0, can be deprecated in 0.17.0, done https://github.com/pytorch/ao/pull/3883 |
| `MXFPInferenceConfig` | built on v2 | n/a | - | done |
| `NVFP4InferenceConfig` | built on v2 | n/a | - | done |
| `Float8DynamicActivationInt4WeightConfig` | built on v2 | n/a | - | done |
| `Int4WeightOnlyConfig` | v2 and v1 exists | deprecate v1 | ? | https://github.com/pytorch/ao/pull/3513, will be deprecated in 0.16.0, done |
| `Int8DynamicActivationIntxWeightConfig` | v2 and v1 exists | deprecate v1 | ? | https://github.com/pytorch/ao/pull/3511, will be deprecated in 0.16.0, done |
| `Float8WeightOnlyConfig` | v2 and v1 exists | deprecate v1 | ? | https://github.com/pytorch/ao/pull/3510, will be deprecated in 0.16.0, done |
| `Float8DynamicActivationFloat8WeightConfig` | v2 and v1 exists | deprecate v1 | ? | https://github.com/pytorch/ao/pull/3510, will be deprecated in 0.16.0, done |
| `IntxWeightOnlyConfig` | v2 and v1 exists | deprecate v1 | ? | https://github.com/pytorch/ao/pull/3512, will be deprecated in 0.16.0, done |
| `Int4DynamicActivationInt4WeightConfig` | v1 exists | move to prototype | ? | hhttps://github.com/pytorch/ao/pull/3491 removed in 0.16.0, done |
| `FPXWeightOnlyConfig` | v1 exists | move to prototype | ? | https://github.com/pytorch/ao/pull/3491 removed in 0.16.0, done |
| `Float8StaticActivationFloat8WeightConfig` | v1 exists | move to prototype | ? | https://github.com/pytorch/ao/pull/3491 moved to prototype in 0.16.0, keep for now |

## appendix

List of things to migrate:
INT8
* [x] [move to prototype] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/block_sparse_layout.py @jainapurva https://github.com/pytorch/ao/pull/3276
* [x] [migrate] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/plain_layout.py @namgyu-youn https://github.com/pytorch/ao/pull/3241
* [x] [move to prototype] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/semi_sparse_layout.py @namgyu-youn  https://github.com/pytorch/ao/pull/3258 (no need to migrate to new tensor structure)


[migration done, TODO: delete old path after all migration is done] INT4 weight only
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/int4_cpu_layout.py @Xia-Weiwen  https://github.com/pytorch/ao/blob/main/torchao/quantization/quantize_/workflows/int4/int4_opaque_tensor.py
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/int4_xpu_layout.py @liangan1 https://github.com/pytorch/ao/pull/2845
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/marlin_sparse_layout.py @liangel-02 https://github.com/pytorch/ao/pull/2771
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/tensor_core_tiled_layout.py @jerryzh168  https://github.com/pytorch/ao/pull/2791
* [x] HQQ support for tensor core tiled layout @jerryzh168 https://github.com/pytorch/ao/pull/2912/

[move to prototype] INT4 weight + int8 activation
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/cutlass_int4_packed_layout.py @jainapurva https://github.com/pytorch/ao/pull/3277
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/dyn_int8_act_int4_wei_cpu_layout.py @jainapurva https://github.com/pytorch/ao/pull/3299
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/marlin_qqq_tensor.py @jainapurva https://github.com/pytorch/ao/pull/3307


UINTx Weight Only
* [x] [move to protoype] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/gemlite_layout.py @jainapurva https://github.com/pytorch/ao/pull/3313
* [x] [move to protoype] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/uintx_layout.py @jainapurva https://github.com/pytorch/ao/pull/3316

[migration done, TODO: delete old path after all migration is done] Int8DynamicActivationIntxWeightConfig
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/packed_linear_int8_dynamic_activation_intx_weight_layout.py @metascroy https://github.com/pytorch/ao/pull/2742
* [x] https://github.com/pytorch/ao/blob/main/torchao/dtypes/uintx/q_dq_layout.py @metascroy https://github.com/pytorch/ao/pull/2732

FP8
* [x] [migrate] https://github.com/pytorch/ao/blob/main/torchao/dtypes/floatx/cutlass_semi_sparse_layout.py @namgyu-youn  https://github.com/pytorch/ao/pull/3258 and @bbeckca #3182 

FPx
* [x] [move to protoype] https://github.com/pytorch/ao/blob/main/torchao/dtypes/floatx/floatx_tensor_core_layout.py @jainapurva https://github.com/pytorch/ao/pull/3317

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrating from AffineQuantizedTensor + Layouts to new structure of tensor subclasses #2752

migration status

appendix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

inference config name	current status	plan	POC	notes
`Int8WeightOnlyConfig`	v1 exists	create v2, then deprecate v1	?	#3391 added v2, #3407 cleaned up, can deprecate in 0.17.0, done: #4151
`Int8DynamicActivationInt8WeightConfig`	v1 exists	create v2, then deprecate v1, delete v1: #4019	?	#3391 added v2, #3407 cleaned up, can deprecate in 0.17.0, done: #4151
`Int8DynamicActivationInt4WeightConfig`	v1 exists	move to prototype	?	#3491 moved to prototype in 0.16.0, can deprecate in 0.17.0, done: #3884
`UIntXWeightOnlyConfig`	v1 exists	move to prototype	?	#3491 moved to prototype in 0.16.0, can deprecate in 0.17.0, done: #3887
`GemliteUIntXWeightOnlyConfig`	v1 exists	move to prototype	?	#3491 moved to prototype in 0.16.0, keep for now, need migration: #3891, done: #4082
`Float8DynamicActivationFloat8SemiSparseWeightConfig`	v1 exists	add a new sparse packing_format for float8 dynamic quant config, then deprecate v1	?	#3361 added v2 to 0.16.0, can be deprecated in 0.17.0, done #3883
`MXFPInferenceConfig`	built on v2	n/a	-	done
`NVFP4InferenceConfig`	built on v2	n/a	-	done
`Float8DynamicActivationInt4WeightConfig`	built on v2	n/a	-	done
`Int4WeightOnlyConfig`	v2 and v1 exists	deprecate v1	?	#3513, will be deprecated in 0.16.0, done
`Int8DynamicActivationIntxWeightConfig`	v2 and v1 exists	deprecate v1	?	#3511, will be deprecated in 0.16.0, done
`Float8WeightOnlyConfig`	v2 and v1 exists	deprecate v1	?	#3510, will be deprecated in 0.16.0, done
`Float8DynamicActivationFloat8WeightConfig`	v2 and v1 exists	deprecate v1	?	#3510, will be deprecated in 0.16.0, done
`IntxWeightOnlyConfig`	v2 and v1 exists	deprecate v1	?	#3512, will be deprecated in 0.16.0, done
`Int4DynamicActivationInt4WeightConfig`	v1 exists	move to prototype	?	hhttps://github.com//pull/3491 removed in 0.16.0, done
`FPXWeightOnlyConfig`	v1 exists	move to prototype	?	#3491 removed in 0.16.0, done
`Float8StaticActivationFloat8WeightConfig`	v1 exists	move to prototype	?	#3491 moved to prototype in 0.16.0, keep for now

Migrating from AffineQuantizedTensor + Layouts to new structure of tensor subclasses #2752

Description

migration status

appendix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions