Skip to content

Migrating from AffineQuantizedTensor + Layouts to new structure of tensor subclasses #2752

@jerryzh168

Description

@jerryzh168

Note: AffineQuantizedTensor + Layouts are scheduled to be deprecated in 0.17.0 and removed in 0.18.0!

Context:
Previously we use AffineQuantizedTensor for many of our use cases including int4, float8, intx, floatx. It introduces some complicated abstractions like Layout, people have been saying it's a bit hard to understand, and there are many indirections in the code.

As an effort simplify the code base and make it easier to contribute to, we have been adding new features with a different structure in mind. Now we want to structure Tensors by "dtype" and "packing_format", e.g. we'll have Int4PreshuffledTensor, Int8Tensor, Float8Tensor, instead of having AffineQuantizedTensor and multiple layouts.

Please check out our updated docs for the new tensor subclass organization structure and guide for design:

migration status

All layouts has been deleted, we will start deleting AffineQuantizedTensor now, @andrewor14 currently working on this.

inference config name current status plan POC notes
Int8WeightOnlyConfig v1 exists create v2, then deprecate v1 ? #3391 added v2, #3407 cleaned up, can deprecate in 0.17.0, done: #4151
Int8DynamicActivationInt8WeightConfig v1 exists create v2, then deprecate v1, delete v1: #4019 ? #3391 added v2, #3407 cleaned up, can deprecate in 0.17.0, done: #4151
Int8DynamicActivationInt4WeightConfig v1 exists move to prototype ? #3491 moved to prototype in 0.16.0, can deprecate in 0.17.0, done: #3884
UIntXWeightOnlyConfig v1 exists move to prototype ? #3491 moved to prototype in 0.16.0, can deprecate in 0.17.0, done: #3887
GemliteUIntXWeightOnlyConfig v1 exists move to prototype ? #3491 moved to prototype in 0.16.0, keep for now, need migration: #3891, done: #4082
Float8DynamicActivationFloat8SemiSparseWeightConfig v1 exists add a new sparse packing_format for float8 dynamic quant config, then deprecate v1 ? #3361 added v2 to 0.16.0, can be deprecated in 0.17.0, done #3883
MXFPInferenceConfig built on v2 n/a - done
NVFP4InferenceConfig built on v2 n/a - done
Float8DynamicActivationInt4WeightConfig built on v2 n/a - done
Int4WeightOnlyConfig v2 and v1 exists deprecate v1 ? #3513, will be deprecated in 0.16.0, done
Int8DynamicActivationIntxWeightConfig v2 and v1 exists deprecate v1 ? #3511, will be deprecated in 0.16.0, done
Float8WeightOnlyConfig v2 and v1 exists deprecate v1 ? #3510, will be deprecated in 0.16.0, done
Float8DynamicActivationFloat8WeightConfig v2 and v1 exists deprecate v1 ? #3510, will be deprecated in 0.16.0, done
IntxWeightOnlyConfig v2 and v1 exists deprecate v1 ? #3512, will be deprecated in 0.16.0, done
Int4DynamicActivationInt4WeightConfig v1 exists move to prototype ? hhttps://github.com//pull/3491 removed in 0.16.0, done
FPXWeightOnlyConfig v1 exists move to prototype ? #3491 removed in 0.16.0, done
Float8StaticActivationFloat8WeightConfig v1 exists move to prototype ? #3491 moved to prototype in 0.16.0, keep for now

appendix

List of things to migrate:
INT8

[migration done, TODO: delete old path after all migration is done] INT4 weight only

[move to prototype] INT4 weight + int8 activation

UINTx Weight Only

[migration done, TODO: delete old path after all migration is done] Int8DynamicActivationIntxWeightConfig

FP8

FPx

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions