Int8WeightOnlyConfig |
v1 exists |
create v2, then deprecate v1 |
? |
#3391 added v2, #3407 cleaned up, can deprecate in 0.17.0, done: #4151 |
Int8DynamicActivationInt8WeightConfig |
v1 exists |
create v2, then deprecate v1, delete v1: #4019 |
? |
#3391 added v2, #3407 cleaned up, can deprecate in 0.17.0, done: #4151 |
Int8DynamicActivationInt4WeightConfig |
v1 exists |
move to prototype |
? |
#3491 moved to prototype in 0.16.0, can deprecate in 0.17.0, done: #3884 |
UIntXWeightOnlyConfig |
v1 exists |
move to prototype |
? |
#3491 moved to prototype in 0.16.0, can deprecate in 0.17.0, done: #3887 |
GemliteUIntXWeightOnlyConfig |
v1 exists |
move to prototype |
? |
#3491 moved to prototype in 0.16.0, keep for now, need migration: #3891, done: #4082 |
Float8DynamicActivationFloat8SemiSparseWeightConfig |
v1 exists |
add a new sparse packing_format for float8 dynamic quant config, then deprecate v1 |
? |
#3361 added v2 to 0.16.0, can be deprecated in 0.17.0, done #3883 |
MXFPInferenceConfig |
built on v2 |
n/a |
- |
done |
NVFP4InferenceConfig |
built on v2 |
n/a |
- |
done |
Float8DynamicActivationInt4WeightConfig |
built on v2 |
n/a |
- |
done |
Int4WeightOnlyConfig |
v2 and v1 exists |
deprecate v1 |
? |
#3513, will be deprecated in 0.16.0, done |
Int8DynamicActivationIntxWeightConfig |
v2 and v1 exists |
deprecate v1 |
? |
#3511, will be deprecated in 0.16.0, done |
Float8WeightOnlyConfig |
v2 and v1 exists |
deprecate v1 |
? |
#3510, will be deprecated in 0.16.0, done |
Float8DynamicActivationFloat8WeightConfig |
v2 and v1 exists |
deprecate v1 |
? |
#3510, will be deprecated in 0.16.0, done |
IntxWeightOnlyConfig |
v2 and v1 exists |
deprecate v1 |
? |
#3512, will be deprecated in 0.16.0, done |
Int4DynamicActivationInt4WeightConfig |
v1 exists |
move to prototype |
? |
hhttps://github.com//pull/3491 removed in 0.16.0, done |
FPXWeightOnlyConfig |
v1 exists |
move to prototype |
? |
#3491 removed in 0.16.0, done |
Float8StaticActivationFloat8WeightConfig |
v1 exists |
move to prototype |
? |
#3491 moved to prototype in 0.16.0, keep for now |
Note: AffineQuantizedTensor + Layouts are scheduled to be deprecated in 0.17.0 and removed in 0.18.0!
Context:
Previously we use AffineQuantizedTensor for many of our use cases including int4, float8, intx, floatx. It introduces some complicated abstractions like Layout, people have been saying it's a bit hard to understand, and there are many indirections in the code.
As an effort simplify the code base and make it easier to contribute to, we have been adding new features with a different structure in mind. Now we want to structure Tensors by "dtype" and "packing_format", e.g. we'll have Int4PreshuffledTensor, Int8Tensor, Float8Tensor, instead of having AffineQuantizedTensor and multiple layouts.
Please check out our updated docs for the new tensor subclass organization structure and guide for design:
migration status
All layouts has been deleted, we will start deleting AffineQuantizedTensor now, @andrewor14 currently working on this.
Int8WeightOnlyConfigInt8DynamicActivationInt8WeightConfigInt8DynamicActivationInt4WeightConfigUIntXWeightOnlyConfigGemliteUIntXWeightOnlyConfigFloat8DynamicActivationFloat8SemiSparseWeightConfigMXFPInferenceConfigNVFP4InferenceConfigFloat8DynamicActivationInt4WeightConfigInt4WeightOnlyConfigInt8DynamicActivationIntxWeightConfigFloat8WeightOnlyConfigFloat8DynamicActivationFloat8WeightConfigIntxWeightOnlyConfigInt4DynamicActivationInt4WeightConfigFPXWeightOnlyConfigFloat8StaticActivationFloat8WeightConfigappendix
List of things to migrate:
INT8
[migration done, TODO: delete old path after all migration is done] INT4 weight only
[move to prototype] INT4 weight + int8 activation
UINTx Weight Only
[migration done, TODO: delete old path after all migration is done] Int8DynamicActivationIntxWeightConfig
FP8
FPx