Add all fbgemm kernel Tensors into Int4WeightOnlyConfig and Float8DynamicActivationInt4WeightConfig#2474
Conversation
…amicActivationInt4WeightConfig Summary: we will * deprecate FbgemmConfig since it's a single kernel (later). * we'd like to categorize things to derived dtype + packed format, e.g. int4 preshuffled, float8 plain * Added PackingFormat that has preshuffled, plain in Version 2 of Int4WeightOnlyConfig, the older AQT tensor will remain in Version 1 Test Plan: python test/quantization/quantize_/workflows/int4/test_int4_tensor.py python test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py python test/quantization/quantize_/workflows/float8/test_float8_tensor.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2474, branch: jerryzh168/stack/10
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2474
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 5bb2fd4 with merge base 1114ca0 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
a3d0835 to
4b0c7c7
Compare
…micActivationInt4WeightConfig Summary: att, we will deprecate FbgemmConfig since it's a single kernel. we'd like to categorize things to derived dtype + packed format Test Plan: python test/quantization/quantize_/test_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2474, branch: jerryzh168/stack/10
…micActivationInt4WeightConfig Summary: att, we will deprecate FbgemmConfig since it's a single kernel. we'd like to categorize things to derived dtype + packed format Test Plan: python test/quantization/quantize_/test_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2474, branch: jerryzh168/stack/10
4b0c7c7 to
f5977ce
Compare
…micActivationInt4WeightConfig Summary: att, we will deprecate FbgemmConfig since it's a single kernel. we'd like to categorize things to derived dtype + packed format Test Plan: python test/quantization/quantize_/test_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2474, branch: jerryzh168/stack/10
04ce2c5 to
afd8703
Compare
…micActivationInt4WeightConfig Summary: att, we will deprecate FbgemmConfig since it's a single kernel. we'd like to categorize things to derived dtype + packed format Test Plan: python test/quantization/quantize_/test_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2474, branch: jerryzh168/stack/10
…micActivationInt4WeightConfig Summary: att, we will deprecate FbgemmConfig since it's a single kernel. we'd like to categorize things to derived dtype + packed format Test Plan: python test/quantization/quantize_/test_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2474, branch: jerryzh168/stack/10
afd8703 to
ff4682e
Compare
…micActivationInt4WeightConfig Summary: att, we will deprecate FbgemmConfig since it's a single kernel. we'd like to categorize things to derived dtype + packed format Test Plan: python test/quantization/quantize_/test_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2474, branch: jerryzh168/stack/10
ff4682e to
58f8a2a
Compare
…amicActivationInt4WeightConfig Summary: att, we will deprecate FbgemmConfig since it's a single kernel. we'd like to categorize things to derived dtype + packed format Test Plan: python test/quantization/quantize_/test_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2474, branch: jerryzh168/stack/10
acc33bd to
7412903
Compare
…amicActivationInt4WeightConfig Summary: att, we will deprecate FbgemmConfig since it's a single kernel. we'd like to categorize things to derived dtype + packed format Test Plan: python test/quantization/quantize_/test_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2474, branch: jerryzh168/stack/10
7412903 to
867c75a
Compare
…amicActivationInt4WeightConfig Summary: att, we will deprecate FbgemmConfig since it's a single kernel. we'd like to categorize things to derived dtype + packed format Test Plan: python test/quantization/quantize_/test_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2474, branch: jerryzh168/stack/10
867c75a to
382bb8a
Compare
…amicActivationInt4WeightConfig Summary: att, we will deprecate FbgemmConfig since it's a single kernel. we'd like to categorize things to derived dtype + packed format Test Plan: python test/quantization/quantize_/test_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2474, branch: jerryzh168/stack/10
382bb8a to
8a6dcc4
Compare
…amicActivationInt4WeightConfig Summary: att, we will deprecate FbgemmConfig since it's a single kernel. we'd like to categorize things to derived dtype + packed format Test Plan: python test/quantization/quantize_/test_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2474, branch: jerryzh168/stack/10
8a6dcc4 to
5bf77b1
Compare
…amicActivationInt4WeightConfig Summary: att, we will deprecate FbgemmConfig since it's a single kernel. we'd like to categorize things to derived dtype + packed format Test Plan: python test/quantization/quantize_/test_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2474, branch: jerryzh168/stack/10
5bf77b1 to
6cb63be
Compare
| class Float8ActivationInt4WeightConfig(AOBaseConfig): | ||
| group_size: int = 128 | ||
| use_preshuffle: bool = False | ||
| kernel: str = "fbgemm" |
There was a problem hiding this comment.
Use GemmKernelChoice here?
There was a problem hiding this comment.
updated the structure after some discussions, please take a look again
| @@ -44,6 +44,7 @@ | |||
| from .quant_api import ( | |||
| CutlassInt4PackedLayout, | |||
| FbgemmConfig, | |||
There was a problem hiding this comment.
What's the plan for FbgemmConfig? Looks like it was added only ~1.5 months ago but it's technically public API. Do we know if anyone's using it already? I don't think it's released yet so wonder if it's OK to just remove it?
There was a problem hiding this comment.
we'll remove it, it is used in some internal script but we'll update these as well
andrewor14
left a comment
There was a problem hiding this comment.
Looks great, left some comments mostly about documentation
| preserve_zero: Optional[bool] = None | ||
| # since not all tensors are migrated to the new structure yet, | ||
| # we use `_legacy' to represent the previous layout | ||
| packing_format: PackingFormat = "_legacy" |
There was a problem hiding this comment.
Can "legacy" mean different things for different configs? I wonder if we should make this optional instead, where None represents "legacy"?
There was a problem hiding this comment.
yeah legacy just means no packing format, (it's implemented with AQT), I plan to remove the support for legacy at some point and don't want to complicate the typing here
Stacked PRs:
optional_tensor_namesin TorchAOBaseTensor #2710Add all fbgemm kernel Tensors into Int4WeightOnlyConfig and Float8DynamicActivationInt4WeightConfig
Summary:
we will
Test Plan:
python test/quantization/quantize_/workflows/int4/test_int4_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py
python test/quantization/quantize_/workflows/float8/test_float8_tensor.py
Reviewers:
Subscribers:
Tasks:
Tags: