Skip to content

Add all fbgemm kernel Tensors into Int4WeightOnlyConfig and Float8DynamicActivationInt4WeightConfig#2474

Merged
jerryzh168 merged 1 commit into
mainfrom
jerryzh168/stack/10
Aug 7, 2025
Merged

Add all fbgemm kernel Tensors into Int4WeightOnlyConfig and Float8DynamicActivationInt4WeightConfig#2474
jerryzh168 merged 1 commit into
mainfrom
jerryzh168/stack/10

Conversation

@jerryzh168

@jerryzh168 jerryzh168 commented Jul 2, 2025

Copy link
Copy Markdown
Contributor

Stacked PRs:


Add all fbgemm kernel Tensors into Int4WeightOnlyConfig and Float8DynamicActivationInt4WeightConfig

Summary:
we will

  • deprecate FbgemmConfig since it's a single kernel (later).
  • we'd like to categorize things to derived dtype + packed format, e.g. int4 preshuffled, float8 plain
  • Added PackingFormat that has preshuffled, plain in Version 2 of Int4WeightOnlyConfig, the older AQT tensor will remain in Version 1

Test Plan:
python test/quantization/quantize_/workflows/int4/test_int4_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py
python test/quantization/quantize_/workflows/float8/test_float8_tensor.py

Reviewers:

Subscribers:

Tasks:

Tags:

…amicActivationInt4WeightConfig

Summary:
we will
* deprecate FbgemmConfig since it's a single kernel (later).
* we'd like to categorize things to derived dtype + packed format, e.g. int4 preshuffled, float8 plain
* Added PackingFormat that has preshuffled, plain in Version 2 of Int4WeightOnlyConfig, the older AQT tensor will remain in Version 1

Test Plan:
python test/quantization/quantize_/workflows/int4/test_int4_tensor.py
python test/quantization/quantize_/workflows/int4/test_int4_preshuffled_tensor.py
python test/quantization/quantize_/workflows/float8/test_float8_tensor.py

Reviewers:

Subscribers:

Tasks:

Tags:

stack-info: PR: #2474, branch: jerryzh168/stack/10
@pytorch-bot

pytorch-bot Bot commented Jul 2, 2025

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2474

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 5bb2fd4 with merge base 1114ca0 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@jerryzh168 jerryzh168 force-pushed the jerryzh168/stack/10 branch from a3d0835 to 4b0c7c7 Compare July 2, 2025 01:58
jerryzh168 added a commit that referenced this pull request Jul 2, 2025
…micActivationInt4WeightConfig

Summary:
att, we will deprecate FbgemmConfig since it's a single kernel.
we'd like to categorize things to derived dtype + packed format

Test Plan:
python test/quantization/quantize_/test_int4_groupwise_preshuffle.py

Reviewers:

Subscribers:

Tasks:

Tags:

stack-info: PR: #2474, branch: jerryzh168/stack/10
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 2, 2025
@jerryzh168 jerryzh168 added the topic: new feature Use this tag if this PR adds a new feature label Jul 2, 2025
@jerryzh168 jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 2, 2025 20:35
jerryzh168 added a commit that referenced this pull request Jul 2, 2025
…micActivationInt4WeightConfig

Summary:
att, we will deprecate FbgemmConfig since it's a single kernel.
we'd like to categorize things to derived dtype + packed format

Test Plan:
python test/quantization/quantize_/test_int4_groupwise_preshuffle.py

Reviewers:

Subscribers:

Tasks:

Tags:

stack-info: PR: #2474, branch: jerryzh168/stack/10
@jerryzh168 jerryzh168 force-pushed the jerryzh168/stack/10 branch from 4b0c7c7 to f5977ce Compare July 2, 2025 20:36
@jerryzh168 jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 2, 2025 20:36
@jerryzh168 jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 2, 2025 21:42
jerryzh168 added a commit that referenced this pull request Jul 2, 2025
…micActivationInt4WeightConfig

Summary:
att, we will deprecate FbgemmConfig since it's a single kernel.
we'd like to categorize things to derived dtype + packed format

Test Plan:
python test/quantization/quantize_/test_int4_groupwise_preshuffle.py

Reviewers:

Subscribers:

Tasks:

Tags:

stack-info: PR: #2474, branch: jerryzh168/stack/10
@jerryzh168 jerryzh168 force-pushed the jerryzh168/stack/10 branch 2 times, most recently from 04ce2c5 to afd8703 Compare July 2, 2025 21:42
jerryzh168 added a commit that referenced this pull request Jul 2, 2025
…micActivationInt4WeightConfig

Summary:
att, we will deprecate FbgemmConfig since it's a single kernel.
we'd like to categorize things to derived dtype + packed format

Test Plan:
python test/quantization/quantize_/test_int4_groupwise_preshuffle.py

Reviewers:

Subscribers:

Tasks:

Tags:

stack-info: PR: #2474, branch: jerryzh168/stack/10
@jerryzh168 jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 2, 2025 21:42
@jerryzh168 jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 2, 2025 23:44
jerryzh168 added a commit that referenced this pull request Jul 2, 2025
…micActivationInt4WeightConfig

Summary:
att, we will deprecate FbgemmConfig since it's a single kernel.
we'd like to categorize things to derived dtype + packed format

Test Plan:
python test/quantization/quantize_/test_int4_groupwise_preshuffle.py

Reviewers:

Subscribers:

Tasks:

Tags:

stack-info: PR: #2474, branch: jerryzh168/stack/10
@jerryzh168 jerryzh168 force-pushed the jerryzh168/stack/10 branch from afd8703 to ff4682e Compare July 2, 2025 23:44
@jerryzh168 jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 2, 2025 23:44
@jerryzh168 jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 3, 2025 00:09
jerryzh168 added a commit that referenced this pull request Jul 3, 2025
…micActivationInt4WeightConfig

Summary:
att, we will deprecate FbgemmConfig since it's a single kernel.
we'd like to categorize things to derived dtype + packed format

Test Plan:
python test/quantization/quantize_/test_int4_groupwise_preshuffle.py

Reviewers:

Subscribers:

Tasks:

Tags:

stack-info: PR: #2474, branch: jerryzh168/stack/10
@jerryzh168 jerryzh168 force-pushed the jerryzh168/stack/10 branch from ff4682e to 58f8a2a Compare July 3, 2025 00:09
@jerryzh168 jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 3, 2025 00:09
@jerryzh168 jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 3, 2025 02:18
@jerryzh168 jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 3, 2025 21:57
jerryzh168 added a commit that referenced this pull request Jul 3, 2025
…amicActivationInt4WeightConfig

Summary:
att, we will deprecate FbgemmConfig since it's a single kernel.
we'd like to categorize things to derived dtype + packed format

Test Plan:
python test/quantization/quantize_/test_int4_groupwise_preshuffle.py

Reviewers:

Subscribers:

Tasks:

Tags:

stack-info: PR: #2474, branch: jerryzh168/stack/10
@jerryzh168 jerryzh168 force-pushed the jerryzh168/stack/10 branch from acc33bd to 7412903 Compare July 3, 2025 21:57
@jerryzh168 jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 3, 2025 21:57
@jerryzh168 jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 7, 2025 18:47
jerryzh168 added a commit that referenced this pull request Jul 7, 2025
…amicActivationInt4WeightConfig

Summary:
att, we will deprecate FbgemmConfig since it's a single kernel.
we'd like to categorize things to derived dtype + packed format

Test Plan:
python test/quantization/quantize_/test_int4_groupwise_preshuffle.py

Reviewers:

Subscribers:

Tasks:

Tags:

stack-info: PR: #2474, branch: jerryzh168/stack/10
@jerryzh168 jerryzh168 force-pushed the jerryzh168/stack/10 branch from 7412903 to 867c75a Compare July 7, 2025 18:47
@jerryzh168 jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 7, 2025 18:47
@jerryzh168 jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 7, 2025 19:52
jerryzh168 added a commit that referenced this pull request Jul 7, 2025
…amicActivationInt4WeightConfig

Summary:
att, we will deprecate FbgemmConfig since it's a single kernel.
we'd like to categorize things to derived dtype + packed format

Test Plan:
python test/quantization/quantize_/test_int4_groupwise_preshuffle.py

Reviewers:

Subscribers:

Tasks:

Tags:

stack-info: PR: #2474, branch: jerryzh168/stack/10
@jerryzh168 jerryzh168 force-pushed the jerryzh168/stack/10 branch from 867c75a to 382bb8a Compare July 7, 2025 19:52
@jerryzh168 jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 7, 2025 19:52
@jerryzh168 jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 7, 2025 19:57
jerryzh168 added a commit that referenced this pull request Jul 7, 2025
…amicActivationInt4WeightConfig

Summary:
att, we will deprecate FbgemmConfig since it's a single kernel.
we'd like to categorize things to derived dtype + packed format

Test Plan:
python test/quantization/quantize_/test_int4_groupwise_preshuffle.py

Reviewers:

Subscribers:

Tasks:

Tags:

stack-info: PR: #2474, branch: jerryzh168/stack/10
@jerryzh168 jerryzh168 force-pushed the jerryzh168/stack/10 branch from 382bb8a to 8a6dcc4 Compare July 7, 2025 19:57
@jerryzh168 jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 7, 2025 19:57
@jerryzh168 jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 7, 2025 22:39
jerryzh168 added a commit that referenced this pull request Jul 7, 2025
…amicActivationInt4WeightConfig

Summary:
att, we will deprecate FbgemmConfig since it's a single kernel.
we'd like to categorize things to derived dtype + packed format

Test Plan:
python test/quantization/quantize_/test_int4_groupwise_preshuffle.py

Reviewers:

Subscribers:

Tasks:

Tags:

stack-info: PR: #2474, branch: jerryzh168/stack/10
@jerryzh168 jerryzh168 force-pushed the jerryzh168/stack/10 branch from 8a6dcc4 to 5bf77b1 Compare July 7, 2025 22:39
@jerryzh168 jerryzh168 changed the base branch from main to jerryzh168/stack/9 July 7, 2025 22:39
@jerryzh168 jerryzh168 changed the base branch from jerryzh168/stack/9 to main July 7, 2025 23:14
jerryzh168 added a commit that referenced this pull request Jul 7, 2025
…amicActivationInt4WeightConfig

Summary:
att, we will deprecate FbgemmConfig since it's a single kernel.
we'd like to categorize things to derived dtype + packed format

Test Plan:
python test/quantization/quantize_/test_int4_groupwise_preshuffle.py

Reviewers:

Subscribers:

Tasks:

Tags:

stack-info: PR: #2474, branch: jerryzh168/stack/10
@jerryzh168 jerryzh168 force-pushed the jerryzh168/stack/10 branch from 5bf77b1 to 6cb63be Compare July 7, 2025 23:14
Comment thread torchao/quantization/quant_api.py Outdated
class Float8ActivationInt4WeightConfig(AOBaseConfig):
group_size: int = 128
use_preshuffle: bool = False
kernel: str = "fbgemm"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use GemmKernelChoice here?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add a docstring?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the structure after some discussions, please take a look again

Comment thread torchao/quantization/quant_api.py Outdated
@@ -44,6 +44,7 @@
from .quant_api import (
CutlassInt4PackedLayout,
FbgemmConfig,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the plan for FbgemmConfig? Looks like it was added only ~1.5 months ago but it's technically public API. Do we know if anyone's using it already? I don't think it's released yet so wonder if it's OK to just remove it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll remove it, it is used in some internal script but we'll update these as well

Comment thread torchao/quantization/quant_api.py
Comment thread torchao/quantization/quant_api.py Outdated

@andrewor14 andrewor14 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, left some comments mostly about documentation

Comment thread torchao/quantization/quantize_/workflows/packing_format.py Outdated
Comment thread torchao/quantization/quant_api.py Outdated
preserve_zero: Optional[bool] = None
# since not all tensors are migrated to the new structure yet,
# we use `_legacy' to represent the previous layout
packing_format: PackingFormat = "_legacy"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can "legacy" mean different things for different configs? I wonder if we should make this optional instead, where None represents "legacy"?

@jerryzh168 jerryzh168 Jul 15, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah legacy just means no packing format, (it's implemented with AQT), I plan to remove the support for legacy at some point and don't want to complicate the typing here

Comment thread torchao/quantization/quantize_/workflows/int4/int4_tensor.py
Comment thread torchao/quantization/quant_api.py
Comment thread torchao/quantization/quant_api.py Outdated
Comment thread torchao/quantization/quantize_/workflows/packing_format.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: new feature Use this tag if this PR adds a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants