[CPU] Enable DA8W4 on CPU by Xia-Weiwen · Pull Request #2128 · pytorch/ao

Xia-Weiwen · 2025-04-25T10:22:16Z

Summary
This PR enables DA8W4 on CPU.

It adds a new layout Int8DynamicActInt4WeightCPULayout and its implementation
It adds two custom ops:
- da8w4_linear_prepack_cpu for weight packing
- da8w4_linear_cpu for A8W4 GEMM.
It adds C++ kernels for the two new custom ops

The feature supports symmetric and asymmetric quantization of activation.

The ops and kernels won't be available unless

torchao is built from source with USE_CPP_KERNELS=1 on Linux with an X86 CPU with AVX512.
torchao is run on Linux with an X86 CPU with AVX512.
PyTorch version >= 2.7

To get the best performance, one needs a CPU with AMX support.

Implementation details

The weight-packing kernel is implemented with AVX512 intrinsics if available. Otherwise, a reference path is used.
The GEMM kernel uses at::cpublas brgemm utilities from Pytorch core if available.
In the GEMM kernel, if M is large (>4)
- if brgemm is available, brgemm is used.
- otherwise, fallback to reference implementation
In the GEMM kernel, if M is small (<=4):
- if AVX512_VNNI is available, the kernel uses AVX512_VNNI intrinsics.
- otherwise, go to the same path for large M.
All utilities functions used in the kernel are implemented with AVX512 if available. Otherwise fall back to reference implementation.

Usage

quantize_(
    model,
    int8_dynamic_activation_int4_weight(
        group_size=32,  # or 64, 128
        layout=Int8DynamicActInt4WeightCPULayout(),
        act_mapping_type=MappingType.SYMMETRIC,  # or MappingType.ASYMMETRIC
    ),
)

Test plan

pytest test/quantization/test_quant_api.py -k test_8da4w_cpu

pytorch-bot · 2025-04-25T10:22:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2128

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e3731f7 with merge base 4ebc9c0 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Xia-Weiwen · 2025-05-14T10:22:42Z

@leslie-fang-intel This PR is updated to use a new layout. Please review again. Thanks.

Xia-Weiwen · 2025-05-16T09:51:35Z

Hi @jerryzh168 Could you please review this PR? Thanks.

Xia-Weiwen · 2025-05-19T02:09:46Z

Hi @jerryzh168 Could you please review this PR? Thanks.

Xia-Weiwen · 2025-05-20T14:36:51Z

Hi @jerryzh168 Could you please review this PR? Thanks.

leslie-fang-intel

Please also describe how we choose different implementations based on the CPU Info.

Xia-Weiwen · 2025-06-04T15:16:16Z

Please also describe how we choose different implementations based on the CPU Info.

I have added more details in the description. Thanks.

Xia-Weiwen · 2025-06-06T01:50:07Z

Hi @jerryzh168 Could you please review this PR? Thanks. It's changed a lot since your last review.

Xia-Weiwen · 2025-06-11T03:04:07Z

Hi @jerryzh168 Could you please review this PR? Thanks.

jerryzh168 · 2025-06-12T17:03:15Z

+
+
+@dataclass(frozen=True)
+class Int8DynamicActInt4WeightCPULayout(Layout):


it looks like you can just reuse Int4CPULayout

can you move the layout and impl to a separate file?

Sure. Done.

jerryzh168 · 2025-06-12T17:15:48Z

+
+
+@register_layout(Int8DynamicActInt4WeightCPULayout)
+class DA8W4CPUAQTTensorImpl(Int4CPUAQTTensorImpl):


oh I see, OK if you need a separate Impl then makes sense to have a separate layout

Yes. We need a different impl from W16W4 because the ISA (AMX and VNNI) requires different memory formats of weight for computation in BF16 or INT8. Thanks.

jerryzh168 · 2025-06-12T17:32:56Z

+        int_data = (int_data + 8).to(torch.uint8)
+        if scale.dim() == 1:
+            scale.unsqueeze_(-1)
+        scale = scale.to(torch.float)
+        if zero_point.dim() == 1:
+            zero_point.unsqueeze_(-1)
+        zero_point = zero_point.to(torch.int8) + 8


can you configure dtypes of int_data, scale, zero_point and shapes in the call to to_affine_quantized_intx?

Thanks for the suggestion. I have improved this part.

jerryzh168 · 2025-06-23T20:56:01Z

    quant_min = -8
    quant_max = 7

+    if isinstance(layout, Int8DynamicActInt4WeightCPULayout):


can this happen in kernel? we have dtype conversions like this:

ao/torchao/dtypes/uintx/plain_layout.py

Line 260 in 2898903

w_vals_int8_t.to(input_tensor.dtype),

Thanks for the comment. I have moved this to _linear_int8_act_int4_weight_cpu_impl.

* [CPU] enable int8_dynamic_activation_int4_weight with Int4CPULayout * Fix format issue * Add Int8DynamicActInt4WeightCPULayout * remove dispatch for t() * Add cpp kernel for weight packing and GEMM * Register ATQ linear dispatch for da8w4 linear * Fix issues with torch.compile * Fix DA8W4CPUAQTTensorImpl.get_plain * Test DA8W4CPUAQTTensorImpl.get_plain in UT * Skip UT if CPP kernel not built * Add AVX512_VNNI implementation for small M * improve performance * Support symmetric quantization of activation * Refine code * Refine code * Put in a separate file * Bug fix * refine code

[CPU] enable int8_dynamic_activation_int4_weight with Int4CPULayout

0581451

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 25, 2025

Merge branch 'main' into da8w4_with_int4_cpu_layout

dffbbab

Xia-Weiwen added cpu quantize_ quantize_ API topic: new feature Use this tag if this PR adds a new feature labels Apr 25, 2025

Xia-Weiwen added 2 commits April 25, 2025 03:27

Fix format issue

9fb7f77

Merge branch 'main' into da8w4_with_int4_cpu_layout

35ece3b

Xia-Weiwen requested a review from leslie-fang-intel April 28, 2025 11:02

jerryzh168 reviewed Apr 28, 2025

View reviewed changes

Comment thread test/quantization/test_quant_api.py Outdated

leslie-fang-intel approved these changes Apr 29, 2025

View reviewed changes

Xia-Weiwen marked this pull request as ready for review April 29, 2025 02:01

Xia-Weiwen requested a review from jerryzh168 April 29, 2025 03:16

Xia-Weiwen marked this pull request as draft May 7, 2025 01:17

Xia-Weiwen added 2 commits May 11, 2025 20:08

Merge branch 'main' into da8w4_with_int4_cpu_layout

c5b6d87

Add Int8DynamicActInt4WeightCPULayout

8e80d03

Xia-Weiwen requested a review from leslie-fang-intel May 14, 2025 10:22

Merge branch 'main' into da8w4_with_int4_cpu_layout

51249c3

leslie-fang-intel reviewed May 15, 2025

View reviewed changes

Comment thread torchao/dtypes/uintx/int4_cpu_layout.py Outdated

Xia-Weiwen changed the title ~~[CPU] enable int8_dynamic_activation_int4_weight with Int4CPULayout~~ [CPU] enable int8_dynamic_activation_int4_weight on CPU May 16, 2025

remove dispatch for t()

3e20172

Xia-Weiwen marked this pull request as ready for review May 16, 2025 05:59

Xia-Weiwen changed the title ~~[CPU] enable int8_dynamic_activation_int4_weight on CPU~~ [CPU] Add a new layout for int8_dynamic_activation_int4_weight on CPU May 16, 2025

Merge branch 'main' into da8w4_with_int4_cpu_layout

e765664

Xia-Weiwen marked this pull request as draft May 21, 2025 02:57

leslie-fang-intel reviewed Jun 4, 2025

View reviewed changes

Comment thread torchao/csrc/cpu/da8w4_linear.cpp Outdated

Xia-Weiwen added 3 commits June 4, 2025 14:02

Support symmetric quantization of activation

e05b96a

Merge branch 'main' into da8w4_with_int4_cpu_layout

fd6e4b1

Refine code

18335c6

leslie-fang-intel reviewed Jun 5, 2025

View reviewed changes

Comment thread torchao/csrc/cpu/da8w4_linear.cpp Outdated

Comment thread torchao/csrc/cpu/da8w4_linear.cpp

leslie-fang-intel approved these changes Jun 5, 2025

View reviewed changes

Xia-Weiwen added 2 commits June 5, 2025 14:53

Refine code

66ab77f

Merge branch 'main' into da8w4_with_int4_cpu_layout

2c5a799

Xia-Weiwen requested a review from jerryzh168 June 6, 2025 01:49

Xia-Weiwen marked this pull request as ready for review June 6, 2025 01:49

Merge branch 'main' into da8w4_with_int4_cpu_layout

131660e

jerryzh168 reviewed Jun 12, 2025

View reviewed changes

Comment thread torchao/quantization/quant_api.py

jerryzh168 reviewed Jun 12, 2025

View reviewed changes

Comment thread torchao/dtypes/uintx/int4_cpu_layout.py Outdated

jerryzh168 reviewed Jun 12, 2025

View reviewed changes

Xia-Weiwen added 2 commits June 14, 2025 17:05

Put in a separate file

75fbd6f

Merge branch 'main' into da8w4_with_int4_cpu_layout

24268fd

Xia-Weiwen requested a review from jerryzh168 June 15, 2025 11:32

jerryzh168 reviewed Jun 20, 2025

View reviewed changes

Comment thread test/quantization/test_quant_api.py

jerryzh168 reviewed Jun 23, 2025

View reviewed changes

jerryzh168 approved these changes Jun 23, 2025

View reviewed changes

Xia-Weiwen merged commit 8b57afe into pytorch:main Jun 25, 2025
35 checks passed

Xia-Weiwen added 3 commits June 25, 2025 13:38

Bug fix

4c0a739

Merge branch 'main' into da8w4_with_int4_cpu_layout

0815d96

refine code

e3731f7



		@dataclass(frozen=True)
		class Int8DynamicActInt4WeightCPULayout(Layout):



		@register_layout(Int8DynamicActInt4WeightCPULayout)
		class DA8W4CPUAQTTensorImpl(Int4CPUAQTTensorImpl):

Conversation

Xia-Weiwen commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2128

✅ No Failures

Uh oh!

Uh oh!

Xia-Weiwen commented May 14, 2025

Uh oh!

Uh oh!

Xia-Weiwen commented May 16, 2025

Uh oh!

Xia-Weiwen commented May 19, 2025

Uh oh!

Xia-Weiwen commented May 20, 2025

Uh oh!

leslie-fang-intel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Xia-Weiwen commented Jun 4, 2025

Uh oh!

Uh oh!

Uh oh!

Xia-Weiwen commented Jun 6, 2025

Uh oh!

Xia-Weiwen commented Jun 11, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jerryzh168 Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Xia-Weiwen commented Apr 25, 2025 •

edited

Loading

pytorch-bot Bot commented Apr 25, 2025 •

edited

Loading

jerryzh168 Jun 23, 2025 •

edited

Loading