implement per-group quantization for Int8WeightOnlyConfig by vkuzo · Pull Request #4018 · pytorch/ao

vkuzo · 2026-03-06T19:06:15Z

Summary:

This is needed for #2752, as there
are some Meta-only callsites for Int8WeightOnlyConfig v1 using
per-group quantization.

99% Claude

Test Plan:

pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py -k test_int8_weight_only_v1_v2_per_group_equivalence

[ghstack-poisoned]

vkuzo · 2026-03-06T19:06:17Z

Stack from ghstack (oldest at bottom):

Summary: This is needed for #2752, as there are some Meta-only callsites for `Int8WeightOnlyConfig` v1 using per-group quantization. 99% Claude Test Plan: ``` pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py -k test_int8_weight_only_v1_v2_per_group_equivalence ``` ghstack-source-id: 300693d ghstack-comment-id: 4013529969 Pull-Request: #4018

pytorch-bot · 2026-03-06T19:06:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4018

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4d04fad with merge base 637c4ac ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vkuzo · 2026-03-06T19:08:58Z

-        y = y.reshape(*activation_tensor.shape[:-1], weight_tensor.qdata.shape[0])
+        if weight_tensor.block_size[-1] < weight_tensor.qdata.shape[-1]:
+            # Per-group quantization: dequantize weight, then do FP matmul
+            w_dequant = weight_tensor.dequantize().to(output_dtype)


same logic as v1:

ao/torchao/dtypes/affine_quantized_tensor_ops.py

Line 253 in 42bcdc4

weight_tensor = weight_tensor.dequantize()

[ghstack-poisoned]

Update

10a45f8

[ghstack-poisoned]

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 6, 2026

vkuzo added the module: inference quantize_ api inference flow label Mar 6, 2026

vkuzo commented Mar 6, 2026

View reviewed changes

This was referenced Mar 6, 2026

delete v1 of Int8DynamicActivationInt8WeightConfig #4019

Closed

delete v1 of Int8WeightOnlyConfig #4020

Closed

andrewor14 approved these changes Mar 12, 2026

View reviewed changes

vkuzo added 2 commits March 17, 2026 18:27

Update

b247036

[ghstack-poisoned]

Update

4d04fad

[ghstack-poisoned]

vkuzo merged commit 6e5ea54 into main Mar 19, 2026
55 of 57 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement per-group quantization for Int8WeightOnlyConfig#4018

implement per-group quantization for Int8WeightOnlyConfig#4018
vkuzo merged 3 commits intomainfrom
gh/vkuzo/229/head

vkuzo commented Mar 6, 2026

Uh oh!

vkuzo commented Mar 6, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Mar 6, 2026 •

edited

Loading

Uh oh!

vkuzo Mar 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vkuzo commented Mar 6, 2026

Uh oh!

vkuzo commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4018

✅ No Failures

Uh oh!

vkuzo Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vkuzo commented Mar 6, 2026 •

edited

Loading

pytorch-bot Bot commented Mar 6, 2026 •

edited

Loading