Skip to content

implement per-group quantization for Int8WeightOnlyConfig#4018

Merged
vkuzo merged 3 commits intomainfrom
gh/vkuzo/229/head
Mar 19, 2026
Merged

implement per-group quantization for Int8WeightOnlyConfig#4018
vkuzo merged 3 commits intomainfrom
gh/vkuzo/229/head

Conversation

@vkuzo
Copy link
Copy Markdown
Contributor

@vkuzo vkuzo commented Mar 6, 2026

Summary:

This is needed for #2752, as there
are some Meta-only callsites for Int8WeightOnlyConfig v1 using
per-group quantization.

99% Claude

Test Plan:

pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py -k test_int8_weight_only_v1_v2_per_group_equivalence

[ghstack-poisoned]
@vkuzo
Copy link
Copy Markdown
Contributor Author

vkuzo commented Mar 6, 2026

vkuzo added a commit that referenced this pull request Mar 6, 2026
Summary:

This is needed for #2752, as there
are some Meta-only callsites for `Int8WeightOnlyConfig` v1 using
per-group quantization.

99% Claude

Test Plan:

```
pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py -k test_int8_weight_only_v1_v2_per_group_equivalence
```
ghstack-source-id: 300693d
ghstack-comment-id: 4013529969
Pull-Request: #4018
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Mar 6, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/4018

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 4d04fad with merge base 637c4ac (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 6, 2026
@vkuzo vkuzo added the module: inference quantize_ api inference flow label Mar 6, 2026
y = y.reshape(*activation_tensor.shape[:-1], weight_tensor.qdata.shape[0])
if weight_tensor.block_size[-1] < weight_tensor.qdata.shape[-1]:
# Per-group quantization: dequantize weight, then do FP matmul
w_dequant = weight_tensor.dequantize().to(output_dtype)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same logic as v1:

weight_tensor = weight_tensor.dequantize()

vkuzo added 2 commits March 17, 2026 18:27
[ghstack-poisoned]
[ghstack-poisoned]
@vkuzo vkuzo merged commit 6e5ea54 into main Mar 19, 2026
55 of 57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: inference quantize_ api inference flow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants