Introduce int8 quantization api (version 2) by namgyu-youn · Pull Request #3391 · pytorch/ao

namgyu-youn · 2025-11-26T02:45:16Z

Summary:
Introduce a new tensor subclass API. The main features are

Int8Tensor: Main API, which handles quantization and dequantization operations
Utility operation functions: Tensor slice, index selection

This API is integrated into global variants (Int8WeightOnlyConfig, Int8DynamicActivationInt8WeightConfig) using version, and not defined as a default.

Related Issue/PR: #3241 (reland)

Test plan: pytest -sv test/quantization/quantize_/workflows/int8/test_int8_tensor.py

PERF Test:
https://github.com/pytorch/ao/blob/main/tutorials/quantize_vit/run_vit_b_quant.py with a batch size of 32:

API	With `torch.compile`	Without `torch.compile`
Old	65.47 ms	234.39 ms
New	63.30 ms	239.30 ms

Future Plan: #3241 (review)

pytorch-bot · 2025-11-26T02:45:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3391

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e32508c with merge base d355d1f ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

namgyu-youn · 2025-11-26T02:47:11Z

@pytorchbot label "quantize_"

jerryzh168 · 2025-11-26T19:22:24Z

@namgyu-youn I have confirmed internally, there are some infra issues right now so the CI jobs didn't show up, let's just wait for that to be resolved

This PR was reopened (likely due to being reverted), so your approval was removed. Please request another review.

namgyu-youn · 2025-12-02T10:20:40Z

@jerryzh168 Finally! CI started to run, but broken although the local test passed in NVIDIA A100. Assuming CI instance calls different kernels compared to Ampere, but not sure what should I do... can you please help with this? Actually, I didn't understand why compiler, instead of profiler

jcaip · 2025-12-02T23:04:34Z

Thanks for working on this @namgyu-youn!

Summary: Introduce a new tensor subclass API. The main features are Int8Tensor: Main API, which handles quantization and dequantization operations Utility operation functions: Tensor slice, index selection This API is integrated into global variants (Int8WeightOnlyConfig, Int8DynamicActivationInt8WeightConfig) using version, and not defined as a default. Related Issue/PR: pytorch#3241 (reland) Test plan: pytest -sv test/quantization/quantize_/workflows/int8/test_int8_tensor.py PERF Test: https://github.com/pytorch/ao/blob/main/tutorials/quantize_vit/run_vit_b_quant.py with a batch size of 32: API With torch.compile Without torch.compile Old 65.47 ms 234.39 ms New 63.30 ms 239.30 ms Future Plan: pytorch#3241 (review)

reland int8 quantization api

c028dbc

pytorch-bot Bot added the ci-no-td label Nov 26, 2025

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 26, 2025

pytorch-bot Bot added the quantize_ quantize_ API label Nov 26, 2025

namgyu-youn mentioned this pull request Nov 26, 2025

introduce new int8 quantization API #3241

Closed

jerryzh168 previously approved these changes Nov 26, 2025

View reviewed changes

jerryzh168 changed the title ~~[Reland] Introduce int8 quantization api (version 2)~~ Introduce int8 quantization api (version 2) Nov 26, 2025

jerryzh168 mentioned this pull request Dec 1, 2025

Int8Tensor migration cleanup #3407

Merged

jcaip closed this Dec 1, 2025

jcaip reopened this Dec 1, 2025

add init for correct import

c1c0f00

namgyu-youn requested a review from jerryzh168 December 2, 2025 02:25

jcaip added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Dec 2, 2025

jcaip added 4 commits December 2, 2025 12:29

Skip test if PyTorch version is below 2.7.0

e6d0814

Fix indentation for skipIf decorator in test

0c2fa6c

Fix import

8a7de2a

fix lint

e32508c

jcaip merged commit 3c3515a into pytorch:main Dec 2, 2025
23 checks passed

namgyu-youn deleted the int8-reland branch December 3, 2025 02:59

vkuzo mentioned this pull request Jan 26, 2026

Migrating from AffineQuantizedTensor + Layouts to new structure of tensor subclasses #2752

Closed

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce int8 quantization api (version 2)#3391

Introduce int8 quantization api (version 2)#3391
jcaip merged 6 commits intopytorch:mainfrom
namgyu-youn:int8-reland

namgyu-youn commented Nov 26, 2025 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Nov 26, 2025 •

edited

Loading

Uh oh!

namgyu-youn commented Nov 26, 2025

Uh oh!

jerryzh168 commented Nov 26, 2025

Uh oh!

namgyu-youn commented Dec 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

jcaip commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

namgyu-youn commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3391

✅ No Failures

Uh oh!

namgyu-youn commented Nov 26, 2025

Uh oh!

jerryzh168 commented Nov 26, 2025

Uh oh!

namgyu-youn commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jcaip commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

namgyu-youn commented Nov 26, 2025 •

edited

Loading

pytorch-bot Bot commented Nov 26, 2025 •

edited

Loading

namgyu-youn commented Dec 2, 2025 •

edited

Loading