Introduce int8 quantization api (version 2)#3391
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3391
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit e32508c with merge base d355d1f ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "quantize_" |
|
@namgyu-youn I have confirmed internally, there are some infra issues right now so the CI jobs didn't show up, let's just wait for that to be resolved |
This PR was reopened (likely due to being reverted), so your approval was removed. Please request another review.
|
@jerryzh168 Finally! CI started to run, but broken although the local test passed in NVIDIA A100. Assuming CI instance calls different kernels compared to Ampere, but not sure what should I do... can you please help with this? Actually, I didn't understand why compiler, instead of profiler |
|
Thanks for working on this @namgyu-youn! |
Summary: Introduce a new tensor subclass API. The main features are Int8Tensor: Main API, which handles quantization and dequantization operations Utility operation functions: Tensor slice, index selection This API is integrated into global variants (Int8WeightOnlyConfig, Int8DynamicActivationInt8WeightConfig) using version, and not defined as a default. Related Issue/PR: pytorch#3241 (reland) Test plan: pytest -sv test/quantization/quantize_/workflows/int8/test_int8_tensor.py PERF Test: https://github.com/pytorch/ao/blob/main/tutorials/quantize_vit/run_vit_b_quant.py with a batch size of 32: API With torch.compile Without torch.compile Old 65.47 ms 234.39 ms New 63.30 ms 239.30 ms Future Plan: pytorch#3241 (review)
Summary:
Introduce a new tensor subclass API. The main features are
Int8Tensor: Main API, which handles quantization and dequantization operationsThis API is integrated into global variants (
Int8WeightOnlyConfig,Int8DynamicActivationInt8WeightConfig) usingversion, and not defined as a default.Related Issue/PR: #3241 (reland)
Test plan: pytest -sv test/quantization/quantize_/workflows/int8/test_int8_tensor.py
PERF Test:
https://github.com/pytorch/ao/blob/main/tutorials/quantize_vit/run_vit_b_quant.py with a batch size of 32:
torch.compiletorch.compileFuture Plan: #3241 (review)