Add Float8ActInt4WeightQATQuantizer#2289
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2289
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 656d17d with merge base 4610850 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
452c147 to
620f676
Compare
c0b808c to
5e08eca
Compare
| from torch.ao.quantization.fx._decomposed import quantized_decomposed_lib # noqa: F401 | ||
|
|
||
| from torchao import quantize_ | ||
| from torchao.float8.config import ScalingGranularity |
There was a problem hiding this comment.
I kinda hate that we have ScalingGranuliarty and Ganularity of the other FP8 inference APIs
There was a problem hiding this comment.
I think this is worth fixing before landing. @andrewor14 , how about just using rowwise scaling (since I assume that the one you want) and removing the option to confugure it? That will at least keep this problem away from the BC surface of QAT in a way that we can more easily fix later.
vkuzo
left a comment
There was a problem hiding this comment.
request changes for removing ScalingGranularity from user API
5e08eca to
59ca3ca
Compare
|
Removed ScalingGranularity from the public API |
cf45f47 to
cfead5c
Compare
cfead5c to
8269247
Compare
8269247 to
2a371fb
Compare
**Summary:** This commit adds a QAT quantizer that performs float8 dynamic activation + int4 symmetric per channel weight fake quantization. Note that there is no corresponding config for float8 QAT yet. This will be added in a future PR. **Test Plan:** python test/quantization/test_qat.py -k test_float8_fake_quantize python test/quantization/test_qat.py -k test_qat_fp8a4w_quantizer
2a371fb to
656d17d
Compare
**Summary:** This commit adds a QAT quantizer that performs float8 dynamic activation + int4 symmetric per channel weight fake quantization. Note that there is no corresponding config for float8 QAT yet. This will be added in a future PR. **Test Plan:** python test/quantization/test_qat.py -k test_float8_fake_quantize python test/quantization/test_qat.py -k test_qat_fp8a4w_quantizer
Summary: This commit adds a QAT quantizer that performs float8 dynamic activation + int4 symmetric per channel weight fake quantization. Note that there is no corresponding config for float8 QAT yet. This will be added in a future PR.
Test Plan:
python test/quantization/test_qat.py -k test_float8_fake_quantize
python test/quantization/test_qat.py -k test_qat_fp8a4w_quantizer