Add torchao_convert to PARQ's QuantOptimizer#2947
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2947
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 4cfa628 with merge base b99904b ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
d593e74 to
8e39be5
Compare
3bdfd2a to
01e14bd
Compare
a3c4e94 to
e608880
Compare
|
Thanks! Overall looks great! Left some comments. |
1beacfa to
f352a85
Compare
b0f15c8 to
38cf423
Compare
andrewor14
left a comment
There was a problem hiding this comment.
Looks good from my side, will let @metascroy do a final pass/stamp
|
Overall, I think it looks great! Approving PR, but address the comments above before landing. |
ec5e1f7 to
01582ae
Compare
|
Thanks for the comments @andrewor14 @metascroy :) I think I've addressed them all now, but let me know if I missed anything! |
e33264a to
a854d79
Compare
a854d79 to
1ed5b32
Compare
| block_size = (1, group_size) | ||
| target_dtype = torch.int8 | ||
| q_args = (weight, mapping_type, block_size, target_dtype, config.b) | ||
| if config.version == 2: |
There was a problem hiding this comment.
for this it's probably fine to break BC soon since it's a prototype feature?
There was a problem hiding this comment.
Got it, thanks. I kept the version=1 convention for this config to preserve the old functionality (StretchedAffineQuantizedTensor). The new version=2 will convert tensors to IntxUnpackedToInt8Tensor, which is consistent with other config classes like IntxWeightOnlyConfig
@metascroy Should we remove StretchedAffineQuantizedTensor entirely?
There was a problem hiding this comment.
I have no objections to removing it. I assumed you wanted it for Int4CPULayout, but if you don't need that, I think it's fine to remove.
On ExecuTorch side, IntxUnpackedToInt8 tensor will be sufficient
There was a problem hiding this comment.
I think you're right about CPU support. I'll leave it for now but consider removing it in the future
There was a problem hiding this comment.
Int4CPULayout is also moved to v2 already: #2845 you can set int4_packing_format to plain_int32 to get it
The original plan was to manually replace each weight tensor with an instance of
IntxUnpackedToInt8Tensor. However, I found that this already happens inIntxWeightOnlyConfigandInt8DynamicActivationIntxWeightConfigwhen they are initialized withversion=2.quantize_once per regularized param_group inQuantOptimizer.torchao_convertStretchedUnifTorchaoQuantizer, I fetch the qparams, scale, zero_point to initializeIntxUnpackedToInt8Tensor(as suggested by Scott)The PR also adds the
num_stepsproperty toQuantOptimizerto mirror D81526700.