PARQ quantizer support for torchao's weight-only configs#2091
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2091
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 073e1fa with merge base e3db2b2 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@lisjin can you give a little code snippet of our QAT prepare/convert would work for this API? I'm having trouble following. Here are some example code snippets from other APIs: https://fb.workplace.com/groups/pytorch.edge2.team/permalink/1186139489308568/ |
|
Hi @lisjin, do you mind adding a code snippet on the main README on what the end-to-end flow would look like? My understanding is you can just replace the |
|
@andrewor14 Thanks for the feedback—I removed config from UnifTorchaoQuantizer. In the README, I've also added a side-by-side comparison of PARQ vs. torchao prepare and convert steps. After PARQ training, we call |
|
Looks great, thanks @lisjin! The README is very clear. One thing I want to discuss is whether we can just use a new What do you think about something like this instead? Also curious if @metascroy has any thoughts on this |
andrewor14
left a comment
There was a problem hiding this comment.
Looks good to me other than the recursion comment. @metascroy any other thoughts?
|
Looks good to me! Thanks @lisjin! Can we add an end-to-end test_intx_weight_only_e2e for intx (with various x-values), similar to test_int4_weight_only_e2e? |
| @common_utils.parametrize("b", [2, 3, 4, 8]) | ||
| def test_intx_weight_only_e2e(self, b: int = 2, group_size: int = 32): |
There was a problem hiding this comment.
@metascroy Thanks for looking it over! I've added this end-to-end test, along with mapping_type=MappingType.SYMMETRIC and target_dtype=torch.int8 defaults for UnifTorchaoQuantizer
|
@lisjin has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
* Add parq.quant.UnifTorchaoQuantizer for quantize_ API equivalence * Test IntxWeightOnlyConfig * Formatting fix * Per-row IntxWeightOnlyConfig test * Add end-to-end QAT prepare/convert test case * Pass explicit layout to int4_weight_only * Add QuantOptimizer.torchao_quantize_ * Update README, add Int4UnifTorchaoQuantizer * Add test_intx_weight_only_e2e, set UnifTorchaoQuantizer defaults * Update PARQ README
This is the first step in supporting
torchao.quantize_for PARQ trained models. I target onlyInt4WeightOnlyConfigandIntxWeightOnlyConfigfor now since PARQ does not have activation quantization.Instead of converting the state (e.g., scale, zero point) from PARQ's existing quantizers to torchao format, I decided to create a new quantizer
UnifTorchaoQuantizer. This quantizer calls torchao's quantization primitiveschoose_qparams_affine,quantize_affine,dequantize_affineto ensure parity between the two QAT methods.@metascroy It would be great if you could check the correctness of how the quantizer in
TestUnifTorchaoQuantizer.test_intx_weight_onlyis initialized. I'm not sure if I missed any subtleties with int8.