Changes to support int8 weight and fp32 bias in QNNPACK#26307
Changes to support int8 weight and fp32 bias in QNNPACK#26307supriyar wants to merge 21 commits intogh/supriyar/17/basefrom
Conversation
…still expects uint8 so we convert from int8 to uint8 in the operator code.
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
…GEMM.
Summary
QNNPACK still expects uint8 weights so we convert from int8 to uint8 in the operator code.
Add support for FP32 bias - Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
ghstack-source-id: 1115029
Pull Request resolved: #26307
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
…GEMM.
Summary
QNNPACK still expects uint8 weights so we convert from int8 to uint8 in the operator code.
Add support for FP32 bias - Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
ghstack-source-id: 1965c2c
Pull Request resolved: #26307
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
| auto input_scale = input_contig.q_scale(); | ||
|
|
||
| // Re-quantizing the bias based on input scale and weight scale. | ||
| if (input_scale != pack_ptr.input_scale) { |
There was a problem hiding this comment.
you probably need to check zero point too
There was a problem hiding this comment.
Input zero point is not used in the pre-packing step anymore, so its fine if it changes at runtime.
There was a problem hiding this comment.
For bias, the quantization depends only on the input scale, not the zero-point
| bias = at::quantize_linear(bias, 1.0, 0, kQInt32); | ||
| bias_fp32 = at::zeros(out_ch, at::kFloat); | ||
| } | ||
| auto bias = at::quantize_linear(bias_fp32, 1.0, 0, kQInt32); |
There was a problem hiding this comment.
I think you don't need this quantization at all - it's never used below
There was a problem hiding this comment.
It is passed to the pre-pack function since qnnpack expects quantized bias.
There was a problem hiding this comment.
Correct, bias should now be stored as float32
There was a problem hiding this comment.
Yes, it is stored in the pack struct as float32, but we still need to quantize it to call the pack function for qnnpack.
| kernel_zp); | ||
| auto* qnnp_w_data = qnnp_weight.data_ptr<c10::quint8>(); | ||
| for (int i = 0; i < weight_contig.numel(); ++i) { | ||
| qnnp_w_data[i] = static_cast<c10::quint8>(w_data[i] + 128); |
There was a problem hiding this comment.
I am missing something here. Why do we add 128 both here and in the packing step?
There was a problem hiding this comment.
The weight tensor we store in the packedWeightStruct is int8, so we need to add 128 here.
In pre-pack since we call the qnnpack-prepack function we need to add 128 there as well before calling it.
raghuramank100
left a comment
There was a problem hiding this comment.
Please see comments
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
…GEMM.
Summary
QNNPACK still expects uint8 weights so we convert from int8 to uint8 in the operator code.
Add support for FP32 bias - Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
ghstack-source-id: e077932
Pull Request resolved: #26307
dzhulgakov
left a comment
There was a problem hiding this comment.
Really close, but please simplify the code - a lot of steps are not necessary (sorry for not spotting it earlier)
| } | ||
| auto wt_ptr = | ||
| guts::make_unique<PackedConvWeightsQnnp>(PackedConvWeightsQnnp{ | ||
| guts::make_unique<qnnpack::PrePackConvWeights>( |
There was a problem hiding this comment.
can you just skip prepacking here all together? and assign nullptr in this member? You're going to recompute the whole thing on first invocation anyway
There was a problem hiding this comment.
I agree - however do you think if someone is reading the code it may throw them off seeing this as null?
Also in the future I think we can remove bias from the pre-packing so it may be okay to leave this here and remove just the bias field later.
If you feel strongly about setting this to nullptr, I can do so :)
There was a problem hiding this comment.
It's just a noop code - why have it? I feel it throws off more to have some computation which is not used later
There was a problem hiding this comment.
Okay, changed it to nullptr in pre-pack :)
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
raghuramank100
left a comment
There was a problem hiding this comment.
Once this is done, please add a end to end numerics comparison test for qnnpack in test_quantized_models.py
dzhulgakov
left a comment
There was a problem hiding this comment.
If those few comments are fixed - I'm good with landing this
| } | ||
| auto wt_ptr = | ||
| guts::make_unique<PackedConvWeightsQnnp>(PackedConvWeightsQnnp{ | ||
| guts::make_unique<qnnpack::PrePackConvWeights>( |
There was a problem hiding this comment.
It's just a noop code - why have it? I feel it throws off more to have some computation which is not used later
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
Differential Revision: [D17504253](https://our.internmc.facebook.com/intern/diff/D17504253)
[ghstack-poisoned]
Summary: Pull Request resolved: pytorch/pytorch#26307 Add support for FP32 bias. Re-quantize bias during time time based on input scale. If the value of input scale changes in the packed struct we requantize the bias with the updated input scale. Test Plan: python test/test_quantized.py TestQNNPackOps Differential Revision: D17504253 Pulled By: supriyar fbshipit-source-id: 49fe36a0bee91aaeb085db28eec4ded8c684dcf4
Summary: Pull Request resolved: pytorch#26307 Add support for FP32 bias. Re-quantize bias during time time based on input scale. If the value of input scale changes in the packed struct we requantize the bias with the updated input scale. Test Plan: python test/test_quantized.py TestQNNPackOps Differential Revision: D17504253 Pulled By: supriyar fbshipit-source-id: 49fe36a0bee91aaeb085db28eec4ded8c684dcf4
Summary: Pull Request resolved: pytorch#26307 Add support for FP32 bias. Re-quantize bias during time time based on input scale. If the value of input scale changes in the packed struct we requantize the bias with the updated input scale. Test Plan: python test/test_quantized.py TestQNNPackOps Differential Revision: D17504253 Pulled By: supriyar fbshipit-source-id: 49fe36a0bee91aaeb085db28eec4ded8c684dcf4
Stack from ghstack:
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.
Test Plan:
python test/test_quantized.py TestQNNPackOps
Differential Revision: D17504253