Changes to support int8 weight and fp32 bias in QNNPACK by supriyar · Pull Request #26307 · pytorch/pytorch

supriyar · 2019-09-16T20:28:29Z

Stack from ghstack:

Unify Quantization APIs for add, pool and relu #26335 Unify Quantization APIs for add, pool and relu
Changes to support int8 weight and fp32 bias in QNNPACK #26307 Changes to support int8 weight and fp32 bias in QNNPACK

Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
python test/test_quantized.py TestQNNPackOps

Differential Revision: D17504253

…still expects uint8 so we convert from int8 to uint8 in the operator code. Add support for FP32 bias. Re-quantize bias during time time based on input scale. If the value of input scale changes in the packed struct we requantize the bias with the updated input scale. Test Plan: python test/test_quantized.py TestQNNPackOps [ghstack-poisoned]

Add support for FP32 bias. Re-quantize bias during time time based on input scale. If the value of input scale changes in the packed struct we requantize the bias with the updated input scale. Test Plan: python test/test_quantized.py TestQNNPackOps [ghstack-poisoned]

…GEMM. Summary QNNPACK still expects uint8 weights so we convert from int8 to uint8 in the operator code. Add support for FP32 bias - Re-quantize bias during time time based on input scale. If the value of input scale changes in the packed struct we requantize the bias with the updated input scale. Test Plan: python test/test_quantized.py TestQNNPackOps ghstack-source-id: 1115029 Pull Request resolved: #26307

Add support for FP32 bias. Re-quantize bias during time time based on input scale. If the value of input scale changes in the packed struct we requantize the bias with the updated input scale. Test Plan: python test/test_quantized.py TestQNNPackOps [ghstack-poisoned]

…GEMM. Summary QNNPACK still expects uint8 weights so we convert from int8 to uint8 in the operator code. Add support for FP32 bias - Re-quantize bias during time time based on input scale. If the value of input scale changes in the packed struct we requantize the bias with the updated input scale. Test Plan: python test/test_quantized.py TestQNNPackOps ghstack-source-id: 1965c2c Pull Request resolved: #26307

Add support for FP32 bias. Re-quantize bias during time time based on input scale. If the value of input scale changes in the packed struct we requantize the bias with the updated input scale. Test Plan: python test/test_quantized.py TestQNNPackOps [ghstack-poisoned]

dzhulgakov · 2019-09-18T07:04:26Z

+    auto input_scale = input_contig.q_scale();
+
+    // Re-quantizing the bias based on input scale and weight scale.
+    if (input_scale != pack_ptr.input_scale) {


you probably need to check zero point too

Input zero point is not used in the pre-packing step anymore, so its fine if it changes at runtime.

For bias, the quantization depends only on the input scale, not the zero-point

dzhulgakov · 2019-09-18T07:07:45Z

-      bias = at::quantize_linear(bias, 1.0, 0, kQInt32);
+      bias_fp32 = at::zeros(out_ch, at::kFloat);
    }
+    auto bias = at::quantize_linear(bias_fp32, 1.0, 0, kQInt32);


I think you don't need this quantization at all - it's never used below

It is passed to the pre-pack function since qnnpack expects quantized bias.

Correct, bias should now be stored as float32

Yes, it is stored in the pack struct as float32, but we still need to quantize it to call the pack function for qnnpack.

raghuramank100 · 2019-09-18T18:24:33Z

+          kernel_zp);
+      auto* qnnp_w_data = qnnp_weight.data_ptr<c10::quint8>();
+      for (int i = 0; i < weight_contig.numel(); ++i) {
+        qnnp_w_data[i] = static_cast<c10::quint8>(w_data[i] + 128);


I am missing something here. Why do we add 128 both here and in the packing step?

The weight tensor we store in the packedWeightStruct is int8, so we need to add 128 here.
In pre-pack since we call the qnnpack-prepack function we need to add 128 there as well before calling it.

raghuramank100

Please see comments

Add support for FP32 bias. Re-quantize bias during time time based on input scale. If the value of input scale changes in the packed struct we requantize the bias with the updated input scale. Test Plan: python test/test_quantized.py TestQNNPackOps [ghstack-poisoned]

…GEMM. Summary QNNPACK still expects uint8 weights so we convert from int8 to uint8 in the operator code. Add support for FP32 bias - Re-quantize bias during time time based on input scale. If the value of input scale changes in the packed struct we requantize the bias with the updated input scale. Test Plan: python test/test_quantized.py TestQNNPackOps ghstack-source-id: e077932 Pull Request resolved: #26307

dzhulgakov

Really close, but please simplify the code - a lot of steps are not necessary (sorry for not spotting it earlier)

dzhulgakov · 2019-09-20T01:25:46Z

+    }
    auto wt_ptr =
        guts::make_unique<PackedConvWeightsQnnp>(PackedConvWeightsQnnp{
            guts::make_unique<qnnpack::PrePackConvWeights>(


can you just skip prepacking here all together? and assign nullptr in this member? You're going to recompute the whole thing on first invocation anyway

I agree - however do you think if someone is reading the code it may throw them off seeing this as null?
Also in the future I think we can remove bias from the pre-packing so it may be okay to leave this here and remove just the bias field later.
If you feel strongly about setting this to nullptr, I can do so :)

It's just a noop code - why have it? I feel it throws off more to have some computation which is not used later

Okay, changed it to nullptr in pre-pack :)

Add support for FP32 bias. Re-quantize bias during time time based on input scale. If the value of input scale changes in the packed struct we requantize the bias with the updated input scale. Test Plan: python test/test_quantized.py TestQNNPackOps [ghstack-poisoned]

raghuramank100

Once this is done, please add a end to end numerics comparison test for qnnpack in test_quantized_models.py

dzhulgakov

If those few comments are fixed - I'm good with landing this

dzhulgakov · 2019-09-20T07:16:12Z

+    }
    auto wt_ptr =
        guts::make_unique<PackedConvWeightsQnnp>(PackedConvWeightsQnnp{
            guts::make_unique<qnnpack::PrePackConvWeights>(


It's just a noop code - why have it? I feel it throws off more to have some computation which is not used later

Add support for FP32 bias. Re-quantize bias during time time based on input scale. If the value of input scale changes in the packed struct we requantize the bias with the updated input scale. Test Plan: python test/test_quantized.py TestQNNPackOps [ghstack-poisoned]

Add support for FP32 bias. Re-quantize bias during time time based on input scale. If the value of input scale changes in the packed struct we requantize the bias with the updated input scale. Test Plan: python test/test_quantized.py TestQNNPackOps Differential Revision: [D17504253](https://our.internmc.facebook.com/intern/diff/D17504253) [ghstack-poisoned]

Summary: Pull Request resolved: pytorch/pytorch#26307 Add support for FP32 bias. Re-quantize bias during time time based on input scale. If the value of input scale changes in the packed struct we requantize the bias with the updated input scale. Test Plan: python test/test_quantized.py TestQNNPackOps Differential Revision: D17504253 Pulled By: supriyar fbshipit-source-id: 49fe36a0bee91aaeb085db28eec4ded8c684dcf4

facebook-github-bot · 2019-09-21T02:36:20Z

@supriyar merged this pull request in 8c4b7a1.

Summary: Pull Request resolved: pytorch#26307 Add support for FP32 bias. Re-quantize bias during time time based on input scale. If the value of input scale changes in the packed struct we requantize the bias with the updated input scale. Test Plan: python test/test_quantized.py TestQNNPackOps Differential Revision: D17504253 Pulled By: supriyar fbshipit-source-id: 49fe36a0bee91aaeb085db28eec4ded8c684dcf4

supriyar requested a review from apaszke as a code owner September 16, 2019 20:28

pytorchbot added module: nn Related to torch.nn module: operators oncall: quantization Quantization support in PyTorch labels Sep 16, 2019

supriyar changed the title ~~Changes to support int8 weight in QNNPACK similar to FBGEMM. QNNPACK still expects uint8 so we convert from int8 to uint8 in the operator code.~~ Changes to support int8 weight in QNNPACK similar to FBGEMM. Sep 16, 2019

supriyar changed the title ~~Changes to support int8 weight in QNNPACK similar to FBGEMM.~~ Changes to support int8 weight in QNNPACK similar to FBGEMM. Sep 16, 2019

supriyar changed the title ~~Changes to support int8 weight in QNNPACK similar to FBGEMM.~~ Changes to support int8 weight and fp32 bias in QNNPACK Sep 16, 2019

supriyar requested review from dzhulgakov, ljk53 and raghuramank100 September 16, 2019 20:37

supriyar mentioned this pull request Sep 17, 2019

Unify Quantization APIs for add, pool and relu #26335

Closed

supriyar added 6 commits September 16, 2019 22:07

dzhulgakov reviewed Sep 18, 2019

View reviewed changes

raghuramank100 reviewed Sep 18, 2019

View reviewed changes

supriyar requested a review from raghuramank100 September 18, 2019 21:36

supriyar added 2 commits September 18, 2019 15:15

supriyar requested a review from dzhulgakov September 19, 2019 01:33

supriyar added 3 commits September 18, 2019 20:53

dzhulgakov requested changes Sep 20, 2019

View reviewed changes

supriyar requested a review from dzhulgakov September 20, 2019 05:03

raghuramank100 approved these changes Sep 20, 2019

View reviewed changes

dzhulgakov reviewed Sep 20, 2019

View reviewed changes

supriyar added 4 commits September 20, 2019 09:31

facebook-github-bot closed this in 8c4b7a1 Sep 20, 2019

facebook-github-bot added the merged label Sep 21, 2019

facebook-github-bot deleted the gh/supriyar/17/head branch October 28, 2019 22:20

mruberry added the Merged label Oct 28, 2020

Conversation

supriyar commented Sep 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raghuramank100 left a comment

Choose a reason for hiding this comment

Uh oh!

dzhulgakov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raghuramank100 left a comment

Choose a reason for hiding this comment

Uh oh!

dzhulgakov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Sep 21, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

supriyar commented Sep 16, 2019 •

edited

Loading