Skip to content

Changes to support int8 weight and fp32 bias in QNNPACK#26307

Closed
supriyar wants to merge 21 commits intogh/supriyar/17/basefrom
gh/supriyar/17/head
Closed

Changes to support int8 weight and fp32 bias in QNNPACK#26307
supriyar wants to merge 21 commits intogh/supriyar/17/basefrom
gh/supriyar/17/head

Conversation

@supriyar
Copy link
Copy Markdown
Contributor

@supriyar supriyar commented Sep 16, 2019

Stack from ghstack:

Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
python test/test_quantized.py TestQNNPackOps

Differential Revision: D17504253

…still expects uint8 so we convert from int8 to uint8 in the operator code.

Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
@supriyar supriyar requested a review from apaszke as a code owner September 16, 2019 20:28
@pytorchbot pytorchbot added module: nn Related to torch.nn module: operators oncall: quantization Quantization support in PyTorch labels Sep 16, 2019
@supriyar supriyar changed the title Changes to support int8 weight in QNNPACK similar to FBGEMM. QNNPACK still expects uint8 so we convert from int8 to uint8 in the operator code. Changes to support int8 weight in QNNPACK similar to FBGEMM. Sep 16, 2019
@supriyar supriyar changed the title Changes to support int8 weight in QNNPACK similar to FBGEMM. Changes to support int8 weight in QNNPACK similar to FBGEMM. Sep 16, 2019
@supriyar supriyar changed the title Changes to support int8 weight in QNNPACK similar to FBGEMM. Changes to support int8 weight and fp32 bias in QNNPACK Sep 16, 2019
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
supriyar added a commit that referenced this pull request Sep 16, 2019
…GEMM.

Summary
QNNPACK still expects uint8 weights so we convert from int8 to uint8 in the operator code.
Add support for FP32 bias - Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

ghstack-source-id: 1115029
Pull Request resolved: #26307
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
supriyar added a commit that referenced this pull request Sep 16, 2019
…GEMM.

Summary
QNNPACK still expects uint8 weights so we convert from int8 to uint8 in the operator code.
Add support for FP32 bias - Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

ghstack-source-id: 1965c2c
Pull Request resolved: #26307
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
auto input_scale = input_contig.q_scale();

// Re-quantizing the bias based on input scale and weight scale.
if (input_scale != pack_ptr.input_scale) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you probably need to check zero point too

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Input zero point is not used in the pre-packing step anymore, so its fine if it changes at runtime.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For bias, the quantization depends only on the input scale, not the zero-point

bias = at::quantize_linear(bias, 1.0, 0, kQInt32);
bias_fp32 = at::zeros(out_ch, at::kFloat);
}
auto bias = at::quantize_linear(bias_fp32, 1.0, 0, kQInt32);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you don't need this quantization at all - it's never used below

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is passed to the pre-pack function since qnnpack expects quantized bias.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, bias should now be stored as float32

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is stored in the pack struct as float32, but we still need to quantize it to call the pack function for qnnpack.

Comment thread aten/src/ATen/native/quantized/cpu/qconv_prepack.cpp Outdated
Comment thread aten/src/ATen/native/quantized/cpu/qconv.cpp Outdated
Comment thread aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp Outdated
Comment thread aten/src/ATen/native/quantized/cpu/qnnpack_utils.h Outdated
kernel_zp);
auto* qnnp_w_data = qnnp_weight.data_ptr<c10::quint8>();
for (int i = 0; i < weight_contig.numel(); ++i) {
qnnp_w_data[i] = static_cast<c10::quint8>(w_data[i] + 128);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am missing something here. Why do we add 128 both here and in the packing step?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The weight tensor we store in the packedWeightStruct is int8, so we need to add 128 here.
In pre-pack since we call the qnnpack-prepack function we need to add 128 there as well before calling it.

Copy link
Copy Markdown
Contributor

@raghuramank100 raghuramank100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comments

Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
@supriyar supriyar requested a review from dzhulgakov September 19, 2019 01:33
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
supriyar added a commit that referenced this pull request Sep 19, 2019
…GEMM.

Summary
QNNPACK still expects uint8 weights so we convert from int8 to uint8 in the operator code.
Add support for FP32 bias - Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

ghstack-source-id: e077932
Pull Request resolved: #26307
Copy link
Copy Markdown
Collaborator

@dzhulgakov dzhulgakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really close, but please simplify the code - a lot of steps are not necessary (sorry for not spotting it earlier)

Comment thread aten/src/ATen/native/quantized/cpu/qconv.cpp
Comment thread aten/src/ATen/native/quantized/cpu/qconv.cpp Outdated
Comment thread aten/src/ATen/native/quantized/cpu/qconv.cpp Outdated
Comment thread aten/src/ATen/native/quantized/cpu/qconv.cpp Outdated
}
auto wt_ptr =
guts::make_unique<PackedConvWeightsQnnp>(PackedConvWeightsQnnp{
guts::make_unique<qnnpack::PrePackConvWeights>(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just skip prepacking here all together? and assign nullptr in this member? You're going to recompute the whole thing on first invocation anyway

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree - however do you think if someone is reading the code it may throw them off seeing this as null?
Also in the future I think we can remove bias from the pre-packing so it may be okay to leave this here and remove just the bias field later.
If you feel strongly about setting this to nullptr, I can do so :)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a noop code - why have it? I feel it throws off more to have some computation which is not used later

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, changed it to nullptr in pre-pack :)

Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
@supriyar supriyar requested a review from dzhulgakov September 20, 2019 05:03
Copy link
Copy Markdown
Contributor

@raghuramank100 raghuramank100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this is done, please add a end to end numerics comparison test for qnnpack in test_quantized_models.py

Copy link
Copy Markdown
Collaborator

@dzhulgakov dzhulgakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If those few comments are fixed - I'm good with landing this

}
auto wt_ptr =
guts::make_unique<PackedConvWeightsQnnp>(PackedConvWeightsQnnp{
guts::make_unique<qnnpack::PrePackConvWeights>(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just a noop code - why have it? I feel it throws off more to have some computation which is not used later

Comment thread aten/src/ATen/native/quantized/cpu/qconv.cpp Outdated
Comment thread aten/src/ATen/native/quantized/cpu/qlinear.cpp Outdated
Comment thread aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp Outdated
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

[ghstack-poisoned]
Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan:
    python test/test_quantized.py TestQNNPackOps

Differential Revision: [D17504253](https://our.internmc.facebook.com/intern/diff/D17504253)

[ghstack-poisoned]
zdevito pushed a commit to zdevito/ATen that referenced this pull request Sep 20, 2019
Summary:
Pull Request resolved: pytorch/pytorch#26307

Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan: python test/test_quantized.py TestQNNPackOps

Differential Revision: D17504253

Pulled By: supriyar

fbshipit-source-id: 49fe36a0bee91aaeb085db28eec4ded8c684dcf4
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@supriyar merged this pull request in 8c4b7a1.

mingbowan pushed a commit to mingbowan/pytorch that referenced this pull request Sep 23, 2019
Summary:
Pull Request resolved: pytorch#26307

Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan: python test/test_quantized.py TestQNNPackOps

Differential Revision: D17504253

Pulled By: supriyar

fbshipit-source-id: 49fe36a0bee91aaeb085db28eec4ded8c684dcf4
@facebook-github-bot facebook-github-bot deleted the gh/supriyar/17/head branch October 28, 2019 22:20
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Pull Request resolved: pytorch#26307

Add support for FP32 bias. Re-quantize bias during time time based on input scale.
If the value of input scale changes in the packed struct we requantize the bias with the updated input scale.

Test Plan: python test/test_quantized.py TestQNNPackOps

Differential Revision: D17504253

Pulled By: supriyar

fbshipit-source-id: 49fe36a0bee91aaeb085db28eec4ded8c684dcf4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: nn Related to torch.nn oncall: quantization Quantization support in PyTorch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants