Skip to content

Add per_tensor_quantize to int8 quantize#21372

Merged
alalek merged 3 commits intoopencv:4.xfrom
zihaomu:dnn_quantize_per_tensor
Jul 5, 2022
Merged

Add per_tensor_quantize to int8 quantize#21372
alalek merged 3 commits intoopencv:4.xfrom
zihaomu:dnn_quantize_per_tensor

Conversation

@zihaomu
Copy link
Copy Markdown
Member

@zihaomu zihaomu commented Dec 31, 2021

Hi, the purpose of this PR is to add per-tensor quantization to the model quantization part of opencv dnn.
The difference of per-channel quantization and per-tensor quantization.

The existing quantization method in opencv/dnn is based on per-channel quantization, which can achieve better accuracy than per-tensor quantization. But for some hardware, per-tensor quantization will be easier to optimize the speed, especially NPU chips.

For example, the NPU of TIM-VX backend:
ResNet50 int8 (per-channel) takes 525.298 ms on Khadas Vim3,
And ResNet50 int8 (per-tensor) takes 20.01 ms on Khadas Vim3.

Compared with per-channel quantize, the per-tensor quantization has the disadvantage of low accuracy. Some original unit test in dnn/test/test_int8_layer.cpp cannot be passed due to low accuracy. So I changed the threshold of the parameter in some unit test cases.

Related PR.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • The feature is well documented and sample code can be built with the project CMake
  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.

@vpisarev
Copy link
Copy Markdown
Contributor

👍
@zihaomu, please, resolve merge conflicts and this PR can be merged

@zihaomu zihaomu force-pushed the dnn_quantize_per_tensor branch from 96f6a62 to 955ec56 Compare March 28, 2022 02:59
@zihaomu zihaomu requested review from alalek and rogday March 30, 2022 02:18
Copy link
Copy Markdown
Member

@rogday rogday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@zihaomu zihaomu force-pushed the dnn_quantize_per_tensor branch from a2f4c13 to 266522a Compare March 31, 2022 07:16
@zihaomu
Copy link
Copy Markdown
Member Author

zihaomu commented Mar 31, 2022

@rogday I have removed some duplicating code, but not that much.

@rogday
Copy link
Copy Markdown
Member

rogday commented Mar 31, 2022

@rogday I have removed some duplicating code, but not that much.

I was thinking more of along these lines. Take a look, feel free to disagree. It has major disadvantage of taking a lot of parameters. But no code duplication.

@zihaomu
Copy link
Copy Markdown
Member Author

zihaomu commented Apr 1, 2022

I was thinking more of along these lines. Take a look, feel free to disagree. It has major disadvantage of taking a lot of parameters. But no code duplication.

Thanks for your reply. My concern is a function with a lot of parameters maybe make code more complicated and hard to understand.

@alalek
Copy link
Copy Markdown
Member

alalek commented Apr 1, 2022

Please rebase to resolve merge conflict: modules/dnn/test/test_int8_layers.cpp

@zihaomu zihaomu force-pushed the dnn_quantize_per_tensor branch from 266522a to 22ead8d Compare April 1, 2022 14:50
@asmorkalov
Copy link
Copy Markdown
Contributor

@zihaomu @rogday Friendly reminder.

Copy link
Copy Markdown
Member

@rogday rogday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. We have a few duplicating lines, but given the alternative - function with lots of parameters, it's fine.

@asmorkalov
Copy link
Copy Markdown
Contributor

@zihaomu Friendly reminder.

1 similar comment
@asmorkalov
Copy link
Copy Markdown
Contributor

@zihaomu Friendly reminder.

@zihaomu
Copy link
Copy Markdown
Member Author

zihaomu commented Jun 10, 2022

@asmorkalov Thanks for the reminder, at the moment I'm all into fast Conv. I'll be updating this PR early next week.

@zihaomu zihaomu force-pushed the dnn_quantize_per_tensor branch from bc7a942 to f52a4cb Compare June 13, 2022 01:08
@zihaomu zihaomu requested a review from rogday June 13, 2022 02:46
@zihaomu
Copy link
Copy Markdown
Member Author

zihaomu commented Jun 13, 2022

Hi @rogday, the code has been updated.

Copy link
Copy Markdown
Member

@rogday rogday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 👍

Comment on lines +35 to +38
Net Net::Impl::quantize(InputArrayOfArrays calibData, int inputsDtype, int outputsDtype)
{
return quantize(calibData, inputsDtype, outputsDtype, false);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this method? (in internal class)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for code reviewing.
How about removing Net Net::Impl::quantize(InputArrayOfArrays calibData, int inputsDtype, int outputsDtype) and just keeping Net Net::Impl::quantize(InputArrayOfArrays calibData, int inputsDtype, int outputsDtype, bool perTensor = false)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could remove this if it is not used anymore.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

testDarknetModel(config_file, weights_file, ref.rowRange(0, N0), scoreDiff, iouDiff, confThreshold);

// per-tensor quantize
testDarknetModel(config_file, weights_file, ref.rowRange(0, N0), scoreDiff, 0.16, 0.7, 0.4, true);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, It makes sense to create dedicated test or at least use SCOPED_TRACE();.
(here and above)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will try to update it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, the SCOPED_TRACE() has been used in every test case of per-tensor quantization.

@zihaomu zihaomu force-pushed the dnn_quantize_per_tensor branch from f52a4cb to ca2711f Compare June 23, 2022 02:50
@zihaomu
Copy link
Copy Markdown
Member Author

zihaomu commented Jun 23, 2022

@alalek There is a API ERROR, due to we have removed the oringinal Net Net::quantize(InputArrayOfArrays calibData, int inputsDtype, int outputsDtype). CI Report.

@alalek
Copy link
Copy Markdown
Member

alalek commented Jun 23, 2022

due to we have removed the oringinal Net Net::Impl::quantize(InputArrayOfArrays calibData, int inputsDtype, int outputsDtype)

Net::Impl is internal API. It is not checked by API compatibility tool.

Tool blames on public Net::quantize ( cv::InputArrayOfArrays calibData, int inputsDtype, int outputsDtype ) (we still need thin wrapper for it over extended overload)


I believe we could add this call into "skip" list of this tool as quantization API is experimental (and we don't need to maintain compatibility overloads here).
I will do this later if there is no objections.

@vpisarev
Copy link
Copy Markdown
Contributor

@zihaomu, thank you! What about the case when we already have quantized model stored in ONNX format and load it - does it recognize "per-tensor" case?

@zihaomu
Copy link
Copy Markdown
Member Author

zihaomu commented Jun 24, 2022

@zihaomu, thank you! What about the case when we already have quantized model stored in ONNX format and load it - does it recognize "per-tensor" case?

This PR only affects the fly-quantize model. And about per-quantized ONNX model, if the original model is per-tensor model, it will be run in per-tensor way. And if it is per-channel model, it will run in the per-channel way.

@zihaomu
Copy link
Copy Markdown
Member Author

zihaomu commented Jun 26, 2022

Hi @vpisarev @alalek and @rogday, I have changed the API from perTensor to perChannel. Because the ONNX uses the same Quantize Flag at the quantize_static(..., per_channel, ...).
In addition, the onnx_importer can recognize quantization type at QConv and QMatMul Layers.


// FIXIT drop from inference API
Net Net::quantize(InputArrayOfArrays calibData, int inputsDtype, int outputsDtype)
Net Net::quantize(InputArrayOfArrays calibData, int inputsDtype, int outputsDtype, bool perTensor = false)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bool perTensor = false

Default value works properly from .hpp files only. They are useless in .cpp files

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

@zihaomu zihaomu requested a review from alalek June 29, 2022 09:06
@alalek alalek merged commit a80fcac into opencv:4.x Jul 5, 2022
@alalek alalek mentioned this pull request Aug 21, 2022
a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023
Add per_tensor_quantize to int8 quantize

* add per_tensor_quantize to dnn int8 module.

* change api flag from perTensor to perChannel, and recognize quantize type and onnx importer.

* change the default to hpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants