Add per_tensor_quantize to int8 quantize#21372
Conversation
|
👍 |
96f6a62 to
955ec56
Compare
a2f4c13 to
266522a
Compare
|
@rogday I have removed some duplicating code, but not that much. |
Thanks for your reply. My concern is a function with a lot of parameters maybe make code more complicated and hard to understand. |
|
Please rebase to resolve merge conflict: |
266522a to
22ead8d
Compare
rogday
left a comment
There was a problem hiding this comment.
Looks good. We have a few duplicating lines, but given the alternative - function with lots of parameters, it's fine.
|
@zihaomu Friendly reminder. |
1 similar comment
|
@zihaomu Friendly reminder. |
|
@asmorkalov Thanks for the reminder, at the moment I'm all into fast Conv. I'll be updating this PR early next week. |
bc7a942 to
f52a4cb
Compare
|
Hi @rogday, the code has been updated. |
modules/dnn/src/net_quantization.cpp
Outdated
| Net Net::Impl::quantize(InputArrayOfArrays calibData, int inputsDtype, int outputsDtype) | ||
| { | ||
| return quantize(calibData, inputsDtype, outputsDtype, false); | ||
| } |
There was a problem hiding this comment.
Do we need this method? (in internal class)
There was a problem hiding this comment.
Thanks for code reviewing.
How about removing Net Net::Impl::quantize(InputArrayOfArrays calibData, int inputsDtype, int outputsDtype) and just keeping Net Net::Impl::quantize(InputArrayOfArrays calibData, int inputsDtype, int outputsDtype, bool perTensor = false)?
There was a problem hiding this comment.
We could remove this if it is not used anymore.
| testDarknetModel(config_file, weights_file, ref.rowRange(0, N0), scoreDiff, iouDiff, confThreshold); | ||
|
|
||
| // per-tensor quantize | ||
| testDarknetModel(config_file, weights_file, ref.rowRange(0, N0), scoreDiff, 0.16, 0.7, 0.4, true); |
There was a problem hiding this comment.
BTW, It makes sense to create dedicated test or at least use SCOPED_TRACE();.
(here and above)
There was a problem hiding this comment.
Ok, I will try to update it.
There was a problem hiding this comment.
Hi, the SCOPED_TRACE() has been used in every test case of per-tensor quantization.
f52a4cb to
ca2711f
Compare
Tool blames on public I believe we could add this call into "skip" list of this tool as quantization API is experimental (and we don't need to maintain compatibility overloads here). |
|
@zihaomu, thank you! What about the case when we already have quantized model stored in ONNX format and load it - does it recognize "per-tensor" case? |
This PR only affects the fly-quantize model. And about per-quantized ONNX model, if the original model is |
…type and onnx importer.
|
Hi @vpisarev @alalek and @rogday, I have changed the API from |
modules/dnn/src/net.cpp
Outdated
|
|
||
| // FIXIT drop from inference API | ||
| Net Net::quantize(InputArrayOfArrays calibData, int inputsDtype, int outputsDtype) | ||
| Net Net::quantize(InputArrayOfArrays calibData, int inputsDtype, int outputsDtype, bool perTensor = false) |
There was a problem hiding this comment.
bool perTensor = false
Default value works properly from .hpp files only. They are useless in .cpp files
Add per_tensor_quantize to int8 quantize * add per_tensor_quantize to dnn int8 module. * change api flag from perTensor to perChannel, and recognize quantize type and onnx importer. * change the default to hpp
Hi, the purpose of this PR is to add per-tensor quantization to the model quantization part of opencv dnn.
The difference of per-channel quantization and per-tensor quantization.
The existing quantization method in opencv/dnn is based on per-channel quantization, which can achieve better accuracy than per-tensor quantization. But for some hardware, per-tensor quantization will be easier to optimize the speed, especially NPU chips.
For example, the NPU of TIM-VX backend:
ResNet50 int8 (per-channel) takes 525.298 ms on Khadas Vim3,
And ResNet50 int8 (per-tensor) takes 20.01 ms on Khadas Vim3.
Compared with per-channel quantize, the per-tensor quantization has the disadvantage of low accuracy. Some original unit test in
dnn/test/test_int8_layer.cppcannot be passed due to low accuracy. So I changed the threshold of the parameter in some unit test cases.Related PR.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.