-
-
Notifications
You must be signed in to change notification settings - Fork 56.5k
Add support for quantized ONNX networks #20188
Description
It's possible to quantize ONNX networks to reduce the storage requirements and also accelerate the inference: https://www.onnxruntime.ai/docs/how-to/quantization.html
However, OpenCV 4.x/pre-5.0 is unable to load such networks, because of the missing support for layers QLinearConv and QLinearMatMul that such quantized networks contain.
It would be nice to add support for such layers into OpenCV. By default, the weights can be converted to FP32 (or FP16 maybe), but the original INT8 weights should be preserved as well — we will be adding fixed-point paths to our implementations of convolution and fully-connected layers.
For testing, here is the original ONNX model:
https://drive.google.com/file/d/1JW6_zrgzjeSZQcKKEDhTvp3aseNu0pe9/view?usp=sharing
and its quantized variant:
https://drive.google.com/file/d/1RHkF8pGMfo0covNR0_GQhB11JvrzogFO/view?usp=sharing
(provided by @SamFC10)