Skip to content

Add support for quantized ONNX networks #20188

@vpisarev

Description

@vpisarev

It's possible to quantize ONNX networks to reduce the storage requirements and also accelerate the inference: https://www.onnxruntime.ai/docs/how-to/quantization.html

However, OpenCV 4.x/pre-5.0 is unable to load such networks, because of the missing support for layers QLinearConv and QLinearMatMul that such quantized networks contain.

It would be nice to add support for such layers into OpenCV. By default, the weights can be converted to FP32 (or FP16 maybe), but the original INT8 weights should be preserved as well — we will be adding fixed-point paths to our implementations of convolution and fully-connected layers.

For testing, here is the original ONNX model:
https://drive.google.com/file/d/1JW6_zrgzjeSZQcKKEDhTvp3aseNu0pe9/view?usp=sharing
and its quantized variant:
https://drive.google.com/file/d/1RHkF8pGMfo0covNR0_GQhB11JvrzogFO/view?usp=sharing
(provided by @SamFC10)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions