Allow onnxruntime quantization preprocessor for dynamic quantization#196
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
mfuntowicz
left a comment
There was a problem hiding this comment.
Thanks for taking care of this, good one!
|
Got a test not passing, however I think this has no link with this PR, will rerun: from optimum/tests/onnxruntime/test_onnxruntime.py Lines 129 to 136 in 50cf246 For me it is a little worrying that we use |
|
@fxmarty 0.8 atol sounds very high |
|
Rerunning, all checks are fine. Weird, not sure why there are non deterministic behavior. |
|
This comparison makes sense for exported models but it's not the case for quantized models (which will very likely give much different outputs than the ones from the original model). It could be interesting to verify the degradation in accuracy, what do you think @mfuntowicz @fxmarty ? + verify that the desired quantization is correctly applied |
|
Oh right, it is comparing with a static quantization model. |
…uggingface#196) * allow onnxruntime quantization preprocessor for dynamic quantization * fix names * trigger checks * trigger checks Co-authored-by: Felix Marty <felix@huggingface.co>
What does this PR do?
Currently, for the onnxruntime backend, the
QuantizationPreprocessoris usable only for static quantization to exclude nodes to quantize, because the onnx model needs to be already saved when initializingQuantizationPreprocessor, which was handled bypartial_fitmethod used during calibration.With this PR, it is possible to use
QuantizationPreprocessorfor dynamic quantization (if it happens to be relevant at some point -- at least I would like to test it), while making no change to the current workflow.Before submitting
QuantizationPreprocessoris largely (publicly) untested and documented, in a future PR we could improve that.This follows up #166 , I messed up with my fork.