Skip to content

Allow onnxruntime quantization preprocessor for dynamic quantization#196

Merged
echarlaix merged 4 commits intohuggingface:mainfrom
fxmarty:quantization-preprocessor-dynamic
May 18, 2022
Merged

Allow onnxruntime quantization preprocessor for dynamic quantization#196
echarlaix merged 4 commits intohuggingface:mainfrom
fxmarty:quantization-preprocessor-dynamic

Conversation

@fxmarty
Copy link
Contributor

@fxmarty fxmarty commented May 17, 2022

What does this PR do?

Currently, for the onnxruntime backend, the QuantizationPreprocessor is usable only for static quantization to exclude nodes to quantize, because the onnx model needs to be already saved when initializing QuantizationPreprocessor, which was handled by partial_fit method used during calibration.

With this PR, it is possible to use QuantizationPreprocessor for dynamic quantization (if it happens to be relevant at some point -- at least I would like to test it), while making no change to the current workflow.

Before submitting

  • QuantizationPreprocessor is largely (publicly) untested and documented, in a future PR we could improve that.

This follows up #166 , I messed up with my fork.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Copy link
Member

@mfuntowicz mfuntowicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of this, good one!

@fxmarty
Copy link
Contributor Author

fxmarty commented May 18, 2022

Got a test not passing, however I think this has no link with this PR, will rerun:

2022-05-18T10:07:30.8335969Z ======================================================================
2022-05-18T10:07:30.8354378Z ERROR: test_dynamic_quantization (test_onnxruntime.TestORTQuantizer) (model_name='facebook/bart-base')
2022-05-18T10:07:30.8354940Z ----------------------------------------------------------------------
2022-05-18T10:07:30.8355263Z Traceback (most recent call last):
2022-05-18T10:07:30.8500187Z   File "/home/runner/work/optimum/optimum/tests/onnxruntime/test_onnxruntime.py", line 129, in test_dynamic_quantization
2022-05-18T10:07:30.8502853Z     validate_model_outputs(
2022-05-18T10:07:30.8503854Z   File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/transformers/onnx/convert.py", line 415, in validate_model_outputs
2022-05-18T10:07:30.8506013Z     raise ValueError(
2022-05-18T10:07:30.8506559Z ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got max absolute difference of: 0.8644396662712097

from

validate_model_outputs(
quantizer._onnx_config,
quantizer.tokenizer,
quantizer.model,
q8_model_path,
list(quantizer._onnx_config.outputs.keys()),
atol=8e-1,
)

For me it is a little worrying that we use atol=0.8, why do we allow it so large @mfuntowicz @lewtun ? In transformers conversion, it looks like the default is 1e-5 (see https://github.com/huggingface/transformers/blob/d6b8e9cec7301ba02f642588a6f12e78ec3b9798/tests/onnx/test_onnx_v2.py#L278-L285 and https://github.com/huggingface/transformers/blob/d6b8e9cec7301ba02f642588a6f12e78ec3b9798/src/transformers/onnx/config.py#L199-L207 probably). Can the difference be so large simply due to the backend?

@mfuntowicz
Copy link
Member

@fxmarty 0.8 atol sounds very high

@fxmarty
Copy link
Contributor Author

fxmarty commented May 18, 2022

Rerunning, all checks are fine. Weird, not sure why there are non deterministic behavior.

@echarlaix
Copy link
Collaborator

echarlaix commented May 18, 2022

This comparison makes sense for exported models but it's not the case for quantized models (which will very likely give much different outputs than the ones from the original model). It could be interesting to verify the degradation in accuracy, what do you think @mfuntowicz @fxmarty ? + verify that the desired quantization is correctly applied

@echarlaix echarlaix merged commit f5a7413 into huggingface:main May 18, 2022
@fxmarty
Copy link
Contributor Author

fxmarty commented May 19, 2022

Oh right, it is comparing with a static quantization model.

fxmarty added a commit to fxmarty/optimum that referenced this pull request May 20, 2022
…uggingface#196)

* allow onnxruntime quantization preprocessor for dynamic quantization

* fix names

* trigger checks

* trigger checks

Co-authored-by: Felix Marty <felix@huggingface.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants