Allow onnxruntime quantization preprocessor for dynamic quantization by fxmarty · Pull Request #196 · huggingface/optimum

fxmarty · 2022-05-17T11:25:53Z

What does this PR do?

Currently, for the onnxruntime backend, the QuantizationPreprocessor is usable only for static quantization to exclude nodes to quantize, because the onnx model needs to be already saved when initializing QuantizationPreprocessor, which was handled by partial_fit method used during calibration.

With this PR, it is possible to use QuantizationPreprocessor for dynamic quantization (if it happens to be relevant at some point -- at least I would like to test it), while making no change to the current workflow.

Before submitting

QuantizationPreprocessor is largely (publicly) untested and documented, in a future PR we could improve that.

This follows up #166 , I messed up with my fork.

HuggingFaceDocBuilderDev · 2022-05-17T11:37:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

optimum/onnxruntime/preprocessors/quantization.py

mfuntowicz

Thanks for taking care of this, good one!

fxmarty · 2022-05-18T11:48:41Z

Got a test not passing, however I think this has no link with this PR, will rerun:

2022-05-18T10:07:30.8335969Z ======================================================================
2022-05-18T10:07:30.8354378Z ERROR: test_dynamic_quantization (test_onnxruntime.TestORTQuantizer) (model_name='facebook/bart-base')
2022-05-18T10:07:30.8354940Z ----------------------------------------------------------------------
2022-05-18T10:07:30.8355263Z Traceback (most recent call last):
2022-05-18T10:07:30.8500187Z   File "/home/runner/work/optimum/optimum/tests/onnxruntime/test_onnxruntime.py", line 129, in test_dynamic_quantization
2022-05-18T10:07:30.8502853Z     validate_model_outputs(
2022-05-18T10:07:30.8503854Z   File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/transformers/onnx/convert.py", line 415, in validate_model_outputs
2022-05-18T10:07:30.8506013Z     raise ValueError(
2022-05-18T10:07:30.8506559Z ValueError: Outputs values doesn't match between reference model and ONNX exported model: Got max absolute difference of: 0.8644396662712097

from

optimum/tests/onnxruntime/test_onnxruntime.py

Lines 129 to 136 in 50cf246

    
           validate_model_outputs( 
        
               quantizer._onnx_config, 
        
               quantizer.tokenizer, 
        
               quantizer.model, 
        
               q8_model_path, 
        
               list(quantizer._onnx_config.outputs.keys()), 
        
               atol=8e-1, 
        
           )

For me it is a little worrying that we use atol=0.8, why do we allow it so large @mfuntowicz @lewtun ? In transformers conversion, it looks like the default is 1e-5 (see https://github.com/huggingface/transformers/blob/d6b8e9cec7301ba02f642588a6f12e78ec3b9798/tests/onnx/test_onnx_v2.py#L278-L285 and https://github.com/huggingface/transformers/blob/d6b8e9cec7301ba02f642588a6f12e78ec3b9798/src/transformers/onnx/config.py#L199-L207 probably). Can the difference be so large simply due to the backend?

mfuntowicz · 2022-05-18T12:30:52Z

@fxmarty 0.8 atol sounds very high

fxmarty · 2022-05-18T12:47:27Z

Rerunning, all checks are fine. Weird, not sure why there are non deterministic behavior.

echarlaix · 2022-05-18T13:29:24Z

This comparison makes sense for exported models but it's not the case for quantized models (which will very likely give much different outputs than the ones from the original model). It could be interesting to verify the degradation in accuracy, what do you think @mfuntowicz @fxmarty ? + verify that the desired quantization is correctly applied

fxmarty · 2022-05-19T05:30:58Z

Oh right, it is comparing with a static quantization model.

…uggingface#196) * allow onnxruntime quantization preprocessor for dynamic quantization * fix names * trigger checks * trigger checks Co-authored-by: Felix Marty <felix@huggingface.co>

allow onnxruntime quantization preprocessor for dynamic quantization

9fa5292

fxmarty mentioned this pull request May 17, 2022

Compare optimized models vs. transformers models #194

Merged

11 tasks

mfuntowicz reviewed May 18, 2022

View reviewed changes

optimum/onnxruntime/preprocessors/quantization.py Outdated Show resolved Hide resolved

mfuntowicz approved these changes May 18, 2022

View reviewed changes

fxmarty added 2 commits May 18, 2022 10:47

fix names

2ad1c59

trigger checks

c3a81e4

trigger checks

59cab6b

echarlaix merged commit f5a7413 into huggingface:main May 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow onnxruntime quantization preprocessor for dynamic quantization#196

Allow onnxruntime quantization preprocessor for dynamic quantization#196
echarlaix merged 4 commits intohuggingface:mainfrom
fxmarty:quantization-preprocessor-dynamic

fxmarty commented May 17, 2022

Uh oh!

HuggingFaceDocBuilderDev commented May 17, 2022

Uh oh!

Uh oh!

mfuntowicz left a comment

Uh oh!

fxmarty commented May 18, 2022

Uh oh!

mfuntowicz commented May 18, 2022

Uh oh!

fxmarty commented May 18, 2022

Uh oh!

echarlaix commented May 18, 2022 •

edited

Loading

Uh oh!

fxmarty commented May 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

fxmarty commented May 17, 2022

What does this PR do?

Before submitting

Uh oh!

HuggingFaceDocBuilderDev commented May 17, 2022

Uh oh!

Uh oh!

mfuntowicz left a comment

Choose a reason for hiding this comment

Uh oh!

fxmarty commented May 18, 2022

Uh oh!

mfuntowicz commented May 18, 2022

Uh oh!

fxmarty commented May 18, 2022

Uh oh!

echarlaix commented May 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fxmarty commented May 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

echarlaix commented May 18, 2022 •

edited

Loading