Allow onnxruntime quantization preprocessor for dynamic quantization by fxmarty · Pull Request #166 · huggingface/optimum

fxmarty · 2022-05-06T12:00:27Z

What does this PR do?

Currently, for the onnxruntime backend, the QuantizationPreprocessor is usable only for static quantization to exclude nodes to quantize, because the onnx model needs to be already saved when initializing QuantizationPreprocessor, which was handled by partial_fit method used during calibration.

With this PR, it is possible to use QuantizationPreprocessor for dynamic quantization (if it happens to be relevant at some point -- at least I would like to test it), while making no change to the current workflow.

Before submitting

QuantizationPreprocessor is largely (publicly) untested and documented, in a future PR we could improve that.

HuggingFaceDocBuilderDev · 2022-05-06T12:10:58Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

allow onnxruntime quantization preprocessor for dynamic quantization

9fa5292

This was referenced May 17, 2022

Compare optimized models vs. transformers models #194

Merged

Allow onnxruntime quantization preprocessor for dynamic quantization #196

Merged

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow onnxruntime quantization preprocessor for dynamic quantization#166

Allow onnxruntime quantization preprocessor for dynamic quantization#166
fxmarty wants to merge 1 commit intohuggingface:mainfrom
fxmarty:quantization-preprocessor-dynamic

fxmarty commented May 6, 2022

Uh oh!

HuggingFaceDocBuilderDev commented May 6, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fxmarty commented May 6, 2022

What does this PR do?

Before submitting

Uh oh!

HuggingFaceDocBuilderDev commented May 6, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants