Skip to content

Allow onnxruntime quantization preprocessor for dynamic quantization#166

Closed
fxmarty wants to merge 1 commit intohuggingface:mainfrom
fxmarty:quantization-preprocessor-dynamic
Closed

Allow onnxruntime quantization preprocessor for dynamic quantization#166
fxmarty wants to merge 1 commit intohuggingface:mainfrom
fxmarty:quantization-preprocessor-dynamic

Conversation

@fxmarty
Copy link
Contributor

@fxmarty fxmarty commented May 6, 2022

What does this PR do?

Currently, for the onnxruntime backend, the QuantizationPreprocessor is usable only for static quantization to exclude nodes to quantize, because the onnx model needs to be already saved when initializing QuantizationPreprocessor, which was handled by partial_fit method used during calibration.

With this PR, it is possible to use QuantizationPreprocessor for dynamic quantization (if it happens to be relevant at some point -- at least I would like to test it), while making no change to the current workflow.

Before submitting

  • QuantizationPreprocessor is largely (publicly) untested and documented, in a future PR we could improve that.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants