Quantization tool: Use nanmin, nanmax, nanmean in calibrator
#23749
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
np.max/np.minto get min/max values from collected data. However, these functions returnnanif any of the array values isnanwhich subsequently leads invalid scale and failure during quantization atonnxruntime/onnxruntime/python/tools/quantization/quant_utils.py
Line 293 in 93689c5
GroupQueryAttention, the intermediate activations corresponding to padded tokens can become nan. We can safely ignore such values as they don't contribute to the final model output.np.nanmax/np.nanminensures that the calibrator can handlenanvalues. If all values are nan, numpy raises aRuntimeWarning: All-NaN slice encounteredwarning which can help debug the eventual scale issue failure.Output
np.max/np.min: 3.0 1.0 np.nanmax/np.nanmin: 3.0 1.0 np.max/np.min: nan nan np.nanmax/np.nanmin: 3.0 1.0 np.max/np.min: nan nan np.nanmax/np.nanmin: nan nan RuntimeWarning: All-NaN slice encountered print("np.nanmax/np.nanmin:", np.nanmax(array), np.nanmin(array))Motivation and Context