🚨🚨🚨 [Quantization] Store the original dtype in the config as a private attribute 🚨🚨🚨#26761
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
ArthurZucker
left a comment
There was a problem hiding this comment.
Let's make sure we prevent people form casting an already quantised model WDYT? Should not be a recommended / desirable use case
src/transformers/modeling_utils.py
Outdated
| # one the weights have been quantized | ||
| # Note that once you have loaded a quantized model, you can't change its dtype so this will | ||
| # remain a single source of truth | ||
| config._quantization_original_dtype = torch_dtype |
There was a problem hiding this comment.
| config._quantization_original_dtype = torch_dtype | |
| config._pre_quantization_dtype = torch_dtype |
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
ArthurZucker
left a comment
There was a problem hiding this comment.
In general, LGTM let's make sure we don't break the workflow for other as it's breaking (not being able to cast to a dtype after init) and add a 🚨 !
| # pop the `_pre_quantization_dtype` as torch.dtypes are not serializable. | ||
| _ = serializable_config_dict.pop("_pre_quantization_dtype", None) |
There was a problem hiding this comment.
we pop it because it should not be saved no?
| # Note that once you have loaded a quantized model, you can't change its dtype so this will | ||
| # remain a single source of truth |
There was a problem hiding this comment.
Might be needed in the quantizer config?
There was a problem hiding this comment.
I think it is ok since users can always load back quantized models with new torch_dtype making that _pre_quantization_dtype obsolete
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Quantization] Store the original dtype in the config as a private attributeQuantization] Store the original dtype in the config as a private attribute 🚨🚨🚨
…ate attribute 🚨🚨🚨 (huggingface#26761) * First step * fix * add adjustements for gptq * change to `_pre_quantization_dtype` * Update src/transformers/modeling_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix serialization * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
What does this PR do?
First step of an alternative design of #26560
For quantized models, instead of introducing a complex logic of retrieving the original weights dtype, I propose to simply add a private attribute
_quantization_original_dtypein the config object.tomethod does not need to be touched here astocannot be called on quantized models (but for GPTQ models you can calltoto perform device placement only - not for dtype casting)that way we could adapt #26560 to simply check if the config has the attribute
_quantization_original_dtypewhich is the case only for quantized models, else retrieve the dtype by retrieving the dtype of the linear layer weights in a classic manner.cc @LysandreJik