-
Notifications
You must be signed in to change notification settings - Fork 31.8k
[bnb] Let's make serialization of int8 models possible
#22177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
The design is not easy enough to use. If a user saves a quantized model and pushes to the Hub, it should work directly with |
|
awesome ok, I'll work on that, so if there is a quantized config on the repo we should force-use |
src/transformers/modeling_utils.py
Outdated
| # TODO: uncomment this after the next release of bitsandbytes | ||
| # can_serialize_bnb = version.parse( | ||
| # importlib_metadata.version("bitsandbytes") | ||
| # ) >= version.parse("0.37.2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be changed and I will remove the hardcoded value below on the next bitsandbytes release, of course I don't expect to merge this before the next bnb release
|
The PR is ready for review @sgugger ! |
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we didn't understand each other ;-)
The quantization config should be saved as part of the model config (the model weights cannot be used without it anyway, once the model is quantized it's really part of the information necessary to load it) It will make all the download code irrelevant and this PR muche easier.
|
|
||
| ### Load a quantized model from the 🤗 Hub | ||
|
|
||
| You can load a quantized model from the Hub by using `from_pretrained` method. Make sure that the pushed weights are quantized, by checking that the file `quantization_config.json` is present in the model repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it should be part of the model config, it doesn't need to be in its own file.
src/transformers/modeling_utils.py
Outdated
| importlib_metadata.version("bitsandbytes") | ||
| ) >= version.parse("0.37.0") | ||
|
|
||
| model.quantization_config = quantization_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything would be much simpler if you set model.config.quantization_config = quantizatio_config.
|
Thanks for the heads up! :D |
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks a lot for iterating!
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
sgugger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a couple last nits!
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
…#22177) * make serialization of int8 models possible * make fixup * add docs * add ability to push to hub and save pretrained * fixes * more addition * more tests * fix issues * change variable * clearer message * adapt from suggestions * few fixes * remove unused function * Update src/transformers/utils/quantization_config.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * address last comments * last warning * clarify doc * protect import * Update src/transformers/modeling_utils.py * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
What does this PR do?
Before this PR, it was not possible to save an 8bit model, or load an 8bit model from the Hub. This PR makes this feature possible. If this PR gets merged, users can upload 8bit models on the Hub and/or load 8bit models from the Hub, hence save 2x memory compared to half-precision models.
Depends on bitsandbytes-foundation/bitsandbytes#159
Let's put it as draft before I address the last TODOs and open questions & before bitsandbytes-foundation/bitsandbytes#159 gets merged.
TODOs and open questions:
BitsAndBytesConfigpytorch_model.bin? I would say yes for simplicity reasons but we need to make sure that a user callsfrom_pretrainedwithload_in_8bit, hence add a warning if there is aquantization_config.jsonon the Hub repo + the user is not passingload_in_8bit=True.load_in_8bit=Trueif there is aquantization_config.jsonon the Hub repo?bnbversionscc @sgugger I left few open questions, would love to hear your thoughts on these!