[`bnb`] Let's make serialization of int8 models possible #22177

younesbelkada · 2023-03-15T08:46:09Z

What does this PR do?

Before this PR, it was not possible to save an 8bit model, or load an 8bit model from the Hub. This PR makes this feature possible. If this PR gets merged, users can upload 8bit models on the Hub and/or load 8bit models from the Hub, hence save 2x memory compared to half-precision models.

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("bigscience/bloom-560m", device_map="auto", load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloom-560m")

text = "Hello my name is"
inputs = tokenizer(text, return_tensors="pt").to(0)
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))

>>> Hello my name is Nate, I am a professional photographer and I am a member of the
model.save_pretrained("./saved_int8")

model = AutoModelForCausalLM.from_pretrained("./saved_int8")

outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))
>>> Hello my name is Nate, I am a professional photographer and I am a member of the

Depends on bitsandbytes-foundation/bitsandbytes#159

Let's put it as draft before I address the last TODOs and open questions & before bitsandbytes-foundation/bitsandbytes#159 gets merged.

TODOs and open questions:

ability to push BitsAndBytesConfig
Do we want to save the serialized model under the name pytorch_model.bin ? I would say yes for simplicity reasons but we need to make sure that a user calls from_pretrained with load_in_8bit, hence add a warning if there is a quantization_config.json on the Hub repo + the user is not passing load_in_8bit=True.
Force load_in_8bit=True if there is a quantization_config.json on the Hub repo?
Update docs
Update warnings
Safety checkers for bnb versions
Add a test to check if it works using sharded fp16 weights

cc @sgugger I left few open questions, would love to hear your thoughts on these!

HuggingFaceDocBuilderDev · 2023-03-15T09:10:56Z

The documentation is not available anymore as the PR was closed or merged.

sgugger · 2023-03-15T12:47:53Z

The design is not easy enough to use. If a user saves a quantized model and pushes to the Hub, it should work directly with from_pretrained. This is why I insisted that the quantization config should be saved inside the model config. This way you won't need to have the user pass load_in_8_bit=True, as you can read it from the config.

younesbelkada · 2023-03-15T13:26:09Z

awesome ok, I'll work on that, so if there is a quantized config on the repo we should force-use device_map=auto & load_in_8bit in this case

younesbelkada · 2023-03-29T11:40:30Z

src/transformers/modeling_utils.py

+            # TODO: uncomment this after the next release of bitsandbytes
+            # can_serialize_bnb = version.parse(
+            #     importlib_metadata.version("bitsandbytes")
+            # ) >= version.parse("0.37.2")


This will be changed and I will remove the hardcoded value below on the next bitsandbytes release, of course I don't expect to merge this before the next bnb release

younesbelkada · 2023-03-29T12:03:13Z

The PR is ready for review @sgugger !
This PR is not mergeable before the bnb release of course

sgugger

I think we didn't understand each other ;-)
The quantization config should be saved as part of the model config (the model weights cannot be used without it anyway, once the model is quantized it's really part of the information necessary to load it) It will make all the download code irrelevant and this PR muche easier.

sgugger · 2023-03-29T13:33:43Z

docs/source/en/main_classes/quantization.mdx

+
+### Load a quantized model from the 🤗 Hub
+
+You can load a quantized model from the Hub by using `from_pretrained` method. Make sure that the pushed weights are quantized, by checking that the file `quantization_config.json` is present in the model repository.


No it should be part of the model config, it doesn't need to be in its own file.

sgugger · 2023-03-29T13:34:49Z

src/transformers/modeling_utils.py

                importlib_metadata.version("bitsandbytes")
            ) >= version.parse("0.37.0")

+            model.quantization_config = quantization_config


Everything would be much simpler if you set model.config.quantization_config = quantizatio_config.

younesbelkada · 2023-03-31T09:18:12Z

Thanks for the heads up! :D
It should be much better now! For me the PR is ready for a review now

sgugger

Nice, thanks a lot for iterating!

src/transformers/utils/quantization_config.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

src/transformers/modeling_utils.py

sgugger

LGTM with a couple last nits!

docs/source/en/main_classes/quantization.mdx

src/transformers/utils/bitsandbytes.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

…#22177) * make serialization of int8 models possible * make fixup * add docs * add ability to push to hub and save pretrained * fixes * more addition * more tests * fix issues * change variable * clearer message * adapt from suggestions * few fixes * remove unused function * Update src/transformers/utils/quantization_config.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * address last comments * last warning * clarify doc * protect import * Update src/transformers/modeling_utils.py * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

make serialization of int8 models possible

0c015e6

younesbelkada marked this pull request as draft March 15, 2023 08:46

younesbelkada requested a review from sgugger March 15, 2023 08:49

make fixup

5e115a6

younesbelkada added 8 commits March 23, 2023 09:21

add docs

bebe177

Merge remote-tracking branch 'upstream/main' into serialize-8bit

140c50b

add ability to push to hub and save pretrained

90000d4

fixes

97b9f76

Merge remote-tracking branch 'upstream/main' into serialize-8bit

777a780

more addition

b55ca85

more tests

3f47743

Merge remote-tracking branch 'upstream/main' into serialize-8bit

af6ec9a

younesbelkada commented Mar 29, 2023

View reviewed changes

younesbelkada added 3 commits March 29, 2023 11:51

fix issues

c680be9

change variable

06ed1e4

clearer message

377a3f2

younesbelkada requested review from sgugger and removed request for sgugger March 29, 2023 11:58

younesbelkada marked this pull request as ready for review March 29, 2023 12:02

sgugger reviewed Mar 29, 2023

View reviewed changes

younesbelkada added 4 commits March 31, 2023 09:00

adapt from suggestions

897cde9

Merge remote-tracking branch 'upstream/main' into serialize-8bit

379b7f3

few fixes

4216795

remove unused function

9fadcf7

younesbelkada requested a review from sgugger March 31, 2023 09:18

sgugger approved these changes Mar 31, 2023

View reviewed changes

src/transformers/utils/quantization_config.py Outdated Show resolved Hide resolved

younesbelkada and others added 5 commits March 31, 2023 15:54

Update src/transformers/utils/quantization_config.py

eda6d40

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

address last comments

9042365

last warning

f03d36e

clarify doc

a59b638

protect import

fc1411e

younesbelkada commented Apr 12, 2023

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

Update src/transformers/modeling_utils.py

48b7f8f

sgugger approved these changes Apr 12, 2023

View reviewed changes

docs/source/en/main_classes/quantization.mdx Outdated Show resolved Hide resolved

src/transformers/utils/bitsandbytes.py Outdated Show resolved Hide resolved

Apply suggestions from code review

d293c25

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

sgugger merged commit 370f0ca into huggingface:main Apr 12, 2023

poedator mentioned this pull request Sep 14, 2023

[bnb] Let's make serialization of 4bit models possible #26037

Merged


		### Load a quantized model from the 🤗 Hub

		You can load a quantized model from the Hub by using `from_pretrained` method. Make sure that the pushed weights are quantized, by checking that the file `quantization_config.json` is present in the model repository.

[bnb] Let's make serialization of int8 models possible #22177

[bnb] Let's make serialization of int8 models possible #22177

Uh oh!

Conversation

younesbelkada commented Mar 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

TODOs and open questions:

Uh oh!

HuggingFaceDocBuilderDev commented Mar 15, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger commented Mar 15, 2023

Uh oh!

younesbelkada commented Mar 15, 2023

Uh oh!

younesbelkada Mar 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

younesbelkada commented Mar 29, 2023

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

sgugger Mar 29, 2023

Choose a reason for hiding this comment

Uh oh!

sgugger Mar 29, 2023

Choose a reason for hiding this comment

Uh oh!

younesbelkada commented Mar 31, 2023

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[`bnb`] Let's make serialization of int8 models possible #22177

[`bnb`] Let's make serialization of int8 models possible #22177

younesbelkada commented Mar 15, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 15, 2023 •

edited

Loading

younesbelkada Mar 29, 2023 •

edited

Loading