remove to restriction for 4-bit model by SunMarc · Pull Request #33122 · huggingface/transformers

SunMarc · 2024-08-26T13:31:59Z

What does this PR do ?

Since bnb 0.43.0, you freely move bnb models across devices. This PR removes the restriction we put in place.
Needs to be tested. cc @matthewdouglas

HuggingFaceDocBuilderDev · 2024-08-26T13:51:24Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

matthewdouglas · 2024-08-26T15:16:26Z

Thanks @SunMarc! I've tested moving between gpu->cpu->gpu, but not yet on multiple GPUs. We'll still see a warning from accelerate:

You shouldn't move a model that is dispatched using accelerate hooks.

src/transformers/modeling_utils.py

matthewdouglas · 2024-08-26T16:32:26Z

Reference note: this should fix #24540 for 4bit.

For 8bit there is still a blocker: bitsandbytes-foundation/bitsandbytes#1332; once that's fixed & released on the bitsandbytes side we can do an additional PR.

Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

…th .to or .cuda

matthewdouglas · 2024-08-27T18:50:20Z

src/transformers/modeling_utils.py

        if getattr(self, "quantization_method", None) == QuantizationMethod.BITS_AND_BYTES:
            if getattr(self, "is_loaded_in_4bit", False):
-                if version.parse(importlib.metadata.version("bitsandbytes")) < version.parse("0.43.0"):
+                if version.parse(importlib.metadata.version("bitsandbytes")) < version.parse("0.43.2"):


@SunMarc I've bumped this to 0.43.2 since that's when bitsandbytes-foundation/bitsandbytes#1279 was landed.

Nice, thanks for updating the PR !

LysandreJik

Thanks for the PR! This looks good

LysandreJik · 2024-08-30T12:21:24Z

src/transformers/modeling_utils.py

+                raise ValueError(
+                    "Calling `cuda()` is not supported for `4-bit` quantized models. Please use the model as it is, since the"
+                    " model has already been set to the correct devices and casted to the correct `dtype`. "
+                    "However, if you still want to move the model, you need to install bitsandbytes >= 0.43.2 "
+                )


The warning isn't super clear to me in terms of what the user should or should not do; should they install the new version or should they just let the model there? I'd try to clarify this a bit

Good feedback, thanks! Updated. I think in most cases the user would be using .cuda() without realizing it is already on a GPU so I put the current model.device in the message. That should help inform on whether they really meant to move it somewhere else and need to upgrade.

* quantization config. * fix-copies * fix * modules_to_not_convert * add bitsandbytes utilities. * make progress. * fixes * quality * up * up rotary embedding refactor 2: update comments, fix dtype for use_real=False (#9312) fix notes and dtype up up * minor * up * up * fix * provide credits where due. * make configurations work. * fixes * fix * update_missing_keys * fix * fix * make it work. * fix * provide credits to transformers. * empty commit * handle to() better. * tests * change to bnb from bitsandbytes * fix tests fix slow quality tests SD3 remark fix complete int4 tests add a readme to the test files. add model cpu offload tests warning test * better safeguard. * change merging status * courtesy to transformers. * move upper. * better * make the unused kwargs warning friendlier. * harmonize changes with huggingface/transformers#33122 * style * trainin tests * feedback part i. * Add Flux inpainting and Flux Img2Img (#9135) --------- Co-authored-by: yiyixuxu <yixu310@gmail.com> Update `UNet2DConditionModel`'s error messages (#9230) * refactor [CI] Update Single file Nightly Tests (#9357) * update * update feedback. improve README for flux dreambooth lora (#9290) * improve readme * improve readme * improve readme * improve readme fix one uncaught deprecation warning for accessing vae_latent_channels in VaeImagePreprocessor (#9372) deprecation warning vae_latent_channels add mixed int8 tests and more tests to nf4. [core] Freenoise memory improvements (#9262) * update * implement prompt interpolation * make style * resnet memory optimizations * more memory optimizations; todo: refactor * update * update animatediff controlnet with latest changes * refactor chunked inference changes * remove print statements * update * chunk -> split * remove changes from incorrect conflict resolution * remove changes from incorrect conflict resolution * add explanation of SplitInferenceModule * update docs * Revert "update docs" This reverts commit c55a50a. * update docstring for freenoise split inference * apply suggestions from review * add tests * apply suggestions from review quantization docs. docs. * Revert "Add Flux inpainting and Flux Img2Img (#9135)" This reverts commit 5799954. * tests * don * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * contribution guide. * changes * empty * fix tests * harmonize with huggingface/transformers#33546. * numpy_cosine_distance * config_dict modification. * remove if config comment. * note for load_state_dict changes. * float8 check. * quantizer. * raise an error for non-True low_cpu_mem_usage values when using quant. * low_cpu_mem_usage shenanigans when using fp32 modules. * don't re-assign _pre_quantization_type. * make comments clear. * remove comments. * handle mixed types better when moving to cpu. * add tests to check if we're throwing warning rightly. * better check. * fix 8bit test_quality. * handle dtype more robustly. * better message when keep_in_fp32_modules. * handle dtype casting. * fix dtype checks in pipeline. * fix warning message. * Update src/diffusers/models/modeling_utils.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * mitigate the confusing cpu warning --------- Co-authored-by: Vishnu V Jaddipal <95531133+Gothos@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: YiYi Xu <yixu310@gmail.com>

* remove to restiction for 4-bit model * Update src/transformers/modeling_utils.py Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com> * bitsandbytes: prevent dtype casting while allowing device movement with .to or .cuda * quality fix * Improve warning message for .to() and .cuda() on bnb quantized models --------- Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

* quantization config. * fix-copies * fix * modules_to_not_convert * add bitsandbytes utilities. * make progress. * fixes * quality * up * up rotary embedding refactor 2: update comments, fix dtype for use_real=False (#9312) fix notes and dtype up up * minor * up * up * fix * provide credits where due. * make configurations work. * fixes * fix * update_missing_keys * fix * fix * make it work. * fix * provide credits to transformers. * empty commit * handle to() better. * tests * change to bnb from bitsandbytes * fix tests fix slow quality tests SD3 remark fix complete int4 tests add a readme to the test files. add model cpu offload tests warning test * better safeguard. * change merging status * courtesy to transformers. * move upper. * better * make the unused kwargs warning friendlier. * harmonize changes with huggingface/transformers#33122 * style * trainin tests * feedback part i. * Add Flux inpainting and Flux Img2Img (#9135) --------- Co-authored-by: yiyixuxu <yixu310@gmail.com> Update `UNet2DConditionModel`'s error messages (#9230) * refactor [CI] Update Single file Nightly Tests (#9357) * update * update feedback. improve README for flux dreambooth lora (#9290) * improve readme * improve readme * improve readme * improve readme fix one uncaught deprecation warning for accessing vae_latent_channels in VaeImagePreprocessor (#9372) deprecation warning vae_latent_channels add mixed int8 tests and more tests to nf4. [core] Freenoise memory improvements (#9262) * update * implement prompt interpolation * make style * resnet memory optimizations * more memory optimizations; todo: refactor * update * update animatediff controlnet with latest changes * refactor chunked inference changes * remove print statements * update * chunk -> split * remove changes from incorrect conflict resolution * remove changes from incorrect conflict resolution * add explanation of SplitInferenceModule * update docs * Revert "update docs" This reverts commit c55a50a. * update docstring for freenoise split inference * apply suggestions from review * add tests * apply suggestions from review quantization docs. docs. * Revert "Add Flux inpainting and Flux Img2Img (#9135)" This reverts commit 5799954. * tests * don * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * contribution guide. * changes * empty * fix tests * harmonize with huggingface/transformers#33546. * numpy_cosine_distance * config_dict modification. * remove if config comment. * note for load_state_dict changes. * float8 check. * quantizer. * raise an error for non-True low_cpu_mem_usage values when using quant. * low_cpu_mem_usage shenanigans when using fp32 modules. * don't re-assign _pre_quantization_type. * make comments clear. * remove comments. * handle mixed types better when moving to cpu. * add tests to check if we're throwing warning rightly. * better check. * fix 8bit test_quality. * handle dtype more robustly. * better message when keep_in_fp32_modules. * handle dtype casting. * fix dtype checks in pipeline. * fix warning message. * Update src/diffusers/models/modeling_utils.py Co-authored-by: YiYi Xu <yixu310@gmail.com> * mitigate the confusing cpu warning --------- Co-authored-by: Vishnu V Jaddipal <95531133+Gothos@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: YiYi Xu <yixu310@gmail.com>

ukaprch · 2025-02-28T20:57:26Z

Just to let you all know, this change breaks those using Quanto instead of BitsandBytes. Yeah, I know, Quanto seems to be the ugly duckling of quantizers. Before this update I could move my pipeline of of 'cuda' to the 'cpu'. When I changed my code to reflect this the 1st time I offload:
pipe = pipe.to(dtype=torch.float32,device='cpu', silence_dtype_warnings=True)

Then come back in to reuse my saved pipe:
pipe = FluxPipeline.from_pipe(pipe)
badabing badaboom!

I get this error:

So it seems you don't want me offloading a quantized file for quanto?

remove to restiction for 4-bit model

08f9c93

SunMarc mentioned this pull request Aug 26, 2024

Using multi GPU fails with AutoModelForCausalLM quantization_config=quantization_config #33112

Closed

matthewdouglas reviewed Aug 26, 2024

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

matthewdouglas added the Quantization label Aug 26, 2024

SunMarc and others added 2 commits August 27, 2024 14:15

Update src/transformers/modeling_utils.py

bb12e88

Co-authored-by: Matthew Douglas <38992547+matthewdouglas@users.noreply.github.com>

bitsandbytes: prevent dtype casting while allowing device movement wi…

d064b48

…th .to or .cuda

matthewdouglas reviewed Aug 27, 2024

View reviewed changes

quality fix

22f6088

matthewdouglas marked this pull request as ready for review August 28, 2024 14:01

matthewdouglas requested review from ArthurZucker and LysandreJik August 28, 2024 14:03

matthewdouglas mentioned this pull request Aug 28, 2024

Enable BNB multi-backend support #31098

Merged

LysandreJik approved these changes Aug 30, 2024

View reviewed changes

Improve warning message for .to() and .cuda() on bnb quantized models

462ac2c

ArthurZucker approved these changes Aug 30, 2024

View reviewed changes

SunMarc merged commit 9ea1eac into main Sep 2, 2024

SunMarc deleted the remove_to_4bit branch September 2, 2024 14:28

SunMarc mentioned this pull request Sep 2, 2024

[Quantization] Add quantization support for bitsandbytes huggingface/diffusers#9213

Merged

8 tasks

sayakpaul added a commit to huggingface/diffusers that referenced this pull request Sep 3, 2024

harmonize changes with huggingface/transformers#33122

31725aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove to restriction for 4-bit model#33122

remove to restriction for 4-bit model#33122
SunMarc merged 5 commits intomainfrom
remove_to_4bit

SunMarc commented Aug 26, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Aug 26, 2024

Uh oh!

matthewdouglas commented Aug 26, 2024

Uh oh!

Uh oh!

matthewdouglas commented Aug 26, 2024

Uh oh!

matthewdouglas Aug 27, 2024

Uh oh!

SunMarc Aug 29, 2024

Uh oh!

LysandreJik left a comment

Uh oh!

LysandreJik Aug 30, 2024

Uh oh!

matthewdouglas Aug 30, 2024

Uh oh!

ukaprch commented Feb 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

SunMarc commented Aug 26, 2024

What does this PR do ?

Uh oh!

HuggingFaceDocBuilderDev commented Aug 26, 2024

Uh oh!

matthewdouglas commented Aug 26, 2024

Uh oh!

Uh oh!

matthewdouglas commented Aug 26, 2024

Uh oh!

matthewdouglas Aug 27, 2024

Choose a reason for hiding this comment

Uh oh!

SunMarc Aug 29, 2024

Choose a reason for hiding this comment

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

LysandreJik Aug 30, 2024

Choose a reason for hiding this comment

Uh oh!

matthewdouglas Aug 30, 2024

Choose a reason for hiding this comment

Uh oh!

ukaprch commented Feb 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants